Retrieval-Augmented Generation, or RAG, is the key technology that makes AI usable for regulated industries. While general language models rely on training data and hallucinate, deliver outdated information, and provide no sources, RAG anchors the AI in a verified knowledge base. The result: source-based, current, and traceable answers.
This article explains the technical architecture of a RAG system for regulated industries. It is aimed at professionals who want to understand what happens under the hood when they use such a system. No computer science degree required, but some technical interest assumed.
The Core Principle
A RAG system consists of two main components: a retrieval component (search) and a generation component (text generation). The combination solves the three main problems of general language models.
Without RAG: The user asks a question. The language model generates an answer based solely on what it learned during training. No external data source, no fact-checking, no source references.
With RAG: The user asks a question. The system first searches a knowledge base for relevant documents. These documents are passed to the language model as context. The model generates its answer based on the retrieved documents and cites the sources.
The difference is comparable to that between an expert answering from memory and an expert who first consults the relevant materials and then answers. The second approach is more reliable, and its answers are verifiable.
The Retrieval Pipeline in Detail
The quality of a RAG system stands or falls with the retrieval pipeline. The better the system finds the relevant documents, the better the answer. A modern retrieval pipeline for regulated industries consists of several stages.
Stage 1: Document Processing and Indexing
Before the system can answer questions, the source documents must be processed and indexed. For regulated industries, this is more complex than for general applications.
Structured extraction. Laws, ordinances, and court decisions are not unstructured text. They have a hierarchical structure. A federal act consists of parts, titles, chapters, sections, articles, and paragraphs. A court decision has a statement of facts, considerations, and a ruling. This structure must be preserved during processing so that the system can search at the right level of granularity.
Chunking. Documents are divided into sections (chunks) that are small enough for efficient retrieval but large enough for sufficient context. For legal texts, the natural structure (article, paragraph, consideration) is often the best chunk boundary. Arbitrarily splitting by character count destroys coherence.
Contextual embedding. Before a chunk is converted into a vector, the system generates a contextual description: which law does this article belong to? In which chapter does it appear? What is it about? This context information is prepended to the chunk and significantly improves search accuracy. An isolated paragraph reading “The tax rate is 8%” is of little use without context. The same paragraph with the context “Art. 55 of the Tax Act of the Canton of Zurich, section on income tax for natural persons” is precisely findable.
Vectorisation. Each chunk (with context information) is converted by an embedding model into a high-dimensional vector. This vector is a mathematical representation of the text’s meaning. Similar meanings produce similar vectors. This enables semantic search: the system finds documents that are relevant in content, even if they use different words.
BM25 indexing. In parallel with vectorisation, each chunk is captured in a classical full-text index (BM25). This index is optimised for exact terms: article numbers, case numbers, specific technical terms, statutory abbreviations. Semantic search and exact search complement each other.
Stage 2: Hybrid Search
When the user asks a question, two searches run in parallel.
Vector search. The question is converted into a vector and matched against the vector database. The system finds chunks whose meaning is most similar to the question. Example: the question “What are the landlord’s obligations regarding defects?” finds relevant provisions in the Code of Obligations, even when those provisions refer to “warranty for defects” and “remediation.”
Keyword search (BM25). Simultaneously, the system searches the full-text index for exact matches. This is particularly important for article references (“Art. 259a OR”), case numbers of court decisions, specific legal terms, and abbreviations.
Reciprocal Rank Fusion (RRF). The results of both searches are combined using an algorithm that prioritises documents ranking highly in both searches. A document that is both semantically relevant and scores well in keyword search receives the highest rank.
Stage 3: Reranking
The initial search casts a wide net. Reranking filters the results.
Cross-encoder. A specialised model evaluates each document-question pair in detail. Unlike vector search, which embeds documents and questions independently and then compares them, the cross-encoder considers the document and the question together. This enables a finer relevance assessment.
Context window optimisation. The number of documents passed to the language model is limited by the context window. Reranking ensures that the most relevant documents fill the available space. Less relevant documents are filtered out so they do not distract the model.
Stage 4: Generation
The language model receives the user’s question together with the retrieved documents as context.
Instructions. The model is instructed to base its answer exclusively on the provided documents. If the documents contain no information on the question, the model should communicate this transparently rather than inventing an answer.
Source attribution. The model references the specific sources it used in its answer. Every statement is attributed to a specific document. The user sees: this information comes from Art. 259a OR, that one from BGE 135 III 345.
Structured output. For regulated applications, structured output is important. Instead of unstructured prose, the system delivers the answer organised by legal basis, summary, and source references.
Special Requirements for Regulated Industries
Standard RAG, as deployed in general applications, is insufficient for regulated industries. The following extensions are necessary.
Multilingual capability. In Switzerland, federal laws exist in German, French, and Italian. All three language versions are equally authoritative. A RAG system for Swiss law must be able to search across languages. A German-language question must also find the French statutory version if it is more relevant. Multilingual embedding models such as BGE-M3 enable this by mapping texts of different languages into the same vector space.
Citation graphs. Legal texts do not exist in isolation. A statutory article references other articles. Court decisions cite laws and other decisions. Ordinances specify laws in greater detail. These relationships form a graph. A RAG system for regulated industries must know and use this graph. When a user asks about a statutory article, the system should also find the relevant court decisions that interpret that article and the ordinances that specify it.
Versioning. Laws change. An article that was in force in a certain version in 2024 may read differently in 2026. The system must handle different versions. It must use the current version by default but also be able to retrieve historical versions on request. For cases relating to a prior legal state, the historical version is the relevant one.
Granular retrieval. In general RAG systems, search often operates at the document level. For regulated industries, this is too coarse. The system must be able to search at the article level, paragraph level, or even sentence level. When the user asks about the limitation period for a specific claim, they need the specific paragraph, not the entire statute.
Audit trail. Every interaction with the system must be logged: the question, the retrieved documents, the generated answers, the sources used. This audit trail is essential for quality control, compliance requirements, and accountability towards clients and authorities.
Quality Measurement
A RAG system is only as good as its ability to find the right documents and generate correct answers. Quality is measured by several metrics.
Recall. How many of the relevant documents does the system find? A system that finds 8 out of 10 relevant articles has a recall of 80%. For regulated industries, high recall is critical. An overlooked relevant article can distort the entire analysis.
Precision. How many of the found documents are actually relevant? A system that delivers 100 documents of which only 5 are relevant has low precision. This overwhelms the user with irrelevant information.
Faithfulness. Does the generated answer match the retrieved documents? A system that retrieves documents correctly but deviates from their content in generation has a faithfulness problem.
Hallucination rate. How often does the answer contain information not found in any of the retrieved documents? For regulated industries, this rate must tend towards zero.
Why Architecture Matters
The technical architecture of a RAG system determines whether it is suitable for regulated industries or not. A system with inferior embeddings, without BM25 search, without reranking, and without source attribution may work for general questions. For legal research, compliance review, or tax analysis, it is insufficient.
The architecture also determines where data is processed. For Swiss companies in regulated industries, the entire pipeline must run in Switzerland: embedding computation, vector database, language model, audit trail. An architecture that sends data to foreign servers for processing is not permissible for many use cases.
Enclava implements this architecture entirely in Switzerland. The platform comprises verified legal and regulatory data, multilingual retrieval, granular source attribution, and complete auditability. The SwissLaw knowledge base contains over 27,000 laws and 1.1 million court decisions, structured, versioned, and continuously updated.
If you want to understand how RAG for regulated industries works in practice, visit enclava.ch or contact us at [email protected].