As I expand my data engineering knowledge base, I stumble across new concepts and terms on a fairly regular basis. One concept that has repeatedly appeared but remains a mystery is the semantic layer. If you are in the same boat and would like to learn more, put on your investigative cap and keep reading! This blog will serve as the case file for solving the mystery of what the semantic layer is (and how it can improve your data analytics across departments).
Case Overview
This case involves a data warehouse structured according to the dbt architecture model. The staging layer includes the source tables with minimal changes (e.g. data type casting or field name changes). The intermediate layer is where the bulk of the transformations happens. The marts layer consists of gold-standard tables or views optimized for analysis, reporting, machine learning, and artificial intelligence.
The semantic layer sits between the gold-standard tables and the end-user tools. The goal of the semantic layer is to translate complex data structures into familiar business terms, create unified metric logic across departments, and serve as a single point of context for your data to optimize querying and AI integration.
Case History
In the 1990s, Business Objects introduced the semantic layer to simplify access to relational databases; they coined it “Universes”. Relational databases required technical knowledge of SQL and complex schema structures to access, making them largely inaccessible to business users. Universes acted as a metadata abstraction layer, mapping technical database structures to familiar business terms - allowing analysts to query data without needing to understand what was happening under the hood.
By the 2000s, with the rise of the modern internet, data volumes and variety grew, demanding more flexible database architectures that traded rigidity for scalability. Moving away from this structured architecture meant business logic could now live anywhere - in dashboards, spreadsheets, or individual scripts - making it increasingly easy for metrics to fall out of sync across departments. In the new era of data, with AI integrated into data stacks, it's ever more important to have a layer that provides consistent business context, because AI is only as reliable as the data definitions it's given.
Witness Testimony
The investigative team interviewed a handful of witnesses directly and indirectly involved in this case. Each witness statement added valuable insight into the investigation and is summarized in this section. Full witness statements will be added to the report file.
Witness #1:
This witness works in a financial establishment and has noticed that certain business metrics across departments do not always add up to the same value. This is causing stakeholders to distrust the accuracy of reporting and the quality of the data. A colleague mentioned exploring a code-first approach in which metrics could be defined once in version-controlled files and shared across all their BI tools, but the witness could not confirm whether this was the chosen solution.
Witness #2:
This witness is a stakeholder in a marketing company. As a business user, this witness used to struggle to find and gather the data they needed to support their customer retention strategy, because there were no data definitions for their database to explain coded field names or field lineage. A recent shift in their data stack changed this to make the fields more user-friendly for business users. The witness mentioned that the solution sits between their data warehouse and their BI tools, automatically translating technical field names into business-friendly definitions.
Witness #3:
This witness is a data engineer at a retail company that recently integrated an AI assistant into their data stack. Initially, the AI was returning inconsistent, sometimes inaccurate answers to business queries, pulling from different definitions depending on which BI tool was used. After escalating the issue, their team identified that the AI lacked a governed business context to rely on. The witness noted that the solution their team is evaluating would sit upstream of both their BI tools and AI models, providing a single source of validated business definitions.
Evidence
Exhibit A: Unified Metrics & Business-Friendly Abstraction
The semantic layer centralizes business logic and mathematical calculations, hiding the technical complexities of underlying data schemas such as foreign keys, complex joins, and SQL syntax. By defining a metric (e.g., "Margin") exactly once, any connected BI tool or AI model pulls the exact same number - eliminating conflicting spreadsheets and reporting errors. This allows business users to query data intuitively using familiar dimensions (e.g., region, product category, date) rather than writing code.
Exhibit B: Centralized Governance, Security & Metadata
The semantic layer enables administrators to manage security, compliance, and access controls in a single place while incorporating detailed metadata, business glossaries, and descriptions. Row-level security and column masking travel with the semantic definitions, ensuring users only see data they are authorized to view. Data lineage tracks exactly which upstream source tables feed into specific metrics, enabling impact analysis and change management to be safe and transparent.
Exhibit C: Integration & AI Enablement
A mature semantic layer serves as a centralized integration layer, standardizing data access across disparate platforms and end-user applications, enabling teams to use multiple BI tools while relying on a common underlying logic. This business context extends to AI agents as well, reducing the risk of hallucinations and ensuring that Generative AI tools provide trustworthy, policy-aligned answers based on validated corporate data.
Persons of Interest
Based on the evidence, witness statements, and background information gathered, the investigation identified some persons of interest who required further investigation.
Code First & Universal Semantic Layers: These tools define metrics and business logic in code - typically YAML or SQL - and operate independently of any specific BI tool or data warehouse, making them portable across the modern data stack.
dbt Semantic Layer (MetricFlow), Cube, AtScale.
Native & Ecosystem-Specific Solutions: These solutions are tightly coupled to a specific platform or ecosystem, offering deep native integration at the cost of flexibility across other tools.
Looker (LookML), Power BI semantic models, warehouse-native views.
Findings
The investigation identified the main culprit in this case as the dbt Semantic Layer. Released in 2022 and revamped in 2024, the dbt Semantic Layer was an impactful enhancement on an already powerful transformation tool.
The goal of the semantic layer is to build a centralized location for business logic and metric definitions, removing them from downstream tools. This centralized location makes it easier to update information or add new logic, as changes flow seamlessly downstream because it sits between your data warehouse and your BI or AI tools. In the semantic layer, the technical complexities of the data are also hidden, making it more user-friendly and intuitive for business users to query.
With consistent definitions and logic, integrating AI into the data stack is easier. The AI has a single source of business context to help it generate answers to business-user prompts. This reduces the risk of inconsistent or inaccurate outputs, making AI a more trustworthy participant in the data workflow.
Case Closed
The semantic layer is like a business's menu. The end user can order a dish from the menu based on their understanding of and interest in the listed meals, and receive it from the kitchen after some time.
The end user does not need to know what happens in the kitchen or how the meal is cooked. They can simply enjoy a delicious meal and feel satisfied. The mystery of the semantic layer has been solved - and it turns out, the answer was on the menu all along.
