
The failure mode: an LLM with raw schema access
Hand a language model your information_schema and ask it for "monthly revenue". It will find a column called amount, or amt, or maybe amt_cents_v2_final, and it will write a query that sums it. The query will run. It will return a number. The number will be wrong in a way nobody on the call notices until the finance lead opens the dashboard the next morning.
The model had no way to know:
- that refunded_amount_cents has to be subtracted
- that one-time charges are excluded from MRR by agreement with finance
- that the amount column on the legacy table is in dollars, not cents
- that a JOIN against invoice_line_items multiplies the customer count
- that "active" means status equals active AND current_period_end is in the future
None of that lives in the schema. It lives in your team's heads, in Notion pages nobody re-reads, and in the queries one senior analyst wrote two years ago. That gap, between what the schema says and what the business actually means, is the gap a semantic layer for SQL exists to close.
What a semantic layer is
The deterministic map from a business term to the exact table, column, grain, and filter.
A semantic layer for SQL is a deterministic mapping. The input is a business term. The output is, for that term:
- the table it lives on
- the column or expression that carries the value
- the grain of one row (per order, per customer, per day)
- the filter expression that defines membership
- whether it is additive across that grain
It is metadata, not SQL. A single business term maps to one row of metadata, and every tool that reads that row produces the same SQL. That is what makes it a business logic layer in SQL rather than a folder of saved queries — saved queries are outputs of the layer, not the layer itself.
What the layer stores, per column
business_nameHuman label. "amt_cents_paid_v2" becomes "Paid amount (cents)" so prompts can refer to it by meaning, not column name.semantic_typetemporal, metric, categorical, identifier, or descriptor. Drives chart-axis assignment and aggregation choice.is_additiveYes for SUM-safe metrics (revenue, count). No for ratios and balances, which need AVG or a recompute. Prevents the classic "sum of percentages" bug.grainOne row per what? Order, customer, line item, day. Required to know whether a JOIN multiplies the metric.value_samplesTop categorical values pulled from the actual data. Lets a prompt "lane A vs lane B" resolve to the real codes without guessing.verified_ruleThe exact filter or expression the business agreed on. The line between hallucinated and verified.Business terms to SQL columns: the mapping in practice
Here is what one team's sql business logic layer looks like for the six terms that show up in nearly every revenue question. Click a row to see the verified rule the SQL generator will pull on the next prompt.
billing.subscriptionsstatus, current_period_endstatus = 'active' AND current_period_end >= now()
Six rows. Six unambiguous definitions. Any prompt mentioning these terms now resolves to the same SQL, because the layer is the source — not the prompt, not the model, not whoever happens to be on call.
Hallucinated vs verified answers
The trick is what happens before the SQL generator runs. With a semantic layer in place, the pipeline does not hand the prompt straight to the model. It first resolves every business term in the prompt against the layer.
Prompt
"Show me MRR by plan last quarter"
What the model writes
SUM(amount) FROM invoice_line_items GROUP BY plan_id
Wrong column (amount, not amount_cents). Missing the paid filter. Includes one-time charges. Returns a number, nobody can defend it.
Prompt
"Show me MRR by plan last quarter"
What the layer resolves to
SUM(amount_cents)/100.0 FROM billing.invoice_line_items WHERE status = 'paid' AND product_type = 'subscription' GROUP BY plan_id, month
Same prompt, but every term routed through the mapping. Result matches the number finance already publishes.
The model is still in the loop. It still does the shape of the query, the joins, the time-bucketing. What it no longer gets to invent is the meaning of MRR, the meaning of active, the meaning of revenue.
The semantic layer takes those off the table. That is the entire conversion from hallucinated to verified.
Semantic layer vs headless BI
The category named headless BI (Cube, dbt Semantic Layer, MetricFlow, AtScale) is sometimes pitched as the same thing as a semantic layer. They are related but not identical, and the distinction matters when you are evaluating an AI analyst.
A headless BI product is a semantic layer plus a runtime: it owns the definitions and also serves the queries via a metrics API. The dashboard talks to that API instead of writing SQL itself. That shape is excellent for pre-modeled dashboards. It is awkward for ad hoc questions, because every new business term needs to be defined in the headless-BI YAML before any tool can ask about it.
A semantic layer in Chion's sense is the metadata only. It does not intercept your queries; it informs them. The SQL generator reads the layer, resolves the business terms in the prompt, and emits a query that runs directly against your warehouse. New terms can be added without redeploying anything. The trade-off: you do not get a metrics API for other tools to call — for that, headless BI and a semantic layer are complementary, not competitive.
Short version: headless BI = semantic layer + a serving runtime. Chion = the semantic layer that an AI analyst reads, persisted next to a real Postgres warehouse.
How Chion's semantic layer works
On first connect, Chion profiles every column on a read only role, then a second pass — the semantic analyzer — proposes the layer fields from section two: business_name, semantic_type, is_additive, grain hints, persisted in ai_semantic_attributes with categorical sample values so prompts like "lane A vs lane B" resolve to the actual codes. The full connect-discover-ask product flow is walked through in how Chion builds the layer.
That gets you started. The layer becomes real when the data team uses it. Every time someone runs a question in chat, refines the SQL, and promotes the verified query, the filter expression on that query is added to the layer. The mapping row for "active customer" stops being a guess and becomes the exact expression your senior analyst signed off on.
The persistence model matters: the layer lives in structured database rows, not a Markdown doc. That is what lets the SQL generator, the chart renderer, and the verified-query exporter all read the same source. It is also what makes the layer survive a schema change — when a column is renamed, only the column_profiles row moves; the business_name and verified_rule that point at it travel with it.
When the team is ready to ship the layer outside Chion, it compiles into the same CHION.md bundle the rest of the platform produces — every verified rule travels with the query that uses it. The SQL AI analyst page covers how that bundle is consumed at runtime, and the AI analyst overview covers the role context. For the upstream generation step see text-to-SQL explained, and for the read-only execution guarantees see verified SQL agents. This page stays on the layer itself.
Quick reference
- A semantic layer for SQL is a deterministic map from business term to table, column, grain, and filter.
- A sql business logic layer stores definitions as structured rows, not saved queries or prose.
- Six fields do the work: business_name, semantic_type, is_additive, grain, value_samples, verified_rule.
- Without it, an LLM guesses column meaning. With it, every prompt resolves the same SQL.
- The conversion from hallucinated to verified happens before SQL generation, not after.
- Chion builds the layer by column profiling plus a semantic analyzer, and persists it in ai_semantic_attributes and column_value_samples.
- Every promoted verified query writes its filter expression back into the layer, so the layer hardens over time.
Frequently asked
What is a semantic layer for SQL, in one sentence?
A semantic layer for SQL is a deterministic mapping from business terms (active customer, MRR, churn) to the exact tables, columns, grain, and filter expressions an analyst would use to compute them — stored as metadata next to the database, not inside any one query.
What is a semantic layer?
A semantic layer is the layer of metadata that sits between a database and the tools that query it (dashboards, notebooks, AI agents) and translates business terms into the exact SQL the team has agreed on. It is not a database, not a query, not a dashboard — it is the dictionary every downstream tool reads from so they all compute the same number for the same word.
Do AI analysts need a semantic layer?
Yes. An AI analyst without a semantic layer will guess what your column names mean and produce SQL that runs but returns numbers nobody can defend. A semantic layer takes the meaning of business terms (revenue, MRR, active) off the model and into structured metadata the data team owns — that is the line between an AI analyst you trust and one you do not.
Semantic layer vs metrics layer — what is the difference?
A metrics layer is a subset of a semantic layer scoped to time-series KPIs (MRR, DAU, conversion rate). A semantic layer covers metrics plus entities (customer, order), dimensions (plan, region), and the filter expressions that bind them. Every metrics layer is part of a semantic layer; not every semantic layer ships a metrics-store API on top.
How does a semantic layer for Postgres work in practice?
On a Postgres warehouse, the semantic layer is rows in metadata tables (in Chion: ai_semantic_attributes and column_value_samples) that point at real columns in your information_schema. The layer is populated by profiling the database on a read-only role, then refined as the team promotes verified queries. The Postgres semantic layer never modifies your database — it lives next to it as a parallel meaning layer.
Why is a business logic layer in SQL different from just writing views?
A view freezes one definition. A sql business logic layer stores the definition as structured metadata (term, table, column, grain, rule, additivity) so any tool — an LLM, a dashboard, a notebook — can resolve the same business term to the same SQL. Views are an output. The semantic layer is the source.
Can an LLM write good SQL without a semantic layer?
Sometimes, on a small clean schema with friendly column names. On a real warehouse with thousands of columns named like amt_v2_final_2024, it cannot. The model has no way to know that revenue means amount_cents minus refunded_amount_cents — that knowledge lives in your team, not in the schema.
What is the difference between hallucinated and verified SQL?
Hallucinated SQL is syntactically valid but uses the wrong column, the wrong filter, or the wrong grain. It runs and returns a number that nobody can defend. Verified SQL resolves every business term through the semantic layer first, so the number traces back to a rule a human at your company actually agreed on.
Does the semantic layer for SQL replace the data team?
No. It captures what the data team already knows so the rest of the company can self-serve without re-asking. The data team owns the definitions. The semantic layer is how those definitions become machine-readable.
How does Chion build the semantic layer?
Chion profiles every column on first connect (cardinality, type, sample values), then runs a semantic analyzer that proposes business_name, semantic_type, is_additive, and value_samples. The data team confirms or edits. Verified queries promoted from chat add their own filter expressions to the layer. The layer is structured rows in ai_semantic_attributes and column_value_samples, not a free-text doc.
Is the semantic layer the same as a metric store?
Overlapping but not identical. A metric store usually focuses on time-series metrics (MRR, DAU). A semantic layer for SQL covers metrics, dimensions, entities, and the filters that bind them. Chion stores both because an AI analyst needs both to answer real questions.
See your warehouse through a semantic layer.
Connect a read only Postgres role. Chion will profile your columns, propose the semantic layer, and let you confirm or edit each business term.