Back to Resources

How Chion's Two-Layer SQL Validator Catches Errors Before Execution

Deep dive into Chion's two-layer SQL validator: L1 blocks non-SELECT mutations, L2 lints for structural issues. How contract enforcement prevents hallucinated columns and unsafe queries.

By Jonathan Dag··9 min read

The problem with unvalidated SQL generation

Most text-to-SQL tools wrap a prompt around your question and return whatever the LLM produces. If the model hallucinates a column name, generates an UPDATE, or forgets a LIMIT — you find out when the query fails or, worse, when it succeeds with wrong data.

Chion's two-layer validator

L1 — Pre-validation (read-only check)

The first layer runs before the SQL touches your database. It blocks anything that isn't a read-only SELECT:

  • INSERT, UPDATE, DELETE, DROP, ALTER — rejected at the contract level
  • GRANT, REVOKE, TRUNCATE — blocked
  • Multiple statements (semicolon injection) — blocked
  • Enforced in code, not in the LLM prompt

L2 — Runtime lint (mode-based validation)

The second layer validates structural correctness:

  • SELECT * is blocked — columns must be explicit
  • LIMIT is enforced on every query
  • JOIN conditions are validated against the schema profile
  • Column references are checked against the SQL contract
  • Aggregation rules are validated (can't SUM a categorical column)

Row budget truncation

Results are capped at the adapter level: ≤1,000 rows / 12,000 cells. If results are truncated, the chart discloses it.

The SQL contract

Between intent extraction and SQL generation, the pipeline builds a SQL contract — a typed specification of allowed columns, filters, aggregations, grain, and ordering. The SQL generator cannot introduce out-of-contract elements. This is what makes the pipeline model-agnostic: the LLM generates SQL against the contract, not against your raw schema.

3-cycle repair loop

If validated SQL still fails at execution (runtime error, empty result), the pipeline enters an automatic repair loop with error context. Up to 3 cycles of deterministic rewrites — not random retries. If it still fails, a hard failure with actionable suggestions.

Why this matters

Contract enforcement + two-layer validation + repair cycles push effective accuracy well beyond raw model benchmarks. The model generates SQL; the pipeline ensures it's correct.