The problem with unvalidated SQL generation
Most text-to-SQL tools wrap a prompt around your question and return whatever the LLM produces. If the model hallucinates a column name, generates an UPDATE, or forgets a LIMIT — you find out when the query fails or, worse, when it succeeds with wrong data.
Chion's two-layer validator
L1 — Pre-validation (read-only check)
The first layer runs before the SQL touches your database. It blocks anything that isn't a read-only SELECT:
- INSERT, UPDATE, DELETE, DROP, ALTER — rejected at the contract level
- GRANT, REVOKE, TRUNCATE — blocked
- Multiple statements (semicolon injection) — blocked
- Enforced in code, not in the LLM prompt
L2 — Runtime lint (mode-based validation)
The second layer validates structural correctness:
- SELECT * is blocked — columns must be explicit
- LIMIT is enforced on every query
- JOIN conditions are validated against the schema profile
- Column references are checked against the SQL contract
- Aggregation rules are validated (can't SUM a categorical column)
Row budget truncation
Results are capped at the adapter level: ≤1,000 rows / 12,000 cells. If results are truncated, the chart discloses it.
The SQL contract
Between intent extraction and SQL generation, the pipeline builds a SQL contract — a typed specification of allowed columns, filters, aggregations, grain, and ordering. The SQL generator cannot introduce out-of-contract elements. This is what makes the pipeline model-agnostic: the LLM generates SQL against the contract, not against your raw schema.
3-cycle repair loop
If validated SQL still fails at execution (runtime error, empty result), the pipeline enters an automatic repair loop with error context. Up to 3 cycles of deterministic rewrites — not random retries. If it still fails, a hard failure with actionable suggestions.
Why this matters
Contract enforcement + two-layer validation + repair cycles push effective accuracy well beyond raw model benchmarks. The model generates SQL; the pipeline ensures it's correct.