Back to Resources

Text-to-SQL Tutorial: Plain English to Verified PostgreSQL in 3 Steps

Learn how text-to-SQL works in Chion: connect your database, let the pipeline profile your schema, and ask questions in plain English. Two-layer validation ensures every query is safe.

By Jonathan Dag··8 min read

What is text-to-SQL?

Text-to-SQL converts a natural-language question into an executable SQL query. Instead of writing SELECT c.name, SUM(o.amount) FROM customers c JOIN orders o ..., you type "top 10 customers by revenue last quarter" and the system generates, validates, and runs the SQL for you.

The 3-step flow

Step 1: Connect your database

Chion connects to PostgreSQL with read-only credentials stored in an AES-256-GCM encrypted vault. No data leaves your infrastructure — only metadata passes through the pipeline.

Step 2: Schema profiling

Before any LLM sees anything, Chion runs a profiling pass: every table, column, data type, cardinality, and value distribution is cataloged. Columns are classified as temporal, quantitative, categorical, or identifier. Entity resolution uses pgvector embeddings to match your words to real column values.

Step 3: Ask a question

Type your question in plain English. The 13-phase pipeline extracts intent, resolves entities, generates a SQL contract, produces read-only SQL, validates it in two layers (L1 read-only check + L2 runtime lint), and renders an interactive D3.js chart with the SQL visible underneath.

Two-layer validation

Every query passes through two validation layers before reaching your database:

  • L1 — Pre-validation: Blocks anything that isn't a read-only SELECT. INSERT, UPDATE, DELETE, DROP — rejected in code, not in the LLM.
  • L2 — Runtime lint: SELECT * is blocked. LIMIT is enforced. JOIN conditions are validated against the schema profile. Row budget: ≤1,000 rows / 12,000 cells.

If the generated SQL violates the contract, it enters an automatic 3-cycle repair loop with error context.

Why pipeline architecture matters more than model choice

Academic benchmarks like Spider and WikiSQL report 85%+ accuracy on text-to-SQL tasks. But raw model accuracy isn't the bottleneck for production workloads. Schema profiling, contract enforcement, and repair cycles push effective accuracy well beyond what model benchmarks suggest — because the pipeline catches and fixes errors the model makes.

Try it

Connect your PostgreSQL database and ask your first question. 10-day trial, no credit card required.