The problem with naive aggregation

Most SQL generators pick aggregation functions based on the question text alone. "Average salary" → AVG. "Total revenue" → SUM. But what happens when the model applies SUM to a categorical column, or AVG to an identifier?

Column type classification

During schema profiling, every column gets classified:

Quantitative: numeric columns where arithmetic makes sense (revenue, salary, count)
Temporal: date/timestamp columns that drive GROUP BY grain
Categorical: string/enum columns with bounded cardinality (region, status, plan_name)
Identifier: primary keys, foreign keys, unique identifiers

Aggregation rules

The SQL contract enforces aggregation rules based on column type:

Quantitative → SUM, AVG, MIN, MAX, PERCENTILE_CONT
Categorical → COUNT, COUNT(DISTINCT)
Temporal → GROUP BY grain (day/week/month/quarter/year)
Identifier → never aggregated, used only in JOIN conditions and WHERE filters

The budget algorithm

Chion's aggregation engine uses a bi-dimensional (grain, K) search algorithm to stay within the row budget (≤1,000 rows / 12,000 cells). If the natural grain produces too many rows, the pipeline coarsens the time grain before applying TopK — never auto-filtering dates.

Why this matters

Invalid aggregations produce numbers that look right but are meaningless. SUMming a categorical column returns a number — but it's nonsense. The contract prevents this at the structural level, not the prompt level.

How Chion's Aggregation Rules Engine Works

The problem with naive aggregation

Column type classification

Aggregation rules

The budget algorithm

Why this matters

More in this topic

SQL CTE Examples: Common Table Expressions with Chion-Generated Queries

SQL Window Functions Tutorial: Rankings, Running Averages, and Growth Rates

How Chion Prevents Hallucinated SQL: Schema Profiling to Execution