Serving Apache Superset dashboards on billions of rows requires thinking differently than on millions. The secret is not in Superset but in modeling and the warehouse downstream. This guide details the patterns in 2026.
1. Limits of naive approaches
- Postgres at 1 billion rows: minute queries, even timeout;
- SELECT * on columnar fact table: too many bytes scanned;
- No partition / index: systematic full scan;
- Heavy virtual datasets: complexity explosion.
If you want Superset ready for large volumes, TVL Managed Superset Pro+ includes managed ClickHouse.
2. Choose the right backend
| Backend | Comfortable limit |
|---|---|
| Postgres | ~100 M rows |
| BigQuery | Terabytes |
| Snowflake | Terabytes |
| ClickHouse | Billions-trillions |
| Druid | Trillions (real-time) |
3. dbt pre-aggregation
Main pattern: pre-aggregate via dbt into physical tables:
-- marts/fct_orders_daily.sql
SELECT
DATE_TRUNC('day', created_at) AS day,
product_id,
country,
COUNT(*) AS orders,
SUM(amount) AS revenue
FROM {{ ref('stg_orders') }}
GROUP BY 1, 2, 3
Instead of 1 billion raw rows, the mart has a few million (day × dimensions granularity). Superset queries in sub-second.
4. Materialized views
For ClickHouse / BigQuery, use materialized views:
-- ClickHouse
CREATE MATERIALIZED VIEW mv_orders_5min
ENGINE = AggregatingMergeTree
ORDER BY (product_id, time_bucket)
AS SELECT
product_id,
toStartOfFiveMinute(created_at) AS time_bucket,
countState() AS orders_state,
sumState(amount) AS revenue_state
FROM orders
GROUP BY product_id, time_bucket;
5. Partitioning
- Postgres:
PARTITION BY RANGE (created_at)per month; - ClickHouse:
PARTITION BY toYYYYMM(created_at); - BigQuery: partitioned tables on
_PARTITIONDATE; - Snowflake: clustering keys.
Filtering on the partition column is mandatory in dashboards (cf. filters).
6. Sampling
To explore in SQL Lab on very large volumes:
-- ClickHouse
SELECT * FROM events
SAMPLE 0.01 -- 1% of rows
WHERE created_at > today() - 7;
-- BigQuery
SELECT * FROM events TABLESAMPLE SYSTEM (1 PERCENT)
WHERE _PARTITIONDATE BETWEEN '2026-05-01' AND '2026-05-09';
This configuration is applied by default on TVL Managed Superset, which follows community best practices.
7. Aggressive Superset cache
- 24h TTL on stable dashboards;
- Nightly cache warming;
- Cache miss = a few seconds acceptable, cache hit = sub-second.
8. Async queries
Essential on large volumes: queries of several seconds don't block the browser.
FEATURE_FLAGS = {"GLOBAL_ASYNC_QUERIES": True}
GLOBAL_ASYNC_QUERIES_JWT_SECRET = os.environ["GAQ_SECRET"]
9. Reference metrics
| Volume | Backend | Target latency |
|---|---|---|
| 10 M rows | Postgres | < 1s |
| 100 M rows | Postgres+marts | < 2s |
| 1 B rows | ClickHouse | < 1s |
| 10 B rows | ClickHouse+MV | < 2s |
| 100 B rows | BigQuery+marts | < 5s |
10. Common pitfalls
- SELECT * on columnar table;
- JOIN multi-millions without prior aggregation;
- No filter on partition column;
- Virtual datasets stacking sub-queries;
- Time range too large by default (5 years).
11. Conclusion
Scaling Apache Superset on billions of rows essentially means scaling the backend (ClickHouse, BigQuery) and pre-aggregating via dbt. Superset itself remains lightweight (it does SELECTs). Performance comes from modeling, not Superset tuning.
Want the benefits of Apache Superset without the friction of installation and maintenance? Deploy your instance in 3 clicks with TVL Managed Superset, hosted in Europe (OVHcloud, Roubaix, France).
For more: ClickHouse, scale users, data modeling.