TVL Managed Superset

Scale Apache Superset to Millions of Rows 2026

Scale Apache Superset on large volumes: pre-aggregation, ClickHouse, materialized views, sampling.

Serving Apache Superset dashboards on billions of rows requires thinking differently than on millions. The secret is not in Superset but in modeling and the warehouse downstream. This guide details the patterns in 2026.

1. Limits of naive approaches

  • Postgres at 1 billion rows: minute queries, even timeout;
  • SELECT * on columnar fact table: too many bytes scanned;
  • No partition / index: systematic full scan;
  • Heavy virtual datasets: complexity explosion.

If you want Superset ready for large volumes, TVL Managed Superset Pro+ includes managed ClickHouse.

2. Choose the right backend

BackendComfortable limit
Postgres~100 M rows
BigQueryTerabytes
SnowflakeTerabytes
ClickHouseBillions-trillions
DruidTrillions (real-time)

3. dbt pre-aggregation

Main pattern: pre-aggregate via dbt into physical tables:

-- marts/fct_orders_daily.sql
SELECT
  DATE_TRUNC('day', created_at) AS day,
  product_id,
  country,
  COUNT(*) AS orders,
  SUM(amount) AS revenue
FROM {{ ref('stg_orders') }}
GROUP BY 1, 2, 3

Instead of 1 billion raw rows, the mart has a few million (day × dimensions granularity). Superset queries in sub-second.

4. Materialized views

For ClickHouse / BigQuery, use materialized views:

-- ClickHouse
CREATE MATERIALIZED VIEW mv_orders_5min
ENGINE = AggregatingMergeTree
ORDER BY (product_id, time_bucket)
AS SELECT
  product_id,
  toStartOfFiveMinute(created_at) AS time_bucket,
  countState() AS orders_state,
  sumState(amount) AS revenue_state
FROM orders
GROUP BY product_id, time_bucket;

5. Partitioning

  • Postgres: PARTITION BY RANGE (created_at) per month;
  • ClickHouse: PARTITION BY toYYYYMM(created_at);
  • BigQuery: partitioned tables on _PARTITIONDATE;
  • Snowflake: clustering keys.

Filtering on the partition column is mandatory in dashboards (cf. filters).

6. Sampling

To explore in SQL Lab on very large volumes:

-- ClickHouse
SELECT * FROM events
SAMPLE 0.01  -- 1% of rows
WHERE created_at > today() - 7;

-- BigQuery
SELECT * FROM events TABLESAMPLE SYSTEM (1 PERCENT)
WHERE _PARTITIONDATE BETWEEN '2026-05-01' AND '2026-05-09';

This configuration is applied by default on TVL Managed Superset, which follows community best practices.

7. Aggressive Superset cache

  • 24h TTL on stable dashboards;
  • Nightly cache warming;
  • Cache miss = a few seconds acceptable, cache hit = sub-second.

8. Async queries

Essential on large volumes: queries of several seconds don't block the browser.

FEATURE_FLAGS = {"GLOBAL_ASYNC_QUERIES": True}
GLOBAL_ASYNC_QUERIES_JWT_SECRET = os.environ["GAQ_SECRET"]

9. Reference metrics

VolumeBackendTarget latency
10 M rowsPostgres< 1s
100 M rowsPostgres+marts< 2s
1 B rowsClickHouse< 1s
10 B rowsClickHouse+MV< 2s
100 B rowsBigQuery+marts< 5s

10. Common pitfalls

  • SELECT * on columnar table;
  • JOIN multi-millions without prior aggregation;
  • No filter on partition column;
  • Virtual datasets stacking sub-queries;
  • Time range too large by default (5 years).

11. Conclusion

Scaling Apache Superset on billions of rows essentially means scaling the backend (ClickHouse, BigQuery) and pre-aggregating via dbt. Superset itself remains lightweight (it does SELECTs). Performance comes from modeling, not Superset tuning.

Want the benefits of Apache Superset without the friction of installation and maintenance? Deploy your instance in 3 clicks with TVL Managed Superset, hosted in Europe (OVHcloud, Roubaix, France).

For more: ClickHouse, scale users, data modeling.