DuckDB is the data outsider of 2026: an embedded columnar SQL engine, ultra-fast on local files (CSV, Parquet, JSON) and now accessible in cloud via MotherDuck. Connecting Apache Superset to DuckDB turns your laptop or your Superset instance into a personal data warehouse.
1. Why DuckDB?
- Performance: columnar, vectorized, comparable to ClickHouse on GBs;
- Zero-config: a single .duckdb file, no server to operate;
- Native reading of Parquet, CSV, JSON, Arrow without prior import;
- Standard SQL with analytical extensions (window functions, CTE);
- MotherDuck: cloud DuckDB + collaboration.
If you want Superset ready to connect to DuckDB, TVL Managed Superset includes the DuckDB driver by default.
2. Prerequisites
- An accessible Superset instance;
- A local DuckDB file or MotherDuck account;
- The
duckdb-enginedriver installed.
3. Install the driver
uv pip install duckdb-engine
4. Local DuckDB URI
For a .duckdb file mounted in Superset:
duckdb:////data/analytics.duckdb
In memory (volatile, useful for testing):
duckdb:///:memory:
5. MotherDuck URI (cloud DuckDB)
duckdb:///md:<db_name>?motherduck_token=<token>
MotherDuck offers DuckDB benefits in collaborative and cloud mode, with S3 storage under the hood.
6. Add to Superset
- UI → Settings → Database Connections → + Database;
- Type: DuckDB;
- Paste the URI;
- Test → Save.
7. Iconic use cases
- Exploratory analysis of Parquet files on S3 without import;
- Local dashboards data analyst on their laptop;
- Fast pre-aggregation of events before push to warehouse;
- Analytics POC without cloud infra;
- Education: teach SQL without provisioning a server.
This configuration is applied by default on TVL Managed Superset, which follows community best practices.
8. Direct Parquet/CSV reading
DuckDB can directly query files without import:
SELECT *
FROM 's3://my-bucket/events/year=2026/month=05/*.parquet'
WHERE event_type = 'purchase';
Configure S3 credentials:
SET s3_access_key_id='AKIA...';
SET s3_secret_access_key='...';
SET s3_region='eu-west-1';
9. DuckDB limits
- Single-user per file (lock file): no concurrency with multiple writers;
- No native replication: for HA, use MotherDuck;
- Not suited for massive writes: DuckDB excels at reads/aggregations;
- Memory limits: DuckDB uses disk-spill but remains limited by the machine.
10. Common pitfalls
- File not accessible to Superset pod: mount a shared volume or use MotherDuck;
- Lock file: another process opened the file in write, close it;
- Legacy driver (
duckdbwithout-engine): useduckdb-enginefor SQLAlchemy; - Aggressive Superset cache recommended because DuckDB is very fast but not instant on 10+ million rows.
11. Conclusion
DuckDB + Apache Superset is a unique combo for exploratory and local BI. For multi-user production loads, consider MotherDuck or a classic cloud data warehouse. But for data analysts who want a lightweight and performant setup, it's unbeatable.
Want the benefits of Apache Superset without the friction of installation and maintenance? Deploy your instance in 3 clicks with TVL Managed Superset, hosted in Europe (OVHcloud, Roubaix, France).
For more: connect ClickHouse, connect PostgreSQL, chart types.