TVL Managed Superset

Connect DuckDB to Apache Superset: local analytics 2026

Tutorial to connect DuckDB to Apache Superset: embedded SQL, Parquet files, MotherDuck, performance.

DuckDB is the data outsider of 2026: an embedded columnar SQL engine, ultra-fast on local files (CSV, Parquet, JSON) and now accessible in cloud via MotherDuck. Connecting Apache Superset to DuckDB turns your laptop or your Superset instance into a personal data warehouse.

1. Why DuckDB?

  • Performance: columnar, vectorized, comparable to ClickHouse on GBs;
  • Zero-config: a single .duckdb file, no server to operate;
  • Native reading of Parquet, CSV, JSON, Arrow without prior import;
  • Standard SQL with analytical extensions (window functions, CTE);
  • MotherDuck: cloud DuckDB + collaboration.

If you want Superset ready to connect to DuckDB, TVL Managed Superset includes the DuckDB driver by default.

2. Prerequisites

  • An accessible Superset instance;
  • A local DuckDB file or MotherDuck account;
  • The duckdb-engine driver installed.

3. Install the driver

uv pip install duckdb-engine

4. Local DuckDB URI

For a .duckdb file mounted in Superset:

duckdb:////data/analytics.duckdb

In memory (volatile, useful for testing):

duckdb:///:memory:

5. MotherDuck URI (cloud DuckDB)

duckdb:///md:<db_name>?motherduck_token=<token>

MotherDuck offers DuckDB benefits in collaborative and cloud mode, with S3 storage under the hood.

6. Add to Superset

  1. UI → Settings → Database Connections → + Database;
  2. Type: DuckDB;
  3. Paste the URI;
  4. Test → Save.

7. Iconic use cases

  • Exploratory analysis of Parquet files on S3 without import;
  • Local dashboards data analyst on their laptop;
  • Fast pre-aggregation of events before push to warehouse;
  • Analytics POC without cloud infra;
  • Education: teach SQL without provisioning a server.

This configuration is applied by default on TVL Managed Superset, which follows community best practices.

8. Direct Parquet/CSV reading

DuckDB can directly query files without import:

SELECT *
FROM 's3://my-bucket/events/year=2026/month=05/*.parquet'
WHERE event_type = 'purchase';

Configure S3 credentials:

SET s3_access_key_id='AKIA...';
SET s3_secret_access_key='...';
SET s3_region='eu-west-1';

9. DuckDB limits

  • Single-user per file (lock file): no concurrency with multiple writers;
  • No native replication: for HA, use MotherDuck;
  • Not suited for massive writes: DuckDB excels at reads/aggregations;
  • Memory limits: DuckDB uses disk-spill but remains limited by the machine.

10. Common pitfalls

  • File not accessible to Superset pod: mount a shared volume or use MotherDuck;
  • Lock file: another process opened the file in write, close it;
  • Legacy driver (duckdb without -engine): use duckdb-engine for SQLAlchemy;
  • Aggressive Superset cache recommended because DuckDB is very fast but not instant on 10+ million rows.

11. Conclusion

DuckDB + Apache Superset is a unique combo for exploratory and local BI. For multi-user production loads, consider MotherDuck or a classic cloud data warehouse. But for data analysts who want a lightweight and performant setup, it's unbeatable.

Want the benefits of Apache Superset without the friction of installation and maintenance? Deploy your instance in 3 clicks with TVL Managed Superset, hosted in Europe (OVHcloud, Roubaix, France).

For more: connect ClickHouse, connect PostgreSQL, chart types.