TVL Managed Superset

Data modeling for Apache Superset 2026

Data modeling best practices for Apache Superset: dbt, star schema, virtual datasets, calculated metrics.

A good Apache Superset dashboard rests 80% on good upstream data modeling. Without clean modeling (dbt, star schema, consistent metrics), your dashboards will be slow, contradictory, and hard to maintain. This guide compiles best practices in 2026.

1. Why model for Superset?

  • Performance: pre-aggregate for sub-second dashboards;
  • Consistency: a single definition per KPI;
  • Maintainability: SQL versioned in Git, not scattered across 100 virtual datasets;
  • Documentation: dbt automatically generates the doc.

If you want template dbt datasets, TVL Managed Superset offers a turnkey dbt + Superset stack.

2. dbt layered architecture

  1. Sources: warehouse raw data;
  2. Staging (stg_*): cleanup, typing, casts;
  3. Intermediate (int_*): enrichments, joins;
  4. Marts (fct_*, dim_*): tables ready to be consumed by Superset;
  5. Exposures: dbt reference pointing to Superset charts/dashboards.

3. Star schema

The star schema is the reference pattern for BI:

  • Fact tables (fct_orders, fct_events): numerical events;
  • Dimension tables (dim_customers, dim_products): attributes;
  • Foreign keys between fact and dim;
  • No chained joins in Superset.

4. Consistent metrics

Centralize definitions:

-- marts/fct_orders.sql
SELECT
  order_id,
  customer_id,
  amount,
  amount * 0.85 AS amount_net,  -- 15% VAT
  ...
FROM stg_orders;

-- Document in schema.yml
columns:
  - name: amount_net
    description: "Net revenue excl. VAT. CFO definition 2026-Q1."

For SaaS KPIs (MRR, ARR, churn), centralize in dedicated marts.

5. Superset datasets

Three simple rules:

  1. Prefer physical datasets (on dbt tables/views) rather than virtual;
  2. Virtual datasets only for prototyping;
  3. Once stabilized, materialize via dbt (cf. virtual datasets).

This configuration is applied by default on TVL Managed Superset, which follows community best practices.

6. Dataset-level metrics

Rather than recalculating in each chart, define at dataset level:

-- Dataset "fct_orders" → Metrics tab
total_revenue : SUM(amount)
total_orders : COUNT(*)
avg_order_value : AVG(amount)
unique_customers : COUNT(DISTINCT customer_id)

These metrics are then reusable in all charts.

7. Naming conventions

PrefixUsage
stg_Staging (cleaning)
int_Intermediate (logic)
fct_Fact tables
dim_Dimension tables
mart_Final aggregated marts

8. Performance optimization

  • dbt incremental materialized for large fact tables;
  • Partitioning on warehouse side (BigQuery, ClickHouse);
  • Index on Postgres filtering columns;
  • Monthly pre-aggregation for exec dashboards;
  • Superset cache aligned with dbt refresh frequency.

9. dbt tests

  • Unique on primary keys;
  • Not null on critical columns;
  • Accepted values on enumerations;
  • Relationships between fact and dim;
  • Custom business tests (e.g. amount > 0).

10. Common pitfalls

  • SQL scattered across 100 Superset virtual datasets → maintenance nightmare;
  • Contradictory metrics between dashboards (different revenue figures);
  • No dbt tests → false data undetected;
  • Late modeling: trying to straighten out after 50 dashboards in place is very painful.

11. Conclusion

Data modeling is the invisible but essential foundation of a productive Apache Superset instance. Investing 1-2 weeks in dbt + star schema initialization at project start saves months of friction afterward. For teams starting out, consider Superset and dbt as a single system, never one without the other.

Want the benefits of Apache Superset without the friction of installation and maintenance? Deploy your instance in 3 clicks with TVL Managed Superset, hosted in Europe (OVHcloud, Roubaix, France).

For more: virtual datasets, SaaS metrics, dashboard best practices.