TVL Managed Superset

Apache Superset Virtual Datasets: Reusable SQL 2026

Create and use Apache Superset virtual datasets: turn a SQL query into a dataset, performance, best practices.

Virtual datasets in Apache Superset turn a SQL query into a reusable data source for charts. The data analyst's prototyping tool par excellence, and a useful complement to physical tables modeled in dbt. This guide explains how to use them without degrading performance in 2026.

1. Physical vs virtual datasets

AspectPhysicalVirtual
SourceA DB tableA SQL query
PerformanceOptimal (direct read)Depends on query
UpdateColumn refreshQuery modification
Use caseProductionPrototyping, light aggregations

If you want an instance with template datasets, TVL Managed Superset offers ready-to-use datasets for SaaS metrics.

2. Create a virtual dataset from SQL Lab

  1. SQL Lab → write and test your query;
  2. Click Explore at the top right of the results;
  3. Enter the dataset name (e.g., orders_daily_summary);
  4. Save → the dataset is created in virtual mode.

3. Create a virtual dataset from the Datasets UI

  1. UI → Datasets → + Dataset;
  2. Database = your connection;
  3. Toggle Virtual Dataset;
  4. Paste the SQL query;
  5. Save.

4. Example — Top 10 products

SELECT
  product_name,
  category,
  SUM(amount) AS revenue,
  COUNT(*) AS orders_count
FROM orders
JOIN products ON products.id = orders.product_id
WHERE created_at >= NOW() - INTERVAL '90 days'
GROUP BY product_name, category
ORDER BY revenue DESC
LIMIT 10;

5. Computed metrics

Once the dataset is created, add metrics at the dataset level (reusable in all charts):

-- Metrics tab of the dataset
total_revenue : SUM(amount)
avg_order_value : AVG(amount)
unique_customers : COUNT(DISTINCT customer_id)

6. Performance pitfalls

Virtual datasets are simply SELECT * FROM (...) wrappers. If the subquery is heavy, each chart re-runs it.

  • Aggregations in virtual: OK if data already filtered;
  • Multi-million JOINs: prefer materializing via dbt;
  • Nested subqueries: flatten with CTE for readability;
  • No cache by default on virtual datasets: adjust Cache timeout.

This configuration is applied by default on TVL Managed Superset, which follows community best practices.

7. Best practices

  • Virtual datasets = prototyping;
  • Physical datasets (dbt) = production as soon as a dashboard is consolidated;
  • Document each dataset (description, owner, refresh);
  • Naming convention: prefix with virt_ to distinguish;
  • Versioning: export datasets in YAML and commit in Git.

8. Convert a virtual dataset to physical

When a virtual dataset becomes critical:

  1. Reproduce the query in dbt (e.g., marts/orders_daily_summary.sql);
  2. Materialize as table or incremental;
  3. In Superset, create a physical dataset pointing to the new table;
  4. Update charts to point to the new dataset;
  5. Delete the virtual dataset.

9. Common pitfalls

  • Too many virtual datasets: complexity explosion;
  • Unversioned modifications: a user modifies the query, breaks 5 charts;
  • Performance degraded without pre-aggregation;
  • Dependency on DB-specific functions not portable.

10. Conclusion

Virtual datasets are a powerful tool to move fast, but to use with discipline. Golden rule: prototype in virtual, materialize in dbt as soon as a dataset becomes critical. This guarantees performance and maintainability.

Want the benefits of Apache Superset without the friction of installation and maintenance? Deploy your instance in 3 clicks with TVL Managed Superset, hosted in Europe (OVHcloud, Roubaix, France).

For more: Jinja templating, SQL Lab, Superset data modeling.