Virtual datasets in Apache Superset turn a SQL query into a reusable data source for charts. The data analyst's prototyping tool par excellence, and a useful complement to physical tables modeled in dbt. This guide explains how to use them without degrading performance in 2026.
1. Physical vs virtual datasets
| Aspect | Physical | Virtual |
|---|---|---|
| Source | A DB table | A SQL query |
| Performance | Optimal (direct read) | Depends on query |
| Update | Column refresh | Query modification |
| Use case | Production | Prototyping, light aggregations |
If you want an instance with template datasets, TVL Managed Superset offers ready-to-use datasets for SaaS metrics.
2. Create a virtual dataset from SQL Lab
- SQL Lab → write and test your query;
- Click Explore at the top right of the results;
- Enter the dataset name (e.g.,
orders_daily_summary); - Save → the dataset is created in virtual mode.
3. Create a virtual dataset from the Datasets UI
- UI → Datasets → + Dataset;
- Database = your connection;
- Toggle Virtual Dataset;
- Paste the SQL query;
- Save.
4. Example — Top 10 products
SELECT
product_name,
category,
SUM(amount) AS revenue,
COUNT(*) AS orders_count
FROM orders
JOIN products ON products.id = orders.product_id
WHERE created_at >= NOW() - INTERVAL '90 days'
GROUP BY product_name, category
ORDER BY revenue DESC
LIMIT 10;
5. Computed metrics
Once the dataset is created, add metrics at the dataset level (reusable in all charts):
-- Metrics tab of the dataset
total_revenue : SUM(amount)
avg_order_value : AVG(amount)
unique_customers : COUNT(DISTINCT customer_id)
6. Performance pitfalls
Virtual datasets are simply SELECT * FROM (...) wrappers. If the subquery is heavy, each chart re-runs it.
- Aggregations in virtual: OK if data already filtered;
- Multi-million JOINs: prefer materializing via dbt;
- Nested subqueries: flatten with CTE for readability;
- No cache by default on virtual datasets: adjust Cache timeout.
This configuration is applied by default on TVL Managed Superset, which follows community best practices.
7. Best practices
- Virtual datasets = prototyping;
- Physical datasets (dbt) = production as soon as a dashboard is consolidated;
- Document each dataset (description, owner, refresh);
- Naming convention: prefix with
virt_to distinguish; - Versioning: export datasets in YAML and commit in Git.
8. Convert a virtual dataset to physical
When a virtual dataset becomes critical:
- Reproduce the query in dbt (e.g.,
marts/orders_daily_summary.sql); - Materialize as
tableorincremental; - In Superset, create a physical dataset pointing to the new table;
- Update charts to point to the new dataset;
- Delete the virtual dataset.
9. Common pitfalls
- Too many virtual datasets: complexity explosion;
- Unversioned modifications: a user modifies the query, breaks 5 charts;
- Performance degraded without pre-aggregation;
- Dependency on DB-specific functions not portable.
10. Conclusion
Virtual datasets are a powerful tool to move fast, but to use with discipline. Golden rule: prototype in virtual, materialize in dbt as soon as a dataset becomes critical. This guarantees performance and maintainability.
Want the benefits of Apache Superset without the friction of installation and maintenance? Deploy your instance in 3 clicks with TVL Managed Superset, hosted in Europe (OVHcloud, Roubaix, France).
For more: Jinja templating, SQL Lab, Superset data modeling.