TVL Managed Superset

PostgreSQL Tuning for Apache Superset: 2026 Guide

Optimize PostgreSQL for Apache Superset: configuration, indexing, vacuum, monitoring, slow queries, and production best practices.

PostgreSQL is the recommended metadata database for Apache Superset. Properly configured, it supports several hundred concurrent users with imperceptible latency. Mistuned, it becomes the main bottleneck of your instance. This guide details the key parameters to adjust, indexing, vacuum, and monitoring to keep Postgres performant in 2026.

1. Postgres in Superset architecture

Apache Superset uses PostgreSQL for two distinct uses: the metadata database (internal to Superset, storing dashboards, datasets, users) and potentially as an analytical data source (queried by charts). This article focuses on the first case, where tuning impacts UI latency and reliability.

If you want to skip this configuration work, TVL Managed Superset relies on a Postgres OVHcloud Business pre-tuned for Superset, deployed in less than 3 minutes.

2. Initial sizing

LoadvCPURAMSSD storage
POC / dev24 GB20 GB
Team ≤30 users2-48 GB50 GB
Production 100-300 users4-816 GB100 GB
Production 500+8-1632 GB+200 GB+

3. Key postgresql.conf parameters

# Connections
max_connections = 200
superuser_reserved_connections = 5

# Memory
shared_buffers = 4GB              # ~25% of RAM
effective_cache_size = 12GB       # ~75% of RAM
work_mem = 16MB                   # per sort operation
maintenance_work_mem = 512MB      # vacuum, index

# WAL
wal_buffers = 16MB
min_wal_size = 1GB
max_wal_size = 4GB
checkpoint_completion_target = 0.9

# Planner
random_page_cost = 1.1            # SSD
effective_io_concurrency = 200    # SSD

# Parallelism
max_parallel_workers = 8
max_parallel_workers_per_gather = 4

# Slow query logging
log_min_duration_statement = 500  # log queries >500ms
log_lock_waits = on

These values are starting points for 16 GB RAM. Adjust with PgTune for your exact profile.

4. Connection pooling with PgBouncer

Superset opens a connection per query. Without pooling, 100 users = 100 Postgres connections = saturation. Insert PgBouncer in transaction mode between Superset and Postgres:

[databases]
superset_meta = host=postgres port=5432 dbname=superset

[pgbouncer]
listen_port = 6432
auth_type = md5
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5

PgBouncer multiplexes 1,000 client connections onto 25 effective Postgres connections. Typical gain: ×5 to ×10 on latency under load.

5. Indexing critical tables

The Superset DB has about 30 tables. A few extra indexes improve performance:

-- User activity (FAB logs)
CREATE INDEX CONCURRENTLY idx_logs_user_dttm
  ON logs(user_id, dttm DESC);

-- Slices and dashboards by owner
CREATE INDEX CONCURRENTLY idx_slices_created_by
  ON slices(created_by_fk);

-- Query cache
CREATE INDEX CONCURRENTLY idx_query_user_status
  ON query(user_id, status);

-- Permissions
CREATE INDEX CONCURRENTLY idx_permissions_view_role
  ON permission_view_role(permission_view_id, role_id);

The CONCURRENTLY keyword avoids the exclusive lock on the table during index creation. Essential in production.

6. Vacuum and autovacuum

PostgreSQL uses MVCC creating multiple row versions. Without regular vacuum, the table grows indefinitely and performance drops. Recommended Superset DB settings:

autovacuum = on
autovacuum_max_workers = 4
autovacuum_naptime = 30s
autovacuum_vacuum_scale_factor = 0.05    # vacuum at 5% variation
autovacuum_analyze_scale_factor = 0.02

For very active tables (logs, query_status), force a more aggressive threshold:

ALTER TABLE logs SET (
  autovacuum_vacuum_scale_factor = 0.02,
  autovacuum_analyze_scale_factor = 0.01
);

7. Monitoring: what to observe

  • Active connections (pg_stat_activity);
  • Query latency via pg_stat_statements;
  • Cache hit ratio (target >99% on read);
  • Replication lag on replicas;
  • Bloat on tables and indexes;
  • Lock waits and deadlocks;
  • Disk space and growth.

Recommended tools: pgBadger for log analysis, pgwatch2 or pganalyze for real-time monitoring, Prometheus + postgres_exporter + Grafana to integrate with your observability stack.

8. Identify slow queries

-- Top 10 most consuming queries
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Currently locked queries
SELECT pid, state, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE wait_event_type IS NOT NULL;

This configuration is applied by default on TVL Managed Superset, which follows community best practices.

9. Backups and PITR

Postgres allows Point-In-Time Recovery via WAL archiving:

archive_mode = on
archive_command = 'rsync -a %p backup-server:/wal-archive/%f'
wal_level = replica

Tools: pgBackRest, Barman, WAL-G. Test restoration quarterly.

10. Optimization summary

OptimizationExpected gainDifficulty
shared_buffers / effective_cache_size adapted×2 to ×3 on latencyLow
PgBouncer in transaction mode×5 to ×10 under loadMedium
Missing indexes×10 to ×100 ad-hocLow
Aggressive autovacuumStable latency over timeLow
Read replica for reportingOffloads primaryHigh

11. Conclusion

A well-tuned Postgres is invisible: Superset runs, users click, no one complains. The above parameters cover 80% of cases. For the remaining 20% (very high loads, multi-tenant), often it requires moving to a replicated architecture and advanced monitoring.

Want the benefits of Apache Superset without the friction of configuration and tuning? Deploy your instance in 3 clicks with TVL Managed Superset, hosted in Europe (OVHcloud, Roubaix, France), pre-tuned Postgres included.

For more, also read Redis cache for Superset, high availability, and Prometheus monitoring.