Without monitoring, you discover incidents through users. With Prometheus + Grafana, you see degradations before they become visible. This guide explains how to instrument Apache Superset, expose metrics, build dashboards, and configure alerts in 2026.
1. Why Prometheus + Grafana?
The reference open source pair for observability: Prometheus collects and stores time-series metrics, Grafana visualizes and alerts. Consistent with the rest of a modern Kubernetes stack, free, performant.
If you want this monitoring without the setup, TVL Managed Superset integrates Prometheus + Grafana by default on Pro+ instances.
2. Metrics to expose
Three layers to instrument:
- Superset application: request latency, error rate, Celery queue, dashboards rendered;
- Infrastructure: CPU, memory, disk, network;
- Databases: Postgres and Redis exporters.
3. Enable Superset metrics
In superset_config.py:
from werkzeug.middleware.dispatcher import DispatcherMiddleware
from prometheus_client import make_wsgi_app
STATS_LOGGER = "superset.stats_logger.StatsdStatsLogger"
STATS_LOGGER_PROMETHEUS = True
# /metrics endpoint exposed by Superset
def FLASK_APP_MUTATOR(app):
app.wsgi_app = DispatcherMiddleware(app.wsgi_app, {
"/metrics": make_wsgi_app()
})
The /metrics endpoint then returns metrics in Prometheus format.
4. Complementary exporters
| Exporter | Metrics | Image |
|---|---|---|
| node_exporter | CPU, RAM, disk | quay.io/prometheus/node-exporter |
| postgres_exporter | Connections, latency, replication lag | quay.io/prometheuscommunity/postgres-exporter |
| redis_exporter | Hit ratio, memory, evictions | oliver006/redis_exporter |
| kube-state-metrics | State of pods, deployments, jobs | k8s.gcr.io/kube-state-metrics |
| blackbox_exporter | HTTP synthetic checks | prom/blackbox-exporter |
5. Prometheus scrape configuration
scrape_configs:
- job_name: superset
metrics_path: /metrics
static_configs:
- targets: ['superset:8088']
scrape_interval: 30s
- job_name: postgres
static_configs:
- targets: ['postgres-exporter:9187']
- job_name: redis
static_configs:
- targets: ['redis-exporter:9121']
- job_name: blackbox
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://superset.example.com/health
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
6. Essential Grafana dashboards
Dashboard 1 — Superset overview
- Requests per second;
- HTTP 5xx error rate (alert if >5% over 5 min);
- Latency p50, p95, p99;
- Celery queue (pending tasks);
- Pods Up/Down and restarts.
Dashboard 2 — Databases
- Active Postgres connections;
- Query latency (top 10);
- Redis cache hit ratio (target >90%);
- Replication lag;
- Disk space used.
Dashboard 3 — Infrastructure
- CPU and memory per node / pod;
- Network in/out;
- Disk I/O;
- OOM kills.
This configuration is applied by default on TVL Managed Superset, which follows community best practices.
7. Critical alerts to configure
groups:
- name: superset
rules:
- alert: SupersetDown
expr: up{job="superset"} == 0
for: 2m
labels: { severity: critical }
annotations:
summary: "Superset {{ $labels.instance }} is down"
- alert: SupersetHighErrorRate
expr: |
rate(http_requests_total{status=~"5.."}[5m])
/ rate(http_requests_total[5m]) > 0.05
for: 5m
labels: { severity: warning }
- alert: SupersetHighLatency
expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 5
for: 10m
labels: { severity: warning }
- alert: PostgresConnectionsHigh
expr: pg_stat_activity_count > 150
for: 5m
labels: { severity: warning }
- alert: RedisMemoryHigh
expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.9
for: 10m
labels: { severity: warning }
8. Notification
Alertmanager routes alerts to:
- Slack for teams;
- PagerDuty / Opsgenie for on-call;
- Email as backup;
- Webhook for custom integration.
9. SLO and error budget
Define measurable, continuously-tracked SLOs:
| SLO | Target | Monthly error budget |
|---|---|---|
| Availability | 99.9% | 43 min |
| Latency p95 < 2s | 99% of requests | 1% of requests |
| 5xx error rate | < 0.1% | 0.1% |
10. Alternative tools
- Datadog: commercial but turnkey, expensive at scale;
- New Relic: powerful APM;
- Elastic Stack: log-oriented but with metrics;
- OpenObserve: open source, lighter.
11. Conclusion
A well-monitored Superset means a Superset where you sleep peacefully. The initial 1-2 engineer-day investment pays back its first week by avoiding a too-late-detected incident. For serious production deployment, it's non-negotiable.
Want the benefits of Apache Superset without the friction of installation and maintenance? Deploy your instance in 3 clicks with TVL Managed Superset, hosted in Europe (OVHcloud, Roubaix, France), monitoring included.
For more: centralized logs, high availability, production checklist.