TVL Managed Superset

Monitoring Apache Superset with Prometheus + Grafana

Set up Apache Superset monitoring with Prometheus and Grafana: metrics, dashboards, alerts, p95, error rate.

Without monitoring, you discover incidents through users. With Prometheus + Grafana, you see degradations before they become visible. This guide explains how to instrument Apache Superset, expose metrics, build dashboards, and configure alerts in 2026.

1. Why Prometheus + Grafana?

The reference open source pair for observability: Prometheus collects and stores time-series metrics, Grafana visualizes and alerts. Consistent with the rest of a modern Kubernetes stack, free, performant.

If you want this monitoring without the setup, TVL Managed Superset integrates Prometheus + Grafana by default on Pro+ instances.

2. Metrics to expose

Three layers to instrument:

  1. Superset application: request latency, error rate, Celery queue, dashboards rendered;
  2. Infrastructure: CPU, memory, disk, network;
  3. Databases: Postgres and Redis exporters.

3. Enable Superset metrics

In superset_config.py:

from werkzeug.middleware.dispatcher import DispatcherMiddleware
from prometheus_client import make_wsgi_app

STATS_LOGGER = "superset.stats_logger.StatsdStatsLogger"
STATS_LOGGER_PROMETHEUS = True

# /metrics endpoint exposed by Superset
def FLASK_APP_MUTATOR(app):
    app.wsgi_app = DispatcherMiddleware(app.wsgi_app, {
        "/metrics": make_wsgi_app()
    })

The /metrics endpoint then returns metrics in Prometheus format.

4. Complementary exporters

ExporterMetricsImage
node_exporterCPU, RAM, diskquay.io/prometheus/node-exporter
postgres_exporterConnections, latency, replication lagquay.io/prometheuscommunity/postgres-exporter
redis_exporterHit ratio, memory, evictionsoliver006/redis_exporter
kube-state-metricsState of pods, deployments, jobsk8s.gcr.io/kube-state-metrics
blackbox_exporterHTTP synthetic checksprom/blackbox-exporter

5. Prometheus scrape configuration

scrape_configs:
  - job_name: superset
    metrics_path: /metrics
    static_configs:
      - targets: ['superset:8088']
    scrape_interval: 30s

  - job_name: postgres
    static_configs:
      - targets: ['postgres-exporter:9187']

  - job_name: redis
    static_configs:
      - targets: ['redis-exporter:9121']

  - job_name: blackbox
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://superset.example.com/health
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

6. Essential Grafana dashboards

Dashboard 1 — Superset overview

  • Requests per second;
  • HTTP 5xx error rate (alert if >5% over 5 min);
  • Latency p50, p95, p99;
  • Celery queue (pending tasks);
  • Pods Up/Down and restarts.

Dashboard 2 — Databases

  • Active Postgres connections;
  • Query latency (top 10);
  • Redis cache hit ratio (target >90%);
  • Replication lag;
  • Disk space used.

Dashboard 3 — Infrastructure

  • CPU and memory per node / pod;
  • Network in/out;
  • Disk I/O;
  • OOM kills.

This configuration is applied by default on TVL Managed Superset, which follows community best practices.

7. Critical alerts to configure

groups:
  - name: superset
    rules:
      - alert: SupersetDown
        expr: up{job="superset"} == 0
        for: 2m
        labels: { severity: critical }
        annotations:
          summary: "Superset {{ $labels.instance }} is down"

      - alert: SupersetHighErrorRate
        expr: |
          rate(http_requests_total{status=~"5.."}[5m])
          / rate(http_requests_total[5m]) > 0.05
        for: 5m
        labels: { severity: warning }

      - alert: SupersetHighLatency
        expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 5
        for: 10m
        labels: { severity: warning }

      - alert: PostgresConnectionsHigh
        expr: pg_stat_activity_count > 150
        for: 5m
        labels: { severity: warning }

      - alert: RedisMemoryHigh
        expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.9
        for: 10m
        labels: { severity: warning }

8. Notification

Alertmanager routes alerts to:

  • Slack for teams;
  • PagerDuty / Opsgenie for on-call;
  • Email as backup;
  • Webhook for custom integration.

9. SLO and error budget

Define measurable, continuously-tracked SLOs:

SLOTargetMonthly error budget
Availability99.9%43 min
Latency p95 < 2s99% of requests1% of requests
5xx error rate< 0.1%0.1%

10. Alternative tools

  • Datadog: commercial but turnkey, expensive at scale;
  • New Relic: powerful APM;
  • Elastic Stack: log-oriented but with metrics;
  • OpenObserve: open source, lighter.

11. Conclusion

A well-monitored Superset means a Superset where you sleep peacefully. The initial 1-2 engineer-day investment pays back its first week by avoiding a too-late-detected incident. For serious production deployment, it's non-negotiable.

Want the benefits of Apache Superset without the friction of installation and maintenance? Deploy your instance in 3 clicks with TVL Managed Superset, hosted in Europe (OVHcloud, Roubaix, France), monitoring included.

For more: centralized logs, high availability, production checklist.