Cohort analysis is one of the most powerful tools to understand retention, expansion, and acquisition cohort value. Apache Superset lets you implement it with a bit of SQL modeling and a native visualization. This guide explains how to build robust cohort retention curves in 2026.
1. What is a cohort analysis?
A cohort is a group of users sharing a time-based commonality, typically their acquisition month. Cohort analysis observes the behavior of these groups over the following months to measure retention, cumulative revenue, or conversion rate.
If you want to start immediately, TVL Managed Superset offers a pre-configured cohort dashboard on Pro+ instances.
2. Three types of cohorts to know
- Retention cohort: % of users still active N months after acquisition;
- Revenue cohort: cumulative revenue per cohort (LTV);
- Behavioral cohort: groups defined by an event (e.g., users having performed action X in month M).
3. Minimal data model
For retention cohorts, you need at least:
-- customers table
CREATE TABLE dim_customers (
customer_id INT PRIMARY KEY,
acquired_at DATE NOT NULL,
segment VARCHAR
);
-- monthly activity table
CREATE TABLE fct_customer_active_monthly (
customer_id INT,
active_month DATE,
is_active BOOLEAN,
revenue DECIMAL,
PRIMARY KEY (customer_id, active_month)
);
4. SQL retention cohort query
WITH cohort AS (
SELECT
customer_id,
DATE_TRUNC('month', acquired_at) AS cohort_month
FROM dim_customers
),
activity AS (
SELECT
f.customer_id,
c.cohort_month,
DATE_DIFF('month', c.cohort_month, f.active_month) AS months_since_acq,
f.is_active,
f.revenue
FROM fct_customer_active_monthly f
JOIN cohort c ON c.customer_id = f.customer_id
)
SELECT
cohort_month,
months_since_acq,
COUNT(DISTINCT CASE WHEN is_active THEN customer_id END) AS active_customers,
SUM(revenue) AS cohort_revenue
FROM activity
GROUP BY 1, 2
ORDER BY 1, 2;
5. Visualization in Superset
- Create a virtual dataset with the above query;
- Chart type: Heatmap or Line chart;
- X axis:
months_since_acq; - Color (heatmap) or line (line chart):
cohort_month; - Metric:
active_customers / FIRST_VALUE(active_customers)for the percentage.
6. Retention curve pattern
Healthy cohorts show:
- An initial drop at M+1 (~30-50% B2C, 5-10% B2B SaaS);
- A stabilization between M+3 and M+6;
- Ideally a reactivation on older cohorts (smiley curve).
This configuration is applied by default on TVL Managed Superset, which follows community best practices.
7. Revenue cohort (LTV)
To calculate cumulative LTV per cohort:
SELECT
cohort_month,
months_since_acq,
SUM(revenue) AS monthly_revenue,
SUM(SUM(revenue)) OVER (
PARTITION BY cohort_month
ORDER BY months_since_acq
) AS cumulative_ltv
FROM activity
GROUP BY 1, 2;
8. Common pitfalls
- Cohorts too small (<30 users): variance too strong, signal lost in noise;
- No baseline: impossible to judge if retention is good without sector benchmark;
- Cohort by first payment vs by signup: different depending on product maturity;
- Ignoring seasonality: a B2B November cohort includes the year-end slump.
9. Going further
- Cohort by acquisition channel: identify the best-retaining channel;
- Cohort by first product purchased: optimize the onboarding funnel;
- Cohort by feature usage level M+1: measure the impact of product activation.
10. Conclusion
Cohort analysis is one of the most telling indicators of a recurring business's health. Apache Superset, combined with clean dbt modeling, lets you implement it in a few hours and turn it into continuous monitoring accessible to the whole team.
Want the benefits of Apache Superset without the friction of installation and maintenance? Deploy your instance in 3 clicks with TVL Managed Superset, hosted in Europe (OVHcloud, Roubaix, France).
For more: SaaS metrics, retention analysis, funnel analysis.