TVL Managed Superset

Connect BigQuery to Apache Superset: 2026 Guide

Tutorial to connect Google BigQuery to Apache Superset: service account, URI, optimization, partitions, cache. Step-by-step.

Google BigQuery is a cloud data warehouse heavily used by modern data teams. The connection to Apache Superset has been solid since version 4.x but requires service account authentication and some optimizations to remain economical. This tutorial details the steps in 2026.

1. Prerequisites

  • A Superset instance (see hosting guide);
  • A Google Cloud project with BigQuery enabled;
  • A service account with the right roles;
  • The sqlalchemy-bigquery and google-cloud-bigquery drivers in Superset.

If you want a Superset already connected to BigQuery, TVL Managed Superset includes GCP drivers by default.

2. Create a GCP service account

  1. GCP Console → IAM & Admin → Service Accounts;
  2. Create a superset-reader service account;
  3. Assign roles: BigQuery Data Viewer, BigQuery Job User, optionally BigQuery Read Session User for massive reads;
  4. Create a JSON key and download it.

3. Install drivers

If not already installed:

# In Superset bootstrap (Helm) or derived Dockerfile
uv pip install sqlalchemy-bigquery google-cloud-bigquery

4. Configure the connection in Superset

  1. UI → Settings → Database Connections → + Database;
  2. Type: Google BigQuery;
  3. SQLAlchemy URI: bigquery://<project-id>;
  4. In Advanced → Other, paste the service account JSON content into Engine Parameters:
{
  "credentials_info": {
    "type": "service_account",
    "project_id": "...",
    "private_key_id": "...",
    "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
    "client_email": "...",
    ...
  }
}

In production, do not paste the JSON in Superset: use an environment variable pointing to a file mounted as a K8s secret, and reference via credentials_path.

5. Test

Click Test connection. If successful, you see your BigQuery datasets in Datasets → + Dataset.

6. Optimize costs

BigQuery bills per byte scanned. A few golden rules:

  • Always filter on the partition column (_PARTITIONDATE or explicit column);
  • Explicit SELECT of columns, never SELECT *;
  • Enable Superset cache with long timeout (24h) on executive dashboards;
  • BI Engine for repeated queries on small datasets;
  • Materialized views or aggregated tables for heavy fact tables.

This configuration is applied by default on TVL Managed Superset, which follows community best practices.

7. Performance

  • Async queries with Celery: essential, BigQuery can take 30-60s;
  • Connection pool: less critical than with Postgres, BigQuery uses jobs;
  • Connection mutator Superset to automatically add --label to jobs and trace consumption per dashboard.

8. Security

  • Dedicated service account per environment (dev / prod);
  • Minimal roles: only give BigQuery Data Viewer on relevant datasets;
  • VPC Service Controls to limit access from Superset IP only;
  • GCP audit logs enabled to trace each query.

9. Common pitfalls

  • "Access denied": forgetting BigQuery Job User (different from Data Viewer);
  • Costs that explode: an unfiltered dashboard scans several TB/day, up to €5-10/day;
  • Schema too granular: avoid datasets with hundreds of tables, BigQuery is slow on the UI side;
  • Timezone: BigQuery stores in UTC, Superset can display in local. Align via WITH ZONE in queries.

10. Conclusion

BigQuery + Superset is a very productive combo for data teams on Google Cloud. The connection is simple, but costs can rise quickly if you don't follow partition filtering and cache rules. A few days of setup allow economical and performant use for years.

Want the benefits of Apache Superset without the friction of installation and maintenance? Deploy your instance in 3 clicks with TVL Managed Superset, hosted in Europe (OVHcloud, Roubaix, France).

For more: connect Snowflake, connect ClickHouse, caching strategies.