Skip to main content

Distributed Workloads

ParadeDB is designed to scale vertically on a single Postgres node with potentially many read replicas, and many production deployments comfortably operate in the 1–10TB range. The largest single ParadeDB database we’ve seen in production is 10TB. For datasets that significantly exceed this scale, ParadeDB supports partitioned tables and can be deployed in sharded Postgres configurations. If you’re working with very large datasets, please reach out to us. We’d be happy to provide guidance and share our roadmap for future distributed query support.

Join Support

ParadeDB supports all PostgreSQL JOINs:
  • INNER JOIN
  • LEFT / RIGHT / FULL OUTER JOIN
  • CROSS JOIN
  • LATERAL
  • Semi and Anti JOINs
For the most part you can mix search and relational queries without changing your SQL. However, JOINs do incur some performance tradeoffs. See the joins guide for more details.

Covering Index

The BM25 index in ParadeDB is a covering index, which means it stores all indexed columns inside a single index per table. This decision is intentional — by colocating all the relevant data, ParadeDB optimizes for fast reads and boolean conditions. However, this means that all columns must be defined up front at index creation time. Adding or removing columns requires a REINDEX.

DDL Replication

A commonly known limitation of Postgres logical replication is that DDL (Data Definition Language) statements are not replicated. This includes operations like CREATE TABLE or CREATE INDEX. If ParadeDB is running as a logical replica of a primary Postgres, DDL statements from the primary must be executed manually on the replica. We recommend version-controlling your schema changes and applying them in a coordinated, repeatable way — either through a migration tool or deployment automation — to keep source and target databases in sync.