In Progress
Analytics
- A custom scan node for aggregates. This will allow “plain SQL” aggregates to go through the same fast execution path as
our aggregate UDFs, further accelerating aggregates like
COUNT, and SQL clauses likeGROUP BY.
JOIN Improvements
- Scoring and highlighting across JOINs. BM25 score and snippet functions can be used in
JOINqueries. - Smarter JOIN planning for search indexes. Apply index-aware optimizations and cost estimation strategies when multiple BM25-indexed tables are joined.
- Faster JOIN performance through predicate pushdown. Search predicates are selectively pushed down to relevant tables based on indexability and selectivity, improving
JOINquery speed.
Long Term
Managed Cloud
- Today, you can deploy ParadeDB either self-hosted or with ParadeDB BYOC. We are working on a fully managed cloud offering, with a focus on scalability and supporting distributed workloads.
Deeper Analytics Improvements
- Push Postgres visibility rules into the index. This is currently a filter applied post index scan that adds overhead to large scans.
- Evaluate more industry-standard OLAP tools. A new file format? Query execution library?
Vector Search Improvements
- Postgres (and by extension, ParadeDB) uses
pgvectorfor vector search. Contingent on demand and internal resources, we may investigate what improvements can be made to the known limitations ofpgvector.
Completed
Write Throughput
- Background merging. Improves write performance by merging index segments asynchronously without blocking inserts.
- Pending list. Buffers recent write before flushing them to the LSM tree.
Improved UX
- More intuitive index configuration. Overhaul the complicated JSON
WITHindex options. - More ORM friendly. Overhaul the query builder functions to use actual column references instead of string literals.
- New operators. In addition to the existing
@@@operator, introduce new operators for different query types (e.g. phrase, term, conjunction/disjunction).