This guide will walk you through a few queries to give you a feel for ParadeDB.
Create Example Table
ParadeDB comes with a helpful procedure that creates a table populated with mock data to help
you get started. Once connected with psql, run the following commands to create and inspect
this table.
CALL paradedb.create_bm25_test_table(
schema_name => 'public',
table_name => 'mock_items'
);
SELECT description, rating, category
FROM mock_items
LIMIT 3;
description | rating | category
--------------------------+--------+-------------
Ergonomic metal keyboard | 4 | Electronics
Plastic Keyboard | 4 | Electronics
Sleek running shoes | 5 | Footwear
(3 rows)
Next, let’s create a BM25 index called search_idx on this table. A BM25 index is a covering index, which means that multiple columns can be included in the same index.
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, category, rating, in_stock, created_at, metadata, weight_range)
WITH (key_field='id');
As a general rule of thumb, any columns that you want to filter, GROUP BY,
ORDER BY, or aggregate as part of a full text query should be added to the
index for faster performance.
Match Query
We’re now ready to execute a basic text search query. We’ll look for matches where description matches running shoes where rating is greater than 2.
SELECT description, rating, category
FROM mock_items
WHERE description ||| 'running shoes' AND rating > 2
ORDER BY rating
LIMIT 5;
description | rating | category
---------------------+--------+----------
White jogging shoes | 3 | Footwear
Generic shoes | 4 | Footwear
Sleek running shoes | 5 | Footwear
(3 rows)
||| is ParadeDB’s custom match disjunction operator, which means “find me all documents containing
running OR shoes.
If we want all documents containing running AND shoes, we can use ParadeDB’s &&& match conjunction operator.
SELECT description, rating, category
FROM mock_items
WHERE description &&& 'running shoes' AND rating > 2
ORDER BY rating
LIMIT 5;
description | rating | category
---------------------+--------+----------
Sleek running shoes | 5 | Footwear
(1 row)
BM25 Scoring
Next, let’s add BM25 scoring to the results, which sorts matches by relevance. To do this, we’ll use pdb.score.
SELECT description, pdb.score(id)
FROM mock_items
WHERE description ||| 'running shoes' AND rating > 2
ORDER BY score DESC
LIMIT 5;
description | score
---------------------+-----------
Sleek running shoes | 6.817111
Generic shoes | 3.8772602
White jogging shoes | 3.4849067
(3 rows)
Highlighting
Finally, let’s also highlight the relevant portions of the documents that were matched.
To do this, we’ll use pdb.snippet.
SELECT description, pdb.snippet(description), pdb.score(id)
FROM mock_items
WHERE description ||| 'running shoes' AND rating > 2
ORDER BY score DESC
LIMIT 5;
description | snippet | score
---------------------+-----------------------------------+-----------
Sleek running shoes | Sleek <b>running</b> <b>shoes</b> | 6.817111
Generic shoes | Generic <b>shoes</b> | 3.8772602
White jogging shoes | White jogging <b>shoes</b> | 3.4849067
(3 rows)
Top N
ParadeDB is highly optimized for quickly returning the Top N results out of the index. In SQL, this means queries that contain an ORDER BY...LIMIT:
SELECT description, rating, category
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating
LIMIT 5;
description | rating | category
---------------------+--------+----------
White jogging shoes | 3 | Footwear
Generic shoes | 4 | Footwear
Sleek running shoes | 5 | Footwear
(3 rows)
Facets
Faceted queries allow a single query to return both the Top N results and an aggregate value,
which is more CPU-efficient than issuing two separate queries.
For example, the following query returns the top 3 results as well as the total number of results matched.
SELECT
description, rating, category,
pdb.agg('{"value_count": {"field": "id"}}') OVER ()
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating
LIMIT 5;
description | rating | category | agg
---------------------+--------+----------+----------------
White jogging shoes | 3 | Footwear | {"value": 3.0}
Generic shoes | 4 | Footwear | {"value": 3.0}
Sleek running shoes | 5 | Footwear | {"value": 3.0}
(3 rows)
That’s it! Next, let’s load your data to start running real queries.