Quickstart

This guide will walk you through a few queries to give you a feel for ParadeDB.

Create Example Table

ParadeDB comes with a helpful procedure that creates a table populated with mock data to help you get started. Once connected with psql, run the following commands to create and inspect this table.

CALL paradedb.create_bm25_test_table(
  schema_name => 'public',
  table_name => 'mock_items'
);

SELECT description, rating, category
FROM mock_items
LIMIT 3;

Expected Response

       description        | rating |  category
--------------------------+--------+-------------
 Ergonomic metal keyboard |      4 | Electronics
 Plastic Keyboard         |      4 | Electronics
 Sleek running shoes      |      5 | Footwear
(3 rows)

Next, let’s create a BM25 index called search_idx on this table. A BM25 index is a covering index, which means that multiple columns can be included in the same index.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, category, rating, in_stock, created_at, metadata, weight_range)
WITH (key_field='id');

As a general rule of thumb, any columns that you want to filter, GROUP BY, ORDER BY, or aggregate as part of a full text query should be added to the index for faster performance.

Note the mandatory key_field option. See choosing a key field for more details.

Match Query

We’re now ready to execute a basic text search query. We’ll look for matches where description matches running shoes where rating is greater than 2.

SELECT description, rating, category
FROM mock_items
WHERE description ||| 'running shoes' AND rating > 2
ORDER BY rating
LIMIT 5;

Expected Response

     description     | rating | category
---------------------+--------+----------
 White jogging shoes |      3 | Footwear
 Generic shoes       |      4 | Footwear
 Sleek running shoes |      5 | Footwear
(3 rows)

||| is ParadeDB’s custom match disjunction operator, which means “find me all documents containing running OR shoes. If we want all documents containing running AND shoes, we can use ParadeDB’s &&& match conjunction operator.

SELECT description, rating, category
FROM mock_items
WHERE description &&& 'running shoes' AND rating > 2
ORDER BY rating
LIMIT 5;

Expected Response

     description     | rating | category
---------------------+--------+----------
 Sleek running shoes |      5 | Footwear
(1 row)

BM25 Scoring

Next, let’s add BM25 scoring to the results, which sorts matches by relevance. To do this, we’ll use pdb.score.

SELECT description, pdb.score(id)
FROM mock_items
WHERE description ||| 'running shoes' AND rating > 2
ORDER BY score DESC
LIMIT 5;

Expected Response

     description     |   score
---------------------+-----------
 Sleek running shoes |  6.817111
 Generic shoes       | 3.8772602
 White jogging shoes | 3.4849067
(3 rows)

Highlighting

Finally, let’s also highlight the relevant portions of the documents that were matched. To do this, we’ll use pdb.snippet.

SELECT description, pdb.snippet(description), pdb.score(id)
FROM mock_items
WHERE description ||| 'running shoes' AND rating > 2
ORDER BY score DESC
LIMIT 5;

Expected Response

     description     |              snippet              |   score
---------------------+-----------------------------------+-----------
 Sleek running shoes | Sleek <b>running</b> <b>shoes</b> |  6.817111
 Generic shoes       | Generic <b>shoes</b>              | 3.8772602
 White jogging shoes | White jogging <b>shoes</b>        | 3.4849067
(3 rows)

Top N

ParadeDB is highly optimized for quickly returning the Top N results out of the index. In SQL, this means queries that contain an ORDER BY...LIMIT:

SELECT description, rating, category
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating
LIMIT 5;

Expected Response

     description     | rating | category
---------------------+--------+----------
 White jogging shoes |      3 | Footwear
 Generic shoes       |      4 | Footwear
 Sleek running shoes |      5 | Footwear
(3 rows)

Faceted queries allow a single query to return both the Top N results and an aggregate value, which is more CPU-efficient than issuing two separate queries. For example, the following query returns the top 3 results as well as the total number of results matched.

SELECT
     description, rating, category,
     pdb.agg('{"value_count": {"field": "id"}}') OVER ()
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating
LIMIT 5;

Expected Response

     description     | rating | category |      agg
---------------------+--------+----------+----------------
 White jogging shoes |      3 | Footwear | {"value": 3.0}
 Generic shoes       |      4 | Footwear | {"value": 3.0}
 Sleek running shoes |      5 | Footwear | {"value": 3.0}
(3 rows)

That’s it! Next, let’s load your data to start running real queries.

Documentation

Create Example Table

Match Query

BM25 Scoring

Highlighting

Top N

Facets

Documentation

​Create Example Table

​Match Query

​BM25 Scoring

​Highlighting

​Top N

​Facets

Create Example Table

Match Query

BM25 Scoring

Highlighting

Top N

Facets