Wednesday, September 20, 2017

Query optimization when using a JSON field

Leave a Comment

Running a PostgreSQL 9.6.4 on my laptop, I have a table called node which has a primary key id field and a properties::jsonb field.

I have a GIN index setup on the properties field.

When I run this query:

SELECT   n.* FROM     node n WHERE    node_type_id = '2' AND      properties @> '{"slug":"wild-castles"}'::JSONB ORDER BY n.id ASC OFFSET 0 LIMIT 10; 

on ~5M rows table it takes about 20 seconds to get an answer. Looking into the explain plan I found out the query optimizer is first sorting the table by the primary key and then filtering by the properties field:

Limit  (cost=0.56..1517.94 rows=10 width=154)   ->  Index Scan using node_pkey on node n  (cost=0.56..739571.11 rows=4874 width=154)         Filter: ((properties @> '{"slug": "wild-castles"}'::jsonb) AND ((node_type_id)::text = '2'::text)) 

But when I remove the ordering I'm seeing the optimizer using the index as expected:

SELECT n.* FROM   node n WHERE  node_type_id = '2' AND    properties @> '{"slug":"wild-castles"}'::JSONB OFFSET 0 LIMIT 10;  Limit  (cost=93.77..127.10 rows=10 width=154)   ->  Bitmap Heap Scan on node n  (cost=93.77..16338.56 rows=4874 width=154)         Recheck Cond: (properties @> '{"slug": "wild-castles"}'::jsonb)         Filter: ((node_type_id)::text = '2'::text)         ->  Bitmap Index Scan on node_ix02  (cost=0.00..92.55 rows=4874 width=0)               Index Cond: (properties @> '{"slug": "wild-castles"}'::jsonb) 

Also, a simple WHERE properties @> '{"slug":"wild-castles"}'::JSONB behaves as expected:

EXPLAIN SELECT   n.* FROM     node n WHERE    properties @> '{"slug":"wild-castles"}'::JSONB ;  Bitmap Heap Scan on node n  (cost=93.77..16326.38 rows=4874 width=154)   Recheck Cond: (properties @> '{"slug": "wild-castles"}'::jsonb)   ->  Bitmap Index Scan on node_ix02  (cost=0.00..92.55 rows=4874 width=0)         Index Cond: (properties @> '{"slug": "wild-castles"}'::jsonb) 

So I guess I'm wondering why would the optimizer not use the index to filter out the rows first and then order them by the id field?

2 Answers

Answers 1

Change Planner Method Configuration and force planer not to do seqscan

eg

      SET enable_seqscan = OFF;         SELECT   n.*         FROM     node n                WHERE    node_type_id = '2'                AND      properties @> '{"slug":"wild-castles"}'::JSONB              ORDER BY n.id ASC OFFSET 0 LIMIT 10; 

Answers 2

In my experience, you sometimes have to trick the query planner to get it to perform well, and it takes some tweaking and fiddling with...

I would try running this to see how it performs:

SELECT nn.* FROM (     SELECT n.*     FROM   node n     WHERE  node_type_id = '2'     AND    properties @> '{"slug":"wild-castles"}'::JSONB ) nn ORDER BY nn.id ASC OFFSET 0 LIMIT 10; 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment