'[HACKERS] GIN internal query-plan alternatives.'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    [HACKERS] GIN internal query-plan alternatives.
From:       jesper () krogh ! cc
Date:       2011-11-30 11:38:20
Message-ID: 2f73f1348320af71b0551e2934d588ca.squirrel () shrek ! krogh ! cc
[Download RAW message or body]

Hi

I have a "feeling" that the internal query-plans (or alternative) 
query-plans when executing GIN-searches are not being exhausted as much as
they generally are in PG.

More specifically a query like:

select id from table where fts @@ to_tsquery('english','verycommon &
veryrare');

Can do an search on the "very rare" and postfilter it on the
very-common keyword. I had problems trying to force the query-planner into
executing a query that forced that behavior, but here is my attempt:


2011-11-30 11:33:41.010 testdb=# explain analyze select id from
testdb.testtable where id in (select id from testdb.testtable where fts @@
pptsquery('veryrare')) and fts @@ pptsquery('verycommon') order by id desc
limit 300;
                                                                        QUERY
PLAN
























----------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=14203.33..14204.08 rows=300 width=4) (actual
time=62.561..62.567 rows=14 loops=1)
   ->  Sort  (cost=14203.33..14206.91 rows=1430 width=4) (actual
time=62.561..62.566 rows=14 loops=1)
         Sort Key: testdb.testtable.id
         Sort Method:  quicksort  Memory: 25kB
         ->  Nested Loop  (cost=4799.17..14137.35 rows=1430 width=4)
(actual time=11.225..62.530 rows=14 loops=1)
               ->  HashAggregate  (cost=4799.17..4813.47 rows=1430
width=4) (actual time=6.792..7.941 rows=2409 loops=1)
                     ->  Bitmap Heap Scan on testtable
(cost=1943.29..4795.59 rows=1430 width=4) (actual
time=0.962..5.174 rows=2409 loops=1)
                           Recheck Cond: (fts @@ '''veryrare'''::tsquery)
->  Bitmap Index Scan on testtable_gin_idx
(cost=0.00..1942.93 rows=1430 width=0) (actual
time=0.635..0.635 rows=2419 loops=1)
                                 Index Cond: (fts @@
'''veryrare'''::tsquery)
               ->  Index Scan using testtable_pkey on testtable
(cost=0.00..6.51 rows=1 width=4) (actual time=0.022..0.022
rows=0 loops=2409)
                     Index Cond: (testdb.testtable.id =
testdb.testtable.id)
                     Filter: (testdb.testtable.fts @@
'''verycommon'''::tsquery)
 Total runtime: 62.679 ms
(14 rows)

Time: 125.899 ms
2011-11-30 11:40:59.673 testdb=# explain analyze select id from
testdb.testtable where fts @@ pptsquery('verycommon veryrare') order by id
desc limit 300;
       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=2522.12..2522.87 rows=300 width=4) (actual
time=1282.967..1282.972 rows=14 loops=1)
   ->  Sort  (cost=2522.12..2523.88 rows=704 width=4) (actual
time=1282.965..1282.968 rows=14 loops=1)
         Sort Key: id
         Sort Method:  quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on testtable  (cost=1081.67..2489.63
rows=704 width=4) (actual time=1282.902..1282.948 rows=14
loops=1)
               Recheck Cond: (fts @@ '''verycommon'' &
''veryrare'''::tsquery)
               ->  Bitmap Index Scan on testtable_gin_idx
(cost=0.00..1081.49 rows=704 width=0) (actual
time=1282.880..1282.880 rows=17 loops=1)
                     Index Cond: (fts @@ '''verycommon'' &
''veryrare'''::tsquery)
 Total runtime: 1283.274 ms
(9 rows)

Time: 1300.587 ms
2011-11-30 11:41:13.217 testdb=#

This may of-course not always be the optimal query-plan, but in this
situation the alternative plan is roughly 20-times better.
(both queries cached).

The in-clause is not "the best way", a regular filter on the results of 
the veryrare-term would be natural, but the query-planner cleverly
collapses that to be the exact same thing.

This can be stressed by just adding the same "common" keyword several time
to the GIN-search, where the query, even producing the same results gets
slower and slower:

2011-11-30 11:51:10.239 testdb=# select count(id) from testdb.testtable
where id in (select id from testdb.testtable where fts @@
pptsquery('veryrare')) and fts @@ pptsquery('verycommon verycommon
verycommon');
 count
-------
    14
(1 row)


Time: 90.389 ms
2011-11-30 11:51:16.777 testdb=# select count(id) from testdb.testtable
where fts @@ to_tsquery('english','veryrare & verycommon & verycommon &
verycommon');
 count
-------
    14
(1 row)

Time: 2207.125 ms

PG does a lot of query-rewriting and testing different query-plans, there
seems to be room for improvements here. I'll craft a test-dataset
where it can be reproduced. Then someone with internal PG knowledge can
tell me if it just isn't implemented or I might have some configuration 
option that prevents my system from doing it correctly.

-- 
Jesper





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic