[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: [PERFORM] Increasing pattern index query speed
From:       "Andrus" <kobruleht2 () hot ! ee>
Date:       2008-11-26 19:24:48
Message-ID: 91652AA4F32448288A88CD62CF84B330 () andrusnotebook
[Download RAW message or body]

Scott,

>My first thought on the query where a pattern being faster than the query 
>with an exact value is that the planner does not have good enough 
>statistics on that column.  Without looking at the explain plans further, I 
>would suggest trying something simple.  The fact that it is fasster on 8.3 
>but slower on 8.1 may have to do with changes between versions, or may 
>simply be due to luck in the statistics sampling.
>See if increasing the statistics target on that column significantly does 
>anything:
>EXPLAIN (your query);
ALTER TABLE orders_products ALTER COLUMN product_id SET STATISTICS 2000;
ANALYZE orders_products;
EXPLAIN (your query);
>2000 is simply a guess of mine for a value much larger than the default. 
>This will generally make query planning slower but the system will have a 
>lot more data about that column and the distribution of data in it.  This 
>should help stabilize the query performance.
>If this has an effect, the query plans will change.
>Your question below really boils down to something more simple:
> --Why is the most optimal query plan not chosen?  This is usually due to 
> either insufficient statistics or quirks in how the query planner works on 
> a specific data >set or with certain configuration options.

Thank you very much.
I found that  AND dok.kuupaev = date'2008-11-21' runs fast but
AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21'  runs very 
slow.

explain SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   WHERE rid.toode like '99000010%'

plan with default statistics:

"Aggregate  (cost=17.86..17.87 rows=1 width=0)"
"  ->  Nested Loop  (cost=0.00..17.85 rows=1 width=0)"
"        ->  Nested Loop  (cost=0.00..11.84 rows=1 width=24)"
"              Join Filter: ("outer".dokumnr = "inner".dokumnr)"
"              ->  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..5.81 
rows=1 width=4)"
"                    Index Cond: ((kuupaev >= '2008-11-21'::date) AND 
(kuupaev <= '2008-11-21'::date))"
"              ->  Index Scan using rid_toode_pattern_idx on rid 
(cost=0.00..6.01 rows=1 width=28)"
"                    Index Cond: ((toode ~>=~ '99000010'::bpchar) AND (toode 
~<~ '99000011'::bpchar))"
"                    Filter: (toode ~~ '99000010%'::text)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.00 rows=1 
width=24)"
"              Index Cond: ("outer".toode = toode.toode)"

after statistics is changed query runs fast ( 70 ... 1000 ms)

ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000;
analyze rid;
explain analyze SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   WHERE rid.toode like '99000010%'
   AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21'
"Aggregate  (cost=27.04..27.05 rows=1 width=0) (actual time=44.830..44.834 
rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.00..27.04 rows=1 width=0) (actual 
time=0.727..44.370 rows=108 loops=1)"
"        ->  Nested Loop  (cost=0.00..21.02 rows=1 width=24) (actual 
time=0.688..40.519 rows=108 loops=1)"
"              ->  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..5.81 
rows=1 width=4) (actual time=0.027..8.094 rows=1678 loops=1)"
"                    Index Cond: ((kuupaev >= '2008-11-21'::date) AND 
(kuupaev <= '2008-11-21'::date))"
"              ->  Index Scan using rid_dokumnr_idx on rid 
(cost=0.00..15.20 rows=1 width=28) (actual time=0.011..0.011 rows=0 
loops=1678)"
"                    Index Cond: ("outer".dokumnr = rid.dokumnr)"
"                    Filter: (toode ~~ '99000010%'::text)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.00 rows=1 
width=24) (actual time=0.016..0.020 rows=1 loops=108)"
"              Index Cond: ("outer".toode = toode.toode)"
"Total runtime: 45.050 ms"

It seems that you are genius.

I used 1000 since doc wrote that max value is 1000

Rid table contains 3.5millions rows, will increase 1 millions of rows per 
year and is updated frequently, mostly by adding.

Is it OK to leave

SET STATISTICS 1000;

setting for this table this column or should  I try to decrease it ?

Andrus. 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic