[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: [PERFORM] Increasing pattern index query speed
From:       Mario Weilguni <mweilguni () sime ! com>
Date:       2008-11-26 14:15:29
Message-ID: 492D5A01.3060109 () sime ! com
[Download RAW message or body]

Andrus schrieb:
> Richard,
> 
> > These are the same but the times are different. I'd be very surprised if
> > you can reproduce these times reliably.
> 
> I re-tried today again and got same results: in production database 
> pattern query  is many times slower that equality query.
> toode and rid base contain only single product starting with 99000010
> So both queries should scan exactly same numbers of rows.
> 
> > Can I give you some wider-ranging suggestions Andrus?
> > 1. Fix the vacuuming issue in your hash-join question.
> 
> I have ran VACUUM FULL VERBOSE ANALYSE and set max_fsm_pages=150000
> So issue is fixed before those tests.
> 
> > 2. Monitor the system to make sure you know if/when disk activity is 
> > high.
> 
> I optimized this system. Now there are short (some seconds) sales 
> queries about after every 5 - 300 seconds which cause few disk 
> activity and add few new rows to some tables.
> I havent seen that this activity affects to this test result.
> 
> > 3. *Then* start to profile individual queries and look into their plans.
> > Change the queries one at a time and monitor again.
> 
> How to change pattern matching query to faster ?
> 
> Andrus.
> 
> Btw.
> 
> I tried to reproduce this big difference in test server in 8.3 using 
> sample data script below and got big difference but in opposite 
> direction.
> 
> explain analyze   SELECT sum(1)
> FROM   orders
> JOIN orders_products USING (order_id)
> JOIN products USING (product_id)
> WHERE orders.order_date>'2006-01-01' and ...
> 
> different where clauses produce different results:
> 
> AND orders_products.product_id =     '3370000000000000'  -- 880 .. 926 ms
> AND  orders_products.product_id like '3370000000000000%' -- 41 ..98 ms
> 
> So patter index is 10 .. 20 times (!) faster always.
> No idea why.
> 
> Test data creation script:
> 
> begin;
> CREATE OR REPLACE FUNCTION Counter() RETURNS int IMMUTABLE AS
> $_$
> SELECT 3500000;
> $_$ LANGUAGE SQL;
> 
> CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE 
> NOT NULL);
> CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name 
> char(70) NOT NULL, quantity numeric(12,2) default 1);
> CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, 
> product_id CHAR(20),
> id serial, price numeric(12,2) default 1 );
> 
> INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20),
> 'product number ' || n::TEXT FROM generate_series(0,13410) AS n;
> 
> INSERT INTO orders
> SELECT n,'2005-01-01'::date + (4000.0 * n/Counter() * '1 DAY'::interval)
> FROM generate_series(0, Counter()/3 ) AS n;
> 
> SET work_mem TO 2097151;
> 
> INSERT INTO orders_products SELECT
> generate_series/3 as  order_id,
> ( (1+ (generate_series % 13410))*power( 10,13))::INT8::CHAR(20) AS 
> product_id
> FROM generate_series(1, Counter());
> 
> ALTER TABLE orders ADD PRIMARY KEY (order_id);
> ALTER TABLE products ADD PRIMARY KEY (product_id);
> ALTER TABLE orders_products ADD PRIMARY KEY (id);
> 
> ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES 
> products(product_id);
> ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES 
> orders(order_id) ON DELETE CASCADE;
> 
> CREATE INDEX orders_date ON orders( order_date );
> CREATE INDEX order_product_pattern_idx ON orders_products( product_id 
> bpchar_pattern_ops );
> 
> COMMIT;
> SET work_mem TO DEFAULT;
> ANALYZE;
> 
No wonder that = compares bad, you created the index this way:
CREATE INDEX order_product_pattern_idx ON orders_products( product_id 
bpchar_pattern_ops );
why not:
CREATE INDEX order_product_pattern_idx ON orders_products( product_id);

explain analyze   SELECT sum(1)
FROM   orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date>'2006-01-01'
AND orders_products.product_id =     '3370000000000000';

                                                                QUERY 
PLAN                                                                
-------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=3013.68..3013.69 rows=1 width=0) (actual 
time=8.206..8.207 rows=1 loops=1)
   ->  Nested Loop  (cost=10.83..3013.21 rows=185 width=0) (actual 
time=2.095..7.962 rows=189 loops=1)
         ->  Index Scan using products_pkey on products  
(cost=0.00..8.27 rows=1 width=18) (actual time=0.036..0.038 rows=1 loops=1)
               Index Cond: ((product_id)::text = '3370000000000000'::text)
         ->  Nested Loop  (cost=10.83..3003.09 rows=185 width=18) 
(actual time=2.052..7.474 rows=189 loops=1)
               ->  Bitmap Heap Scan on orders_products  
(cost=10.83..949.68 rows=253 width=22) (actual time=0.161..0.817 
rows=261 loops=1)
                     Recheck Cond: ((product_id)::text = 
'3370000000000000'::text)
                     ->  Bitmap Index Scan on foo  (cost=0.00..10.76 
rows=253 width=0) (actual time=0.116..0.116 rows=261 loops=1)
                           Index Cond: ((product_id)::text = 
'3370000000000000'::text)
               ->  Index Scan using orders_pkey on orders  
(cost=0.00..8.10 rows=1 width=4) (actual time=0.020..0.021 rows=1 loops=261)
                     Index Cond: (orders.order_id = 
orders_products.order_id)
                     Filter: (orders.order_date > '2006-01-01'::date)
 Total runtime: 8.268 ms


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic