'Re: [PERFORM] Talking about optimizer, my long dream'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: [PERFORM] Talking about optimizer, my long dream
From:       Robert Haas <robertmhaas () gmail ! com>
Date:       2011-02-28 19:09:15
Message-ID: AANLkTikcRfxMaiegtDytPin_jdfV8mO_cabJWZTxxBpL () mail ! gmail ! com
[Download RAW message or body]

2011/2/27 ÷¦ÔÁĚ¦Ę ôÉÍŢÉŰÉÎ <tivv00@gmail.com>:
> 
> 
> 27 ĚŔÔĎÇĎ 2011 Ň. 19:59 Robert Haas <robertmhaas@gmail.com> ÎÁĐÉÓÁ×:
> > 
> > 2011/2/4 ÷¦ÔÁĚ¦Ę ôÉÍŢÉŰÉÎ <tivv00@gmail.com>:
> > > Hi, all.
> > > All this optimizer vs hint thread reminded me about crazy idea that got
> > > to
> > > my head some time ago.
> > > I currently has two problems with postgresql optimizer
> > > 1) Dictionary tables. Very usual thing is something like "select * from
> > > big_table where distionary_id = (select id from dictionary where
> > > name=value)". This works awful if dictionary_id distribution is not
> > > uniform.
> > 
> > Does it work better if you write it as a join?
> > 
> > SELECT b.* FROM big_table b, dictionary d WHERE b.dictionary_id = d.id
> > AND d.name = 'value'
> > 
> > I would like to see a concrete example of this not working well,
> > because I've been writing queries like this (with MANY tables) for
> > years and it's usually worked very well for me.
> > 
> Here you are:
> šPostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
> (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
> create table a(dict int4, val int4);
> create table b(dict int4, name text);
> create index c on a(dict);
> insert into b values (1, 'small'), (2, 'large');
> insert into a values (1,1);
> insert into a select 2,generate_series(1,10000);
> analyze a;
> analyze b;
> test=# explain analyze select * from a where dict=1;
> šš š š š š š š š š š š š š š š š š š š š š š QUERY PLAN
> 
> -----------------------------------------------------------------------------------------------------
>  šIndex Scan using c on a š(cost=0.00..8.27 rows=1 width=8) (actual
> time=0.014..0.016 rows=1 loops=1)
> šš Index Cond: (dict = 1)
> šTotal runtime: 0.041 ms
> (3 rows)
> test=# explain analyze select * from a where dict=2;
> šš š š š š š š š š š š š š š š š š š š š š š QUERY PLAN
> 
> -----------------------------------------------------------------------------------------------------
>  šSeq Scan on a š(cost=0.00..170.01 rows=10000 width=8) (actual
> time=0.014..6.876 rows=10000 loops=1)
> šš Filter: (dict = 2)
> šTotal runtime: 13.419 ms
> (3 rows)
> test=# explain analyze select * from a,b where a.dict=b.dict and
> b.name='small';
> šš š š š š š š š š š š š š š š š š š š š š š š šQUERY PLAN
> 
> -----------------------------------------------------------------------------------------------------------
>  šHash Join š(cost=1.04..233.55 rows=5000 width=18) (actual
> time=0.047..13.159 rows=1 loops=1)
> šš Hash Cond: (a.dict = b.dict)
> šš -> šSeq Scan on a š(cost=0.00..145.01 rows=10001 width=8) (actual
> time=0.009..6.633 rows=10001 loops=1)
> šš -> šHash š(cost=1.02..1.02 rows=1 width=10) (actual time=0.011..0.011
> rows=1 loops=1)
> šš š š š -> šSeq Scan on b š(cost=0.00..1.02 rows=1 width=10) (actual
> time=0.006..0.008 rows=1 loops=1)
> šš š š š š š š Filter: (name = 'small'::text)
> šTotal runtime: 13.197 ms
> (7 rows)
> test=# explain analyze select * from a,b where a.dict=b.dict and
> b.name='large';
> šš š š š š š š š š š š š š š š š š š š š š š š šQUERY PLAN
> 
> -----------------------------------------------------------------------------------------------------------
>  šHash Join š(cost=1.04..233.55 rows=5000 width=18) (actual
> time=0.074..21.476 rows=10000 loops=1)
> šš Hash Cond: (a.dict = b.dict)
> šš -> šSeq Scan on a š(cost=0.00..145.01 rows=10001 width=8) (actual
> time=0.012..7.085 rows=10001 loops=1)
> šš -> šHash š(cost=1.02..1.02 rows=1 width=10) (actual time=0.021..0.021
> rows=1 loops=1)
> šš š š š -> šSeq Scan on b š(cost=0.00..1.02 rows=1 width=10) (actual
> time=0.015..0.016 rows=1 loops=1)
> šš š š š š š š Filter: (name = 'large'::text)
> šTotal runtime: 28.293 ms
> (7 rows)
> It simply don't know that small=1 and large=2, so it never uses nested loop
> + iindex scan:
> test=# set enable_hashjoin=false;
> SET
> test=# explain analyze select * from a,b where a.dict=b.dict and
> b.name='small';
> šš š š š š š š š š š š š š š š š š š š š š š š š š QUERY PLAN
> 
> ----------------------------------------------------------------------------------------------------------------
>  šNested Loop š(cost=0.00..253.28 rows=5000 width=18) (actual
> time=0.041..0.047 rows=1 loops=1)
> šš -> šSeq Scan on b š(cost=0.00..1.02 rows=1 width=10) (actual
> time=0.010..0.012 rows=1 loops=1)
> šš š š š Filter: (name = 'small'::text)
> šš -> šIndex Scan using c on a š(cost=0.00..189.75 rows=5000 width=8)
> (actual time=0.021..0.023 rows=1 loops=1)
> šš š š š Index Cond: (a.dict = b.dict)
> šTotal runtime: 0.089 ms
> (6 rows)

Oh, I see.  Interesting example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic