[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    [HACKERS] pattern match,  index use and performance
From:       "John Liu" <johnl () emrx ! com>
Date:       2003-04-30 15:13:28
[Download RAW message or body]

This seems an old topic to the list,
but it's one of the confusing area for
postgresql users -

example table:

1. about "like" and "similar to" operators
postgresql can't use index if your database
encoding is anything other than 'C'. After I did
this, I knew this will not take care of
'%pattern%' map, which requires the text search module.

question 1: query uses the index, why the performance is still bad?

emrxdbs=# explain analyze select * from patient where lastname like 'BEC%';
                                                             QUERY PLAN
----------------------------------------------------------------------------
--------------------------
 Index Scan using patient_lname_idx on patient  (cost=0.00..251.65 rows=62
width=176) (actual time=63.48..8860.14 rows=1732 loops=1)
   Index Cond: ((lastname >= 'BEC'::character varying) AND (lastname <
'BED'::character varying))
   Filter: (lastname ~~ 'BEC%'::text)
 Total runtime: 8870.27 msec

emrxdbs=#  explain analyze select * from patient where firstname like
'RAN%';
                                                             QUERY PLAN
----------------------------------------------------------------------------
--------------------------
 Index Scan using patient_fname_idx on patient  (cost=0.00..18.25 rows=4
width=176) (actual time=39.84..12515.44 rows=2649 loops=1)
   Index Cond: ((firstname >= 'RAN'::character varying) AND (firstname <
'RAO'::character varying))
   Filter: (firstname ~~ 'RAN%'::text)
 Total runtime: 12532.38 msec

question 2: why "not like" is not using index?

emrxdbs=#  explain analyze select * from patient where firstname not like
'MAR%';
                                                    QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------
 Seq Scan on patient  (cost=0.00..20102.05 rows=745897 width=176) (actual
time=0.09..76515.69 rows=717610 loops=1)
   Filter: (firstname !~~ 'MAR%'::text)
 Total runtime: 81983.46 msec
(3 rows)


2. about POSIX Regular Expressions, it doesn't use index at all?!
emrxdbs=# explain analyze select * from patient where firstname ~ 'RAN%';
                                                   QUERY PLAN
----------------------------------------------------------------------------
------------------------------------
 Seq Scan on patient  (cost=0.00..20102.05 rows=5985 width=176) (actual
time=25040.93..25040.93 rows=0 loops=1)
   Filter: (firstname ~ 'RAN%'::text)
 Total runtime: 25041.23 msec
(3 rows)

3. what's the general guidelines in terms of simple pattern
match in postgresql (not the text search module)?

Can someone help to clear the above questions?

thanks.

johnl
============================================================================
========
exmple table - count=750000
     Column     |            Type             |
Modifiers
----------------+-----------------------------
 patseq         | integer                     | not null default
nextval('public.patient_patseq_seq'::text)
 mpin           | character varying(12)       | not null
 mrn            | character varying(18)       | not null
 facilityid     | character varying(12)       | not null
 dob            | date                        |
 sex            | character(1)                |
 lastname       | character varying(30)       |
 firstname      | character varying(30)       |
 middlename     | character varying(30)       |
 terminatedcare | boolean                     |
 iscurrent      | boolean                     | not null
 fielda         | character varying(10)       |
 fieldb         | character varying(10)       |
 fieldc         | character varying(10)       |
 fieldd         | character varying(10)       |
 fielde         | character varying(10)       |
 fieldf         | timestamp without time zone | default
('now'::text)::timestamp(6) with time zone
Indexes: patient_pkey primary key btree (patseq),
         patient_fname_idx btree (firstname),
         patient_lname_idx btree (lastname),
         patient_mpin_idx btree (mpin),
         patient_mrnfac_idx btree (mrn, facilityid)


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic