'Re: [GENERAL] how to improve this similarity query?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: [GENERAL] how to improve this similarity query?
From:       pasman_pasmañski <pasman.p () gmail ! com>
Date:       2011-09-30 16:17:19
Message-ID: CAOWY8=bx+t=XHy8T3am5qhK3VaYRxzSZcibL4E1VtHctrxbEUQ () mail ! gmail ! com
[Download RAW message or body]

Look at doc chapter II.12

2011/9/30, hamann.w@t-online.de <hamann.w@t-online.de>:
> Hi,
>
> I have a table of names, and searches are usually performed on prefix match.
> This could nicely translate into an index search
> Suppose first name is stored as either 'Jim' or 'Jimmy', searching
> ... where firstname ~* '^jim';
> gets proper result. I had hoped that creating a functional
> index on lower(firstname) and using a query like
> .... where lower(firstname) ~ '^jim'
> would improve the search, but it does not.
> I ended up with adding a lowercased column for matching
>
> Now a few names (a few percent of the entire lot) have alternates, like
> 'James'.
> These could be nicknames, informal variants, language variants, alternate
> spellings
>
> I have already split off these few percent into a separate table and can
> query that like
> ... where 'jim' ~* firstname_pattern;'jim' ~* firstname_pattern;'jim' ~*
> firstname_pattern;
>
> There are two problems with this approach: when I use 'Jimbo' for the plain
> query,
> I do not get any 'Jim' or 'Jimmy', as expected. To achieve the same effect
> on the pattern
> query, I need to "decorate" the pattern somewhat. Actually, when I specify
> 'J(im|ames)' for
> the pattern, it gets preprocessed - and is stored in the database as -
> 'J(im|am($|e($|s)))$'
> Unfortunately there are regex patterns which the preprocessing script cannot
> handle, so
> I might have to try a different regex.
> The other, bigger, problem: the search cannot make use of  an index, and it
> has to compile
> a regex for every entry in the table. I am considering a change to that
> part: in the Jim/James
> case it is obvious that I could speed up the query with
> .... where firstname_pattern ~* '^j' and 'jim' ~* firstname_pattern;
> If the pattern was 'Bob|Robert' instead, I would have to change the
> preprocessing so the
> 'B' and 'R' parts would be separate.
>
> So, I wonder whether there is any better way of doing these. I have looked
> into tsquery
> resp. fulltext, but they do not seem to support prefix matches, only exact
> ones.
>
> Regards
> Wolfgang Hamann
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


-- 
------------
pasman

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic