[prev in list] [next in list] [prev in thread] [next in thread]
List: postgresql-general
Subject: Re: [GENERAL] Fastest Index/Algorithm to find similar sentences
From: Beena Emerson <memissemerson () gmail ! com>
Date: 2013-07-31 14:08:22
Message-ID: CAOG9ApEaGjHaFtm2XrVGYc6WbYFva3JzLxa6ANSFFyW_-mFkQA () mail ! gmail ! com
[Download RAW message or body]
I am sorry, I just re-read your mail and realized you have already tried
with pg_trgm.
On Wed, Jul 31, 2013 at 7:23 PM, Beena Emerson <memissemerson@gmail.com>wrote:
> On Sat, Jul 27, 2013 at 10:34 PM, Janek Sendrowski <janek12@web.de> wrote:
>
>> Hi Sergey Konoplev,
>>
>> If I'm searching for a sentence like "The tiger is the largest cat
>> species" for example.
>>
>> I can only find the sentences, which include the words "tiger, largest,
>> cat, species", but I also like to have the sentences with only three or
>> even two of these words.
>>
>> Janek
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
> Hi,
>
> You may use similarity functions of pg_trgm.
>
> Example:
> =# \d+ test
> Table "public.test"
> Column | Type | Modifiers | Storage | Stats target | Description
> --------+------+-----------+----------+--------------+-------------
> col | text | | extended | |
> Indexes:
> "test_idx" gin (col gin_trgm_ops)
> Has OIDs: no
>
> # SELECT * FROM test;
> col
> -----------------------------------------
> The tiger is the largest cat species
> The cheetah is the fastest cat species
> The peacock is the largest bird species
> (3 rows)
>
> =# SELECT show_limit();
> show_limit
> ------------
> 0.3
> (1 row)
>
> =# SELECT col, similarity(col, 'The tiger is the largest cat species') AS
> sml
> FROM test WHERE col % 'The tiger is the largest cat species'
> ORDER BY sml DESC, col;
> col | sml
> -----------------------------------------+----------
> The tiger is the largest cat species | 1
> The peacock is the largest bird species | 0.511111
> The cheetah is the fastest cat species | 0.466667
> (3 rows)
>
> =# SELECT set_limit(0.5);
> set_limit
> -----------
> 0.5
> (1 row)
>
> =# SELECT col, similarity(col, 'The tiger is the largest cat species') AS
> sml
> FROM test WHERE col % 'The tiger is the largest cat species'
> ORDER BY sml DESC, col;
> col | sml
> -----------------------------------------+----------
> The tiger is the largest cat species | 1
> The peacock is the largest bird species | 0.511111
> (2 rows)
>
> =# SELECT set_limit(0.9);
> set_limit
> -----------
> 0.9
> (1 row)
>
> =# SELECT col, similarity(col, 'The tiger is the largest cat species') AS
> sml
> FROM test WHERE col % 'The tiger is the largest cat species'
> ORDER BY sml DESC, col;
> col | sml
> --------------------------------------+-----
> The tiger is the largest cat species | 1
> (1 row)
>
>
> When you set a higher limit, you get more exact matches.
>
>
> --
> Beena Emerson
>
>
--
Beena Emerson
[Attachment #3 (text/html)]
<div dir="ltr"><br><div>I am sorry, I just re-read your mail and realized you have \
already tried with pg_trgm.</div><div><br></div></div><div \
class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jul 31, 2013 at 7:23 PM, \
Beena Emerson <span dir="ltr"><<a href="mailto:memissemerson@gmail.com" \
target="_blank">memissemerson@gmail.com</a>></span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"><div><div class="h5">On Sat, Jul 27, 2013 at \
10:34 PM, Janek Sendrowski <span dir="ltr"><<a href="mailto:janek12@web.de" \
target="_blank">janek12@web.de</a>></span> wrote:<br> </div></div><div \
class="gmail_extra"><div><div class="h5"><div class="gmail_quote"> <blockquote \
class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi \
Sergey Konoplev,<br> <br>
If I'm searching for a sentence like "The tiger is the largest cat \
species" for example.<br> <br>
I can only find the sentences, which include the words "tiger, largest, cat, \
species", but I also like to have the sentences with only three or even two of \
these words.<br> <span><font color="#888888"> <br>
Janek<br>
</font></span><div><div><br>
<br>
--<br>
Sent via pgsql-general mailing list (<a href="mailto:pgsql-general@postgresql.org" \
target="_blank">pgsql-general@postgresql.org</a>)<br> To make changes to your \
subscription:<br> <a href="http://www.postgresql.org/mailpref/pgsql-general" \
target="_blank">http://www.postgresql.org/mailpref/pgsql-general</a><br> \
</div></div></blockquote></div><br></div></div><div>Hi,<br></div><div><div \
dir="ltr"><br></div><div dir="ltr">You may use similarity functions of \
pg_trgm.</div><div dir="ltr"><br></div><div dir="ltr">Example:</div><div dir="ltr"> \
=# \d+ test<br> </div><div dir="ltr"> Table \
"public.test"</div><div dir="ltr"> Column | Type | Modifiers | Storage | \
Stats target | Description </div><div \
dir="ltr">--------+------+-----------+----------+--------------+-------------</div>
<div dir="ltr"> col | text | | extended | | </div><div \
dir="ltr">Indexes:</div><div dir="ltr"> "test_idx" gin (col \
gin_trgm_ops)</div><div dir="ltr">Has OIDs: no</div><div dir="ltr">
<br></div><div dir="ltr"><div dir="ltr"># SELECT * FROM test;</div><div dir="ltr"> \
col </div><div \
dir="ltr">-----------------------------------------</div><div class="im"><div \
dir="ltr"> The tiger is the largest cat species</div>
</div><div dir="ltr"> The cheetah is the fastest cat species</div><div dir="ltr"> \
The peacock is the largest bird species</div><div dir="ltr">(3 rows)</div></div><div \
dir="ltr"><br></div><div dir="ltr">=# SELECT show_limit();</div>
<div dir="ltr"> show_limit </div><div dir="ltr">------------</div><div dir="ltr"> \
0.3</div><div dir="ltr">(1 row)</div><div dir="ltr"><br></div><div dir="ltr">=# \
SELECT col, similarity(col, 'The tiger is the largest cat species') AS \
sml</div>
<div dir="ltr"> FROM test WHERE col % 'The tiger is the largest cat \
species'</div><div dir="ltr"> ORDER BY sml DESC, col;</div><div dir="ltr"> \
col | sml </div><div dir="ltr">
-----------------------------------------+----------</div><div dir="ltr"> The tiger \
is the largest cat species | 1</div><div dir="ltr"> The peacock is the \
largest bird species | 0.511111</div><div dir="ltr"> The cheetah is the fastest cat \
species | 0.466667</div>
<div dir="ltr">(3 rows)</div><div dir="ltr"><br></div><div dir="ltr">=# SELECT \
set_limit(0.5);</div><div dir="ltr"> set_limit </div><div \
dir="ltr">-----------</div><div dir="ltr"> 0.5</div><div dir="ltr">(1 \
row)</div>
<div dir="ltr"><br></div><div dir="ltr">=# SELECT col, similarity(col, 'The tiger \
is the largest cat species') AS sml</div><div dir="ltr"> FROM test WHERE col % \
'The tiger is the largest cat species'</div>
<div dir="ltr"> ORDER BY sml DESC, col;</div><div dir="ltr"> col \
| sml </div><div \
dir="ltr">-----------------------------------------+----------</div><div dir="ltr"> \
The tiger is the largest cat species | 1</div>
<div dir="ltr"> The peacock is the largest bird species | 0.511111</div><div \
dir="ltr">(2 rows)</div><div dir="ltr"><br></div><div dir="ltr">=# SELECT \
set_limit(0.9);</div><div dir="ltr"> set_limit </div><div dir="ltr">-----------</div>
<div dir="ltr"> 0.9</div><div dir="ltr">(1 row)</div><div \
dir="ltr"><br></div><div dir="ltr">=# SELECT col, similarity(col, 'The tiger is \
the largest cat species') AS sml</div><div dir="ltr"> FROM test WHERE col % \
'The tiger is the largest cat species'</div>
<div dir="ltr"> ORDER BY sml DESC, col;</div><div dir="ltr"> col \
| sml </div><div dir="ltr">--------------------------------------+-----</div><div \
dir="ltr"> The tiger is the largest cat species | 1</div>
<div dir="ltr">(1 row)</div><div dir="ltr"><br></div><div dir="ltr"><br></div><div \
dir="ltr">When you set a higher limit, you get more exact matches.</div></div><span \
class="HOEnZb"><font color="#888888"><div><br></div><div> <br></div>-- <br><div \
dir="ltr"><span style="border-collapse:collapse"><div \
style="font-family:arial,sans-serif;color:rgb(34,34,34)"> <span \
style="font-family:arial,helvetica,sans-serif">Beena \
Emerson</span><br></div></span><br></div> </font></span></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><span \
style="border-collapse:collapse"><div \
style="font-family:arial,sans-serif;color:rgb(34,34,34)"><span \
style="font-family:arial,helvetica,sans-serif">Beena Emerson</span><br> \
</div></span><br></div> </div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic