[prev in list] [next in list] [prev in thread] [next in thread]
List: postgresql-general
Subject: Re: [HACKERS] hunspell and tsearch2 ?
From: "=?ISO-8859-1?Q?Dirk_Lutzeb=E4ck?=" <dirk.lutzebaeck () thinkproject ! com>
Date: 2012-08-31 13:07:24
Message-ID: 5040B70C.70805 () thinkproject ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
Hi Robert,
there is a note in the pg documentation chapter
12.6.5 Ispell Dictionary
*Note:*MySpell does not support compound words. Hunspell has
sophisticated support for compound words. At present, PostgreSQL
implements only the basic compound word operations of Hunspell.
Regards
Dirk
On 08/30/2012 05:39 PM, Robert Haas wrote:
> On Mon, Aug 27, 2012 at 8:31 AM, Dirk Lutzebäck
> <dirk.lutzebaeck@thinkproject.com> wrote:
>> we have issues with compound words in tsearch2 using the german (ispell)
>> dictionary. This has been discussed before but there is no real solution
>> using the recommended german dictionary at
>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (convert old
>> openoffice dict file to ispell suitable for tsearch):
>>
>> # select ts_lexize('german_ispell', 'vollklimatisiert');
>> ts_lexize
>> --------------------
>> {vollklimatisiert}
>> (1 row)
>>
>> This should return atleast
>>
>> {vollklimatisiert, voll, klimatisiert}
>>
>>
>> The issue with compound words in ispell has been addressed in hunspell. But
>> this has not been integrated fully to tsearch2 (according to the
>> documentation).
> Just out of curiosity, which part of the documentation are you looking
> at? The only mention of hunspell I see in the documentation is a
> mention that we apparently support their dictionary-file format.
>
>> Are there any plans to fully integrate hunspell into tsearch2? What is
>> needed to do this? What is the functional delta which is missing? Maybe we
>> can help...
--
Mit freundlichen Grüßen / Best regards,
*think project! International GmbH & Co. KG*
Dirk Lutzebäck
Geschäftsführer / Managing Director, CTO
Tel +49 30 921 017 90
Fax +49 30 921 017 50
dirk.lutzebaeck@thinkproject.com
Rechtliche Informationen zum Absender (Impressum):
www.thinkproject.com/de/info <http://www.thinkproject.com/de/info>
Legal information (imprint): www.thinkproject.com/en/info
<http://www.thinkproject.com/en/info>
[Attachment #5 (text/html)]
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi Robert,<br>
<br>
there is a note in the pg documentation chapter<br>
<br>
<blockquote>12.6.5 Ispell Dictionary<br>
</blockquote>
<blockquote><b>Note:</b><span class="APPLICATION"> MySpell</span>
does not support compound words. <span class="APPLICATION">Hunspell</span>
has sophisticated support for compound words. At present, <span
class="PRODUCTNAME">PostgreSQL</span> implements only the
basic compound word operations of Hunspell.<br>
</blockquote>
Regards<br>
Dirk<br>
<br>
<br>
On 08/30/2012 05:39 PM, Robert Haas wrote:<br>
</div>
<blockquote
cite="mid:CA+Tgmob3Mr3PznHK0E15yYKX5PB2xmqJcCHN=ffV62akME_qnQ@mail.gmail.com"
type="cite">
<pre wrap="">On Mon, Aug 27, 2012 at 8:31 AM, Dirk Lutzebäck
<a class="moz-txt-link-rfc2396E" \
href="mailto:dirk.lutzebaeck@thinkproject.com"><dirk.lutzebaeck@thinkproject.com></a> \
wrote: </pre>
<blockquote type="cite">
<pre wrap="">we have issues with compound words in tsearch2 using the german \
(ispell) dictionary. This has been discussed before but there is no real solution
using the recommended german dictionary at
<a class="moz-txt-link-freetext" \
href="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2">http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2</a> \
(convert old openoffice dict file to ispell suitable for tsearch):
# select ts_lexize('german_ispell', 'vollklimatisiert');
ts_lexize
--------------------
{vollklimatisiert}
(1 row)
This should return atleast
{vollklimatisiert, voll, klimatisiert}
The issue with compound words in ispell has been addressed in hunspell. But
this has not been integrated fully to tsearch2 (according to the
documentation).
</pre>
</blockquote>
<pre wrap="">
Just out of curiosity, which part of the documentation are you looking
at? The only mention of hunspell I see in the documentation is a
mention that we apparently support their dictionary-file format.
</pre>
<blockquote type="cite">
<pre wrap="">Are there any plans to fully integrate hunspell into tsearch2? \
What is needed to do this? What is the functional delta which is missing? Maybe we
can help...
</pre>
</blockquote>
<pre wrap="">
</pre>
</blockquote>
<br>
<br>
<div class="moz-signature">-- <br>
<p>
Mit freundlichen Grüßen / Best regards,
</p>
<p><b>think project! International GmbH & Co. KG</b></p>
<p>
Dirk Lutzebäck<br>
Geschäftsführer / Managing Director, CTO
</p>
<p>
Tel +49 30 921 017 90<br>
Fax +49 30 921 017 50<br>
<a class="moz-txt-link-abbreviated" \
href="mailto:dirk.lutzebaeck@thinkproject.com">dirk.lutzebaeck@thinkproject.com</a><br>
</p>
<p>
Rechtliche Informationen zum Absender (Impressum):
<a href="http://www.thinkproject.com/de/info">www.thinkproject.com/de/info</a>
</p>
<p>
Legal information (imprint): <a
href="http://www.thinkproject.com/en/info">www.thinkproject.com/en/info</a>
</p>
</div>
</body>
</html>
["smime.p7s" (application/pkcs7-signature)]
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic