[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: [HACKERS] hunspell and tsearch2 ?
From:       "=?ISO-8859-1?Q?Dirk_Lutzeb=E4ck?=" <dirk.lutzebaeck () thinkproject ! com>
Date:       2012-08-31 13:07:24
Message-ID: 5040B70C.70805 () thinkproject ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Hi Robert,

there is a note in the pg documentation chapter

    12.6.5 Ispell Dictionary

    *Note:*MySpell does not support compound words. Hunspell has
    sophisticated support for compound words. At present, PostgreSQL
    implements only the basic compound word operations of Hunspell.

Regards
Dirk


On 08/30/2012 05:39 PM, Robert Haas wrote:
> On Mon, Aug 27, 2012 at 8:31 AM, Dirk Lutzebäck
> <dirk.lutzebaeck@thinkproject.com> wrote:
>> we have issues with compound words in tsearch2 using the german (ispell)
>> dictionary. This has been discussed before but there is no real solution
>> using the recommended german dictionary at
>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (convert old
>> openoffice dict file to ispell suitable for tsearch):
>>
>> # select ts_lexize('german_ispell', 'vollklimatisiert');
>>       ts_lexize
>> --------------------
>>   {vollklimatisiert}
>> (1 row)
>>
>> This should return atleast
>>
>>   {vollklimatisiert, voll, klimatisiert}
>>
>>
>> The issue with compound words in ispell has been addressed in hunspell. But
>> this has not been integrated fully to tsearch2 (according to the
>> documentation).
> Just out of curiosity, which part of the documentation are you looking
> at?  The only mention of hunspell I see in the documentation is a
> mention that we apparently support their dictionary-file format.
>
>> Are there any plans to fully integrate hunspell into tsearch2? What is
>> needed to do this? What is the functional delta which is missing? Maybe we
>> can help...


-- 

Mit freundlichen Grüßen / Best regards,

*think project! International GmbH & Co. KG*

Dirk Lutzebäck
Geschäftsführer / Managing Director, CTO

Tel +49 30 921 017 90
Fax +49 30 921 017 50
dirk.lutzebaeck@thinkproject.com

Rechtliche Informationen zum Absender (Impressum): 
www.thinkproject.com/de/info <http://www.thinkproject.com/de/info>

Legal information (imprint): www.thinkproject.com/en/info 
<http://www.thinkproject.com/en/info>


[Attachment #5 (text/html)]

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Hi Robert,<br>
      <br>
      there is a note in the pg documentation chapter<br>
      <br>
      <blockquote>12.6.5 Ispell Dictionary<br>
      </blockquote>
      <blockquote><b>Note:</b><span class="APPLICATION"> MySpell</span>
        does not support compound words. <span class="APPLICATION">Hunspell</span>
        has sophisticated support for compound words. At present, <span
          class="PRODUCTNAME">PostgreSQL</span> implements only the
        basic compound word operations of Hunspell.<br>
      </blockquote>
      Regards<br>
      Dirk<br>
      <br>
      <br>
      On 08/30/2012 05:39 PM, Robert Haas wrote:<br>
    </div>
    <blockquote
cite="mid:CA+Tgmob3Mr3PznHK0E15yYKX5PB2xmqJcCHN=ffV62akME_qnQ@mail.gmail.com"
      type="cite">
      <pre wrap="">On Mon, Aug 27, 2012 at 8:31 AM, Dirk Lutzeb&auml;ck
<a class="moz-txt-link-rfc2396E" \
href="mailto:dirk.lutzebaeck@thinkproject.com">&lt;dirk.lutzebaeck@thinkproject.com&gt;</a> \
wrote: </pre>
      <blockquote type="cite">
        <pre wrap="">we have issues with compound words in tsearch2 using the german \
(ispell) dictionary. This has been discussed before but there is no real solution
using the recommended german dictionary at
<a class="moz-txt-link-freetext" \
href="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2">http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2</a> \
(convert old openoffice dict file to ispell suitable for tsearch):

# select ts_lexize('german_ispell', 'vollklimatisiert');
     ts_lexize
--------------------
 {vollklimatisiert}
(1 row)

This should return atleast

 {vollklimatisiert, voll, klimatisiert}


The issue with compound words in ispell has been addressed in hunspell. But
this has not been integrated fully to tsearch2 (according to the
documentation).
</pre>
      </blockquote>
      <pre wrap="">
Just out of curiosity, which part of the documentation are you looking
at?  The only mention of hunspell I see in the documentation is a
mention that we apparently support their dictionary-file format.

</pre>
      <blockquote type="cite">
        <pre wrap="">Are there any plans to fully integrate hunspell into tsearch2? \
What is needed to do this? What is the functional delta which is missing? Maybe we
can help...
</pre>
      </blockquote>
      <pre wrap="">
</pre>
    </blockquote>
    <br>
    <br>
    <div class="moz-signature">-- <br>
      <p>
        Mit freundlichen Gr&uuml;&szlig;en / Best regards,
      </p>
      <p><b>think project! International GmbH &amp; Co. KG</b></p>
      <p>
        Dirk Lutzeb&auml;ck<br>
        Gesch&auml;ftsf&uuml;hrer / Managing Director, CTO
      </p>
      <p>
        Tel +49 30 921 017 90<br>
        Fax +49 30 921 017 50<br>
        <a class="moz-txt-link-abbreviated" \
href="mailto:dirk.lutzebaeck@thinkproject.com">dirk.lutzebaeck@thinkproject.com</a><br>
  </p>
      <p>
        Rechtliche Informationen zum Absender (Impressum):
        <a href="http://www.thinkproject.com/de/info">www.thinkproject.com/de/info</a>
  </p>
      <p>
        Legal information (imprint): <a
          href="http://www.thinkproject.com/en/info">www.thinkproject.com/en/info</a>
      </p>
    </div>
  </body>
</html>


["smime.p7s" (application/pkcs7-signature)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic