[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-user
Subject: Re: Lucene Arabic Internationalization Question
From: Nader Henein <nsh () bayt ! net>
Date: 2005-05-27 21:19:42
Message-ID: 42978EEE.6070309 () bayt ! net
[Download RAW message or body]
Dear Rasha,
Sorry for the delay, I've indexed Arabic and English seamlessly on
Lucene, the only thing you have to watch out for is stemming, as for
indexing PDFs, I have not used that part of the API, but from
experience, this comes down to using or in some cases forcing the
correct encoding, debug this by bringing down your development to the
lowest denominator, for example if you're doing this from a webservice,
try it first from the prompt, so you have to contend only with the OS
encoding (UTF-8 is highly recommended) and not the browser / server
encodings.
A more detailed example of the problem you're facing would help me
understand the problem more.
Nader
Rasha wrote:
>Dear Nader,
>
>I Have a big problem during indexing pdfs containing Persian Word
>
>lucenePDFIndexer cannot index it , and indexed words of pdf are unuseable
>
>
>is there a way to perform it to index good?
>
>
>regards,
>rasha malek
>
>
>
>
>
>
>
>
--
Nader S. Henein
Senior Applications Architect
Bayt.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic