'Re: Lucene Arabic Internationalization Question'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: Lucene Arabic Internationalization Question
From:       Nader Henein <nsh () bayt ! net>
Date:       2005-05-27 21:19:42
Message-ID: 42978EEE.6070309 () bayt ! net
[Download RAW message or body]

Dear Rasha,

Sorry for the delay, I've indexed Arabic and English seamlessly on 
Lucene, the only thing you have to watch out for is stemming, as for 
indexing PDFs, I have not used that part of the API, but from 
experience, this comes down to using or in some cases forcing the 
correct encoding, debug this by bringing down your development to the 
lowest denominator, for example if you're doing this from a webservice, 
try it first from the prompt, so you have to contend only with the OS 
encoding (UTF-8 is highly recommended) and not the browser / server  
encodings.

A more detailed example of the problem you're facing would help me 
understand the problem more.

Nader

Rasha wrote:

>Dear Nader,
>
>I Have a big problem during indexing pdfs containing Persian Word
>
>lucenePDFIndexer cannot index it , and indexed words of pdf are unuseable
>
>
>is there a way to perform it to index good?
>
>
>regards,
>rasha malek
>
>
>
>
>
>
>  
>

-- 

Nader S. Henein
Senior Applications Architect

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic