[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: Some questions
From:       Karl =?iso-8859-1?q?=D8ie?= <karl () gan ! no>
Date:       2002-04-19 12:59:42
[Download RAW message or body]

> Well, I saw that lucene create the index on the filesystem: I think 
> that this is a problem for producion enviroment. I usually use 
> Database, for example Oracle. 
> Is it possible integrate Lucene with Oracle or some other db (Mysql)?

you can store the index in blob-fields, but thats about it so far....

> I think that there isn't any Italian Anylizer, is it?
> How can I write one?

the implementation for lucene is pretty straight forward, take a look at the 
contributed GermanAnalyzer. Inside the implementing class you implement 
stopwords, language dependent case switching etc...

When it comes to the english and german analyzers they also perform stemming 
(making "computers" match "computer" and "histories" match "history" etc). 
This requires to create a program that can understand the plurals/singulars 
of Italian. A good start might be to look at http://snowball.sourceforge.net 
as they have a italian stemmer allready.

> The last question is: I suppose that my search engine is able to spider 
> web sites. Is it possible spidering urls?
> For example is it possible that with a page I spider this page, then I 
> extract the links of the page and at least I'd like spidering also 
> these links?
> How can I do this?

As lucene works with only the text content for a doc you will have to create a 
spider that retrieves a url, extracts the text and feeds it to lucene, then 
extract the links and process each of these links in the same manner. for 
this you will need a html parser..


happy hacking!


mvh karl řie

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic