[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    JLemmaGen project
From:       Michal Hlavac <hlavki () hlavki ! eu>
Date:       2013-10-23 15:17:32
Message-ID: 4612022.0EkpJiYGq2 () hlavki
[Download RAW message or body]

Hi,

I rewrote lemmatizer project LemmaGen (http://lemmatise.ijs.si/) to java. Originally \
it's written in C#. Lemmagen project uses rules to lemmatize word. Algorithm is \
described here: http://lemmatise.ijs.si/Download/File/Documentation%23JournalPaper.pdf


Project is writtten under GPLv3. Sources are located on bitbucket server:
https://bitbucket.org/hlavki/jlemmagen

There is also Lemmagen4j project which use more memory and without prebuilded trees.

I obtained also licenced dictionaries to build rules tree for 15 languages. \
Dictionaries are licenced, but prebuilded trees don't. But you can also build your \
own dictionary.

Project contains also TokenFilter for lucene/solr.
Project is not stable, but any feedback is appreciated.

Supported languages are:
mlteast-bg - Bulgarian
mlteast-cs - Czech
mlteast-en - English
mlteast-et - Estonian
mlteast-fr - French
mlteast-hu - Hungarian
mlteast-mk - Macedonia
mlteast-pl - Polish
mlteast-ro - Romanian
mlteast-ru - Russian
mlteast-sk - Slovak
mlteast-sl - Slovene
mlteast-sr - Serbian
mlteast-uk - Ukrainian

thanks, miso


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic