[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-user
Subject: JLemmaGen project
From: Michal Hlavac <hlavki () hlavki ! eu>
Date: 2013-10-23 15:17:32
Message-ID: 4612022.0EkpJiYGq2 () hlavki
[Download RAW message or body]
Hi,
I rewrote lemmatizer project LemmaGen (http://lemmatise.ijs.si/) to java. Originally \
it's written in C#. Lemmagen project uses rules to lemmatize word. Algorithm is \
described here: http://lemmatise.ijs.si/Download/File/Documentation%23JournalPaper.pdf
Project is writtten under GPLv3. Sources are located on bitbucket server:
https://bitbucket.org/hlavki/jlemmagen
There is also Lemmagen4j project which use more memory and without prebuilded trees.
I obtained also licenced dictionaries to build rules tree for 15 languages. \
Dictionaries are licenced, but prebuilded trees don't. But you can also build your \
own dictionary.
Project contains also TokenFilter for lucene/solr.
Project is not stable, but any feedback is appreciated.
Supported languages are:
mlteast-bg - Bulgarian
mlteast-cs - Czech
mlteast-en - English
mlteast-et - Estonian
mlteast-fr - French
mlteast-hu - Hungarian
mlteast-mk - Macedonia
mlteast-pl - Polish
mlteast-ro - Romanian
mlteast-ru - Russian
mlteast-sk - Slovak
mlteast-sl - Slovene
mlteast-sr - Serbian
mlteast-uk - Ukrainian
thanks, miso
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic