[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wikitech-l
Subject:    Re: [Wikitech-l] Lsearch and MWSearch: how to turn on morphology for Russian
From:       Yury Katkov <katkov.juriy () gmail ! com>
Date:       2014-01-31 7:34:12
Message-ID: CAAT7DEE=cOK3gq8pmw-BC5gc=zaip-iTY3+GiAFCimFFYEoS-g () mail ! gmail ! com
[Download RAW message or body]

Hi! I'll definitely try Cirrus, but still it's interesting to see Lucene
working. Besides everynew extension by WMF typically requires very fresh
MediaWiki version which can be a burden for 3rd parties.

I tried to add InitializeSettings.php, run ./build and ./lsearchd again.
Still no good, when I search the word "банк", I expect Lucene to find also
"банков", "банки", "банке", etc., and I can see that these word forms \
are presented in a file
LuceneSearch.jar/uzip://org/apache/lucene/analysis/ru/stemsUnicode.txt
and words.Unicode.txt.

Still when I search for "банк", I only get "банк" and the following log:

18409 [pool-2-thread-1] INFO  org.wikimedia.lsearch.search.SearchEngine  -
Using FilterWrapper wrap: {} []
18414 [pool-2-thread-1] INFO  org.wikimedia.lsearch.search.SearchEngine  -
search wikivote: query=[банк] parsed=[custom(+contents:банк^0.2 relevance
([((P contents:"банк") (P sections:"банк"^0.25))^2.0], (P
alttitle:"банк"~20^2.5) (P related:"банк"^12.0)) (P alttitle:"банк"~20))]
hit=[0] in 7ms using IndexSearcherMul:1391088160991
18439 [pool-2-thread-1] INFO  org.wikimedia.lsearch.spell.Suggest  -
wikivote for original=[банк] suggest: [банк] using=[] in 18 ms
24262 [pool-2-thread-2] INFO  org.wikimedia.lsearch.frontend.HttpHandler  -
query:/search/wikivote/%D0%B1%D0%B0%D0%BD%D0%BA?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2 \
C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15%2C90%2C91%2C92%2C93%2C102%2C103%2C106%2C107%2C108%2C109%2C170%2C171&offset=0&limit=20&version=2.1&iwlimit=10&searchall=1
 what:search dbname:wikivote term:банк
24263 [pool-2-thread-2] INFO  org.wikimedia.lsearch.search.SearchEngine  -
Using FilterWrapper wrap: {} []


-----
Yury Katkov, WikiVote



On Fri, Jan 31, 2014 at 1:02 AM, Nikolas Everett <neverett@wikimedia.org>wrote:

> I hate to say this after all you went through setting up Lucene Search but
> it is end of life and not receiving any real support.  We're in the process
> of replacing it with the combination of
> CirrusSearch<https://www.mediawiki.org/wiki/Extension:CirrusSearch>
> /Elasticsearch <http://www.elasticsearch.org/> which work pretty much the
> same way the MWSearch/Lucene Search combination does.  CirrusSearch has to
> be smarter than MWSearch because Elasticsearch doesn't have any Mediawiki
> knowledge but because it links into Mediawiki it can do things like expand
> templates.  I like it but I'm biased.
> 
> That aside, it looks like Lucene Search is supposed to read
> InitializeSettings which is kind of wmf specific thing.  You might be able
> to trick it into doing it by putting a file called InitializeSettings.php
> in the conf directory with the contents
> 
> 'wgLanguageCode' => array(
> 'your $wgDBname' =>  'ru',
> ),
> 
> 
> CirrusSearch, if you care to try it, reads the language code from
> wgLanguageCode.
> 
> Nik
> 
> 
> 
> On Thu, Jan 30, 2014 at 3:39 PM, Yury Katkov <katkov.juriy@gmail.com>
> wrote:
> 
> > Hi guys!
> > 
> > I've installed MWSearch and Lucene Search extensions but I can see that
> the
> > search engine doesn't understand the morphology of Russian (doesn't
> > recognize word forms). How can I turn the morphological analyzer on? How
> > it's done in Russian Wikipedia?
> > 
> > Cheers,
> > -----
> > Yury Katkov, WikiVote
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic