From kde-devel Wed Nov 27 14:23:01 2013 From: Martin Sandsmark Date: Wed, 27 Nov 2013 14:23:01 +0000 To: kde-devel Subject: Re: Please upload articles for automatic Language/Layout Switching Message-Id: <20131127142301.GA25619 () viritrilbia ! samfundet ! no> X-MARC-Message: https://marc.info/?l=kde-devel&m=138556220418840 On Tue, Nov 26, 2013 at 11:12:37PM +0530, Shivam Makkar wrote: > Implementation: https://github.com/amourphious/Language-Detection Looking at the corpus you have already it is not up to par. https://github.com/amourphious/Language-Detection/blob/master/LanguageDetection/langdata/norwegian is 110 years old today, for example. The Danish corpus seems to be part of an outdated bible translation. So I would recommend (as others have done) to either use Wikipedia, or altenatively a proper corpus like this one for Norwegian: http://www.tekstlab.uio.no/norsk/bokmaal/english.html Or if you just want a simple n-gram algorithm, there's several ready-made alternatives, like this one from Chromium: https://code.google.com/p/chromium-compact-language-detector/ -- Martin Sandsmark >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<