On Tue, Nov 26, 2013 at 11:12:37PM +0530, Shivam Makkar wrote: > Implementation: https://github.com/amourphious/Language-Detection Looking at the corpus you have already it is not up to par. https://github.com/amourphious/Language-Detection/blob/master/LanguageDetection/langdata/norwegian is 110 years old today, for example. The Danish corpus seems to be part of an outdated bible translation. So I would recommend (as others have done) to either use Wikipedia, or altenatively a proper corpus like this one for Norwegian: http://www.tekstlab.uio.no/norsk/bokmaal/english.html Or if you just want a simple n-gram algorithm, there's several ready-made alternatives, like this one from Chromium: https://code.google.com/p/chromium-compact-language-detector/ -- Martin Sandsmark >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<