On Tue, Nov 26, 2013 at 11:12:37PM +0530, Shivam Makkar wrote:
> Implementation: https://github.com/amourphious/Language-Detection

Looking at the corpus you have already it is not up to par.
https://github.com/amourphious/Language-Detection/blob/master/LanguageDetection/langdata/norwegian
is 110 years old today, for example.
The Danish corpus seems to be part of an outdated bible translation.

So I would recommend (as others have done) to either use Wikipedia, or
altenatively a proper corpus like this one for Norwegian:
http://www.tekstlab.uio.no/norsk/bokmaal/english.html

Or if you just want a simple n-gram algorithm, there's several ready-made
alternatives, like this one from Chromium:
https://code.google.com/p/chromium-compact-language-detector/


-- 
Martin Sandsmark

>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<