[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: Please upload articles for automatic Language/Layout Switching
From:       Martin Sandsmark <martin.sandsmark () kde ! org>
Date:       2013-11-27 14:23:01
Message-ID: 20131127142301.GA25619 () viritrilbia ! samfundet ! no
[Download RAW message or body]

On Tue, Nov 26, 2013 at 11:12:37PM +0530, Shivam Makkar wrote:
> Implementation: https://github.com/amourphious/Language-Detection

Looking at the corpus you have already it is not up to par.
https://github.com/amourphious/Language-Detection/blob/master/LanguageDetection/langdata/norwegian
 is 110 years old today, for example.
The Danish corpus seems to be part of an outdated bible translation.

So I would recommend (as others have done) to either use Wikipedia, or
altenatively a proper corpus like this one for Norwegian:
http://www.tekstlab.uio.no/norsk/bokmaal/english.html

Or if you just want a simple n-gram algorithm, there's several ready-made
alternatives, like this one from Chromium:
https://code.google.com/p/chromium-compact-language-detector/


-- 
Martin Sandsmark

> > Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to \
> > unsubscribe <<


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic