[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-sonnet
Subject:    Re: [KDE-Sonnet] [Mountain Goat Programmer] New comment on Queen
From:       Henrique Pinto <henrique.pinto () kdemail ! net>
Date:       2007-01-19 13:55:33
Message-ID: 200701191155.33331.henrique.pinto () kdemail ! net
[Download RAW message or body]

On Fri 19 Jan 2007 01:38, Jacob Rideout wrote:
> It now appears to me that Portuguese is special case, and a more
> general solution isn't acceptable. Tcatng uses a combined pt_PT and
> pt_BR corpus generated model to detect Portuguese, then uses
> specialized models to differentiate.
>
> Take a look at the .corpus files at this site:
> http://tcatng.cvs.sourceforge.net/tcatng/tcatng/language-profiles/pt-br/
>
> Are those words characteristic of their respective dialects?

Yes, they are. However, there are some very small problems with 
brazilian.corpus:

"António" should be "Antônio";
"Brasilia" should be "Brasília";
"adóque" should be "adoque";
"Boceta" and "Buceta" are slang for "vagina", and considered really, really, 
really unpolite. I don't think it is a good idea to include these terms, 
they're rarely used (especially in written form). 

-- 
	Henrique Pinto
	henrique.pinto@kdemail.net
_______________________________________________
kde-sonnet mailing list
kde-sonnet@kde.org
https://mail.kde.org/mailman/listinfo/kde-sonnet

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic