[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-sonnet
Subject: Re: [KDE-Sonnet] [Mountain Goat Programmer] New comment on Queen
From: Henrique Pinto <henrique.pinto () kdemail ! net>
Date: 2007-01-19 13:55:33
Message-ID: 200701191155.33331.henrique.pinto () kdemail ! net
[Download RAW message or body]
On Fri 19 Jan 2007 01:38, Jacob Rideout wrote:
> It now appears to me that Portuguese is special case, and a more
> general solution isn't acceptable. Tcatng uses a combined pt_PT and
> pt_BR corpus generated model to detect Portuguese, then uses
> specialized models to differentiate.
>
> Take a look at the .corpus files at this site:
> http://tcatng.cvs.sourceforge.net/tcatng/tcatng/language-profiles/pt-br/
>
> Are those words characteristic of their respective dialects?
Yes, they are. However, there are some very small problems with
brazilian.corpus:
"António" should be "Antônio";
"Brasilia" should be "Brasília";
"adóque" should be "adoque";
"Boceta" and "Buceta" are slang for "vagina", and considered really, really,
really unpolite. I don't think it is a good idea to include these terms,
they're rarely used (especially in written form).
--
Henrique Pinto
henrique.pinto@kdemail.net
_______________________________________________
kde-sonnet mailing list
kde-sonnet@kde.org
https://mail.kde.org/mailman/listinfo/kde-sonnet
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic