On Fri 19 Jan 2007 01:38, Jacob Rideout wrote: > It now appears to me that Portuguese is special case, and a more > general solution isn't acceptable. Tcatng uses a combined pt_PT and > pt_BR corpus generated model to detect Portuguese, then uses > specialized models to differentiate. > > Take a look at the .corpus files at this site: > http://tcatng.cvs.sourceforge.net/tcatng/tcatng/language-profiles/pt-br/ > > Are those words characteristic of their respective dialects? Yes, they are. However, there are some very small problems with brazilian.corpus: "António" should be "Antônio"; "Brasilia" should be "Brasília"; "adóque" should be "adoque"; "Boceta" and "Buceta" are slang for "vagina", and considered really, really, really unpolite. I don't think it is a good idea to include these terms, they're rarely used (especially in written form). -- Henrique Pinto henrique.pinto@kdemail.net _______________________________________________ kde-sonnet mailing list kde-sonnet@kde.org https://mail.kde.org/mailman/listinfo/kde-sonnet