From kde-devel Wed Jan 31 11:19:51 2007 From: Adriaan de Groot Date: Wed, 31 Jan 2007 11:19:51 +0000 To: kde-devel Subject: Re: What about a glossary Message-Id: <200701311219.52086.groot () kde ! org> X-MARC-Message: https://marc.info/?l=kde-devel&m=117024607315771 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--===============0583604062==" --===============0583604062== Content-Type: multipart/signed; boundary="nextPart6688318.WEncs8kyNf"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit --nextPart6688318.WEncs8kyNf Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Wednesday 31 January 2007 10:43, Krzysztof Lichota wrote: > Ian Wadham napisa=C5=82(a): > > On Tue, 30 Jan 2007 12:11 pm, David Jarvie wrote: > >> Glossaries might well be useful, but the proposed list of ambiguous > >> English translatable strings is a separate idea. As I have proposed, > >> that list would ideally be referenced by Krazy, whereas glossaries > >> wouldn't be. > > > > Maybe this would be of interest http://rogersreference.com/rrdetail.htm > > It is a dictionary (for sale) of English words that have several meanin= gs > > and are spelt the same, as well as words that sound the same, but have > > different spellings and meanings. Glancing at the example pages, thoug= h, > > it might be too comprehensive for KDE needs. I googled on "homonym > > dictionary" BTW. > > Good idea, though we cannot use this dictionary, because it is not free. > Maybe such information can be extracted from Wiktionary? There was an earlier thread on -i18n-doc about using FP7 (European research= =20 framework program 7) money for translation work. Now, since it's a *researc= h*=20 project it won't pay for straightforward work, but it might do for research= =20 into simplifying the work. This glossary discussion reminds me of it and I= =20 can finally point to some kind of research topic. One thing we need is detection of potentially ambiguous terms in source=20 code -- that is when the gettext calls don't have enough context. Another thing is a glossary. Another thing is detection of inconsistent translation. Not only within one= =20 project but across (Open Source) projects. By increasing consistency, we=20 increase the acceptability of these projects in a SME environment in Europe= =20 where training and translation costs are high. Extending KBabel's automatic translation features with advanced natural=20 language processing in order to attempt translation *within the narrow=20 subject area covered by application translations* with higher accuracy will= =20 increase the effectiveness of the translation tools. The larger and broader a repository is, the more valuable it is for=20 translating software in a European context. By using NLP and AI techniques = we=20 can attempt to speed up the translation process. This is possible because o= f=20 the fairly narrow scope of the translations. I think we could pull off a=20 research project on this subject, since the topic of machine translation=20 remains open and the use of localized Open Source products in Europe will=20 likely grow in importance. The support of "marginal" languages (not officia= l=20 languages of the EU) may be of cultural interest. I only know one NLP research group, and they are mostly focused on search a= nd=20 mostly in English. I'd be interested in hearing from other researchers (bot= h=20 NLP as AI as DB) on -devel or -i18n-doc if there is interest in trying to=20 write up such a proposal formally. We would need at least two research grou= ps=20 from Universities in different countries and some corporate tie-in as well.= =20 Are there Open-Source friendly translation agencies? How do local governmen= ts=20 (like Extremadura) check and extend translation quality? =2D-=20 These are your friends - Adem GPG: FEA2 A3FE Adriaan de Groot --nextPart6688318.WEncs8kyNf Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQBFwHtYdqzuAf6io/4RAmbvAJ4qlighFy+Y5O97GIX7mRJVbStWSACfUhWQ Xu1jaISd5fE5Wzt6KUgzfHo= =1+O5 -----END PGP SIGNATURE----- --nextPart6688318.WEncs8kyNf-- --===============0583604062== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe << --===============0583604062==--