From kde-i18n-doc Thu Jan 20 10:08:31 2000 From: Eric Bischoff Date: Thu, 20 Jan 2000 10:08:31 +0000 To: kde-i18n-doc Subject: Re: .po/.docbook encoding X-MARC-Message: https://marc.info/?l=kde-i18n-doc&m=94836291130154 Stephan Kulow wrote: > > Hi! > > We would like to switch the encoding of all translated > files to a common one - namely utf8. This would reduce > the needed hacks that we're currently going through. I am strongly in favour of this change in the documentation area. All character sets that have characters that use 8 bits ore more will see a change. It means that French "é", German "ü", Greek theta, Russian je and japanese characters will be encoded on two or more bytes. The only characters that will not change are characters that currently are encoded with 7 bits : the English ones ;-). As a counterpart, all languages will share one single encoding ! We could switch to UTF-8 : a) only for source docbook files OR b) for both source docbook files and resulting HTML files Solution b) is strongly opposed by many people because people are not used to browse UTF-8 with their web browser (Netscape or other), even if it already works. And some browsers are not UTF-8 enabled yet. So I suggest we switched to UTF-8 only for source docbook files, not for resulting HTML. At least for the moment. In the documentation redaction process, we will need to : A - Convert existing docs from Latin and KOI and other encodings to UTF-8 B - Edit the source docbook files when they will be UTF-8 C - Convert them to HTML D - Convert them to Postscript A - Convert from existing encodings to UTF-8. ==> To check (1): Have we a tool to do the conversion from ISO, KOI, etc to UTF-8 ? "recode" will probably do the thing. B - Edit the source DocBook file directly in UTF-8. We need a UTF-8 enabled text editor ==> To check (2): Is at least *one* UTF-8 enabled editor available ? KWrite ? (and because some people are allergic about Emacs and other about Vi, we shouldn't take the command-line editors into consideration) C - Produce HTML documentation. If I remember well, Jade can only output Unicode as an output if the input is some Unicode. So we may need to convert it back to have ordinary latin-1 or KOI-8 (Russian), either before or after DocBook => HTML conversion. ==> To check (3): Is it true that if Jade input is UTF-8, then output is UTF-8 too? ("transparent" behavior) ==> To check (4): If this is true, have we a tool to convert back from UTF-8 into ISO, KOI, etc Would "recode" work ? Probably yes. D - Produce Postscript documentation. If I remember well, TeX, laTeX, babel or jadetex (or all of them) are not unicode-enabled yet, but people are working on that. If this is true, we should convert UTF-8 docbook into a temporary ISO or KOI docbook file *before* processing by jadetex utility. ==> To check (5): Is it true that TeX, laTeX, babel or jadetex are not unicode-enabled ? ==> To check (4): If it is true, have we a tool to do the conversion before jadetex processing ? (this point has already been mentioned above) Volunteers for checking points 1 to 5 could join the kde-docbook mailing list so that we would distribute the work. Frederik, would you help investigating too ? Any comments welcome. Eric -- __________________________________________________ \^o~_. .~. ______ /( __ ) /V\ Toys story \__ \/ ( V // \\ \__| (__=v /( )\ |\___/ ) ^^-^^ \_____( ) Tux Konqui \__=v __________________________________________________ Éric Bischoff - mailto:ebisch@cybercable.tm.fr