[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-i18n-doc
Subject:    Re: .po/.docbook encoding
From:       Eric Bischoff <ebisch () cybercable ! tm ! fr>
Date:       2000-01-20 10:08:31
[Download RAW message or body]

Stephan Kulow wrote:
> 
> Hi!
> 
> We would like to switch the encoding of all translated
> files to a common one - namely utf8. This would reduce
> the needed hacks that we're currently going through.

I am strongly in favour of this change in the documentation
area.

All character sets that have characters that use 8 bits ore
more will see a change. It means that French "é", German
"ü", Greek theta, Russian je and japanese characters will be
encoded on two or more bytes. The only characters that will
not change are characters that currently are encoded with 7
bits : the English ones ;-). As a counterpart, all languages
will share one single encoding !

We could switch to UTF-8 :
a) only for source docbook files
OR
b) for both source docbook files and resulting HTML files

Solution b) is strongly opposed by many people because
people are not used to browse UTF-8 with their web browser
(Netscape or other), even if it already works. And some
browsers are not UTF-8 enabled yet. So I suggest we switched
to UTF-8 only for source docbook files, not for resulting
HTML. At least for the moment.

In the documentation redaction process, we will need to :
A - Convert existing docs from Latin and KOI and other
encodings to UTF-8
B - Edit the source docbook files when they will be UTF-8
C - Convert them to HTML
D - Convert them to Postscript

A - Convert from existing encodings to UTF-8.
==> To check (1): Have we a tool to do the conversion from
ISO, KOI, etc to UTF-8 ?
    "recode" will probably do the thing.

B - Edit the source DocBook file directly in UTF-8.
We need a UTF-8 enabled text editor
==> To check (2): Is at least *one* UTF-8 enabled editor
available ? KWrite ?
    (and because some people are allergic about Emacs and
other about Vi, we shouldn't take
     the command-line editors into consideration)

C - Produce HTML documentation.
If I remember well, Jade can only output Unicode as an
output if the input is some Unicode. So we may need to
convert it back to have ordinary latin-1 or KOI-8 (Russian),
either before or after DocBook => HTML conversion.
==> To check (3): Is it true that if Jade input is UTF-8,
then output is UTF-8 too?
    ("transparent" behavior)
==> To check (4): If this is true, have we a tool to convert
back from UTF-8 into ISO, KOI, etc
    Would "recode" work ? Probably yes.

D - Produce Postscript documentation.
If I remember well, TeX, laTeX, babel or jadetex (or all of
them) are not unicode-enabled yet, but people are working on
that. If this is true, we should convert UTF-8 docbook into
a temporary ISO or KOI docbook file *before* processing by
jadetex utility.
==> To check (5): Is it true that TeX, laTeX, babel or
jadetex are not unicode-enabled ?
==> To check (4): If it is true, have we a tool to do the
conversion before jadetex processing ?
    (this point has already been mentioned above)

Volunteers for checking points 1 to 5 could join the
kde-docbook mailing list so that we would distribute the
work. Frederik, would you help investigating too ?

Any comments welcome.

Eric
-- 
 __________________________________________________
                                           \^o~_.
     .~.                           ______  /( __ )
     /V\         Toys story         \__  \/  (  V
   //   \\                            \__| (__=v
  /(     )\                        |\___/     )
    ^^-^^                           \_____(  )
     Tux                    Konqui         \__=v
 __________________________________________________
 Éric Bischoff   -   mailto:ebisch@cybercable.tm.fr

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic