'Re: khtml and encodings'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kfm-devel
Subject:    Re: khtml and encodings
From:       Lars Knoll <Lars.Knoll () mpi-hd ! mpg ! de>
Date:       2000-06-19 13:11:55
[Download RAW message or body]

On Mon, 19 Jun 2000, Yarick Rastrigin wrote:

> Hello !
> > If they don't use Qt's ability to read in the encodings its the
> > applications problem. I don't see why we should rebuild this inside KDE as
> Well, I agree. It's apps problem. Not every app does needs some
> nonstandart M$
> feature/implementation. KOI-8 is basically a standart on Russian Unices,
> and it is really enough to have
> done properly. (However, QT implementation of KOI-8 locale detection
> made me laughing and crying at the same time :). But browser,
> e-mail/news client and office package do need this functionality.

You can always set the QTextCodec you want to use by hand, as I do it in
the Decoder class of khtml. By default Qt uses the locale you specify in
the environment.

> > we have a great system with Qt's TextCodecs. I agree however, that we need
> Why do you think it is great ? 'Cause it's written by you (nothing
> personal, really) ?

Then don't get personal. No, it's not written by me. But I used it quite
a lot, and I can tell you that I can't think of a better system. Qt uses
*only* unicode internally, so you need some classes to convert from and to
loacle specific encodings. QTextCodecs do that very nicely, and are easy
to use.

> You're claiming you're fully unicode-at-the-heart-of, but this isn't at
> least true.
> You could just use unicode as one of encodings. However,
> Decoder::Decoder() inits to iso-8859-1
> (I've to change this to KOI-8, 'cause otherwise too many packages can't
> work properly with 
> Russian text. Now I'm working on libkonq :)). 

This is because I implemented the html4 specs. They specify, that if no
<meta> tag specifying the charset is found, the encoding defaults to
8859-1. Anyway, I'll add a configure option to be able to override the
default, as too many web pages omit the <meta> tag.

> > to add support for a few more encodings in Qt (windows/dos codepages
> > mainly).
> Hmm. More I think about all this encodings stuff, more I see such a
> scheme:
> All internal handling is unicode.

Exactly.

> There are input and output decoders. Every input string could (and
> should )
> be converted to unicode according to user input settings or locale and
> stored Unicodely,
> so it could be translated from Unicode according to user output settings
> or locale.  

This is right for usual applications. For khtml however, the input
conversion needs to be done accoring to the <meta> tag. The output
conversion needs to be done in a way, that you can display as many pages
as possible, not only pages encoded in your locale (there are users that
might want to read a russian web page from time to time although they are
using a german locale...)

> (Resembles QT's mechanism at some point :). The obvious lack of
> extensibility in QT's implementation leads to all sorts of problems. You
> can't add support for encoding without changing and recompiling QT, and
> then you must edit (and recompile) at least 1 place in KDE libs ( I
> suppose there are more). Why not have 1
> universal codec, using recoding tables (encoding-to-Unicode for input
> and Unicode-to-encoding for output)

Not true. Read the Qt docs. You can write your own converters and register
them at run time in your application. I'm just saying that if you put them
into Qt, _all_ applications will get the functionality, not only the _one_
app you just wrote.

> Easily extensible throug minimal config files. Ease of implementation
> and complete independence. Localisation is one of such a places where

You have all that now.

> inheritance of mistakes is deadly for the project. You could easily
> write graphical widget who will react slightly weird, and inherit this
> behavior, and a few will care, but if you use weak text handling system
> - it makes a sense for everybody without English language knowledge.

The QTextCodec system works quite nicely, and you're actually the first
one who is complaining about it.

> > The place to implement that *is* Qt. All that needs to be done is to add a
> > QTextCodec for the windows-1251 codepage. (the windows codepages are
> > btw about the only encodings still missing in Qt). After that, all KDE/Qt
> They're numerous, BTW.

I know that. Believe me, I've dealt with these issues already :-)

> > application (not only konqueror) will automatically get the ability to
> > read in cp1251 encoded pages. Hmmm... I wanted to add a few other
> Then, you must decide, which font registry to use, then support for this
> font map -
> and you have QT-2.1.1-KDE-1.92patch right under 2 megs. :)  

Not at all. If you use a cp1251 for input, you can still use koi8 for
output. AFAIK, the characters in both registries are the same. Otherwise
you could use unicode fonts (iso10464-1 registry). Anyway, I'm currently
rewriting the font system for khtml, to take care of such issues. khtml is
anyway more or less the only place where one can have mixes between
different 8bit registries in one page.

> > (windows) codepages to Qt anyway. I'll see if I can find some time to do
> > it on the weekend.
> This would be another quick hack. It continues for almost two years,
> this way of hacks, and then 
> "cleanups" . Sometimes I think - why don't you do this correctly from
> the beginning ?
> I'm very thankful for your efforts - but some places in KDE code make me
> wondering.

What are you talking about? This is *not* a hack. Writing the correct
converters for conversion from and to unicode is the only thing one can do
to get proper handling of these encodings. Or do you want to go back to
the days where one used 8bit to store characters? Unicode is the only way
to go here.

Lars

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic