[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: Mixing encodings with an HTML page
From:       Lars Knoll <lars () trolltech ! com>
Date:       2001-03-07 10:31:53
[Download RAW message or body]

On Wednesday 07 March 2001 11:18, Brunet Eric wrote:
> Hello all,
>
> I have already asked this question on this mailing list a couple of weeks
> ago and got no answer. Of course, this was just during the final freeze
> of kde 2.1, and everybody was busy fixing the few remaining bugs. Now I
> think that people have more time to discuss about future improvments of
> konqueror...
>
> My problem is the following: suppose I have an HTML file which looks like
> that:
>
> ---------------------------------------------------------------------------
>- <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-7" />
> </head>
>
> <body>
>
> Ù<p> <!-- character 0xd7;  uppercase omega in latin-7 encoding -->
>
> &eacute;<p> <!-- this one is not in latin 7 --!>
>
> &#1044; <!-- U+0414; CYRILLIC CAPITAL LETTER DE. Not in latin-7 -->
>
> </body>
> ---------------------------------------------------------------------------
>
> This is I believe a perfectly valid html file, but as far as I can tell,
> there is no way to have konqueror display it properly. There should be
> three lines, an uppercase omega (greek), a small e with acute (western
> europe) and an uppercase de (russian). If I let the encoding to auto in
> konqueror, the omega is correct and I have then two question marks. If I
> choose a latin-1 encoding, then I have the small e with acute, but the
> omega looks like a capital u with grave and the de like a question mark.
> Finally, if I choose an utf-8 encoding, then both the small e with acute
> and the capital de are correct, but the omega is not there. (And it is
> even worse than that: while trying to interpret the 0xd7 as a multi-byte
> sequence, the parser ``ate'' the <, and the result looks like
> [weird character]p>é...)
>
> So it looks that konqueror is not able to display a page by using
> characters from different fonts with different encodings.
>
> Is there any chance that in a near future, the best browser in the world
> would be able to handle such pages ?

Unfortunately, the X11 font cencept makes this exceedingly difficult to 
implement. There are a few ways to get this working. One is too use Unicode 
fonts for displaying. I removed this in KDE-2/2.1 because it made quite some 
problems for people with slower machines (and most people don't need the 
mixing). I could readd this as a config option to the HTML settings dialog in 
2.2. It'll work directly if you use the new antialiased fonts with Qt-2.3, 
beacause these are always Unicode fonts, and Qt just pretends them to be 
something different.

The real solution will however only come with Qt-3 where we get a real good 
abstraction of a font, that hides all the uglyness (8bit'ness) of the X11 
font model.

Regards,
Lars
 
>> Visit http://master.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic