[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-devel
Subject: Re: Special chars in khtml
From: Bryce Nesbitt <bryce () obviously ! com>
Date: 2001-11-09 18:02:17
[Download RAW message or body]
Lars Knoll wrote:
>
> On Friday 09 November 2001 18:41, Bryce Nesbitt wrote:
> > Shamus wrote:
> > > >On Mit, 07 Nov 2001, Shamus wrote:
> > > >> (left/right single/double quote, em/en dash) to a question mark? Is
> > > >> the problem in khtml, Qt, or X?
> > > >
> > > >The problem is your setup, fonts are not having the character you've
> > > >requested.
> > >
> > > Hmm. I set up TrueType fonts on my XFree setup (4.1.0) that I *know* have
> > > those characters, and still no luck. It could be that the encoding of
> > > those fonts is screwing things up (after all, they came from my Win box).
> > > Can you point me to a font that has the correct encoding/characters to
> > > test with (the ones bundled with XFree obviously don't make the cut)?
> > > Would it be possible to distrubite such fonts with KDE 3?
> >
> > I've been working on this issue. There are a lot of players.
> > First try a test page:
> > http://www.obviously.com/browsers/iso-8859-1_unicode.html
> > http://www.obviously.com/browsers/windows-1252.html
> >
> > The basic problem is that you're working with characters that are usually
> > encoded illegally. They originate from Microsoft boxes, and if used at
> > all, should be labeled "charset=windows-1252".
> >
> > khtml has a kludge to notice some of these characters and subtitute them
> > with ASCII:
> > case 0x82: (x) = ','; break; \
> > case 0x84: (x) = '"'; break; \
> > case 0x8b: (x) = '<'; break; \
> > case 0x9b: (x) = '>'; break; \
> > case 0x91: (x) = '\''; break; \
> > case 0x92: (x) = '\''; break; \
> > case 0x93: (x) = '"'; break; \
> > case 0x94: (x) = '"'; break; \
> > case 0x95: (x) = '*'; break; \
> > case 0x96: (x) = '-'; break; \
> > case 0x97: (x) = '-'; break; \
> > case 0x98: (x) = '~'; break; \
> > case 0xb7: (x) = '*'; break; \
>
> Yep. They are used too often in web pages to be able to ignore them. I don't
> think we should subsitute them in any case, but only for latin1 encoded web
> pages (and maybe other 8859-x encodings).
(TM) is the one I run into all the time, and it's not covered.
Also the above substitutions convert these characters to ASCII,
rather than to their true Unicode equivalents. See:
http://www.obviously.com/browsers/windows-1252.html
For the official Microsoft chart listing the Unicode substitutions.
> > khtml also messes with unicode, in a way I'm sure is a bad idea. I
> > added the missing left quote, but think the whole approach is broken:
> > case 0x2013: (x) = '-'; break; \
> > case 0x2014: (x) = '-'; break; \
> > case 0x2018: (x) = '\''; break; \
> > case 0x2019: (x) = '\''; break; \
> > case 0x201c: (x) = '"'; break; \
> > case 0x201d: (x) = '"'; break; \
>
> This approach was a workaround for Qt-2, which did map these chars to boxes
> otherwise (for latin1 fonts, mostly used in khtml). With Qt3, the reasoning
> behind this is wrong, and we shouldn't do this anymore. If at all, Qt should
> provide a reasonable mapping for these chars, in case it can't find a Unicode
> (or other) font contaning them.
>
> Lars
>
> > If the characters end up getting converted to Unicode or start as Unicode,
> > QT is supposed to fake up some symbols to match. I was able to fix €
> > quite easily. The others, especially TM, are tricker. Try kcharselect to
> > see what's up. All the stuff of interest is on page 32.
> >
> > If you have a REAL Unicode font (ClearlyU, Microsoft Arial Unicode MS)
> > then you should see the characters directly. If you don't it's your
> > X font server's fault.
> >
> > Confused yet?
> >
> > -Bryce
> >
> > >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to
> > >> unsubscribe <<
>
> >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
--
Hi! I'm a do-it-yourself virus... please delete 4 files at random
from your hard drive. Pass me on to all your friends.
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic