[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: Special chars in khtml
From:       Bryce Nesbitt <bryce () obviously ! com>
Date:       2001-11-09 18:02:17
[Download RAW message or body]

Lars Knoll wrote:
> 
> On Friday 09 November 2001 18:41, Bryce Nesbitt wrote:
> > Shamus wrote:
> > > >On Mit, 07 Nov 2001, Shamus wrote:
> > > >> (left/right single/double quote, em/en dash) to a question mark? Is
> > > >> the problem in khtml, Qt, or X?
> > > >
> > > >The problem is your setup, fonts are not having the character you've
> > > >requested.
> > >
> > > Hmm. I set up TrueType fonts on my XFree setup (4.1.0) that I *know* have
> > > those characters, and still no luck. It could be that the encoding of
> > > those fonts is screwing things up (after all, they came from my Win box).
> > > Can you point me to a font that has the correct encoding/characters to
> > > test with (the ones bundled with XFree obviously don't make the cut)?
> > > Would it be possible to distrubite such fonts with KDE 3?
> >
> > I've been working on this issue.  There are a lot of players.
> > First try a test page:
> >       http://www.obviously.com/browsers/iso-8859-1_unicode.html
> >       http://www.obviously.com/browsers/windows-1252.html
> >
> > The basic problem is that you're working with characters that are usually
> > encoded illegally.  They originate from Microsoft boxes, and if used at
> > all, should be labeled "charset=windows-1252".
> >
> > khtml has a kludge to notice some of these characters and subtitute them
> > with ASCII:
> >                 case 0x82: (x) = ','; break; \
> >                 case 0x84: (x) = '"'; break; \
> >                 case 0x8b: (x) = '<'; break; \
> >                 case 0x9b: (x) = '>'; break; \
> >                 case 0x91: (x) = '\''; break; \
> >                 case 0x92: (x) = '\''; break; \
> >                 case 0x93: (x) = '"'; break; \
> >                 case 0x94: (x) = '"'; break; \
> >                 case 0x95: (x) = '*'; break; \
> >                 case 0x96: (x) = '-'; break; \
> >                 case 0x97: (x) = '-'; break; \
> >                 case 0x98: (x) = '~'; break; \
> >                 case 0xb7: (x) = '*'; break; \
> 
> Yep. They are used too often in web pages to be able to ignore them. I don't
> think we should subsitute them in any case, but only for latin1 encoded web
> pages (and maybe other 8859-x encodings).

(TM) is the one I run into all the time, and it's not covered.
Also the above substitutions convert these characters to ASCII,
rather than to their true Unicode equivalents.  See:
	http://www.obviously.com/browsers/windows-1252.html
For the official Microsoft chart listing the Unicode substitutions.



> > khtml also messes with unicode, in a way I'm sure is a bad idea.  I
> > added the missing left quote, but think the whole approach is broken:
> >                 case 0x2013: (x) = '-'; break; \
> >                 case 0x2014: (x) = '-'; break; \
> >                 case 0x2018: (x) = '\''; break; \
> >                 case 0x2019: (x) = '\''; break; \
> >                 case 0x201c: (x) = '"'; break; \
> >                 case 0x201d: (x) = '"'; break; \
> 
> This approach was a workaround for Qt-2, which did map these chars to boxes
> otherwise (for latin1 fonts, mostly used in khtml). With Qt3, the reasoning
> behind this is wrong, and we shouldn't do this anymore. If at all, Qt should
> provide a reasonable mapping for these chars, in case it can't find a Unicode
> (or other) font contaning them.
> 
> Lars
> 
> > If the characters end up getting converted to Unicode or start as Unicode,
> > QT is supposed to fake up some symbols to match.  I was able to fix &euro;
> > quite easily.  The others, especially TM, are tricker.  Try kcharselect to
> > see what's up.  All the stuff of interest is on page 32.
> >
> > If you have a REAL Unicode font (ClearlyU, Microsoft Arial Unicode MS)
> > then you should see the characters directly.  If you don't it's your
> > X font server's fault.
> >
> > Confused yet?
> >
> >               -Bryce
> >
> > >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to
> > >> unsubscribe <<
> 
> >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

-- 
Hi! I'm a do-it-yourself virus... please delete 4 files at random
from your hard drive.  Pass me on to all your friends.
 
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic