[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kfm-devel
Subject:    Re: using unicode in khtml
From:       Lars Knoll <Lars.Knoll () mpi-hd ! mpg ! de>
Date:       1999-05-17 14:04:58
[Download RAW message or body]

On Mon, 17 May 1999, Nicolas Brodu wrote:

> On Mon, 17 May 1999, Waldo Bastian wrote :
> >
> >> The <body> tag can be completely missing.
> >> I decided now to make a new (and small) class dealing with the input
> >> stream khtml gets. The class will be called from the tokenizer, and do the
> >> transformation to unicode. This seems to work already.
> >
> >How does it find out which charset should be used? 
> >
> 
> Just an idea, (might have already been discussed, I'm jumping in this thread) :
> 
> How about having a small library of characters specific to (or most frequently
> met in) the various charsets. Then we can check a few lines of the document,
> or at least the title. This, plus a default choice based on the domain
> extension (.fr, .de, ...) in case of multiple choices, should give the good set
> most of the time.

There's a function in the QTextCodecs doing exactly this. As far as I
remember it's called something like heuristicContentMatch(). But I don't
know how good it works. Perhaps one could use it, if neither server nor
document specify a charset. But then again, I would perhaps prefer
to use Latin1, and give the user a choice of manually adjusting the
charset.

Cheers,
Lars

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic