[prev in list] [next in list] [prev in thread] [next in thread]
List: kfm-devel
Subject: Re: using unicode in khtml
From: Lars Knoll <Lars.Knoll () mpi-hd ! mpg ! de>
Date: 1999-05-17 14:04:58
[Download RAW message or body]
On Mon, 17 May 1999, Nicolas Brodu wrote:
> On Mon, 17 May 1999, Waldo Bastian wrote :
> >
> >> The <body> tag can be completely missing.
> >> I decided now to make a new (and small) class dealing with the input
> >> stream khtml gets. The class will be called from the tokenizer, and do the
> >> transformation to unicode. This seems to work already.
> >
> >How does it find out which charset should be used?
> >
>
> Just an idea, (might have already been discussed, I'm jumping in this thread) :
>
> How about having a small library of characters specific to (or most frequently
> met in) the various charsets. Then we can check a few lines of the document,
> or at least the title. This, plus a default choice based on the domain
> extension (.fr, .de, ...) in case of multiple choices, should give the good set
> most of the time.
There's a function in the QTextCodecs doing exactly this. As far as I
remember it's called something like heuristicContentMatch(). But I don't
know how good it works. Perhaps one could use it, if neither server nor
document specify a charset. But then again, I would perhaps prefer
to use Latin1, and give the user a choice of manually adjusting the
charset.
Cheers,
Lars
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic