[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kfm-devel
Subject:    Re: using unicode in khtml
From:       Lars Knoll <Lars.Knoll () mpi-hd ! mpg ! de>
Date:       1999-05-18 8:56:51
[Download RAW message or body]

On Tue, 18 May 1999, Waldo Bastian wrote:
> And HTML4.0/B.3.2 Attribute values:
> 
> "When script or style data is the value of an attribute (either 
> style or the intrinsic event attributes), authors should escape 
> occurrences of the delimiting single or double quotation mark 
> within the value according to the script or style language 
> convention. Authors should also escape occurrences of "&" if 
> the "&" is not meant to be the beginning of a character reference. 
> 
>      '"' should be written as "&quot;" or "&#34;" 
>      '&' should be written as "&amp;" or "&#38;" 
> 
> Thus, for example, one could write:
> 
>    <INPUT name="num" value="0"
>    onchange="if (compare(this.value, &quot;help&quot;)) {gethelp()}">
> "

OK. You won ;-)

> > The short answer is, that you save a QChar. The higher byte tells you it's
> > an attribute, the lower one which one.
> 
> Ok.
> 
> > Do we really still need null terminated strings? I thought of passing
> > tokens as QConstString's (which is basically not different from a pointer
> > to a QChar and the length of the string, QConstString avoids copying the
> > QChar array...). Advantage is, that we can use all const member of
> > QString, which will make handling much easier.
> 
> But we should take care to only create a QConstString object for the
> single token we are processing. 

I agree.

> > > If we want to use 0xE000-0xEfff range for internal use, we also
> > > have to make sure we won't get confused when these characters
> > > appear in the HTML itself.
> > 
> > You are right, we should make a check, but the chance of them appearing is
> > small, because using characters in that range is not portable.
> 
> It's just that we shouldn't crash on it.

Hmm... Do I have any other choice than agreeing to that???

> > > (\r never appears in the HTML since we convert it to a \n,
> > > and &#10 is passed as an entity and converted in the parser,
> > > not in the tokenizer.)
> > 
> > That will change. I am now converting entities to QChar's in the
> > tokenizer (IMO that's the place where it should be).
> 
> Yes.
> 

Cheers,
Lars

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic