'Re: Getting the position of a node in the input stream (using Neko)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xerces-j-user
Subject:    Re: Getting the position of a node in the input stream (using Neko)
From:       Andy Clark <andyc () apache ! org>
Date:       2002-08-25 18:07:32
[Download RAW message or body]

Martin Jericho wrote:
>  > to implement the same thing in NekoHTML. But in neither
>  > case do we track "character offsets", which I think has
>  > limited usefulness but others disagree.
> Hopefully my arguments below will help to convince you of their usefulness.

I can understand the cases in which people would like to
be able to do this but I also realize what it would take
to implement it. ;)

The "limited usefulness" that I was referring to was the
fact that reporting character offsets only works if the
parsed source is already a character stream. If it's
anything else (say a byte stream in UTF8 or Shift_JIS)
then the application can't map those offsets back to the
source without re-reading the file.

>  > Because "no-change" has the potential of producing XML
>  > that is not well-formed. And the whole purpose of Neko-
>  > HTML is to parse HTML and make it appear as XML.
> So Neko has to do this because otherwise the underlying xerces parser 
> would not be able to parse it.  Is that right?  This would not be of any 

I don't use the Xerces2 parser to implement NekoHTML --
I only use the XNI framework and some utility classes.
The NekoHTML scanner is written completely from scratch
to be able to handle HTML

> concern to me anyway if character positions were reported, I was just 
> using it as an example to demonstrate that you can't get Neko to output 
> the orginal source unchanged.

Ugh. ;)

>  > Please let me know more detail about these bugs so
>  > that I can fix them. Minimal sample files would be
>  > preferable.
>  
> I have attached the relevant files.

Thanks for attaching those! I fixed the problem and
have included your sample input file (albeit a little
bit more stripped down) in my set of regression tests.
So if I ever break it, I'll know right away. :)

I changed the behavior of <COL> to *not* automatically
insert a <COLGROUP> as its parent. Is this the behavior
you were expecting? Also, as a general question, do
you think that NekoHTML should insert a <TBODY> parent
for <TR> elements? I notice that Mozilla inserts one.
<aside>The DOM Inspector rules!</aside>

> I get the feeling that this would have to be implemented in the XNI 
> framework rather than as a Neko improvement.  I would love to get 

If it were to be added, the place would be in the
XNI interfaces which would then be implemented in
NekoHTML.

> buffer, not the original source.  In fact the only HTML parser I have 
> found which does it is the one in the swing package, although I still 
> haven't tested it properly.  If that doesn't do what I want, I might 
> even have to write my own from scratch.

Well, whatever you do, take advantage of all of the
code that's available. I hope NekoHTML can be of use
but if not then that's ok, too.

I'm trying to get a new version of NekoHTML posted
"real soon now". I will make an announcement when
it's ready.

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic