[prev in list] [next in list] [prev in thread] [next in thread]
List: xerces-j-user
Subject: Re: Getting the position of a node in the input stream (using Neko)
From: Andy Clark <andyc () apache ! org>
Date: 2002-08-25 18:07:32
[Download RAW message or body]
Martin Jericho wrote:
> > to implement the same thing in NekoHTML. But in neither
> > case do we track "character offsets", which I think has
> > limited usefulness but others disagree.
> Hopefully my arguments below will help to convince you of their usefulness.
I can understand the cases in which people would like to
be able to do this but I also realize what it would take
to implement it. ;)
The "limited usefulness" that I was referring to was the
fact that reporting character offsets only works if the
parsed source is already a character stream. If it's
anything else (say a byte stream in UTF8 or Shift_JIS)
then the application can't map those offsets back to the
source without re-reading the file.
> > Because "no-change" has the potential of producing XML
> > that is not well-formed. And the whole purpose of Neko-
> > HTML is to parse HTML and make it appear as XML.
> So Neko has to do this because otherwise the underlying xerces parser
> would not be able to parse it. Is that right? This would not be of any
I don't use the Xerces2 parser to implement NekoHTML --
I only use the XNI framework and some utility classes.
The NekoHTML scanner is written completely from scratch
to be able to handle HTML
> concern to me anyway if character positions were reported, I was just
> using it as an example to demonstrate that you can't get Neko to output
> the orginal source unchanged.
Ugh. ;)
> > Please let me know more detail about these bugs so
> > that I can fix them. Minimal sample files would be
> > preferable.
>
> I have attached the relevant files.
Thanks for attaching those! I fixed the problem and
have included your sample input file (albeit a little
bit more stripped down) in my set of regression tests.
So if I ever break it, I'll know right away. :)
I changed the behavior of <COL> to *not* automatically
insert a <COLGROUP> as its parent. Is this the behavior
you were expecting? Also, as a general question, do
you think that NekoHTML should insert a <TBODY> parent
for <TR> elements? I notice that Mozilla inserts one.
<aside>The DOM Inspector rules!</aside>
> I get the feeling that this would have to be implemented in the XNI
> framework rather than as a Neko improvement. I would love to get
If it were to be added, the place would be in the
XNI interfaces which would then be implemented in
NekoHTML.
> buffer, not the original source. In fact the only HTML parser I have
> found which does it is the one in the swing package, although I still
> haven't tested it properly. If that doesn't do what I want, I might
> even have to write my own from scratch.
Well, whatever you do, take advantage of all of the
code that's available. I hope NekoHTML can be of use
but if not then that's ok, too.
I'm trying to get a new version of NekoHTML posted
"real soon now". I will make an announcement when
it's ready.
--
Andy Clark * andyc@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic