'[ jEdit-devel ] [ jedit-Plugin Bugs-2810050 ] XMLPlugin'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       jedit-devel
Subject:    [ jEdit-devel ] [ jedit-Plugin Bugs-2810050 ] XMLPlugin
From:       "SourceForge.net" <noreply () sourceforge ! net>
Date:       2010-08-26 6:50:48
Message-ID: E1OoWIS-0002T4-Ch () sfs-web-11 ! v29 ! ch3 ! sourceforge ! com
[Download RAW message or body]

Plugin Bugs item #2810050, was opened at 2009-06-22 03:19
Message generated for change (Comment added) made by kerik-sf
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=565475&aid=2810050&group_id=588

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
> Status: Pending
> Resolution: Fixed
Priority: 5
Private: No
Submitted By: Greg Knittl (gknittl)
Assigned to: Eric Le Lay (kerik-sf)
Summary: XMLPlugin XercesParserImpl  null CompletionInfo

Initial Comment:
requirements:
-XML Plugin relax-ng branch
-the buffer is validated by an XML schema
-the buffer contains an error that prevents Xerces from parsing all the way to the \
end, such as a fundamental syntax error like an unclosed element or an incomplete \
                entity
-the XML plugin has parsed the invalid buffer 
result:
the XML Plugin no longer displays schema related completions.

I recreate this on demand by adding a < as a new element or & and manually invoking \
the XML parser.  I get into this state fairly often in natural use by switching \
buffers in mid element. Enabling parsing on saving or by keystroke will result in \
parsing invalid buffers quite easily too.

The XML plugin builds CompletionInfo each time it parses the buffer.
For buffers validated by XML Schemas,  XercesParserImpl.endElement() builds the \
completion info through Xerces. The plugin builds CompletionInfo when endElement() \
receives the last end tag that Xerces parses. In the case of a structural error, \
Xerces  stops parsing. endElement() doesn't get the last tag and the plugin builds no \
completion information.

The schema could change while the buffer is loaded, so it makes sense to reload it on \
each parse,  or at least to check if it has changed and update it if necessary. As \
time permits, will investigate if there  is a more robust way of capturing the \
schema, perhaps at startDocument. 


----------------------------------------------------------------------

> Comment By: Eric Le Lay (kerik-sf)
Date: 2010-08-26 08:50

Message:
fixed in r18330, test case in r18415

I ended up implementing much of what we discussed :
- still use Xerces to parse XSD schemas, but load the schemas manually at
start of instance document, so you get Completion Info even if the document
is malformed ;
- cache CompletionInfo and Schemas to speed up sidekick parsing;
invalidate cache entries as necessary;
- make CompletionInfo namespace-aware : CompletionInfo may contain
elements from different namespaces, XMLParsedData contains schema to prefix
mappings.

----------------------------------------------------------------------

Comment By: Greg Knittl (gknittl)
Date: 2009-06-24 23:11

Message:
Hi Eric,

Yes, it would be good to parse the schema directly. Perhaps we could
repurpose some tool that creates a sample instance document from a schema.
The more I review W3C schemas the more I am reminded that they are so
complicated that my instinct would be to reuse specialized software of some
kind, if nothing else it might be better to use the root element trick to
generate the Xerces psvi rather than trying to code our own.

Seems to me that bringing the full tree to bear on completion requires
building the tree each keystroke since currently completion has no idea
what the user has typed since the last time the tree was parsed. And since
jEdit allows users to type anywhere they could have changed the overall
tree structure especially by adding a new element between parses. The jEdit
syntax highlighting is per key stroke and it's a pretty good approximation
of basic XML syntax, but it doesn't give the tree structure. Maybe for XML
replace the existing jEdit syntax highlighter with a simple-minded XML
parser that doesn't do validation but does basic syntax + plus tree
structure.  This would be per keystroke. Then parsing the buffer would mean
doing full scale parsing and  validation and that would be at intervals
like the tree parse is today. Perhaps multi-core CPUs do have the
horsepower for keystroke parsing but I also want to run jEdit on my EEE
704! 

XMLParser should currently be able to get a reasonable approximation of
the namespace of the current enclosing element by parsing backwards in the
code the way it does now to get the enclosing tag. If it's broken then
perhaps TagParser needs to support namespace prefixes. CompletionInfo could
also lack support for namespace. Exactly what information would be useful
in CompletionInfo is another interesting design discussion. It goes back a
long way and may have been a way to standardize completion info from dtds
and schemas in the early days of schema.

Schema being added dynamically was related to comments I made about
potentially caching and reusing CompletionInfo to improve performance. It's
more complicated than I thought. I should really keep it separate from this
bug.

----------------------------------------------------------------------

Comment By: Eric Le Lay (kerik-sf)
Date: 2009-06-24 22:02

Message:
The more I think, the more I like the idea of getting away from Xerces to
get the schema information.
Indeed, it has to be done in the case of Relax NG, as there is no type
information from the parser (ie no PSVI) . So maybe we could parse and
construct the completion info by ourselves as you suggested in the
beginning.

Re-parsing a root-only document is a nice trick though, and could be a
short term solution.

Something I would really like is completion taking into account (and that
would require a mean to construct a sidekick tree even if the buffer
contains malformed XML) : 
  - the location in the tree, for Relax NG schemas which allow c in /a/b 
but not when  b is at the root
  - the current namespace, because now if 2 elements have the same name
but different namespaces, only the first one is used.

Sorry, but I don't understand what you say about a schema being
dynamically added to a document.

----------------------------------------------------------------------

Comment By: Greg Knittl (gknittl)
Date: 2009-06-24 08:39

Message:
Parsing a well formed root element with no contents regenerates full
CompletionInfo after parsing a mal-formed buffer has nulled out
CompletionInfo.  This implies a possible solution by parsing the full
document first and doing the additional root-only parse to generation
CompletionInfo if the full parse doesn't get to the end. 

I would have to research further to check for possible complications. For
example, whether it's possible to declare schemas on non-root documents.

I suppose the plugin should handle the case where a schema is dynamically
added to a document or removed from a document. It also has to handle
dynamic changes to the underlying schema. Trying to optimize this seems
more complicated than I thought. So I will keep optimizing separate from
the original issue and pursue it elsewhere if needed.

----------------------------------------------------------------------

Comment By: Greg Knittl (gknittl)
Date: 2009-06-22 20:21

Message:
Hi Eric,

I suspect that CompletionInfo looses all the finer points of XML Schemas.
In the end XMLParser often uses CompetionInfo where the buffer is
malformed XML.
There is no schema info currently guaranteed available to XMLParser since
jEdit doesn't have an incremental parser. 
Even with an incremental parser/validator, suggesting completion info is
above and beyond validation.

The ideal ideal solution would be an incremental
parser/validater/completer. In the absence of an incremental
parser/validater I'm thinking the more practical ideal solution would be to
parse the schema(s) directly into completion info. This would only be
required once when that schema is loaded and could be reusable across
buffers that use that schema. There are complicating factors like buffers
governed by multiple schemas, but CompletionInfo may be a simplifying
bottleneck. It might be as simple as running an XSLT transform against the
schema(s) to boil it(them) down to CompletionInfo granularity.

Will investigate further as time permits.




 

----------------------------------------------------------------------

Comment By: Eric Le Lay (kerik-sf)
Date: 2009-06-22 19:33

Message:
Since we extract only the most basic information from the schema (sigh), I
conceive that we could as well parse the xsd files ourselves. But even so,
things can get tricky (see those abstract elements and substitution groups
which I had never heard of, but are nonetheless present in the docbook
schemas).

----------------------------------------------------------------------

Comment By: Eric Le Lay (kerik-sf)
Date: 2009-06-22 19:30

Message:
Greg,
it's is not a bug, it's a feature ;-)

We do not reinvent the wheel and so are dependent on the Xerces parser to
do the job.
Now I see your point : as you already have explained, this is a feature
request for an incremental/lax parser instead of a fully validating
parser.

About the schema:  it's declared at the root element, and thus available
in the attributes during the first call to startElement(). But the
advantage of getting it from Xerces is that it's in a nice and digestible
way : no need to build our own schema parser and so on.

About CompletionInfo (which is what you call schema) caching, this could
indeed be a nice feature to have. Remember however that parsing the buffer
produces also the sidekick tree, and I would guess it's the most time
consuming.

----------------------------------------------------------------------

Comment By: Greg Knittl (gknittl)
Date: 2009-06-22 06:24

Message:
http://xerces.apache.org/xerces2-j/faq-xs.html states
"[schema information] property is only available on the endElement method
for the validation root."

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=565475&aid=2810050&group_id=588

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users 
worldwide. Take advantage of special opportunities to increase revenue and 
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
-- 
-----------------------------------------------
jEdit Developers' List
jEdit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jedit-devel


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic