'Re: Problem round-tripping with xml.dom.minidom pretty-printer'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-list
Subject:    Re: Problem round-tripping with xml.dom.minidom pretty-printer
From:       Robert Bossy <Robert.Bossy () jouy ! inra ! fr>
Date:       2008-02-29 17:50:31
Message-ID: 47C845E7.7010001 () jouy ! inra ! fr
[Download RAW message or body]

Ben Butler-Cole wrote:
>> An additional thing to keep in mind is that toprettyxml does not print
>> an XML identical to the original DOM tree: it adds newlines and tabs.
>> When parsed again these blank characters are inserted in the DOM tree as
>> character nodes. If you toprettyxml an XML document twice in a row, then
>> the second one will also add newlines and tabs around the newlines and
>> tabs added by the first. Since you call toprettyxml an infinite number
>> of times, it is expected that lots of blank characters appear.
>>     
>
> Right. That's the behaviour I'm asking about, which I consider to be
> problematic. I would expect a module providing a parser and pretty-
> printer (not just for XML parsers) to be able to conservatively round-
> trip.
>
> As far as I can see (and your comments back this up) minidom doesn't
> have this property. Unless anyone knows how to get it to behave that
> way...
>   
minidom --any DOM parser, btw-- has no means to know which blank 
character is a pretty print artefact or actual blank content from the 
original XML.

You could write a function that strips all-blank nodes recursively down 
the elements tree, before doing so I suggest you take a look at section 
2.10 of http://www.w3.org/TR/REC-xml/.

RB

-- 
http://mail.python.org/mailman/listinfo/python-list
[prev in list] [next in list] [prev in thread] [next in thread]