'Re: a new library for traversing odf files and a new export filter'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       calligra-devel
Subject:    Re: a new library for traversing odf files and a new export filter
From:       matus.uzak () gmail ! com
Date:       2013-03-25 16:54:53
Message-ID: CAMt8kKVfZp9upMg2oxjDWQ+y1dq-4x7b8WV87QdtCVQ7ythO9Q () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

Hi,

sorry for not discussing earlier, but I did not have much free time last
two weeks.

I think we should continue the parser type discussion in order to also
improve state of things in libmsooxml.  What we have there is a PULL
parser. And I identified the following problems (Would be cool is Lassi
could check those):

1. OOXML sometimes requires us to run the parser twice at one element in
order to first collect selected information required to convert the content
of child elements.

2. There are situations when conversion of the 1st child of the root
element requires information from the last child of the root element.

3. Interpretation of OOXML elements differs based on the namespace and that
happens in scope of one single filter implementation (The namespace is not
only limited to WordprocessingML, DrawingML and VML - that would be the
docx filter for example).  That forces us to maintain a context in order to
interpret attribute values properly.  There also might be totally different
child elements.  It's good that namespace is always checked, because that
avoids creation of invalid ODF, but it also ignores an element in an
unexpected namespace.

4. Variations of 1, 2 and 3.

It sounds like we need to adopt attributes of a SAX parser in order to
solve point 3.  And the code becomes a bit fluffy when we try to solve 1, 2
and 4, which is not an attribute of a PULL parser.

We will also need to fight with this when doing the ODF->OOXML conversion.
 As Inge wrote, the current plan is to export text and simple formatting
into DOCX.  But I'm afraid we will hit one of the problems soon.

I have also read comments from Jos about using XSLT to do the conversion.
 Do you think it would be easier to solve points 1,2,3 and 4 that way?
 When I imagine the code in XSLT using XPath, it could be Ok.  But not that
Ok in means of performance.

br,

Matus Uzak

[Attachment #5 (text/html)]

<div>Hi,<br></div><div><br></div><div>sorry for not discussing earlier, but=
 I did not have much free time last two weeks.</div><div><br></div><div>I t=
hink we should continue the parser type discussion in order to also improve=
 state of things in libmsooxml. =A0What we have there is a PULL parser. And=
 I identified the following problems (Would be cool is Lassi could check th=
ose):</div>

<div>=A0</div><div>1. OOXML sometimes requires us to run the parser twice a=
t one element in order to first collect selected information required to co=
nvert the content of child elements.</div><div><br></div><div>2. There are =
situations when conversion of the 1st child of the root element requires in=
formation from the last child of the root element.</div>

<div><br></div><div>3. Interpretation of OOXML elements differs based on th=
e namespace and that happens in scope of one single filter implementation (=
The namespace is not only limited to WordprocessingML, DrawingML and VML - =
that would be the docx filter for example). =A0That forces us to maintain a=
 context in order to interpret attribute values properly. =A0There also mig=
ht be totally different child elements. =A0It&#39;s good that namespace is =
always checked, because that avoids creation of invalid ODF, but it also ig=
nores an element in an unexpected namespace.</div>

<div><br></div><div>4. Variations of 1, 2 and 3.</div><div><br></div><div>I=
t sounds like we need to adopt attributes of a SAX parser in order to solve=
 point 3. =A0And the code becomes a bit fluffy when we try to solve 1, 2 an=
d 4, which is not an attribute of a PULL parser.</div>

<div><br></div><div>We will also need to fight with this when doing the ODF=
-&gt;OOXML conversion. =A0As Inge wrote, the current plan is to export text=
 and simple formatting into DOCX. =A0But I&#39;m afraid we will hit one of =
the problems soon.</div>
<div><br></div><div>I have also read comments from Jos about using XSLT to =
do the conversion. =A0Do you think it would be easier to solve points 1,2,3=
 and 4 that way? =A0When I imagine the code in XSLT using XPath, it could b=
e Ok. =A0But not that Ok in means of performance.</div>
<div><br></div><div>br,</div><div><br></div><div>Matus Uzak</div>

_______________________________________________
calligra-devel mailing list
calligra-devel@kde.org
https://mail.kde.org/mailman/listinfo/calligra-devel

[prev in list] [next in list] [prev in thread] [next in thread]