[prev in list] [next in list] [prev in thread] [next in thread] 

List:       calligra-devel
Subject:    Re: a new library for traversing odf files and a new export filter
From:       Sebastian Sauer <mail () dipe ! org>
Date:       2013-03-26 9:52:59
Message-ID: 51516FFB.3010709 () dipe ! org
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


On 03/26/2013 04:32 PM, Sebastian Sauer wrote:
> On 03/26/2013 02:51 PM, Lassi Nieminen wrote:
>> Hola,
>>
>> On Mon, Mar 25, 2013 at 8:12 PM, Inge Wallin <inge@lysator.liu.se 
>> <mailto:inge@lysator.liu.se>> wrote:
>>
>>     On Monday, March 25, 2013 17:54:53 matus.uzak@gmail.com
>>     <mailto:matus.uzak@gmail.com> wrote:
>>     > Hi,
>>     >
>>     > sorry for not discussing earlier, but I did not have much free
>>     time last
>>     > two weeks.
>>     >
>>     > I think we should continue the parser type discussion in order
>>     to also
>>     > improve state of things in libmsooxml.  What we have there is a
>>     PULL
>>     > parser. And I identified the following problems (Would be cool
>>     is Lassi
>>     > could check those):
>>     >
>>     > 1. OOXML sometimes requires us to run the parser twice at one
>>     element in
>>     > order to first collect selected information required to convert
>>     the content
>>     > of child elements.
>>     >
>>     > 2. There are situations when conversion of the 1st child of the
>>     root
>>     > element requires information from the last child of the root
>>     element.
>>
>>     It would be interesting to see some examples of these two issues.
>>
>>
>> As an example : in pptx files, in slides,
>> there can be text which is specified to use theme color lt1
>>
>> Don't remember the exact syntax, but something like
>> <p>
>> <rPr "color" = "lt1"/>
>> <r>Hejsan</r>
>> </p>
>>
>> Then as the last element of that slide there may or may not be
>> <clrMap "lt1" = "bg1" ...../> // or something similar
>>
>> Which means that lt1 should be interpreted to be bg1 for this 
>> particular slide.
>> Currently what we're doing is that we first read the slide once, 
>> skipping everything
>> except clrMap. Then we read the slide again (yay!) and start the real 
>> conversion.
>>
>> There was something similar in xlsx filters too if my memory serves 
>> me correctly.
>>
>
> See also somewhat related XmlWriteBuffer in 
> filters/libmsooxml/MsooXmlUtils.h which is used "when information that 
> has to be written in advance is based on XML elements parsed later.  
> In such case the information cannot be saved in one pass" for OOXML=>ODF.
>
> In the case of XSLT I also remember that there where a problem with 
> offset-references. Means something like (pseudo-xml):
>
> <style>
>   <item>index 0</index>
>   <item>index 1</index>
>   <item>index 2</index>
> </style>
>
> <content>
>   <content withStyleIndex="1"> // where 1 references to the second 
> stlye-item
> <content>
>
> XSLT does iirc not allow such index-based reference-fetching making it 
> needed to for-loop with counter over the <style> items all the time 
> they are referenced. Super expensive and iirc not caching is done (my 
> knowledge there is a few years old, so maybe that changed). A classic 
> case where someone just likes to introduce a "caching concept" to read 
> all the items at once, prepare them and access them later on direct by 
> index from a style-container/mnager. OOXML makes quit a lot of use of 
> such index-based references being a 1:1 port from C/C++ to XML.

Also somewhat related: Hard to say if caused by ugly design decisions 
alone or driven by XSLT limitations (would think both) but years ago 
when the CleverAge OOXML=>ODF converter sponsored by Microsoft appeared 
during the OOXML ISO battle I investigated that code (for my diploma 
thesis which had OOXML<=>ODF as subject). Lots of intermedia-steps (pre- 
and post processing, multiple xslt runs).

Code is still available at: 
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/
Readme: 
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Readme.txt?revision=5309&view=markup
The main converter lib: 
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/
The xsl's: 
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/resources/oox2odf/

It wasn't that bad but I can confirm Rob Weir's blog back then that the 
converter needs >10x longer then anything else and is a memory-monster.

>
>
>
> _______________________________________________
> calligra-devel mailing list
> calligra-devel@kde.org
> https://mail.kde.org/mailman/listinfo/calligra-devel


[Attachment #5 (text/html)]

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 03/26/2013 04:32 PM, Sebastian Sauer
      wrote:<br>
    </div>
    <blockquote cite="mid:51516B4A.6050205@dipe.org" type="cite">
      <meta content="text/html; charset=ISO-8859-1"
        http-equiv="Content-Type">
      <div class="moz-cite-prefix">On 03/26/2013 02:51 PM, Lassi
        Nieminen wrote:<br>
      </div>
      <blockquote
cite="mid:CABCpZwrTLGFWnrfTF6NaTeJgKtRqAcc46OC6UmAqu3ktA+mHKA@mail.gmail.com"
        type="cite">Hola,<br>
        <br>
        <div class="gmail_quote">On Mon, Mar 25, 2013 at 8:12 PM, Inge
          Wallin <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:inge@lysator.liu.se" target="_blank">inge@lysator.liu.se</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div class="im">On Monday, March 25, 2013 17:54:53 <a
                moz-do-not-send="true"
                href="mailto:matus.uzak@gmail.com">matus.uzak@gmail.com</a>
              wrote:<br>
              &gt; Hi,<br>
              &gt;<br>
              &gt; sorry for not discussing earlier, but I did not have
              much free time last<br>
              &gt; two weeks.<br>
              &gt;<br>
              &gt; I think we should continue the parser type discussion
              in order to also<br>
              &gt; improve state of things in libmsooxml. &nbsp;What we have
              there is a PULL<br>
              &gt; parser. And I identified the following problems
              (Would be cool is Lassi<br>
              &gt; could check those):<br>
              &gt;<br>
              &gt; 1. OOXML sometimes requires us to run the parser
              twice at one element in<br>
              &gt; order to first collect selected information required
              to convert the content<br>
              &gt; of child elements.<br>
              &gt;<br>
              &gt; 2. There are situations when conversion of the 1st
              child of the root<br>
              &gt; element requires information from the last child of
              the root element.<br>
              <br>
            </div>
            It would be interesting to see some examples of these two
            issues.</blockquote>
          <div><br>
          </div>
          <div>As an example : in pptx files, in slides,</div>
          <div>there can be text which is specified to use theme color
            lt1</div>
          <div><br>
          </div>
          <div>Don't remember the exact syntax, but something like</div>
          <div>&lt;p&gt;</div>
          <div>&lt;rPr "color" = "lt1"/&gt;</div>
          <div>&lt;r&gt;Hejsan&lt;/r&gt;</div>
          <div>&lt;/p&gt;</div>
          <div> <br>
          </div>
          <div>Then as the last element of that slide there may or may
            not be</div>
          <div>&lt;clrMap "lt1" = "bg1" ...../&gt; // or something
            similar</div>
          <div><br>
          </div>
          <div>Which means that lt1 should be interpreted to be bg1 for
            this particular slide.</div>
          <div>Currently what we're doing is that we first read the
            slide once, skipping everything</div>
          <div>except clrMap. Then we read the slide again (yay!) and
            start the real conversion.</div>
          <div><br>
          </div>
          <div>There was something similar in xlsx filters too if my
            memory serves me correctly.</div>
          <div><br>
          </div>
        </div>
      </blockquote>
      <br>
      See also somewhat related XmlWriteBuffer in
      filters/libmsooxml/MsooXmlUtils.h which is used "when information
      that has to be written in advance is based on XML elements parsed
      later.&nbsp; In such case the information cannot be saved in one pass"
      for OOXML=&gt;ODF.<br>
      <br>
      In the case of XSLT I also remember that there where a problem
      with offset-references. Means something like (pseudo-xml):<br>
      <br>
      &lt;style&gt;<br>
      &nbsp; &lt;item&gt;index 0&lt;/index&gt;<br>
      &nbsp; &lt;item&gt;index 1&lt;/index&gt;<br>
      &nbsp; &lt;item&gt;index 2&lt;/index&gt;<br>
      &lt;/style&gt;<br>
      <br>
      &lt;content&gt;<br>
      &nbsp; &lt;content withStyleIndex="1"&gt; // where 1 references to the
      second stlye-item<br>
      &lt;content&gt;<br>
      <br>
      XSLT does iirc not allow such index-based reference-fetching
      making it needed to for-loop with counter over the &lt;style&gt;
      items all the time they are referenced. Super expensive and iirc
      not caching is done (my knowledge there is a few years old, so
      maybe that changed). A classic case where someone just likes to
      introduce a "caching concept" to read all the items at once,
      prepare them and access them later on direct by index from a
      style-container/mnager. OOXML makes quit a lot of use of such
      index-based references being a 1:1 port from C/C++ to XML.<br>
    </blockquote>
    <br>
    Also somewhat related: Hard to say if caused by ugly design
    decisions alone or driven by XSLT limitations (would think both) but
    years ago when the CleverAge OOXML=&gt;ODF converter sponsored by
    Microsoft appeared during the OOXML ISO battle I investigated that
    code (for my diploma thesis which had OOXML&lt;=&gt;ODF as subject).
    Lots of intermedia-steps (pre- and post processing, multiple xslt
    runs).<br>
    <br>
    Code is still available at:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/</a><br>
  Readme:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Readme.txt?revision=5309& \
amp;view=markup">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Readme.txt?revision=5309&amp;view=markup</a><br>
  The main converter lib:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/"> \
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/</a><br>
  The xsl's:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/re \
sources/oox2odf/">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/resources/oox2odf/</a><br>
  <br>
    It wasn't that bad but I can confirm Rob Weir's blog back then that
    the converter needs &gt;10x longer then anything else and is a
    memory-monster.<br>
    <br>
    <blockquote cite="mid:51516B4A.6050205@dipe.org" type="cite"> <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
calligra-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:calligra-devel@kde.org">calligra-devel@kde.org</a>
<a class="moz-txt-link-freetext" \
href="https://mail.kde.org/mailman/listinfo/calligra-devel">https://mail.kde.org/mailman/listinfo/calligra-devel</a>
 </pre>
    </blockquote>
    <br>
  </body>
</html>



_______________________________________________
calligra-devel mailing list
calligra-devel@kde.org
https://mail.kde.org/mailman/listinfo/calligra-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic