'Re: a new library for traversing odf files and a new export filter'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       calligra-devel
Subject:    Re: a new library for traversing odf files and a new export filter
From:       matus.uzak () gmail ! com
Date:       2013-03-26 11:07:06
Message-ID: CAMt8kKXyUhBWEvf18MOtb8VFpa2kYqK5hfUsGt3YFOe=f0P5dw () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

>
> See also somewhat related XmlWriteBuffer in
> filters/libmsooxml/MsooXmlUtils.h which is used "when information that has
> to be written in advance is based on XML elements parsed later. In such
> case the information cannot be saved in one pass" for OOXML=>ODF.
>

The worst thing is relative line-height and font-size.  It must be mapped
into absolute values because each filter uses specific equations which were
reverse engineered.  Most of these equations include the max/min pair.  So
we first have to parse paragraph content to calculate it's
line-height/font-size and then updated text styles.

But I think this does not interfere with the Pull parser concept.  We
simply do some action on styles when coming out of the recursion
(backtracking).

On Tue, Mar 26, 2013 at 10:32 AM, Sebastian Sauer <mail@dipe.org> wrote:

>  On 03/26/2013 02:51 PM, Lassi Nieminen wrote:
>
> Hola,
>
> On Mon, Mar 25, 2013 at 8:12 PM, Inge Wallin <inge@lysator.liu.se> wrote:
>
>> On Monday, March 25, 2013 17:54:53 matus.uzak@gmail.com wrote:
>> > Hi,
>> >
>> > sorry for not discussing earlier, but I did not have much free time last
>> > two weeks.
>> >
>> > I think we should continue the parser type discussion in order to also
>> > improve state of things in libmsooxml.  What we have there is a PULL
>> > parser. And I identified the following problems (Would be cool is Lassi
>> > could check those):
>> >
>> > 1. OOXML sometimes requires us to run the parser twice at one element in
>> > order to first collect selected information required to convert the
>> content
>> > of child elements.
>> >
>> > 2. There are situations when conversion of the 1st child of the root
>> > element requires information from the last child of the root element.
>>
>>  It would be interesting to see some examples of these two issues.
>
>
>  As an example : in pptx files, in slides,
> there can be text which is specified to use theme color lt1
>
>  Don't remember the exact syntax, but something like
> <p>
> <rPr "color" = "lt1"/>
> <r>Hejsan</r>
> </p>
>
>  Then as the last element of that slide there may or may not be
> <clrMap "lt1" = "bg1" ...../> // or something similar
>
>  Which means that lt1 should be interpreted to be bg1 for this particular
> slide.
> Currently what we're doing is that we first read the slide once, skipping
> everything
> except clrMap. Then we read the slide again (yay!) and start the real
> conversion.
>
>  There was something similar in xlsx filters too if my memory serves me
> correctly.
>
>
> See also somewhat related XmlWriteBuffer in
> filters/libmsooxml/MsooXmlUtils.h which is used "when information that has
> to be written in advance is based on XML elements parsed later.  In such
> case the information cannot be saved in one pass" for OOXML=>ODF.
>
> In the case of XSLT I also remember that there where a problem with
> offset-references. Means something like (pseudo-xml):
>
> <style>
>   <item>index 0</index>
>   <item>index 1</index>
>   <item>index 2</index>
> </style>
>
> <content>
>   <content withStyleIndex="1"> // where 1 references to the second
> stlye-item
> <content>
>
> XSLT does iirc not allow such index-based reference-fetching making it
> needed to for-loop with counter over the <style> items all the time they
> are referenced. Super expensive and iirc not caching is done (my knowledge
> there is a few years old, so maybe that changed). A classic case where
> someone just likes to introduce a "caching concept" to read all the items
> at once, prepare them and access them later on direct by index from a
> style-container/mnager. OOXML makes quit a lot of use of such index-based
> references being a 1:1 port from C/C++ to XML.
>
>
> _______________________________________________
> calligra-devel mailing list
> calligra-devel@kde.org
> https://mail.kde.org/mailman/listinfo/calligra-devel
>
>

[Attachment #5 (text/html)]

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">See also somewhat related XmlWriteBuffer in \
filters/libmsooxml/MsooXmlUtils.h which is used &quot;when information that has to be \
written in advance is based on XML elements parsed later.  In such case the \
information cannot be saved in one pass&quot; for OOXML=&gt;ODF.<br> \
</blockquote><div><br></div><div>The worst thing is relative line-height and \
font-size.  It must be mapped into absolute values because each filter uses specific \
equations which were reverse engineered.  Most of these equations include the max/min \
pair.  So we first have to parse paragraph content to calculate it&#39;s \
line-height/font-size and then updated text styles.</div> <div><br></div><div>But I \
think this does not interfere with the Pull parser concept.  We simply do some action \
on styles when coming out of the recursion (backtracking).</div><br><div \
class="gmail_quote">On Tue, Mar 26, 2013 at 10:32 AM, Sebastian Sauer <span \
dir="ltr">&lt;<a href="mailto:mail@dipe.org" \
target="_blank">mail@dipe.org</a>&gt;</span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">  

  <div bgcolor="#FFFFFF" text="#000000"><div><div class="h5">
    <div>On 03/26/2013 02:51 PM, Lassi Nieminen
      wrote:<br>
    </div>
    <blockquote type="cite">Hola,<br>
      <br>
      <div class="gmail_quote">On Mon, Mar 25, 2013 at 8:12 PM, Inge
        Wallin <span dir="ltr">&lt;<a href="mailto:inge@lysator.liu.se" \
target="_blank">inge@lysator.liu.se</a>&gt;</span>  wrote:<br>
        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
                solid;padding-left:1ex">
          <div>On Monday, March 25, 2013 17:54:53 <a \
href="mailto:matus.uzak@gmail.com" target="_blank">matus.uzak@gmail.com</a>  \
wrote:<br>  &gt; Hi,<br>
            &gt;<br>
            &gt; sorry for not discussing earlier, but I did not have
            much free time last<br>
            &gt; two weeks.<br>
            &gt;<br>
            &gt; I think we should continue the parser type discussion
            in order to also<br>
            &gt; improve state of things in libmsooxml.  What we have
            there is a PULL<br>
            &gt; parser. And I identified the following problems (Would
            be cool is Lassi<br>
            &gt; could check those):<br>
            &gt;<br>
            &gt; 1. OOXML sometimes requires us to run the parser twice
            at one element in<br>
            &gt; order to first collect selected information required to
            convert the content<br>
            &gt; of child elements.<br>
            &gt;<br>
            &gt; 2. There are situations when conversion of the 1st
            child of the root<br>
            &gt; element requires information from the last child of the
            root element.<br>
            <br>
          </div>
          It would be interesting to see some examples of these two
          issues.</blockquote>
        <div><br>
        </div>
        <div>As an example : in pptx files, in slides,</div>
        <div>there can be text which is specified to use theme color lt1</div>
        <div><br>
        </div>
        <div>Don&#39;t remember the exact syntax, but something like</div>
        <div>&lt;p&gt;</div>
        <div>&lt;rPr &quot;color&quot; = &quot;lt1&quot;/&gt;</div>
        <div>&lt;r&gt;Hejsan&lt;/r&gt;</div>
        <div>&lt;/p&gt;</div>
        <div>
          <br>
        </div>
        <div>Then as the last element of that slide there may or may not
          be</div>
        <div>&lt;clrMap &quot;lt1&quot; = &quot;bg1&quot; ...../&gt; // or something \
similar</div>  <div><br>
        </div>
        <div>Which means that lt1 should be interpreted to be bg1 for
          this particular slide.</div>
        <div>Currently what we&#39;re doing is that we first read the slide
          once, skipping everything</div>
        <div>except clrMap. Then we read the slide again (yay!) and
          start the real conversion.</div>
        <div><br>
        </div>
        <div>There was something similar in xlsx filters too if my
          memory serves me correctly.</div>
        <div><br>
        </div>
      </div>
    </blockquote>
    <br></div></div>
    See also somewhat related XmlWriteBuffer in
    filters/libmsooxml/MsooXmlUtils.h which is used &quot;when information
    that has to be written in advance is based on XML elements parsed
    later.  In such case the information cannot be saved in one pass&quot;
    for OOXML=&gt;ODF.<br>
    <br>
    In the case of XSLT I also remember that there where a problem with
    offset-references. Means something like (pseudo-xml):<br>
    <br>
    &lt;style&gt;<br>
      &lt;item&gt;index 0&lt;/index&gt;<br>
      &lt;item&gt;index 1&lt;/index&gt;<br>
      &lt;item&gt;index 2&lt;/index&gt;<br>
    &lt;/style&gt;<br>
    <br>
    &lt;content&gt;<br>
      &lt;content withStyleIndex=&quot;1&quot;&gt; // where 1 references to the
    second stlye-item<br>
    &lt;content&gt;<br>
    <br>
    XSLT does iirc not allow such index-based reference-fetching making
    it needed to for-loop with counter over the &lt;style&gt; items all
    the time they are referenced. Super expensive and iirc not caching
    is done (my knowledge there is a few years old, so maybe that
    changed). A classic case where someone just likes to introduce a
    &quot;caching concept&quot; to read all the items at once, prepare them and
    access them later on direct by index from a style-container/mnager.
    OOXML makes quit a lot of use of such index-based references being a
    1:1 port from C/C++ to XML.<br>
    <br>
  </div>

<br>_______________________________________________<br>
calligra-devel mailing list<br>
<a href="mailto:calligra-devel@kde.org">calligra-devel@kde.org</a><br>
<a href="https://mail.kde.org/mailman/listinfo/calligra-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/calligra-devel</a><br> \
<br></blockquote></div><br>

_______________________________________________
calligra-devel mailing list
calligra-devel@kde.org
https://mail.kde.org/mailman/listinfo/calligra-devel

[prev in list] [next in list] [prev in thread] [next in thread]