[prev in list] [next in list] [prev in thread] [next in thread]
List: koffice-devel
Subject: how to extend the new powerpoint parser
From: Jos van den Oever <jos.van.den.oever () kogmbh ! com>
Date: 2010-01-24 23:15:25
Message-ID: 201001250015.25759.jos.van.den.oever () kogmbh ! com
[Download RAW message or body]
Hi all,
The conversion to the new parser is mostly done. The main remaining task is
fixing any regressions that turn up in the conversion.
The generated code provides a simple value base API. Here is an excerpt:
class ExHyperlinkContainer : public StreamOffset {
public:
OfficeArtRecordHeader rh;
ExHyperlinkAtom exHyperlinkAtom;
QSharedPointer<FriendlyNameAtom> friendlyNameAtom;
QSharedPointer<TargetAtom> targetAtom;
QSharedPointer<LocationAtom> locationAtom;
ExHyperlinkContainer(void* /*dummy*/ = 0) {}
};
ExHyperlinkContainer has two obligator members (rh and exHyperlinkAtom) and
three optional members (FriendlyNameAtom, TargetAtom, LocationAtom).
To know what these members are, look in the documentation [1].
This was generated from this XML:
<struct name="ExHyperlinkContainer">
<type name="rh" type="OfficeArtRecordHeader">
<limitation name="recVer" value="0xF" />
<limitation name="recInstance" value="0" />
<limitation name="recType" value="0xFD7" />
</type>
<type name="exHyperlinkAtom" type="ExHyperlinkAtom" />
<type name="friendlyNameAtom" type="FriendlyNameAtom" optional="true" />
<type name="targetAtom" type="TargetAtom" optional="true" />
<type name="locationAtom" type="LocationAtom" optional="true" />
</struct>
You can see here again which members are optional. You also see instructions
on how to parse the code. The member 'rh' has limitations on the values that
its members may have. These limitations are taken from the code.
If you need to get at a structure which has not yet been added to the parser,
you can do so yourself by describing the structure in mso.xml.
Check out msoscheme and compile build the generator:
# check out the code
git clone git://gitorious.org/msoscheme/msoscheme.git
# build and test (you need a java compiler and Apache Ant)
cd msoscheme && ant
# adapt the code in src/mso.xml and regenerate the parsers
ant generateParsers
# look at the new parsers in cpp/simpleParser.h and cpp/simpleParser.cpp
You can check the new mso.xml with the C++ code provided in the project:
mkdir build && cd build
cmake ../cpp
# convert your file to xml
ppttoxml $yourfile
# print the structure of the file
pptstructureprinter $yourfile
If you find a file that is not parsed properly, you get output from the
koconverter like this:
95515 bytes left at the end of PowerPointStructs, so probably an error at
position 7082
If you see this you should find out what structure is defined at position 7082,
which is the likely cause of the problem.
Use pptstructureprinter for this. It gives output like this:
...
1 15 0 7d0 1012 7054 DocInfoListContainer
2 15 1 3ff 20 7062 VBAInfoContainer
3 2 0 400 12 7070 VBAInfoAtom
2 15 0 3fa 103 7090 SlideViewInfoInstance
3 0 0 3fe 3 7098 SlideViewInfoAtom
...
The columns here are
1) nesting level
2) recVer
3) recInstance
4) recType
5) size
6) position
You see that position (7082) reported as likely cause for the error is in a
VBAInfoAtom.
The next step to fix this problem is to look up this structure in mso.xml and
compare it with the documentation. In this case the definition in mso.xml
matches that in the documentation. So either the description is incomplete or
the ppt file was not created in PowerPoint but e.g. in OpenOffice.
To see what is going on, put a breakpoint in the function parseVBAInfoAtom and
see where the parsing error occurs.
The most common error is a discrepancy between the documentation and mso.xml.
If the observed file does not match the documentation then add a remark in the
mso.xml at the place where you add the exception.
Good luck,
Jos
[1] [MS-PPT].pdf and [MS-ODRAW].pdf
--
Jos van den Oever, software architect
+49 391 25 19 15 53
http://kogmbh.com/legal/
_______________________________________________
koffice-devel mailing list
koffice-devel@kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic