[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    how to extend the new powerpoint parser
From:       Jos van den Oever <jos.van.den.oever () kogmbh ! com>
Date:       2010-01-24 23:15:25
Message-ID: 201001250015.25759.jos.van.den.oever () kogmbh ! com
[Download RAW message or body]

Hi all,

The conversion to the new parser is mostly done. The main remaining task is 
fixing any regressions that turn up in the conversion.

The generated code provides a simple value base  API. Here is an excerpt:

class ExHyperlinkContainer : public StreamOffset {
public:
    OfficeArtRecordHeader rh;
    ExHyperlinkAtom exHyperlinkAtom;
    QSharedPointer<FriendlyNameAtom> friendlyNameAtom;
    QSharedPointer<TargetAtom> targetAtom;
    QSharedPointer<LocationAtom> locationAtom;
    ExHyperlinkContainer(void* /*dummy*/ = 0) {}
};

ExHyperlinkContainer has two obligator members  (rh and exHyperlinkAtom) and 
three optional members (FriendlyNameAtom, TargetAtom, LocationAtom).
To know what these members are, look in the documentation [1].

This was generated from this XML:
	<struct name="ExHyperlinkContainer">
		<type name="rh" type="OfficeArtRecordHeader">
			<limitation name="recVer" value="0xF" />
			<limitation name="recInstance" value="0" />
			<limitation name="recType" value="0xFD7" />
		</type>
		<type name="exHyperlinkAtom" type="ExHyperlinkAtom" />
		<type name="friendlyNameAtom" type="FriendlyNameAtom" optional="true" />
		<type name="targetAtom" type="TargetAtom" optional="true" />
		<type name="locationAtom" type="LocationAtom" optional="true" />
	</struct>

You can see here again which members are optional. You also see instructions 
on how to parse the code. The member 'rh' has limitations on the values that 
its members may have. These limitations are taken from the code.

If you need to get at a structure which has not yet been added to the parser, 
you can do so yourself by describing the structure in mso.xml.

Check out msoscheme and compile build the generator:
   # check out the code
   git clone git://gitorious.org/msoscheme/msoscheme.git
   # build and test (you need a java compiler and Apache Ant)
   cd msoscheme && ant
   # adapt the code in src/mso.xml and regenerate the parsers
   ant generateParsers
   # look at the new parsers in cpp/simpleParser.h and cpp/simpleParser.cpp

You can check the new mso.xml with the C++ code provided in the project:
   mkdir build && cd build
   cmake ../cpp
   # convert your file to xml
   ppttoxml $yourfile
   # print the structure of the file
   pptstructureprinter $yourfile

If you find a file that is not parsed properly, you get output from the 
koconverter like this:
   95515 bytes left at the end of PowerPointStructs, so probably an error at
        position  7082
If you see this you should find out what structure is defined at position 7082, 
which is the likely cause of the problem.
Use pptstructureprinter for this. It gives output like this:
...
1       15      0       7d0     1012    7054    DocInfoListContainer
2       15      1       3ff     20      7062    VBAInfoContainer
3       2       0       400     12      7070    VBAInfoAtom
2       15      0       3fa     103     7090    SlideViewInfoInstance
3       0       0       3fe     3       7098    SlideViewInfoAtom
...
The columns here are
 1)  nesting level
 2)  recVer
 3) recInstance
 4) recType
 5) size
  6) position
You see that position (7082) reported as likely cause for the error is in a 
VBAInfoAtom.

The next step to fix this problem is to look up this structure in mso.xml and 
compare it with the documentation. In this case the definition in mso.xml 
matches that in the documentation. So either the description is incomplete or 
the ppt file was not created in PowerPoint but e.g. in OpenOffice.

To see what is going on, put a breakpoint in the function parseVBAInfoAtom and 
see where the parsing error occurs.

The most common error is a discrepancy between the documentation and mso.xml. 
If the observed file does not match the documentation then add a remark in the 
mso.xml at the place where you add the exception.

Good luck,
Jos

[1] [MS-PPT].pdf and [MS-ODRAW].pdf

-- 
Jos van den Oever, software architect
+49 391 25 19 15 53
http://kogmbh.com/legal/
_______________________________________________
koffice-devel mailing list
koffice-devel@kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic