------------------------------------------------------------------------------- - - - KWord - XML File Format Description v0.0.3 - - - - by Werner, wtrobin@carinthia.com - last changes: 19.09.99 - - Please be so kind to report all the errors, typos,... you'll surely find! - - Beware: The KWord-format is moving quite fast these days (thanks to - - Reggie for his great work) so some information might be outdated! - ------------------------------------------------------------------------------- The KWord file format is a (more or less :) human-readable XML format. This means it consists of HTML-like tags which define the document structure and the contents. The main structure of each KWord document is a header and a body. In the header-part things like the paper size, the author,... are stored. I'd like to start with an example to explain the contents of the body. You might have noticed that you can do nearly everything with the frames in your KWord document (e.g. move them around, interlock them, let your text "flow" through them,...). To achieve this flexibility KWord has to store the data in a very well defined structure including framesets, frames, paragraphs, and so on. As you might remember you were able to choose between templates for "DTPing" and simple "Wordprocessing" right in the beginning (after launching the killer- app). The only difference is that KWord offers some help in managing the layout of the first frame in Wordprocessing-mode (in most cases you'll only have one). Due to that fact "DTPing" offers more flexibility and "Wordprocessing" works almost automagically :) Some basic notes: ----------------- - All kinds of numbers are stored like this: foo="1" (between " and " :) - A rational number looks like this: width="1.03" (note: the '.' is used as "comma") - equals - Unicode-letters (UTF-8 compressed) are used to store the text - Some special characters ('<', '>') are "escaped" ('<', '>') - Please launch Kword and save an empty file - it is much easier to follow this documentation if you wade trough an (almost empty) examle document :) The tags: --------- Each file starts with this tag. Note: You must not "close" this tag (i.e. don't put a at the end of the file!) , Like . It opens/closes the whole doc. Therefore each of them is only used once. Modifiers for : author="Your Name" Shouldn't be a problem :) email="you@home..." Should neither be editor="KWord" Name of the editor which has saved the file mime="application/x-kword" Mimetype of the file (i.e. which app to launch if the you click on the icon representing one of your documents) , Is used to define the properties of the paper. Normally this is the first tag in the "header". Modifiers for : format="1" 0...DIN A3 1...DIN A4 2...DIN A5 3...US LETTER 4...US LEGAL 5...SCREEN (screen sized) 6...CUSTOM (just enter your prefered size) 7...DIN B5 8...US EXECUTIVE ptWidth="595" Width of the page in pt ptHeight="841" Height of the page in pt mmWidth ="210" Same in mm mmHeight="297" Same in mm inchWidth ="8.26772" Same in inch inchHeight="11.6929" Same in inch orientation="0" 0...Portrait 1...Landscape columns="1" Number of columns ptColumnspc="3" Spacing between columns in pt mmColumnspc="1.05833" Same in mm inchColumnspc="0.0416667" Same in inch hType="0" 0...On all pages (even/odd) the same headers 1...Different header only on first page 2...Different headers for even/odd pages fType="0" See hType, header -> footer :) ptHeadBody="9" Distance between header and body in pt ptFootBody="9" Distance betwenn footer and body in pt mmHeadBody="3.5" Same in mm mmFootBody="3.5" Same in mm inchHeadBody="0.137795" Same in inch inchFootBody="0.137795" Same in inch , Used to specify the borders of the . Should only be used within and ! Modifiers for : mmLeft="0" This mmTop="0" should mmRight="0" be mmBottom="0" quite ptLeft="0" self ptTop="0" explanatory :) ptRight="0" ptBottom="0" inchLeft="0" inchTop="0" inchRight="0" inchBottom="0" , Some basic settings Modifiers for : processing="1" 0..."Normal" document (Wordprocessing) 1...DTP-document (DTPing) standardpage="1" There can be only "1" :) hasHeader="0" Is there a header? (0/1) hasFooter="0" Is there a footer? (0/1) unit="mm" Basic unit for positioning, ruler,... , Information for the Footnote-Manager Modifiers for : none This tag stores the value of the first footnote (e.g. "1" means that the first footnote looks like that: [1]) Modifiers for : value="1" explained above , Used to store the formatting options for the footnote. Note: This one must not be used outside the tags! Modifiers for : superscript="1" [???] type="1" [???] , Modifiers for : ref="(null)" The name of the corresponding paragraph. , With this tag you open/close the "frame-section". All your FRAMESETs (notice the small s!) are placed inside (and nowhere else!) Modifiers for : none , This tag defines one frameset. A frameset consists of (at least) one FRAME and one PARAGRAPH. Modifiers for : frameType="1" 0...Base frame (for internal use only!!!) 1...Text frame 2...Picture frame 3...Part frame (e.g. KImage-Part) autoCreateNewFrame="1" Whether KWord should create a new frame if there is no space left in the old one. 0/1 frameInfo="0" 0...Body 1...First header 2...Odd header 3...Even header 4...First footer 5...Odd footer 6...Even footer grpMgr="grpmgr_0" The name of the group manager for this table. (i.e. If this frameset "belongs" to a table the position and the size are contolled by a group manager (one for each table)) row="0" Position in the table (only for "table- frames"). Index starts at 0. col="1" Just guess :) removeable="0" Whether the header-frame is removable or not (notice the typo!). 0/1 [???] , Describes the position, property,... of one FRAME. Note: The tag is used like this: i.e. there are no other tags in between... Modifiers for : left="28" Those four modifiers (left, top, right, top="42" bottom) describe the size and the position right="566" of the frame (absolut to the paper). bottom="798" Note: measured in pt! runaround="1" 0...Don't run around frame 1...Run around bounding rectangle 2...Run around contur runaGapPT="2" Run around with gap (in pt) runaGapMM="1" Same in mm runaGapINCH="0.0393701" Same in inch lWidth="1" Note: Description for all borders (xWidth, lRed="255" xRed, xGreen, xBlue,...) lGreen="255" lBlue="255" xWidth...Width of border (pt) lStyle="0" xRed, xGreen, xBlue...RGB triplet -> color rWidth="1" of border (e.g. 255, 255, 255 -> white) rRed="255" xStyle...Style of the border-line: rGreen="255" 0...Solid rBlue="255" 1...Dash rStyle="0" 2...Dot tWidth="1" 3...Dash-Dot tRed="255" 4...Dash-Dot-Dot tGreen="255" tBlue="255" x==l -> left border tStyle="0" x==r -> right border bWidth="1" x==t -> top border bRed="255" x==b -> bottom border bGreen="255" bBlue="255" bStyle="0" bkRed="255" RGB-triplet of background color bkGreen="255" bkBlue="255" bleftpt="0" Distance: left border - text/picture in pt bleftmm="0" Same in mm bleftinch="0" Same in inch brightpt="0" Distance: right border - text/pictute in pt brightmm="0" Same in mm brightinch="0" Same in inch btoppt="0" Distance: top border - text/picture in pt btopmm="0" Same in mm btopinch="0" Same in inch bbottompt="0" Distance: bottom border - text/pic. in pt bbottommm="0" Same in mm bbottominch="0" Same in inch , All the information for each paragraph (text, color(s), format(s),...) is stored between these two tags. Each FRAMESET may contain as many PARAGRAPH tags as you want to. Modifiers for : none , Just guess :) Currently the text is stroed as UTF-8 compressed Unicode glyphs. Note: the format-tags navigate in the text using an index which starts at 0 and runs up till it reaches length-1. Also the length of the text is used to express format-information. Modifiers for : none , The name of the paragraph only used if it is a footnote (see ). Modifiers for : name="Footnote/Endnote_1" Just the name of the paragraph where the footnote belongs to , Some paragraph information (will be extended?) Modifiers for : info="0" 0...No "special" information 1...Footnote (see ) , Normally the text "flows" trough all the frames. Sometimes you want to define a hard break to the next frame (page); i.e. this and the following paragraphs start in a new frame, even if there is enough space in the last one. Modifiers for : frame="0" 0...let it flow 1...hard brake -> next frame , The text is stored plain. All the formatting is configured between these two tags. Modifiers for : none , These tags describe "runs of text" which share the same fromatting properties. Modifiers for : id="1" 0...none (mustn't be in a file) 1..."normal" text 2...a picture 3...tabulator 4...a variable 5...a footnote pos="0" position in the blabla stream (of course 0-based :) len="5" length of the "run of text" which is formatted using this format (Note: is not stored for picture-fromats (id="2") a typical text FORMAT used to store the text-color guess :) once again... 50 - normal, 75 - bold,... guess :) 0/1 0...normal 1...sub 2...super Note: KWord stores only a link to the real picture! (This should/will change as soon as we use the new storage spec and the new Image Container class) Note: KWord stores a 0-char at the position of the image! Note: KWord stores a 0-char at this pos.! Note: There is no additional info. a variable (e.g. page number, date,...) Note: KWord stores a 0-char at this pos.! 0...date (fix) 1...date (variable) 2...time (fix) 3...time (variable) 4...page number location of the variable (should be easy to "decode") just guess :) this is like the id="1" part for formatting the variable's text a footnote. Note: Once again a 0-char is stored at this position. [???] [???] guess name (reference) to describe the formatting , KWord supports "Styles" and it stores them at the end of the file (STYLES). To use them for each paragraph it has to store them there, too. The reason for that is that you can change the style of one single paragraph without changing the whole document's style the new paragraph-style is based on. Modifiers for : none , Name of the Style this paragraph style is based on Modifiers for : value="Standard" The name :) , Name of the style of the following paragraph Modifiers for : name="Standard" Once again - the name , The alignment of this paragraph. Modifiers for : value="0" 0...left 1...right 2...center 3...block , Distance to the last paragraph Modifiers for : pt="0" mm="0" inch="0" , Distance to the next paragraph Modifiers for : pt="0" mm="0" inch="0" Indent of the first Line Left indent Spacing between the lines of the p. , Describes the counter-type used for this p. Modifiers for : type="0" 0...none 1...arabic numbers (e.g. "1") 2...low letter (e.g. "a") 3...captial letter (e.g. "A") 4...low letters roman number (e.g. "ii") 5...cap. letters roman number (e.g. "IX") 6...bullet (e.g. "-") depth="0" 0..."1", 1..."1.1" 2..."1.1.1",... bullet="176" bullet character start="1" value to start with numberingtype="1" 0...list 1...chapter (added to doc-structure) lefttext="" special text left to counter righttext="" special text right to counter (e.g. ".") bulletfont="times" font for bullet-char All the borders :) Modifiers for <*BORDER>: red, green, blue color style="0" linestyle 0...solid 1...dash 2...dot 3...dash-dot-dash-dot-... 4...dash-dot-dot-dash-dot-dot-... width="0" width (in pt) The format of this "style" (see ) Defines a tabulator at the specific position Modifiers for : type="0" 0...left 1...center 2...right 3...decimal point The are a region where the different style types are defined. The