-------------------------------------------------------------------------------
- -
- KWord - XML File Format Description v0.0.3 -
- -
- by Werner, wtrobin@carinthia.com - last changes: 19.09.99 -
- Please be so kind to report all the errors, typos,... you'll surely find! -
- Beware: The KWord-format is moving quite fast these days (thanks to -
- Reggie for his great work) so some information might be outdated! -
-------------------------------------------------------------------------------
The KWord file format is a (more or less :) human-readable XML format. This
means it consists of HTML-like tags which define the document structure and
the contents. The main structure of each KWord document is a header and a body.
In the header-part things like the paper size, the author,... are stored.
I'd like to start with an example to explain the contents of the body. You
might have noticed that you can do nearly everything with the frames in your
KWord document (e.g. move them around, interlock them, let your text "flow"
through them,...). To achieve this flexibility KWord has to store the data in
a very well defined structure including framesets, frames, paragraphs, and so
on.
As you might remember you were able to choose between templates for "DTPing"
and simple "Wordprocessing" right in the beginning (after launching the killer-
app). The only difference is that KWord offers some help in managing the layout
of the first frame in Wordprocessing-mode (in most cases you'll only have one).
Due to that fact "DTPing" offers more flexibility and "Wordprocessing" works
almost automagically :)
Some basic notes:
-----------------
- All kinds of numbers are stored like this: foo="1" (between " and " :)
- A rational number looks like this: width="1.03" (note: the '.' is used as
"comma")
- equals
- Unicode-letters (UTF-8 compressed) are used to store the text
- Some special characters ('<', '>') are "escaped" ('<', '>')
- Please launch Kword and save an empty file - it is much easier to follow this
documentation if you wade trough an (almost empty) examle document :)
The tags:
---------
Each file starts with this tag.
Note: You must not "close" this tag
(i.e. don't put a ?xml...> at the end of the
file!)
, Like . It opens/closes the whole doc.
Therefore each of them is only used once.
Modifiers for :
author="Your Name" Shouldn't be a problem :)
email="you@home..." Should neither be
editor="KWord" Name of the editor which has saved the file
mime="application/x-kword" Mimetype of the file (i.e. which app to
launch if the you click on the icon
representing one of your documents)
, Is used to define the properties of the paper.
Normally this is the first tag in the "header".
Modifiers for :
format="1" 0...DIN A3
1...DIN A4
2...DIN A5
3...US LETTER
4...US LEGAL
5...SCREEN (screen sized)
6...CUSTOM (just enter your prefered size)
7...DIN B5
8...US EXECUTIVE
ptWidth="595" Width of the page in pt
ptHeight="841" Height of the page in pt
mmWidth ="210" Same in mm
mmHeight="297" Same in mm
inchWidth ="8.26772" Same in inch
inchHeight="11.6929" Same in inch
orientation="0" 0...Portrait
1...Landscape
columns="1" Number of columns
ptColumnspc="3" Spacing between columns in pt
mmColumnspc="1.05833" Same in mm
inchColumnspc="0.0416667" Same in inch
hType="0" 0...On all pages (even/odd) the same
headers
1...Different header only on first page
2...Different headers for even/odd pages
fType="0" See hType, header -> footer :)
ptHeadBody="9" Distance between header and body in pt
ptFootBody="9" Distance betwenn footer and body in pt
mmHeadBody="3.5" Same in mm
mmFootBody="3.5" Same in mm
inchHeadBody="0.137795" Same in inch
inchFootBody="0.137795" Same in inch
, Used to specify the borders of the . Should
only be used within and !
Modifiers for :
mmLeft="0" This
mmTop="0" should
mmRight="0" be
mmBottom="0" quite
ptLeft="0" self
ptTop="0" explanatory :)
ptRight="0"
ptBottom="0"
inchLeft="0"
inchTop="0"
inchRight="0"
inchBottom="0"
, Some basic settings
Modifiers for :
processing="1" 0..."Normal" document (Wordprocessing)
1...DTP-document (DTPing)
standardpage="1" There can be only "1" :)
hasHeader="0" Is there a header? (0/1)
hasFooter="0" Is there a footer? (0/1)
unit="mm" Basic unit for positioning, ruler,...
, Information for the Footnote-Manager
Modifiers for :
none
This tag stores the value of the first footnote
(e.g. "1" means that the first footnote looks
like that: [1])
Modifiers for :
value="1" explained above
, Used to store the formatting options for the
footnote. Note: This one must not be used outside
the tags!
Modifiers for :
superscript="1" [???]
type="1" [???]
,
Modifiers for :
ref="(null)" The name of the corresponding paragraph.
, With this tag you open/close the "frame-section".
All your FRAMESETs (notice the small s!) are
placed inside (and nowhere else!)
Modifiers for :
none
This tag defines one frameset. A frameset consists
of (at least) one FRAME and one PARAGRAPH.
Modifiers for :
frameType="1" 0...Base frame (for internal use only!!!)
1...Text frame
2...Picture frame
3...Part frame (e.g. KImage-Part)
autoCreateNewFrame="1" Whether KWord should create a new frame if
there is no space left in the old one. 0/1
frameInfo="0" 0...Body
1...First header
2...Odd header
3...Even header
4...First footer
5...Odd footer
6...Even footer
grpMgr="grpmgr_0" The name of the group manager for this
table. (i.e. If this frameset "belongs" to
a table the position and the size are
contolled by a group manager (one for each
table))
row="0" Position in the table (only for "table-
frames"). Index starts at 0.
col="1" Just guess :)
removeable="0" Whether the header-frame is removable or
not (notice the typo!). 0/1 [???]
, Describes the position, property,... of one FRAME.
Note: The tag is used like this:
i.e. there are no other tags
in between...
Modifiers for :
left="28" Those four modifiers (left, top, right,
top="42" bottom) describe the size and the position
right="566" of the frame (absolut to the paper).
bottom="798" Note: measured in pt!
runaround="1" 0...Don't run around frame
1...Run around bounding rectangle
2...Run around contur
runaGapPT="2" Run around with gap (in pt)
runaGapMM="1" Same in mm
runaGapINCH="0.0393701" Same in inch
lWidth="1" Note: Description for all borders (xWidth,
lRed="255" xRed, xGreen, xBlue,...)
lGreen="255"
lBlue="255" xWidth...Width of border (pt)
lStyle="0" xRed, xGreen, xBlue...RGB triplet -> color
rWidth="1" of border (e.g. 255, 255, 255 -> white)
rRed="255" xStyle...Style of the border-line:
rGreen="255" 0...Solid
rBlue="255" 1...Dash
rStyle="0" 2...Dot
tWidth="1" 3...Dash-Dot
tRed="255" 4...Dash-Dot-Dot
tGreen="255"
tBlue="255" x==l -> left border
tStyle="0" x==r -> right border
bWidth="1" x==t -> top border
bRed="255" x==b -> bottom border
bGreen="255"
bBlue="255"
bStyle="0"
bkRed="255" RGB-triplet of background color
bkGreen="255"
bkBlue="255"
bleftpt="0" Distance: left border - text/picture in pt
bleftmm="0" Same in mm
bleftinch="0" Same in inch
brightpt="0" Distance: right border - text/pictute in pt
brightmm="0" Same in mm
brightinch="0" Same in inch
btoppt="0" Distance: top border - text/picture in pt
btopmm="0" Same in mm
btopinch="0" Same in inch
bbottompt="0" Distance: bottom border - text/pic. in pt
bbottommm="0" Same in mm
bbottominch="0" Same in inch
, All the information for each paragraph (text,
color(s), format(s),...) is stored between these
two tags. Each FRAMESET may contain as many
PARAGRAPH tags as you want to.
Modifiers for :
none
, Just guess :) Currently the text is stroed as
UTF-8 compressed Unicode glyphs.
Note: the format-tags navigate in the text using
an index which starts at 0 and runs up till it
reaches length-1. Also the length of the text is
used to express format-information.
Modifiers for :
none
, The name of the paragraph only used if it is a
footnote (see ).
Modifiers for :
name="Footnote/Endnote_1" Just the name of the paragraph where the
footnote belongs to
, Some paragraph information (will be extended?)
Modifiers for :
info="0" 0...No "special" information
1...Footnote (see )
, Normally the text "flows" trough all the frames.
Sometimes you want to define a hard break to the
next frame (page); i.e. this and the following
paragraphs start in a new frame, even if there is
enough space in the last one.
Modifiers for :
frame="0" 0...let it flow
1...hard brake -> next frame
, The text is stored plain. All the formatting is
configured between these two tags.
Modifiers for :
none
, These tags describe "runs of text" which share the
same fromatting properties.
Modifiers for :
id="1" 0...none (mustn't be in a file)
1..."normal" text
2...a picture
3...tabulator
4...a variable
5...a footnote
pos="0" position in the blabla stream
(of course 0-based :)
len="5" length of the "run of text" which is
formatted using this format (Note: is not
stored for picture-fromats (id="2")
a typical text FORMAT
used to store the text-color
guess :)
once again...
50 - normal, 75 - bold,...
guess :) 0/1
0...normal
1...sub
2...super
Note: KWord stores only a link to the real
picture! (This should/will change as soon
as we use the new storage spec and the new
Image Container class)
Note: KWord stores a 0-char at the position
of the image!
Note: KWord stores a 0-char at this pos.!
Note: There is no additional info.
a variable (e.g. page number, date,...)
Note: KWord stores a 0-char at this pos.!
0...date (fix)
1...date (variable)
2...time (fix)
3...time (variable)
4...page number
location of the variable
(should be easy to "decode")
just guess :)
this is like the id="1" part
for formatting the variable's
text
a footnote. Note: Once again a 0-char is
stored at this position.
[???]
[???]
guess
name (reference)
to describe the formatting
, KWord supports "Styles" and it stores them at the
end of the file (STYLES). To use them for each
paragraph it has to store them there, too. The
reason for that is that you can change the style
of one single paragraph without changing the whole
document's style the new paragraph-style is based
on.
Modifiers for :
none
, Name of the Style this paragraph style is based
on
Modifiers for :
value="Standard" The name :)
, Name of the style of the following paragraph
Modifiers for :
name="Standard" Once again - the name
, The alignment of this paragraph.
Modifiers for :
value="0" 0...left
1...right
2...center
3...block
, Distance to the last paragraph
Modifiers for :
pt="0"
mm="0"
inch="0"
, Distance to the next paragraph
Modifiers for :
pt="0"
mm="0"
inch="0"
Indent of the first Line
Left indent
Spacing between the lines of the p.
, Describes the counter-type used for this p.
Modifiers for :
type="0" 0...none
1...arabic numbers (e.g. "1")
2...low letter (e.g. "a")
3...captial letter (e.g. "A")
4...low letters roman number (e.g. "ii")
5...cap. letters roman number (e.g. "IX")
6...bullet (e.g. "-")
depth="0" 0..."1", 1..."1.1" 2..."1.1.1",...
bullet="176" bullet character
start="1" value to start with
numberingtype="1" 0...list
1...chapter (added to doc-structure)
lefttext="" special text left to counter
righttext="" special text right to counter (e.g. ".")
bulletfont="times" font for bullet-char
All the borders :)
Modifiers for <*BORDER>:
red, green, blue color
style="0" linestyle
0...solid
1...dash
2...dot
3...dash-dot-dash-dot-...
4...dash-dot-dot-dash-dot-dot-...
width="0" width (in pt)
The format of this "style" (see )
Defines a tabulator at the specific position
Modifiers for :
type="0" 0...left
1...center
2...right
3...decimal point
The are a region where the different style types are defined. The
tag
there can be and .