'[kapidox] src/kapidox: Unbreak .qch output'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-commits
Subject:    [kapidox] src/kapidox: Unbreak .qch output
From:       Aurélien Gâteau <agateau () kde ! org>
Date:       2014-05-27 11:49:52
Message-ID: E1WpFtA-00060Z-TT () scm ! kde ! org
[Download RAW message or body]

Git commit 764e0ce21b368fc8b0f895c3f2d9802e2d7e29f7 by Aurélien Gâteau.
Committed on 27/05/2014 at 11:46.
Pushed by gateau into branch 'master'.

Unbreak .qch output

header.html and footer.html must generate valid HTML otherwise the content
of the .qch files looks very raw.

M  +2    -0    src/kapidox/data/footer.html
M  +8    -1    src/kapidox/data/header.html
M  +58   -27   src/kapidox/generator.py

http://commits.kde.org/kapidox/764e0ce21b368fc8b0f895c3f2d9802e2d7e29f7

diff --git a/src/kapidox/data/footer.html b/src/kapidox/data/footer.html
index e69de29..308b1d0 100644
--- a/src/kapidox/data/footer.html
+++ b/src/kapidox/data/footer.html
@@ -0,0 +1,2 @@
+</body>
+</html>
diff --git a/src/kapidox/data/header.html b/src/kapidox/data/header.html
index f913ef4..dec0650 100644
--- a/src/kapidox/data/header.html
+++ b/src/kapidox/data/header.html
@@ -1,6 +1,13 @@
+<!--
 projectname: $projectname
 title: $title
 doxygenversion: $doxygenversion
 datetime: $datetime
-----
+-->
+<html>
+<head>
+  <link rel="stylesheet" type="text/css" href="doxygen.css" />
+</head>
+<body>
+
 <div id="top"> <!-- Doxygen will close this div -->
diff --git a/src/kapidox/generator.py b/src/kapidox/generator.py
index d27397f..b8be6e2 100644
--- a/src/kapidox/generator.py
+++ b/src/kapidox/generator.py
@@ -303,37 +303,68 @@ def menu_items(htmldir, modulename):
             entries))
 
 
-def read_all(stream):
-    """Read all content of a stream, returns it as a string.
-
-    This should not be necessary: a plain stream.read() should be enough, but
-    there is a bug in Python < 2.7.7: if one opens a file with codecs.open(),
-    then read a line with readline(), then call read(), not all the content is
-    returned.
-    See http://bugs.python.org/issue8260
+def parse_dox_html(stream):
+    """Parse the HTML files produced by Doxygen, extract the key/value block we
+    add through header.html and return a dict ready for the Jinja template.
+
+    The HTML files produced by Doxygen with our custom header and footer files
+    look like this:
+
+        <!--
+        key1: value1
+        key2: value2
+        ...
+        -->
+        <html>
+        <head>
+        ...
+        </head>
+        <body>
+        ...
+        </body>
+        </html>
+
+
+    The parser fills the dict from the top key/value block, and add the content
+    of the body to the dict using the "content" key.
+
+    We do not use an XML parser because the HTML file might not be well-formed,
+    for example if the documentation contains raw HTML.
+
+    The key/value block is kept in a comment so that it does not appear in Qt
+    Compressed Help output, which is not postprocessed by ourself.
     """
-    chunks = []
-    while True:
-        chunk = stream.read()
-        if chunk:
-            chunks.append(chunk)
+    dct = {}
+    body = []
+
+    def parse_key_value_block(line):
+        if line == "<!--":
+            return parse_key_value_block
+        if line == "-->":
+            return skip_head
+        key, value = line.split(': ', 1)
+        dct[key] = value
+        return parse_key_value_block
+
+    def skip_head(line):
+        if line == "<body>":
+            return extract_body
         else:
-            break
-    return ''.join(chunks)
+            return skip_head
 
+    def extract_body(line):
+        if line == "</body>":
+            return None
+        body.append(line)
+        return extract_body
 
-def parse_dox_html(stream):
-    """Parse html produced by Doxygen, extract the header fields we add through
-    header.html and return a dict ready for the Jinja template"""
-    dct = {}
-    while True:
-        line = stream.readline().strip()
-        if line == '----': # Must match header.html
-            dct['content'] = read_all(stream)
-            return dct
-        else:
-            key, value = line.split(': ', 1)
-            dct[key] = value
+    parser = parse_key_value_block
+    while parser is not None:
+        line = stream.readline().rstrip()
+        parser = parser(line)
+
+    dct['content'] = '\n'.join(body)
+    return dct
 
 
 def postprocess_internal(htmldir, tmpl, mapping):

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic