'HTML Output for LyX'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lyx-devel
Subject:    HTML Output for LyX
From:       Richard Heck <rgheck () bobjweil ! com>
Date:       2009-04-30 14:13:35
Message-ID: 49F9B20F.4030609 () bobjweil ! com
[Download RAW message or body]


Hi, Alex. This is going to seem critical, but it is going to end up 
being constructive. See below.

>> What's more problematic, to my mind, is that the framework isn't extensible.
>> Yes, I know that the programmer can add support for new layout types, etc,
>> but as things now stand even simple layouts like Theorem aren't supported.
>> Yes, those could be added. But LyX itself is extensible, as LaTeX is, and
>> that's a crucial thing about it. If I define some new character styles, or
>> new layouts of some other sort, then they're not handled and *can't* be
>> handled without making eLyXer aware of what the LaTeX commands that define
>> these styles mean. But then we're back to writing a LaTeX to HTML converter.
>> Similarly, I think it will be a challenge to provide proper support for math
>> macros, let alone for things people do using ERT.
>>     
>
> The framework is extensible in several respects. The first one is of
> course with code. But a much better way only requires CSS. Unknown
> layouts do not generate errors, but new HTML <div> classes. Adding
> support in the CSS is trivial. 
>
>   
This isn't quite right. You're assuming that whatever needs to be done 
to render the new environment can be done in CSS. Even for theorem 
environments, this is non-trivial. I suppose you can use advanced stuff, 
like the content tag and counters, but browser support for these is 
still in its infancy.

And that is a very simple case. Just consider the Endnotes module.

> Similarly, unknown insets generate
> <span> tags with a characteristic span. As to new macros, LaTeX
> commands and ERT, they are not supported at this point.
>
>   
Right. Which means that people who want to use eLyXer for output are 
restricted to a subset of LyX's functionality. And they are going to 
continue to be, since there is no way of solving this problem short of 
writing or borrowing a LaTeX to HTML converter. That is why I don't 
support including it within LyX. Anything that is part of LyX ought to 
support LyX's full functionality, or at least something close to it.

> If you would be so kind as to send me a sample I would be delighted to
> help make theorems work.
>
>   
See below again. But you can easily to create a LyX document with some 
theorem environments, if you want to continue with this approach.

>> Similarly, at present, so far as I can tell, there is no support for BibTeX.
>> Is that right? In my conversions, I get little raised numbers, but they
>> don't link to anything.
>>
>>     
> BibTex is not working at the moment. Again, a sample would be appreciated.
>
>   
You can easily create a LyX document with some BibTeX. And if you want 
to work on this, then you can probably use the python-bibtex package to 
parse the files. Figuring out how the bibliography is supposed to be 
rendered will be more difficult, though maybe there's not so much of a 
need to render it precisely as BibTeX would. Or maybe you could 
(optionally?) use the bbl file. But see below again.

>> Also, crossrefs appear as little arrows, which is nice, except that you
>> don't get the corresponding text, which makes things like "In [arrow], we
>> will discuss..." hard to read.
>>     
>
> A single-pass converter cannot possibly output the actual number of
> the linked section, or the text -- it will first find the reference
> and a few kb later (once that part of the document is parsed) the
> label. A second pass would be required to make labels work properly,
> and the second pass is still in the works. But this should work,
> eventually.
>
>   
There are other problems, too. We have to keep track of which counters 
are "linked" and which ones are supposed to be reset when. You could 
maybe try (optionally?) using the aux file, which of course has already 
dealt with all of that. But see below again.

>> Let me just be clear about the nature of these criticisms. It may well be
>> that eLyXer will be a good program for use by people whose LyX files are
>> very basic, don't use much math, don't define custom styles, and the like.
>> If so, then so; there are plenty of people for whom that is plenty good
>> enough. But if this program is going to be included in LyX, then, in my
>> opinion, it needs to handle the LyX format, pretty much without exception.
>> Yes, there may be some special cases it doesn't quite handle, but, mostly,
>> it should do what LyX does, and it absolutely needs to handle math cleanly.
>> Right now, and with all due respect, it doesn't come close.
>>     
>
> Fair enough. My views are much more simplistic: LyX currently doesn't
> do what 99.9% of users need, which is output to different formats
> including HTML and/or something importable from within Word. 
>
>   
LyX will always output to plain text, and that's readily importable in 
Word. If one wants to output to a format that preserves a good bit of 
the formatting, then latex2rtf does a fine job, so long as you don't 
have too much math, etc. (I've used that for collaboration myself, so I 
know.) Properly configured, which is apparently a challenge on some 
operating systems, htlatex will do excellent conversion both to HTML and 
to ODT, and plastex does a very good job converting to HTML, though with 
some limits, including the fact that all the math is little pictures 
(though it does handle cross-references and BibTeX nicely). So there are 
lots of options. None of that means the world can't use a better 
mousetrap. See below.

> Thus many, many people don't know about an otherwise wonderful editor. The
> needs of these users (and of the majority of current users IMHO) would
> be adequately served by an HTML export tool which generates anything
> that does not look like garbage. Nobody is volunteering to write a
> tool that does what you want, so LyX could very well use what eLyXer
> offers. But perhaps as you say integration is not such a good idea.
> Separate packaging and distribution might be an enabler for people,
> especially if the most popular versions (Windows, Debian) do a joint
> distribution.
>
>   
See below.

>> My own view, for what it's worth, is that there is no stable way to go here
>> except to have LyX output the LaTeX, which it does well, and then convert
>> that. Otherwise, I don't see how you will ever get proper handling of math
>> macros, character styles, new layouts, and the like. And if I were going to
>> work on that, then I'd work on plastex, which seems to me generally to do a
>> pretty good job. At the very least, one can use the plastex parser and write
>> a new output routine.
>>
>>     
> Good luck with that. The problem is orders of magnitude harder than a
> LyX to HTML converter. 
>
>   
That depends how much of LyX's source you care to convert. If you want 
to handle custom styles, then the problem is the same. Which was my point.

So the question is: What do we have to do if we're going to get really 
good HTML output for more than fairly simple LyX files, let alone for 
LyX's full functionality? I think there is now fairly widespread 
agreement that the answer is: You have to do it within LyX itself, i.e., 
in the C++ source, where we actually have access to the information we 
need. And once you start to think in those terms, then I think it 
becomes completely obvious that this is the way to go. If HTML is an 
output format, then the layout files themselves can contain appropriate 
information about how custom styles should be output as HTML, and indeed 
even about how standard insets should be output. If Footnote gets output 
as a span, then that can be configured in the InsetLayout for Footnote 
and even overridden by the user. E.g.:
    InsetLayout Footnote
    ...
       HTMLType       BeginEnd
       HTMLBegin      <span class="footnote">
       HTMLEnd         </span>
       HTMLPreamble
            .footnote { ... }
       EndPreamble
    ...
    EndLayout
or something along those lines. Similarly:
    Style Section
    ...
       HTMLType       BeginEnd
       HTMLBegin      <h2>
       HTMLEnd         </h2>
    ...
    End
Which can of course vary for different classes. E.g., you might want it 
that way in a book, but <h1> in an article.

Getting something workable that does as much as eLyXer now does would be 
pretty easy, because we already have access to the complete structure of 
the document. Lots of the output code could almost be cut and paste from 
the other output routines. The challenge will be to get good rendering 
of the math. Addressing other issues, like file splitting, would take 
some work, but not too much. Note that we can even get a good TOC this 
way. Dealing with cross-references and BibTeX becomes easy, too, because 
we have all the information we need ready to hand. (Of course, there 
will be issues, but you get my point.)

Alex, do you know C++? I'd be happy to help with this, once exams are over.

rh

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic