'Re: [WikiEN-l] Templates/taxoboxes, or: why a converter isn't a parser'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wikien-l
Subject:    Re: [WikiEN-l] Templates/taxoboxes, or: why a converter isn't a parser
From:       Ævar_Arnfjörð_Bjarmason <avarab () gmail ! com>
Date:       2004-08-30 21:05:38
Message-ID: 51dd1af804083014057e2c3373 () mail ! gmail ! com
[Download RAW message or body]

I already got it in your reply here:

Maybe you are underestimating the vast differences in implementation
between the current not-really-a-parser and what I am working on.

There is nothing wrong with using a group of templates together, but
there *is* something majorly wrong with patching together one object (a
table, in this case) using pieces from different places. It works with
the current not-really-a-parser because it takes the wiki source texts
from the templates, sticks them together somehow, and then converts them
to HTML. This kind of practice is exactly what leads to all the problems
with our current not-really-a-parser. A proper parser should parse each
template individually, and then use its parse tree in the processing of
the page that uses it.

It's great that you're working on a different way to do it thats not
just dumb-text-includes.

On Mon, 30 Aug 2004 16:55:01 +0100, Timwi <timwi@gmx.net> wrote:
> Ævar Arnfjörð Bjarmason wrote:
> 
> > Why would it ever break? I can see it getting slow because it cannot
> > be optimized but not breaking, all it's doing is just including one
> > thing after the other
> >
> > {{a}} gets Template:A which contains "foo" and {{b}} gets Template:B
> > which contains "bar" hence
> >
> > {{a}}{{b}} = foobar
> 
> Of course, this simple example would still work. But picture this:
> 
> Template:A contains:         I ''li
> Template:B contains:         ke'' hamburgers
> 
> currently, {{a}}{{b}} would yield "I <em>like</em> hamburgers", but only
> because it sticks the pieces together and then tries to make sense of it.
> 
> Why is this bad? Picture this:
> 
> Template:A contains:
>         {|
>         | nowrap
> Template:B contains:
>         | Text
>         |}
> 
> Is the "nowrap" a table cell attribute or text in a separate cell? Does
> this change depending on whether there is a newline after "nowrap"? ...
> And this is just a simple example.
> 
> > Why would this break in whatever parser you plan to implement?
> 
> Because a parser is not a converter. The current not-really-a-parser is
> actually a converter: It looks out for particular syntax elements like
> ''these'' and turns them into <em>HTML tags</em>. This is bad because it
> means that several of these conversions can interfere with each other:
> 
>         I ''like [[hamburger|hamburgers'']]
> 
> produces invalid HTML. It gets even worse when it tries to locate
> {{template inclusions}} and replaces them with some other text, not
> knowing what it is or how it fits into the document structure.
> 
> A real parser analyses the document's structure. It turns the wiki text
> into a data structure in memory that actually bears resemblance to the
> structure of the document. It creates a "heading" element where there is
> a heading, instead of turning some strategically-placed equals signs
> into <h#> tags.
> 
> > The only reason i can see why that would happen is if you were to
> > implement some auto-completion of the table syntax. Sort of like
> > tidy(html) for wikisyntax and do it before things get fetched from
> > Template: rather than after everything has been included.
> 
> Your terminology "auto-completion" reveals that you are thinking in
> terms of conversion. Don't think of it as auto-completion; for example,
> if a '' has no matching '', I can tell the parser what to do
> independently of what it does when there *is* a matching ''. There are
> several possibilities: make an italics element (what you would probably
> call auto-completion); make a text element (i.e. pretend the "''" was
> actually text); or bail out saying "syntax error". Of course, we don't
> want the latter. My parser currently does the second: It turns the ''
> into text. I did that because this is also how the current
> not-really-a-parser functions. However, I can easily change that.
> 
> In our specific case, there would be a document (a template) that has a
> {| with no matching |}. What should it do? Unfortunately, none of the
> three options make it work the way you have come to expect from the
> current not-really-a-parser.
> 
> Timwi
> 
> _______________________________________________
> WikiEN-l mailing list
> WikiEN-l@Wikipedia.org
> http://mail.wikipedia.org/mailman/listinfo/wikien-l
>
_______________________________________________
WikiEN-l mailing list
WikiEN-l@Wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wikien-l

[prev in list] [next in list] [prev in thread] [next in thread]