'Re: Ideas'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hypermail
Subject:    Re: Ideas
From:       "Byron C. Darrah" <bdarr () sse ! FU ! HAC ! COM>
Date:       1998-05-06 1:45:19
[Download RAW message or body]


> Date: Tue, 5 May 1998 17:40:13 -0700 (PDT)
> From: Jared Reisinger <feety@hhhh.org>
> 
> On Mon, 4 May 1998, Byron C. Darrah wrote:
> 
> > I don't know if I understand this right: Does this suggest replacing
> > 0000.html, 0001.html, etc with a single file?  If so, then I think that
> > might be a mistake.  It would require a CGI program to extract each page on
> > demand and turn it into HTML.  Thus, the extraction/HTML conversion would
> > be done each time each message is read, instead of the current way which is
> > just once per message.  It would mean a much higher system load for busy
> > archives.
> 
> I agree that *busy* archives probably want to minimize request-time
> processing.  But not-so-busy archives, or archives on beefy servers may
> not have to worry about it quite as much.
> 
> Does anyone have any sense of how much overhead would be involved for
> on-the-fly generated pages using something like Hypermail's mail-to-HTML
> engine?

I admit the overhead might not be too bad if the message database is
designed well.  For one thing, the structure of such a system would need to
distinguish message bodies from attachments, so that tons of unnecessary
disk access is not done each time a message which happens to have a large
attachment(s) is read.  If that is done, and the messages in the archive
aren't too big themselves (mail message bodies are usually pretty small),
then on-the-fly retrieval/conversion could be very efficient - not much
more overhead than if the messages were put into an "ar" archive and
extracted on the fly.

However, is there really any good reason change over to this model?  I
don't really see any benefit, but I do see a couple of other disadvantages,
in addition to the overhead that I mentioned before:


1.  It would be less efficient to index such an archive with a standard web
search engine.  Most web search engines have a feature that lets you index
files that reside on a local disk system with great efficiency using local
disk accesses.  However, with the monolithic system, any index/search
program would have to access the entire message base one message at a time
through the proposed on-the-fly hypermail CGI converter over a network
connection.

2.  Someday in the future, it would sure be nice to add an administrative
tool that lets a site administrator delete / edit individual messages in a
hypermail archive.  Like (probably) most of you, I currently archive my
hypermail messages redundantly in an (editable) mbox and rebuild the
archive every time I need to do any such maintenance.  But it would sure be
great to do spot changes without having to rebuild.

If those messages are all bunched up together in a single file, this
feature will be much harder to implement, and somewhat inefficient -- it
will either have to leave "deleted" messages in the file (but marked off
with a "tombstone"), or it will have to rebuild the whole file and index
after deletions are done.  And editing a message will require an even more
sophisticated mechanism, since it could cause a message to grow in size.

> If the overhead is small enough compared to other stuff a web
> server has to do, it might be worth considering making the HTML generation
> into a request-time thing.  At the very least, it could be an option, so
> that the list administrator could decide whether or not to pre-generate
> the HTML.

Yes, I for one would appreciate at least keeping an option available for
not requiring on-the-fly processing.

> Part of the patch I sent to Kent was a change to make the remaining
> compile-time settings into config-file settings.  It would be even cooler
> if Hypermail (or a request-time Hypermail CGI utility) could support these
> options at request time.  This would allow *end-users* some control over
> how they view the archive.

This would be nice.  But it can be done just as easily and efficiently if
the messages reside in separate files, as compared to all stuck together in
a single file.

> Plus, it helps make a distinction between
> Hypermail's core data set (the mail archive and associated indexes) and
> the rendered output.
> 
>    -- Jared

Thanks for reading, it's just MHO,
--Byron Darrah

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic