'[sylpheed:25025] Re: plans for implementing new storage engines'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sylpheed
Subject:    [sylpheed:25025] Re: plans for implementing new storage engines
From:       Stefaan A Eeckels <Stefaan.Eeckels () ecc ! lu>
Date:       2005-05-30 23:01:32
Message-ID: 20050531010132.07d60058.Stefaan.Eeckels () ecc ! lu
[Download RAW message or body]

On Tue, 31 May 2005 02:19:53 +0400
Antony Dovgal <tony2001@phpclub.net> wrote:

> On Mon, 30 May 2005 23:25:28 +0200
> Stefaan A Eeckels <Stefaan.Eeckels@ecc.lu> wrote:
> 
> Well, I'm almost sure if I try on a workstation with a bit more of RAM and 
> a bit faster disks it would perform faster.
> But that's still annoying.

It's the _difference_ that's puzzling. In my case there is
no observable difference bewteen opening a folder without
new messages, and one with them. In your case, new messages
cause a significant slowdown.

> > That's a particularly slow beast. MySQL is also amazingly slow
> > once you try to join two tables with 10,000+ records, etc. 
> 
> Try to use indexes for JOINs, you'll be surprised =)

I've real-world experience with MySQL, and indexes or not, it
still sucks rocks through two straws. But hey, it's cheap and
cheerful and it sure has it uses. I just wouldn't want to have
to install a database (any database) to run my MUA. 

> > If you go for a DB, use one of the non-SQL ones. 
> 
> FoxPro? =) Just kidding. 
> I see no reason to refuse SQL DBs just because *in theory* non-SQL DB 
> *may* be faster a bit.

No, but there's a better chance they're embeddable, and you're
not having to hassle with ODBC.  Berkeley DB runs on all Unix
platforms.

> That's exactly what happens when one uses some strange binary DB format.

Whem push comes to shove, all DBs use files in strange binary
DB formats.

> Right. But there are several solutions to emulate fulltext search.

Each of which is seriously more complex than grepping through
a bunch of files.

> > Other problems that come to mind are:
> > - storing huge attachments
> 
> Well, we don't have to store them in the DB at all.
> I mean the files themselves.

No, but then you have a two-tiered storage system; that adds
complexity and the risk to have a synchronisation problem between
the file system files and the database entries. 
 
> Just put the file in some directory (using an unique name) and write to the DB:
> a) original name
> b) CRC/MD5
> c) generated name
> d) unique id of the message

That looks like the sylpheed cache file to me :-)
> 
> > - checking signatures (you need the unmodified message to do that)
> 
> C'mon. That's easy too..

It should be; but one byte more or less and your signature is
off by miles. Checking the signature when the messages arrive
opens another can of worms. 

> > - reconstituting the message for display
> 
> Full headers + message text = original message.
> Attachments are not so important, but it's still easy to implement.

Attachments are hugely important to me (and I guess quite a few
others too). Having access to the complete message is equally 
important. Plus, it makes the code more complex and slower. I'd
rather wait a few seconds when I activate a folder than waiting
half a second when quickly reading through messages.

> > I can see lovely SQL to handle moving a message from one folder to
> > another when there are different tables for the attachments of
> > each folder :-)
> 
> Hm. What's the problem here ?
> Creating a global attachment list with links to the messages solves it in a second. 

Ah - but I responded to your suggestion to have a "database" 
per folder. 

Come to think of it, MIME messages do not have an identifiable
"body"; they're just a sequence of attachments (body parts in
X400 parlance). Would you store these as individual files, in
their encoded format (required for signature purposes) or not?
How to handle encrypted messages? There's more than meets the eye.

> > The message will have been written already through the filtering
> > process, so the time is spent either in scanning the directory 
> > for new messages, or re-creating the cache. 
> 
> The latter seems to be the most possible to me.

The cache doesn't get recreated when new messages are added
(just checked the code), the new messages get added and the
cache is marked dirty so it's written out when leaving the
folder. 

> > It might be that I don't notice a difference thanks to the Solaris
> > DNLC (Directory Name Lookup Cache).
> 
> No idea. It's quite possible that Solarises work with 
> their filesystem in totally different way than other OSes. 

Linux has quite a few caches surrounding file activity. Whether
the result matches Solaris UFS I don't know. 

I'll post information on folder scan times a bit later. Got a couple
of projects to finish first.

-- 
Stefaan
-- 
As complexity rises, precise statements lose meaning,
and meaningful statements lose precision. -- Lotfi Zadeh 

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic