'Re: [tux2-dev] tux2 patches'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       tux2-dev
Subject:    Re: [tux2-dev] tux2 patches
From:       <kinkie () kame ! usr ! dsi ! unimi ! it>
Date:       2001-07-10 20:07:03
[Download RAW message or body]

On Tue, 10 Jul 2001, Daniel Phillips wrote:

> N.B., this is getting pretty far from Tux2 issues and more towards a
> defense of the basic Unix way of looking at files... so if that kind of
> thing gets you excited then read on.

Hehe :)

> On Monday 09 July 2001 22:57, kinkie@kame.usr.dsi.unimi.it wrote:
> > On Mon, 9 Jul 2001, Ian Wells wrote:
> > > > So to look at the various points:
> > > >
> > > >   - Permissions - checked in the vfs, not the filesystem
> > > >   - Directories - not really a problem any more, see my directory
> > > >     index work.
> > > >   - Filenames - why not, if its fast.
> > > >   - Open by inode number - I can see why that might be useful.
> > > >     But what is the userland interface for that?
> > >
> > > In the squid case, taking the points above, we're basically saying
> > > that squid doesn't need protection or file naming (or, to add the
> > > other points that a kernel filesystem offers, multiple process
> > > access or executable files).
> >
> > Correct.
> > Squid has its own internal mappings url-to-filename. Filenames in
> > squid are named 000000001 to FFFFFFFF (I might have the number of
> > digits wrong, but you get the idea)
>
> This gives 2^32 possible names.  I don't see how you would resolve this
> without using a hash table or similar mapping.  In other words, it
> walks and talks like a name lookup, why not use the mechanism that
> exists.

Squid has an index of all the in-cache objects, in an hash table. I don't
think it's going to change soon (there has been an experimental code,
named "butterfly") which avoided this by hashing and looking up on disk,
but it was more-or-less discarded because seen as non-optimal.

> > And you're right, it doesn't need
> > multiple process access (in fact, if it was doing it, it would be
> > fatal for the squid cache coherency algorithms),
>
> So how are you going to design SMP squid?

It is not planned, except for very limited purposes. For instance, squid
right now can use threads or pipe-driven helper processes to access the
disk (thus avoiding blocking operations). Too much in squid is
nonreentrant to change this. Besides, it is entirely likely that since
Squid is extremely memory and IO driven lock contentions and management
would be killer (right now one of the biggest CPU hogs in squid is poll
fdset maintainance). This to say, Squid isn't going SMP any time soon if
ever, except for very limited tasks (now the list includes: disk I/O, user
authentication, redirection).

> > nor executable files.
>
> I'll give you that, but what does it cost to keep the bit around and
> never use it?

I'm not worrying about wasting one bit on disk, but on the (admittedly
few) cycles spent in checking it :)

> > If the fs itself doesn't protect the data, it needs to be
> > protected in upper layers, i.e. by mouning it in a tree not
> > accessible to the general public.
>
> There's no question about it, any filesystem in the Linux tree is going
> to have to play by the security rules.
>
> > > Sounds to me like it doesn't need a kernel filesystem at all, since
> > > we've just discarded the advantages granted a filesystem by being
> > > in a kernel.
>
> Um, the page cache?  Block layer?  LRU page replacement?  And the ones
> above, name resolution (sure the name is a 32 bit integer, it's about
> the same amount of work) and security (no I don't want to run Squid if

I'm not really sure about that.
Keep in mind that Squid's disk usage patterns are (or at least should be)
terrible for LRU-based caching systems. Hot objects (i.e. accessed very
quickly) are kept in RAM, so they are not fetched from the disk. The disk
has a very big working set, which is accessed very sparsely since hot
objects have already been served. In fact, it is entirely likely that
future releases of Squid will use O_DIRECT, effectively to kill the page
cache: squid has more metadata information, so it can cache more
effectively.

> somebody can use it to root my box).
>
> All that being said, I do think that the whole Squid cache mechanism
> could conceivably be moved into user space, but some of the mechanism
> for doing it isn't in place yet.  (AIO, zero-copy disk access, the
> first two I thought of.)  This would be a worthwhile experiment, though
> it's no one-day hack.

Squid 3.0 is being planned right now. It will be a very different beast
than Squid is now. It will have every kind of AIO you can think of, and it
will have a very efficient and modularized backend. Unfortunately it will
take lots of time to get it :)

> [...]
> > > I suppose this *is* a troll, of sorts, but it's worth remembering
> > > that adapting privileged code to suit non-privileged code is
> > > sometimes a case of the tail wagging the dog.
> >
> > It might very well be the case, and I don't consider this a troll at
> > all. You're perfectly right, but it is extremely unlikely that squid
> > will get its own transactional filesystem any time soon, or get as
> > low-level as to implement elevator algorithms or other
> > hardware-related techniques. Tux2 might very well improve squid's
> > performance already, the point in my request being, maybe we can
> > squeeze a few more percents if we "trim the fat".
>
> The big improvement will come from using my directory index patch.  I
> seriously doubt that it will go faster than Ext2 (+index).  I'll be
> happy if it doesn't measurably slow it down.  Oh, there may be some
> advantage to be gained by optimizing the placement of inodes, again we
> will have to wait and see.

Sure :) In fact, I can't wait.

> > NetCache's (owned
> > by Network Appliance) products stem from Harvest Cache (which is
> > squid's ancestor), and their software exploits and control ALL of the
> > system. For storage, they WAFL (or was it WAFS?) directly to the
> > disks.
>
> WAFL is just a filesystem.
>
> OK, here is my suggestion: go to your source tree and comment out all
> the security stuff and whatever other disposable parts you can find.
> Then make yourself a new system call that treats the first few name
> characters as a hex inode number.  Run some benchmark comparisons, post
> a table and end the speculation.  We are talking mini-micro hack here.

Mini-micro and totally out of my league.
Also, it is more of a benchmark thing (polygraph) which is out of
my possibilities (lack of time, it's a somewhat complex setup).
Since I can't show code, I'll just shut up and wait for a release of the
full filesystem, then we'll see :)

-- 
  /Kinkie

Se sulla scatola c'e` scritto "Per windows 95 e superiori", dovrebbe
funzionare sotto Linux, vero?

_______________________________________________
tux2-dev mailing list
tux2-dev@list.innominate.org
http://innominate.org/mailman/listinfo/tux2-dev

[prev in list] [next in list] [prev in thread] [next in thread]