I don't weigh in too often on these threads, but I'd like to make a few
points (feel free to flame :) ).

In general, metadata based filesystems are not terribly user-friendly,
worse, adding a database into the mix can produce a disaster.

First and most important point:

- Metadata does not scale well. It's like having a desk in an office and
filling it with pieces of paper. It works fine if you have a limited
quantity of pieces of paper, but for business and for regular home use over
a period of time, you do need organisations. In the real world, this is in
the form Room->Filing Cabinet/Box->Folder->Document. It's a normal
heirarchical file system. That's not to say that a relational look-up can't
help with a few things, but is can't be used for everything.

Critical considerations:

- Metadata searching is very, very expensive in processing terms and worse
for larger document sets. Moreover, you limit yourself by I/O abilities.

- Using a database for metadata means that you have to have the service
running constantly in the background. Security considerations should be
inserted here. Moreover, it is taking up cycles and a very significant chunk
of memory. The more files, the worse the footprint. SQL dbs are huge,
really, especially when you add the interpretation stage for SQL on top of
the myriad of different functions they have to perform. You could write a
completely fresh, light-weight DB for this, if you liked, but it is hardly
ideal, and we'd spend two or three years bug and security-fixing the thing.

- Namespace collisions. Every run across a problem where you had to work
with documents with the same name, one for one purpose and one with a few
changes for another, in another folder (clearly marked as the other purpose,
hopefully!)? The system will have to display many results and offer a clear
indication to the user where the file was contained. Moreover, often users
will identify files based upon other files in the same location. Metadata
can compensate for this, but it isn't easy.

- The user or applications must specify metadata for different files
constantly and consistently. We can't get MIME types to always work
effectively, so how do we get all applications to play nicely with this?

- Users are used to thinking in a heirarchical manner. They don't especially
want to Google for their data and then have to manually sift through the
results for the right one.

With all of that having been said (and there are ther arguments), I do agree
that metadata can be valuable for augmenting existing search facilities. In
order to do this effectively and within a reasonable time, however, it needs
to be done at the filesystem level, not by a high-level system embedded in
KDE. ReiserFS 4 should allow us to do these things with relatively little
system slowdown, which is why I look forward to it so avidly. Smarter
filesystems are the key, really, but not SQL database-based FSs.

I have been thinking that someone might float an idea like this for a while
now, and it shouldn't surprise me that the Gnome folks decided to blink
first and give it a go. It was evident as soon as Microsoft mooted it as a
feature in Longhorn, and it was also evident that the problem was
non-trivial and that there were a large number of very critical issues.

The point most worth making, however, is that Windows is stuck at a point
where they seemingly can't develop further in application or DE
usability/functionality and they're going for frills. To be frank, though,
most users don't have a difficult time finding their files (assuming they
understand the concepts of directories, however). They spend 99% of their
time sitting staring at an application trying to do work of some kind. I
believe that there is significant innovation taking place in other aspects
of the KDE project and applications, and that efforts are better served
focusing there. When Reiser 4 turns up, this stuff will be trivial to
implement, but until then, it's duplication of that effort.

To be honest, there are a number of areas where genuine progress can be made
to make a real difference, almost immediately. Merging of instant messaging
with PIM components is a start. Some sort of development to the KDE Notes
system to assign notes to any document in any application and when the same
program opens the same document again, they get reopened would be a great
improvement to workflow.

If you really want to try something radical, implement this idea I floated
with the Slicker folks (they seem to be dead atm):

http://www.linuxcomment.com/slicker.htm

KDE is already very strong in network transparency, but adding in
collections of documents, contacts, messages, services/web services and
other data on a project basis would be a powerful productivity tool.

At the moment, metadata searching really isn't ready for primetime, and the
benifits are dubious. Don't let me stop you, but beware! :)

-Luke

----- Original Message ----- 
From: "Stefano Borini" <munehiro@ferrara.linux.it>
To: <kde-devel@kde.org>
Sent: Sunday, September 07, 2003 12:06 PM
Subject: Re: Storage implementation in KDE


> On Sat, Sep 06, 2003 at 12:41:13PM -0700, David wrote:
> > Someone has to categorize these files.
>
> the user, of course, but also some metadata already present in the
> inserted file, such as the id3 for mp3 or the text itself for documents
> of whatever nature.
>
> > If the user isn't already categorizing their files, a system that
> > expects them to do so every time a file is created or aquired isn't
> > going to help.
>
> let the user insert keywords instead of filenames. Searching files in
> this filesystem should become similar to searching with a search engine.
> If there are too many matches, the ioslave should categorize them with
> the other, unselected keywords, or by alphabetical index. i.e. i search
> "love songs" and the system finds 600 files. Then it displays on the
> screen "love songs (A)" "... (B)" etc... or better "love songs from
> Artist1" "... from Artist2" etc...
>
> About the keyword, they should be left totally arbitrary, in the way
> similar to ldap. The filename is not a name, but a bunch of metadata
> informations.
>
> Problems arise to give the user the ability to understand that this
> filesystem and the traditional filesystem (which is needed, at least at
> the system level) are different and behave in a very different way.
>
> > I think a better way to approach this is mapping a filesystem API on top
> > of a traditional database.A dbfs in other words. The latter has the
> > advantage of categorizing your mp3 collection without being intrusive.
> > It could also be compatible with storage, should that be necessary.
>
> i'm strongly for this approach, and since i have a deep study of
> postgres features i propose myself as a contributor if help is needed.
>
> > (heck, what we really need are OS/2 HPFS metadata attributes).
>
> i have no infos about this, since i've never experienced OS2. can you
> provide some link ?
>
>
>
>
> >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to
unsubscribe <<

 
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<