'Re: Adding improved file search to KDE'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: Adding improved file search to KDE
From:       Wolfgang =?iso-8859-1?q?M=FCller?= <Wolfgang.Mueller2 () uni-bayreuth ! de>
Date:       2004-06-16 7:24:25
Message-ID: 200406160724.25816.wolfgang.mueller2 () uni-bayreuth ! de
[Download RAW message or body]

Last things first:

> > Any thoughts?

Sounds a lot like the wishlist of the fer-de-lance project that did not take 
off so far and which brought to you kmrml ;-) (i.e. Carsten did).
I have thought quite a lot about what I'd do in your place ;-) as I wanted to 
be part of that if fer-de-lance took off. However, my job has taken me 
elsewhere. In any case, I would like to take part in your discussion.

In fact, context based search (e.g. 
http://www.bradleyrhodes.com/Papers/remembrance.html) is quite an ambitious 
area of research. Content Based Multimedia Retrieval, too :-) . So what 
you're suggesting is already heard of and hard to do, however, if kde did 
this, it would be a Really Kool Thing (TM).

> > Now for talk of today. Right now, the only tool that does any file
> > indexing is "locate". It does a terrible job at that. It only indexes
> > file names. It also doesn't update but once a night.

I think there are quite some :-) , however, their integration is really bad.

> > Indexes will be maintained by a PostgreSQL database instance. At first,
> > we will index only the files in the user's home directory. Then we can
> > branch out and index every file everywhere. How do we keep an accurate
> > and timely index? There are two ways:

In fact, PostgreSQL might be a good place to start off, and useful for 
metadata (i.e. a few keywords) however, if you want to do fast full-text or 
content-based indexing, I think it would be good to write your own server. 
Maybe this idea gives you the pip, but things will be much faster, and they 
will be (tendentially) easier to install.

You don't need transactions. You don't want all ACID, you want an index 
structure that works, that's all. Things will be much faster if you can read 
the necessary data sequentially without having to do a seek after every block 
read.

[Anyone who wants to build on, generalize or reimplement the GNU Image Finding 
Tool to this end is very welcome to contact me and/or 
mailto:help-gift@gnu.org]. 

> > (1) We get notified when a file is modified. There is no way to do this
> > at the kernel level (yet!) but the KDE API allows us to put our own hooks
> > in. Most applications will use the KDE API to load and store files.

I think this is a good idea. It would be a great first step if KDE forwarded 
every document created by a KDE-app to some indexer. The indexer could then 
have some of your great KDE-style plugin architectures to accommodate diverse 
data/media types and diverse types of meta data.

What KDE/Qt might think of is adding context information to the whole GUI. 
Knowing which application saved which file is only the first step. It might 
be interesting which other files where open, which operations a user wants to 
perform etc..

> > (2) We browse the filesystem, keeping resource usage to a minimum,
> > looking for modified files.

Yes, this could be a chron job.

I hope you'll be successful.

Cheers,
Wolfgang

>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
[prev in list] [next in list] [prev in thread] [next in thread]