[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-usability
Subject:    Re: Easier Searching in KDE
From:       Manuel Amador <rudd-o () amautacorp ! com>
Date:       2004-06-09 18:38:00
Message-ID: 1086806279.30180.5.camel () localhost ! localdomain
[Download RAW message or body]

[Attachment #2 (multipart/signed)]


El lun, 07-06-2004 a las 18:11, Gustavo Sverzut Barbieri escribió:

> > I really don't like the way the windows search works: It generates results
> > slowly, and you don't really have an idea of how much is left to search
> > over. I want results right away, like Google. I want the results catalogued
> > and listed by relevance and user preferences.
> 
> Problem with google caching is:
> 	- we don't have a cluster to be frequently updating indexes without impact 
> the computer usage
> 	- lost of sync between the cache and the real system. It really matter when 
> you come to messing with your home dir, when you may create/delete/move files 
> really fast.

I have done some calculations with some code, and, assuming you have a
file indexing daemon which uses a notification system and extracts
metadata (and a small portion of data, say, for a full text indexing
database), you'd be looking at one or two seconds of indexing per file,
at a 20 niceness level.

That ain't slow.  Coupled with a register/notification
(observer/observable pattern for queries) system for a hypothetical
search daemon, you could add a word "Casaperrogato" to one of your
OpenOffice documents, and see it appear in a search window in approx. 10
seconds.

That's much more immediate than Windows search, and it is possible.

For the problem of moving files, ideally you'd need to store an MD5 hash
of each file as a primary key in the index, so when files get moved
around they'd be recognized instead of reindexed.  Alternatively (or
primarily) you could (in the indexing daemon) associate an Extended
Attribute named, say, "File ID" which would act like an object index so
the search daemon would recognize it everywhere on your FS (even NFS),
regardless of file contents.

This could actually spawn a completely new filing paradigm, based on
objects and categories/contents, instead of subdirectories.

> 
> 
> > > 		- cache previous results, maybe they're used again soon since users
> > > often refine their search
> >
> > Unfortunately, this doesn't help the first search. The first search is
> > always more important than subsequent searches, in my opinion.
> 
> I agree... but it's better than having it slow everytime.
> 
> 
> > > 		- change kio_slaves to update a db everytime a file is modified...
> > > with that we can have something fast for user and better sync'ed than
> > > slocate. The problem is other apps, like gnome or openoffice.
> >
> > This may slow down the speed at which updates happen.
> 
> Do you think so?
> I'm saying that when you save your files or delete then, the index must be 
> updated. But that just help kde apps.
> 
> 
> > > 	All of them have problems. The real problem I see is with home dir...
> > > it's the part of the system that changes most and in short periods of
> > > time, probably the user changes and then search... that common case makes
> > > life difficult and should be optimized.
> >
> > I have been doing some work with Materialized Views in PostgreSQL. Here is
> > what I think will work with KDE.
> >
> > Google type search. Everything on disk is indexed. With an 80GB hard drive,
> > it's not a problem to have everything indexed in multiple ways. The search
> > should take a couple of seconds, max. Let the OS worry about in-memory
> > caching, etc... If there isn't enough room for the index, then we should
> > provide a weaker system like Windows where the search is done in real time,
> > only the most important files are indexed, and recent search results are
> > stored.
> 
> Fair.
> 
> 
> > File Importance. Files in the home directory, files modified or viewed
> > frequently by the user, files in the favorites list, etc, are more
> > important than system files or log files or cache files. They should be
> > listed first and indexed first.
> >
> > Creating the index is the problem. A backend process should be constantly
> > running at a low priority. Initially, it indexes all the files. Then, it
> > begins to index files as they change. It always keep a fresh index of the
> > most important files, and gets around to less important files when it has
> > the time.
> 
> That doesn't work.
> 	- a process accessing the disk everytime will screw up with OS disc caching.
> 	- IMHO, one may want to find the last file he/she accessed/saved/created 
> other than anything else. So these will be the most important... and probably 
> will not be indexed.
> 
> 
> > When a search is made, it is run completely against the index. Sure, the
> > index may be out-of-date, but the most important files will be indexed
> > almost as soon as they are modified, and the least important files may
> > never get indexed, so the results are usually relevant.
> >
> > Metadata is critical. How do we accumulate the meta data? How is it stored?
> > I don't have the slightest in this department. Already, we have
> > permissions, location, access/modification times, and MIME type that give
> > us good meta data. But I would like additional meta-data specified by the
> > user, like "project" or "author" or "comments". Importance is another piece
> > of meta-data that is useful.
> >
> > I don't think a real database (PostgreSQL or MySQL) will be appropriate for
> > the database part of the tool. The requirements for the index cache are
> > very different than what PostgreSQL provides. I do believe that we can
> > steal a lot of ideas from the database community on how to search vast
> > indexes efficiently. Perhaps early implementations will rely on a database,
> > but that should be temporary.
> 
> I also don't think so.
-- 
	Manuel Amador
	Jefe de I+D                         +593 (9) 847-7372
	Amauta                     http://www.amautacorp.com/
	GNU Privacy Guard key ID: 0xC1033CAD at keyserver.net

["signature.asc" (application/pgp-signature)]

_______________________________________________
kde-usability mailing list
kde-usability@kde.org
https://mail.kde.org/mailman/listinfo/kde-usability


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic