From kde-usability Wed Jun 09 18:38:00 2004 From: Manuel Amador Date: Wed, 09 Jun 2004 18:38:00 +0000 To: kde-usability Subject: Re: Easier Searching in KDE Message-Id: <1086806279.30180.5.camel () localhost ! localdomain> X-MARC-Message: https://marc.info/?l=kde-usability&m=108681043602458 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--===============0820608578==" --===============0820608578== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-/i3tLz0ktmKtC0B1j0i7" --=-/i3tLz0ktmKtC0B1j0i7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable El lun, 07-06-2004 a las 18:11, Gustavo Sverzut Barbieri escribi=C3=B3: > > I really don't like the way the windows search works: It generates resu= lts > > slowly, and you don't really have an idea of how much is left to search > > over. I want results right away, like Google. I want the results catalo= gued > > and listed by relevance and user preferences. >=20 > Problem with google caching is: > - we don't have a cluster to be frequently updating indexes without impa= ct=20 > the computer usage > - lost of sync between the cache and the real system. It really matter w= hen=20 > you come to messing with your home dir, when you may create/delete/move f= iles=20 > really fast. I have done some calculations with some code, and, assuming you have a file indexing daemon which uses a notification system and extracts metadata (and a small portion of data, say, for a full text indexing database), you'd be looking at one or two seconds of indexing per file, at a 20 niceness level. That ain't slow. Coupled with a register/notification (observer/observable pattern for queries) system for a hypothetical search daemon, you could add a word "Casaperrogato" to one of your OpenOffice documents, and see it appear in a search window in approx. 10 seconds. That's much more immediate than Windows search, and it is possible. For the problem of moving files, ideally you'd need to store an MD5 hash of each file as a primary key in the index, so when files get moved around they'd be recognized instead of reindexed. Alternatively (or primarily) you could (in the indexing daemon) associate an Extended Attribute named, say, "File ID" which would act like an object index so the search daemon would recognize it everywhere on your FS (even NFS), regardless of file contents. This could actually spawn a completely new filing paradigm, based on objects and categories/contents, instead of subdirectories. >=20 >=20 > > > - cache previous results, maybe they're used again soon since users > > > often refine their search > > > > Unfortunately, this doesn't help the first search. The first search is > > always more important than subsequent searches, in my opinion. >=20 > I agree... but it's better than having it slow everytime. >=20 >=20 > > > - change kio_slaves to update a db everytime a file is modified... > > > with that we can have something fast for user and better sync'ed than > > > slocate. The problem is other apps, like gnome or openoffice. > > > > This may slow down the speed at which updates happen. >=20 > Do you think so? > I'm saying that when you save your files or delete then, the index must b= e=20 > updated. But that just help kde apps. >=20 >=20 > > > All of them have problems. The real problem I see is with home dir..= . > > > it's the part of the system that changes most and in short periods of > > > time, probably the user changes and then search... that common case m= akes > > > life difficult and should be optimized. > > > > I have been doing some work with Materialized Views in PostgreSQL. Here= is > > what I think will work with KDE. > > > > Google type search. Everything on disk is indexed. With an 80GB hard dr= ive, > > it's not a problem to have everything indexed in multiple ways. The sea= rch > > should take a couple of seconds, max. Let the OS worry about in-memory > > caching, etc... If there isn't enough room for the index, then we shoul= d > > provide a weaker system like Windows where the search is done in real t= ime, > > only the most important files are indexed, and recent search results ar= e > > stored. >=20 > Fair. >=20 >=20 > > File Importance. Files in the home directory, files modified or viewed > > frequently by the user, files in the favorites list, etc, are more > > important than system files or log files or cache files. They should be > > listed first and indexed first. > > > > Creating the index is the problem. A backend process should be constant= ly > > running at a low priority. Initially, it indexes all the files. Then, i= t > > begins to index files as they change. It always keep a fresh index of t= he > > most important files, and gets around to less important files when it h= as > > the time. >=20 > That doesn't work. > - a process accessing the disk everytime will screw up with OS disc cach= ing. > - IMHO, one may want to find the last file he/she accessed/saved/created= =20 > other than anything else. So these will be the most important... and prob= ably=20 > will not be indexed. >=20 >=20 > > When a search is made, it is run completely against the index. Sure, th= e > > index may be out-of-date, but the most important files will be indexed > > almost as soon as they are modified, and the least important files may > > never get indexed, so the results are usually relevant. > > > > Metadata is critical. How do we accumulate the meta data? How is it sto= red? > > I don't have the slightest in this department. Already, we have > > permissions, location, access/modification times, and MIME type that gi= ve > > us good meta data. But I would like additional meta-data specified by t= he > > user, like "project" or "author" or "comments". Importance is another p= iece > > of meta-data that is useful. > > > > I don't think a real database (PostgreSQL or MySQL) will be appropriate= for > > the database part of the tool. The requirements for the index cache are > > very different than what PostgreSQL provides. I do believe that we can > > steal a lot of ideas from the database community on how to search vast > > indexes efficiently. Perhaps early implementations will rely on a datab= ase, > > but that should be temporary. >=20 > I also don't think so. --=20 Manuel Amador Jefe de I+D +593 (9) 847-7372 Amauta http://www.amautacorp.com/ GNU Privacy Guard key ID: 0xC1033CAD at keyserver.net --=-/i3tLz0ktmKtC0B1j0i7 Content-Type: application/pgp-signature; name=signature.asc Content-Description: Esta parte del mensaje =?ISO-8859-1?Q?est=E1?= firmada digitalmente -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQBAx1kHWyznNMEDPK0RAr5zAJ0b/INRbu9N7yLKOAxgW2n4Us/teACfZmrm mx4g/iYaJVuaDoyvFwN1O8g= =yIYj -----END PGP SIGNATURE----- --=-/i3tLz0ktmKtC0B1j0i7-- --===============0820608578== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ kde-usability mailing list kde-usability@kde.org https://mail.kde.org/mailman/listinfo/kde-usability --===============0820608578==--