[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-usability
Subject:    Re: Easier Searching in KDE
From:       Jonathan Gardner <jgardner () jonathangardner ! net>
Date:       2004-06-01 22:05:42
Message-ID: 200406011505.42598.jgardner () jonathangardner ! net
[Download RAW message or body]

On Monday 24 May 2004 07:47 am, Gustavo Sverzut Barbieri wrote:
> The only point I see against this is the slowness... google is fully
> indexed and does this all within mseconds. Searching the web or even
> search for files takes a lot of time in computer these days... slocate is
> good, but it generally is desync. Searching for contents is even more
> slow.
> 	Maybe we could come with a way to solve this:
> 		- the match list should first search local files in indexed mode, then
> check if they exists (avoid show non-existent), then proceed with
> something like "find", then with web, ... This may present first results
> really quick, but the other will still be delayed

I really don't like the way the windows search works: It generates results 
slowly, and you don't really have an idea of how much is left to search 
over. I want results right away, like Google. I want the results catalogued 
and listed by relevance and user preferences.

> 		- cache previous results, maybe they're used again soon since users
> often refine their search

Unfortunately, this doesn't help the first search. The first search is 
always more important than subsequent searches, in my opinion.

> 		- change kio_slaves to update a db everytime a file is modified... 
> with that we can have something fast for user and better sync'ed than
> slocate. The problem is other apps, like gnome or openoffice.
>

This may slow down the speed at which updates happen.

> 	All of them have problems. The real problem I see is with home dir...
> it's the part of the system that changes most and in short periods of
> time, probably the user changes and then search... that common case makes
> life difficult and should be optimized.
>

I have been doing some work with Materialized Views in PostgreSQL. Here is 
what I think will work with KDE.

Google type search. Everything on disk is indexed. With an 80GB hard drive, 
it's not a problem to have everything indexed in multiple ways. The search 
should take a couple of seconds, max. Let the OS worry about in-memory 
caching, etc... If there isn't enough room for the index, then we should 
provide a weaker system like Windows where the search is done in real time, 
only the most important files are indexed, and recent search results are 
stored.

File Importance. Files in the home directory, files modified or viewed 
frequently by the user, files in the favorites list, etc, are more 
important than system files or log files or cache files. They should be 
listed first and indexed first.

Creating the index is the problem. A backend process should be constantly 
running at a low priority. Initially, it indexes all the files. Then, it 
begins to index files as they change. It always keep a fresh index of the 
most important files, and gets around to less important files when it has 
the time.

When a search is made, it is run completely against the index. Sure, the 
index may be out-of-date, but the most important files will be indexed 
almost as soon as they are modified, and the least important files may 
never get indexed, so the results are usually relevant.

Metadata is critical. How do we accumulate the meta data? How is it stored? 
I don't have the slightest in this department. Already, we have 
permissions, location, access/modification times, and MIME type that give 
us good meta data. But I would like additional meta-data specified by the 
user, like "project" or "author" or "comments". Importance is another piece 
of meta-data that is useful.

I don't think a real database (PostgreSQL or MySQL) will be appropriate for 
the database part of the tool. The requirements for the index cache are 
very different than what PostgreSQL provides. I do believe that we can 
steal a lot of ideas from the database community on how to search vast 
indexes efficiently. Perhaps early implementations will rely on a database, 
but that should be temporary.

-- 
Jonathan Gardner
jgardner@jonathangardner.net
_______________________________________________
kde-usability mailing list
kde-usability@kde.org
https://mail.kde.org/mailman/listinfo/kde-usability
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic