[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-devel
Subject: Adding improved file search to KDE
From: Jonathan Gardner <jgardner () jonathangardner ! net>
Date: 2004-06-14 23:11:08
Message-ID: 200406141611.08762.jgardner () jonathangardner ! net
[Download RAW message or body]
On the KDE usability lists, one of our illustrious imaginators wanted to
push for better file searching. I've done a lot of work with PostgreSQL and
I feel I have a good grasp on database topics and how they can tie into
search.
Let me break this up into 2 parts. The first part is the "big picture". This
is where we can be at one day, in the not too distant future. The second
part is the "next step". This is what we can do next to move towards the
"big picture".
Files should exist independent of the actual filesystem. In other words, the
filesystem should be one way of many to find a file.
We store thoughts by context. We store a ton of metadata for each memory or
idea, and we access them by index on these metadata. We do a search, "Show
me that time I went to Hawaii and we were on the beach last year" and we
get back a good candidate result with an option to think harder and come up
with something else, or to change our search parameters "No, it was when I
was with my wife."
Certain files shouldn't be readily accessible to the user: cache files and
configuration files, for instance. Other files should score higher on the
search results independent of the search parameters: Files that are
modified frequently and/or recently. Files that are viewed often or
recently. Files that are owned by the user or user's group. Other files
should score lower: Files that are frequently passed up in favor of other
results, files that are never accessed or modified, or files that are
modified by not by the user (like log files).
There should be a personality profile kept for each user. When I search for
"log", I expect to see Apache and PostgreSQL log files. But my next-door
neighbor may want to see what is happening in the logging industry.
The file search dialog will become a ubiquitos replacement for the file open
dialog. Meta data like "project" or "event" will keep files grouped
together. User will never see the underlying filesystem, if any.
I am sure you can come up with many metadata attributes that I can't even
imagine. I am sure each individual user will have their tastes as well.
However, actually collecting the metadata is difficult. For this purpose,
we will have to keep a running context so that when a new file is created
or edited in that context, the context's metadata is transferred. For
instance, when I am working on a particular project, the system should
realize this. When I go to edit a file, it will add that file as part of
the project, unless there is strong evidence not to. We certainly can't
expect the user to type "This is the picture of me and Sally in Hawaii on
North Shore" for every single picture as they download them. Maybe metadata
can be sparse at first and eventually grow as context is added.
Disk space will be abundant in the future. It will be cheap to get terrabyte
drives, which will be more than enough to store a movie of most of your
life. The remaining space will be used as a cache and as an index to every
file on the disk. The system disk will also keep a running index of CDs and
other media. Perhaps it will even index websites and FTP sites that the
user has visited or will likely visit one day.
Processor power and excessive memory will make compiling all this
information into a meaningful and efficient index easy. Keeping it up to
date won't be an issue. Combining all the known indexes plus the user's
preferences and personality will yield superior search results
instantaneously.
Now for talk of today. Right now, the only tool that does any file indexing
is "locate". It does a terrible job at that. It only indexes file names. It
also doesn't update but once a night.
I propose we begin a project to start indexes on all running instances of
KDE. At first, we will index the entire filesystem. Later, we will add
indexes of CDs that the user has accessed. Maybe in the future we can index
sites the user has visited.
The only metadata we should store are the file's name, title (if
applicable), permissions, ownership, and last access "score". Also, file
type and category will be recorded.
The access score is a decaying number. Every time the user accesses it, it
increases. Every time the user modified it, it increases more. This way,
frequently accessed items and recently accessed items will be given the
highest score.
The file "type" is its MIME type. This should be easy to determine. The
"file" tool can assist. Category is a broader description. We'll have "user
files", maybe broken up into "documents", "music". "movies"... We'll also
have "system files": "log files", "configuration files", "data files",
"cache", and "code files". Every file should fall into one of these broad
categories.
Indexes will be maintained by a PostgreSQL database instance. At first, we
will index only the files in the user's home directory. Then we can branch
out and index every file everywhere. How do we keep an accurate and timely
index? There are two ways:
(1) We get notified when a file is modified. There is no way to do this at
the kernel level (yet!) but the KDE API allows us to put our own hooks in.
Most applications will use the KDE API to load and store files.
(2) We browse the filesystem, keeping resource usage to a minimum, looking
for modified files.
Any thoughts?
--
Jonathan Gardner
jgardner@jonathangardner.net
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic