'Re: A search service for KDE'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: A search service for KDE
From:       Manuel Amador <rudd-o () amautacorp ! com>
Date:       2005-02-23 23:28:25
Message-ID: 1109201305.30814.63.camel () master ! amauta
[Download RAW message or body]

El mar, 22-02-2005 a las 19:03 +0000, Daniel Roe escribió:

> 
> But I think given that Manuel's engine seems quite close to completion, we 
> should at the very least check it out before rejecting its language. I speak 
> for others, I'm sure, when I say that I'm eager to look over the preview 
> release he speaks of.

Thanks, but do not get your hopes up so much right now.  Here's what is
done:

- file plugin interface (with KDE's KFile abstraction, for plugin
writers)
- index database interface, and two incomplete, but far ahead plugins:
PostgreSQL and ZODB (I'm leaning more towards ZODB since I can
transparently persist everything instead of manually pickling and
unpickling, and being able to store references, which I cannot with PG,
and I found - I think - the solution to the scalability problems)
- Filesystem crawler with companion per-volume crawlers
- indexer process, which consumes files produced by the FS crawler and
adds them to the database
- XML-RPC-based interfaces (socket and TCP) exposing search and metadata
querying methods, plus various administrative functions for the
superuser
- a rudimentary command-line based search tool (more like testing tool)
- a command-line tool to test file plugins
- automatic throttling based on last minute loadavg (check your uptime
command's output for more information)
- an event logging interface which can output to syslog, stderr or a
file

Currently, using any of the two database backends, indexing takes 0.02
seconds per file on average, and extracting contents and all metadata
takes 0.5 seconds on average (MP3s take 0.2 seconds, text files 0.5 and
some large HTML files up to 45 seconds, due to the unoptimized regexps
I'm using in some places).  Searching The Beatles (13000 songs indexed)
takes 0.5 seconds on average, according to the time(1) command.

I'm solving scalability issues.  I finally have memory usage pinned
down, by using finite-sized queues between processes, and with
PostgreSQL.  I'll delve into stabilizing memory usage with ZODB (which
can enable so much more functionality in the future!) because once I
unleashed the indexer on my entire hard disk, the memory usage of the
metadata service process shot to 400 MB (I only have half a gig of RAM
and recovering from that situation took me a couple of minutes).

Next on my list:
- incremental indexing (with inotify).  I first need to reboot into my
inotify-enabled patched kernel.  I don't wanna!

Sorry, no GUI search tool yet =(  I tried with kdevelop but all I got
was a headache (I still cannot wrap my head around C++, or the other way
around - maybe I'm spoiled because of python).  Anyway, any app that
wants to query the database can perform a simple, already "standardized"
XML-RPC call and the server replies with results.  But ideally, that
tool should expose a standard DCOP interface for all KDE apps to use and
rely on, right?

> 
> Regards,
> Daniel
>  >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
-- 
Manuel Amador <rudd-o@amautacorp.com>
Amauta

>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

[prev in list] [next in list] [prev in thread] [next in thread]