[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: Status on indexing system
From:       Matteo Merli <matteo.merli () gmail ! com>
Date:       2005-02-24 11:33:04
Message-ID: e215b2d805022403336f220484 () mail ! gmail ! com
[Download RAW message or body]

On Wed, 23 Feb 2005 16:33:12 -0500, Manuel Amador <rudd-o@amautacorp.com> wrote:
> Hi,
> 
> I'm advancing, slowly, with the project I proposed to the list a few
> days ago.  I'm struggling with scalability/database issues right now,
> but as soon as I get over them, I'll start working on a simple frontend
> a la Beagle.  For what it's worth, searching songs by The Beatles on
> 70.000 files now takes under 0.4 seconds.

Hi, I'm am working from some time on (not-yet-released) information
retreival system very similar to your. I used basically the same
approach: python + ZODB + BerkeleyDB reached after many tries with
other systems.
I faced your same problems with the index performances and space
occupation and came out with a few solutions..
Maybe we can share ideas and code..

The main features already implemented in my system are:
- Directory Scanner
- Pugins to handle different mime-types
- Very fast text tokenizer and indexer
- Full Unicode support ( all the text is converted to unicode )
- Query Parser (does handle logical queries with AND OR "("  )
- Basic web interface with Twisted/Nevow
- SOAP interface

I haven't released nothing yet because there is no documentation and I
am working on rationalising the package structure.

Best Regards
Matteo Merli

-- 
Matteo Merli
<matteo.merli@gmail.com>
 
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic