[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Strigi splitting? was Re: Proposal: Integration of libKMetaData into
From:       "Jos van den Oever" <jvdoever () gmail ! com>
Date:       2006-12-18 21:12:09
Message-ID: c2dbc4260612181312w4e9d922bha4e7a245ffaea4d4 () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (text/plain)]

2006/12/18, Aaron J. Seigo <aseigo@kde.org>:
> On Monday 18 December 2006 9:44, Sebastian TrĂ¼g wrote:
> > last Friday I discussed the new libkmetadata with Aaron Seigo. We agreed
> > that it would be a good idea to have it in kdelibs so all KDE applications
> > could benefit from it.
>
> the plan we discussed is something like this:
>
> - put qrdf into kdesupport. it has no kde dependencies (only Qt) and still has
> API work left to be done on it anyways.
> - put kmetadata in kdelibs/ .. this would include (Sebastian: correct me if
> i'm wrong here):
>     - backbone
>     - kmetadata
>     - strigiindexer (optional component relying on strigi being installed)
>     - tests
> - move the example apps over time to appropriate places, e.g. extragear/search
> or whatever
>
> the goals are:
>
>  - have a consumable API that ships with KDE 4.0
>  - work on apps consuming that API for KDE 4.1
>
> if the second goal is also hit for 4.0, great. but my recomendation is to set
> that as a 4.1 goal. i personally feel that's more realistic if the goal is to
> include a broad set of applications. i assume a few, or even a good number,
> of apps will use it in 4.0 but it'll really take off post-4.0. but for that
> to happen, we need the API in a usable form in 4.0.
>
> whether or not there'll be a BC gaurantee in 4.0 for it is something that's
> left to discussion at this point, but in a Good World there would be. there's
> still time enough for that, particularly if we get the API in sooner (e.g.
> now) rather than later (e.g. N months from now) so we can kick its tires
> thoroughly with our own applications.
>
> the "why" is pretty simple: to give kde4 the ability to grow into a fully
> searchable, fully linkable, fully semantic desktop. just like how network
> awareness was a goal of kde 2.0 that become realized more and more over time
> (years, actually), we can set a similar goal for search and semantics.


Related to this, I'd like to move Strigi into something more mature
than playground and add it as a dependency to KDE4, since parts of its
functionality are interesting for more than just desktop search as
I've shown with the stream-based kioslave for reading embedded files.
Some of you might think requiring all of Strigi for KDE4 might be a
bit much. So I'd like to discuss splitting it up in 2 or 3 parts.
Keeping all of it in one folder and simply using the compile time
options for disabling parts as desired is however a valid option too.

The three parts are libstreams, libstreamindexer and the strigidaemon.
Splitting up in 3 parts would allow KDE to exclude the strigidaemon.

These are the three parts we might split Strigi in:

== libstreams ==

A library that defines two abstract stream interfaces StreamBase<T>
and SubStreamProvider. The first is an input stream that is very
efficient because it avoid copying of data. A typical read operation
proceeds thus:
  const char* data;
  int32_t nread = input->read(data, min, max);
Instead of copying the data into a userprovided buffer, a pointer to
the data is returned. This avoid both memory allocation and data
copying. When reading loads of data as Strigi does, this really makes
a difference. Especially when reading substreams/embedded files.

Which brings me to the next interface: SubStreamProvider. This
abstract class takes a stream as input and allows iteration over the
substreams:
  MailInputStream mail(input);
  InputStream* substream = mail.nextEntry();
  while (substream) {
      ....
      substream = mail.nextEntry();
  }

To top it off this part of Strigi provides an ArchiveReader which
facilitates reading of  substreams from parent streams by opening the
the substreams with the right SubStreamProvider for rpm, zip, email,
etc. and by caching the filenames to speed this up. This is the class
that is used in the jstream:// kioslave.

== libstreamindexer ==

This is the part of Strigi that extracts text and metadata from
streams. It defines the interfaces for the analyzer (plugins). This
part of Strigi could be used as a KFileMetaInfo replacement. It would
allow nifty things like extracting metadata while a file is
downloading for any supported filetype. This would require porting of
the current metadata extractors from KDE.

Closely related to this part of Strigi are the deepfind and deepgrep
executables and I would keep them with this part of Strigi. These
programs also form a good fallback for searching in folders on which
no index is available such as removable media.

libstreamindexer depends on libstreams

== strigidaemon and storage backends ==

This is the part of Strigi that implements the DBus interface to do
fast searching. It creates an index on the data it can extract with
libstreamindexer and this is what most of you will associate with
Strigi. This part would be way less powerful without the other two
parts.

I've attached a schematic picture of the interactions between the
various parts that should help the discussion. You can just grap the
image and change it if you want. Sebastian, the nepomuk backbone and
kmetadata fall into one block in the drawing, if you want you can add
some detail there.

Cheers,
Jos

["kdestrigi.odp" (application/vnd.oasis.opendocument.presentation)]
["kdestrigi.png" (image/png)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic