'Re: Nepomuk in 4.13 and beyond'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: Nepomuk in 4.13 and beyond
From:       François_K. <daitheflu () free ! fr>
Date:       2013-12-19 17:20:59
Message-ID: 1153790423.262172383.1387473659336.JavaMail.root () zimbra76-e14 ! priv ! proxad ! net
[Download RAW message or body]

Hi,

Thanks for your answer.

> > * What are the plans to store tags ? On OSX, tags are stored in
> > files xattrs
> > which is -IMHO- very nice : - Metadata live and die with the file ;
> > - No "store" query when you move or copy a file ;
> > - You don't rely on a "store" to tag files ;
> > - You also don't end with a huge store full of unuseful things
> > like it
> > used to happen with Nepomuk some time ago (no offense) ; - You can
> > easily
> > backup the metadata (at least files metadata) : you just have to
> > use a
> > decent backup tool that handles xattrs ; - It's CLI-friendly ;
> > - ...
> 
> +1
> 
> I'm leaning towards this as well.

I was doing some research about extended attributes and found this link which may \
contain interesting information/ideas : \
http://web.archive.org/web/20120204170632/http://www.freedesktop.org/wiki/CommonExtendedAttributes

Anyway, I think it's pretty interesting to see that FreeDesktop.org was also looking \
at xattrs.

> > * What are the plans to store indexes ? Again, with OSX (sorry, I
> > work a lot
> > with Macs -maybe too much-), the system builds an index per volume.
> > This is
> > quite nice because when you connect a volume that has already been
> > indexed,
> > the system gets the information and can immediatly search the
> > volume index.
> > Let's take an example : let's say you have some remote storage (NAS
> > or
> > whatever) at home with your medias. You mount this remote volume
> > and let
> > the indexers do their stuff. Then you mount the volume from another
> > device
> > and *tadaaa*, you're able to query the previously-built index.
> > Wouldn't
> > that be awesome ? If you disconnect the volume, the index for this
> > volume
> > isn't available anymore and you don't get results for it. This also
> > means
> > that if one index gets corrupted, you don't have to scan and index
> > every
> > volume again. I think this would also solve Ignacio's issue.
> > 
> 
> This is exactly what I'm aiming for. We're currently using Xapian to
> store the
> indexes. Its engine allows multiple databases to be queried easily.

Great ! I'm glad to read that :)

> > * You probably already know it, but SQLite DB might have some
> > problems when
> > stored on remote filesystems (see: http://www.sqlite.org/wal.html
> > and
> > especially "All processes using a database must be on the same host
> > computer; WAL does not work over a network filesystem."). So if you
> > plan to
> > store each index on its volume (as previously suggested), SQLite
> > might not
> > be the (best) solution.
> > 
> 
> Nah.
> 
> The sqlite is used to map file urls to unique identifiers. We need
> unique
> identifiers for files since the url can change on rename/move.
> 
> This unique identifier (an unsigned integer) is then used in xapian
> to uniquely
> identify the file.

I'm not sure I get it properly.
I clearly understand that the URL can't be used as identifier since it can change.

But if SQLite is used to map the URL and the id, it means that we'll have to update \
the mapping each time the file URL changes, right ? Then what value does the mapping \
add ?

Wouldn't it be possible to :
  * drop SQLite,
  * build a unique identifier upon the file URL (md5, UUID, or whatever suitable) and \
update it directly in Xapian whenever the file URL changes ?

Also, wouldn't it be possible to use the inode number of the file as id ? If I \
remember well, the inode number isn't supposed to change when you mv a file (as long \
as it remains on the same filesystem). I'm not really sure about this, I'm really far \
from being a FS expert ! Just thought it might be an elegant solution.

> > * Will there be several separated indexers (one for PDF files, one
> > for video
> > files, ...) or just one that takes care of everything ? I was
> > thinking
> > about the ability to add indexers that could retrieve stuff from
> > the
> > Internet. For example, have an indexer that could retrieve movie
> > information from TheMovieDB.org.
> > 
> 
> There are separate indexers for each file format, as was the case
> with Nepomuk.
> Please have a look at kfilemetadata [1].

Okay, great. I think the most important part in this is to have something modular \
(being able to add, remove, update indexers).

> For web extractors, I still haven't figured out how we would approach
> that.
> Another Nepomuk developer, Jorg, has similar ideas. Maybe we should
> start a
> thread about it and discuss it?

Yep, why not ? :)

> 
> > * I hope there will be a nice query API ? Dealing with Sparql was a
> > nightmare for me !
> > 
> 
> There is one right now. Perhaps you could take a look and give some
> feedback?

Okay, I'll try to take some time to have a look at it. And maybe take some more to \
dive into the code and see if I can give some feedback about the whole project. I \
certainly don't have the necessary knowledge to have something interesting to say, \
but, yep, we'll see.

> > * Will it come with a QML DataEngine ?
> > 
> 
> Can't say. It will have QML Bindings, but I'm not sure about a
> DataEngine.
> Lets see.

Okay ! Then I'll be patient and see what comes :)

Best regards,

-- 
François

> > Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

[prev in list] [next in list] [prev in thread] [next in thread]