[prev in list] [next in list] [prev in thread] [next in thread]
List: nepomuk
Subject: Re: [Nepomuk] Why store file urls?
From: Vishesh Handa <me () vhanda ! in>
Date: 2012-12-10 13:30:05
Message-ID: CAOPTMKBfhJNwOdTg12hSefmixFYdDHtk2oYes_iRQSNT5wjcmg () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
Quick update -
Right now the plan is to implement this for 4.11.
On Tue, Nov 27, 2012 at 1:53 AM, Sebastian Trüg <trueg@kde.org> wrote:
> On 11/23/2012 11:17 AM, Vishesh Handa wrote:
>
> >
> >
> >
> > On Fri, Nov 23, 2012 at 3:30 PM, Jörg Ehrichs <Joerg.Ehrichs@gmx.de
> > <mailto:Joerg.Ehrichs@gmx.de>> wrote:
> >
> > 2012/11/23 Marco Martin <notmart@gmail.com <mailto:notmart@gmail.com
> > > > >
> >
> > > On Friday 23 November 2012, Vishesh Handa wrote:
> > >
> > > > <nepomuk:/res/23161f9c-8839-**4de3-bba0-affdd6d654ef>
> > > > rdf:type
> > > > nmm:MusicPiece
> > > > rdf:type
> > > > nfo:FileDataObject
> > > > rdf:type
> > > > nfo:Audio
> > > > rdf:type
> > > > nie:InformationElement
> > > > nie:url
> > > > file:///home/vishesh/Music/**where_does_the_good_go.mp3
> > > >
> > > > Storing this URL makes accessing file resources quite
> > convenient. But I
> > > > fear it has been a terrible design decision. By storing the url
> > we face the
> > > > following problems -
> > >
> > > uhm, probably is right, keeping the full file url consistent is a
> > mess,
> > > however...
> > >
> > > a very common use case is in the c++ code, doing
> > Nepomuk2::Resource(file path)
> > >
> > > needing a fast way to obtain the resource associated to a
> > particular file
> > > (like in https://bugs.kde.org/show_bug.**cgi?id=310525<https://bugs.kde.org/show_bug.cgi?id=310525>
> > >
> > )
> > >
> > > otherwise how could be done quickly to have the metadata of a
> > file given we
> > > have the file, and the other way around?
> >
> >
> > It would be slightly more expensive, but not too hard. One would have to
> > retrieve the resource for each file resource till the root element. So
> > if you give me something like this
> > Resource("/home/vishesh/kde/**src/file.cpp")
> >
> > I'll have to do either multiple queries -
> >
> > select ?r where { ?r nfo:filename "home" ; nie:isPartOf <rootElement> .
> > } -> homeRes
> > select ?r where { ?r nfo:filename "vishesh" ; nie:isPartOf <homeRes> . }
> > -> visheshRes
> > ..
> > ..
> > or maybe it can be done in one query?
> >
>
> I think so:
>
> select ?r where { ?r nfo:filename "file.cpp" ; nie:isPartOf [ nfo:filename
> "src" ; nie:isPartOf [ nfo:filename "kde" ... ] ] }
>
> I am, however, not sure which is faster.
>
> In general I like the idea to get rid of file URL, a lot actually. This
> could even mean that you get rid of nie:url alltogether. In the end there
> is really no need to use nie:url for http or any other remote resource...
>
> As for your (3): that should actually be fairly simple. I wrote the code,
> which feels very hacky (not the code itself, but the need for its
> existance) and it could easily be adapted to only update nfo:filename and
> nie:isPartOf. Much simpler in the end.
>
> All in all: +10 from me if you can get the direct file resource access
> fast.
>
> Cheers,
> Sebastian
>
>
> > You get the gist. These all could be cached in memory so it shouldn't be
> > a big problem. This is actually quite analogous to what the kernel does
> > in the file system later, except that it matches inodes to their
> > filename. We will be matching resource uris.
> >
> > I'd say retrieving metadata from a file is a "one-time" job of the
> > file-indexer.
> > Afterwards, we should rely on the data inside Nepomuk and only get
> > more once this fails.
> >
> > In addition, the nepomuk-core part could offer a convenient method to
> > create the file url for the end-user and also cache this information
> > for a while to speed up the query. I assume its faster to check
> > QFile::exists() than creating the url with every query again.
> >
> >
> > Of course. This all should be transparently handled in the resource class.
> >
> > Other than that, I like the idea. It seems there are several problems
> > with remove able media, which doesn't seem to get solved with the
> > current way.
> >
> >
> > Yeah. I think so as well.
> >
> > But it's a BIG change. All the previous data will first need to be ported.
> >
> > ______________________________**_________________
> > Nepomuk mailing list
> > Nepomuk@kde.org <mailto:Nepomuk@kde.org>
> >
> > https://mail.kde.org/mailman/**listinfo/nepomuk<https://mail.kde.org/mailman/listinfo/nepomuk>
> >
> >
> >
> >
> > --
> > Vishesh Handa
> >
> >
> >
> > ______________________________**_________________
> > Nepomuk mailing list
> > Nepomuk@kde.org
> > https://mail.kde.org/mailman/**listinfo/nepomuk<https://mail.kde.org/mailman/listinfo/nepomuk>
> >
> > ______________________________**_________________
> Nepomuk mailing list
> Nepomuk@kde.org
> https://mail.kde.org/mailman/**listinfo/nepomuk<https://mail.kde.org/mailman/listinfo/nepomuk>
>
--
Vishesh Handa
[Attachment #5 (text/html)]
Quick update -<br><br>Right now the plan is to implement this for 4.11.<br><div \
class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Nov 27, 2012 at 1:53 AM, \
Sebastian Trüg <span dir="ltr"><<a href="mailto:trueg@kde.org" \
target="_blank">trueg@kde.org</a>></span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div class="im">On 11/23/2012 11:17 AM, Vishesh Handa \
wrote:<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im"> <br>
<br>
<br>
On Fri, Nov 23, 2012 at 3:30 PM, Jörg Ehrichs <<a \
href="mailto:Joerg.Ehrichs@gmx.de" target="_blank">Joerg.Ehrichs@gmx.de</a><br></div> \
<mailto:<a href="mailto:Joerg.Ehrichs@gmx.de" \
target="_blank">Joerg.Ehrichs@gmx.de</a>>> wrote:<br> <br>
2012/11/23 Marco Martin <<a href="mailto:notmart@gmail.com" \
target="_blank">notmart@gmail.com</a> <mailto:<a href="mailto:notmart@gmail.com" \
target="_blank">notmart@gmail.com</a>>>:<div><div class="h5"><br>
> On Friday 23 November 2012, Vishesh Handa wrote:<br>
><br>
>> <nepomuk:/res/23161f9c-8839-<u></u>4de3-bba0-affdd6d654ef><br>
>> rdf:type<br>
>> nmm:MusicPiece<br>
>> rdf:type<br>
>> nfo:FileDataObject<br>
>> rdf:type<br>
>> nfo:Audio<br>
>> rdf:type<br>
>> nie:InformationElement<br>
>> nie:url<br>
>> file:///home/vishesh/Music/<u></u>where_does_the_good_go.mp3<br>
>><br>
>> Storing this URL makes accessing file resources quite<br>
convenient. But I<br>
>> fear it has been a terrible design decision. By storing the url<br>
we face the<br>
>> following problems -<br>
><br>
> uhm, probably is right, keeping the full file url consistent is a<br>
mess,<br>
> however...<br>
><br>
> a very common use case is in the c++ code, doing<br>
Nepomuk2::Resource(file path)<br>
><br>
> needing a fast way to obtain the resource associated to a<br>
particular file<br>
> (like in <a href="https://bugs.kde.org/show_bug.cgi?id=310525" \
target="_blank">https://bugs.kde.org/show_bug.<u></u>cgi?id=310525</a>)<br> ><br>
> otherwise how could be done quickly to have the metadata of a<br>
file given we<br>
> have the file, and the other way around?<br>
<br>
<br>
It would be slightly more expensive, but not too hard. One would have to<br>
retrieve the resource for each file resource till the root element. So<br>
if you give me something like this<br>
Resource("/home/vishesh/kde/<u></u>src/file.cpp")<br>
<br>
I'll have to do either multiple queries -<br>
<br>
select ?r where { ?r nfo:filename "home" ; nie:isPartOf <rootElement> \
.<br> } -> homeRes<br>
select ?r where { ?r nfo:filename "vishesh" ; nie:isPartOf <homeRes> \
. }<br>
-> visheshRes<br>
..<br>
..<br>
or maybe it can be done in one query?<br>
</div></div></blockquote>
<br>
I think so:<br>
<br>
select ?r where { ?r nfo:filename "file.cpp" ; nie:isPartOf [ nfo:filename \
"src" ; nie:isPartOf [ nfo:filename "kde" ... ] ] }<br> <br>
I am, however, not sure which is faster.<br>
<br>
In general I like the idea to get rid of file URL, a lot actually. This could even \
mean that you get rid of nie:url alltogether. In the end there is really no need to \
use nie:url for http or any other remote resource...<br>
<br>
As for your (3): that should actually be fairly simple. I wrote the code, which feels \
very hacky (not the code itself, but the need for its existance) and it could easily \
be adapted to only update nfo:filename and nie:isPartOf. Much simpler in the end.<br>
<br>
All in all: +10 from me if you can get the direct file resource access fast.<br>
<br>
Cheers,<br>
Sebastian<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div class="im"> <br>
You get the gist. These all could be cached in memory so it shouldn't be<br>
a big problem. This is actually quite analogous to what the kernel does<br>
in the file system later, except that it matches inodes to their<br>
filename. We will be matching resource uris.<br>
<br>
I'd say retrieving metadata from a file is a "one-time" job of \
the<br> file-indexer.<br>
Afterwards, we should rely on the data inside Nepomuk and only get<br>
more once this fails.<br>
<br>
In addition, the nepomuk-core part could offer a convenient method to<br>
create the file url for the end-user and also cache this information<br>
for a while to speed up the query. I assume its faster to check<br>
QFile::exists() than creating the url with every query again.<br>
<br>
<br>
Of course. This all should be transparently handled in the resource class.<br>
<br>
Other than that, I like the idea. It seems there are several problems<br>
with remove able media, which doesn't seem to get solved with the<br>
current way.<br>
<br>
<br>
Yeah. I think so as well.<br>
<br>
But it's a BIG change. All the previous data will first need to be ported.<br>
<br>
______________________________<u></u>_________________<br>
Nepomuk mailing list<br></div>
<a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a> \
<mailto:<a href="mailto:Nepomuk@kde.org" \
target="_blank">Nepomuk@kde.org</a>><div class="im"><br> <a \
href="https://mail.kde.org/mailman/listinfo/nepomuk" \
target="_blank">https://mail.kde.org/mailman/<u></u>listinfo/nepomuk</a><br> <br>
<br>
<br>
<br>
--<br>
Vishesh Handa<br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
Nepomuk mailing list<br>
<a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a><br>
<a href="https://mail.kde.org/mailman/listinfo/nepomuk" \
target="_blank">https://mail.kde.org/mailman/<u></u>listinfo/nepomuk</a><br> <br>
</div></blockquote><div class="HOEnZb"><div class="h5">
______________________________<u></u>_________________<br>
Nepomuk mailing list<br>
<a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a><br>
<a href="https://mail.kde.org/mailman/listinfo/nepomuk" \
target="_blank">https://mail.kde.org/mailman/<u></u>listinfo/nepomuk</a><br> \
</div></div></blockquote></div><br><br clear="all"><br>-- <br><span \
style="color:rgb(192,192,192)">Vishesh Handa</span><br><br> </div>
_______________________________________________
Nepomuk mailing list
Nepomuk@kde.org
https://mail.kde.org/mailman/listinfo/nepomuk
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic