'Re: [Nepomuk] Why store file urls?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nepomuk
Subject:    Re: [Nepomuk] Why store file urls?
From:       Vishesh Handa <me () vhanda ! in>
Date:       2012-12-10 13:30:05
Message-ID: CAOPTMKBfhJNwOdTg12hSefmixFYdDHtk2oYes_iRQSNT5wjcmg () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

Quick update -

Right now the plan is to implement this for 4.11.

On Tue, Nov 27, 2012 at 1:53 AM, Sebastian Trüg <trueg@kde.org> wrote:

> On 11/23/2012 11:17 AM, Vishesh Handa wrote:
> 
> > 
> > 
> > 
> > On Fri, Nov 23, 2012 at 3:30 PM, Jörg Ehrichs <Joerg.Ehrichs@gmx.de
> > <mailto:Joerg.Ehrichs@gmx.de>> wrote:
> > 
> > 2012/11/23 Marco Martin <notmart@gmail.com <mailto:notmart@gmail.com
> > > > > 
> > 
> > > On Friday 23 November 2012, Vishesh Handa wrote:
> > > 
> > > > <nepomuk:/res/23161f9c-8839-**4de3-bba0-affdd6d654ef>
> > > > rdf:type
> > > > nmm:MusicPiece
> > > > rdf:type
> > > > nfo:FileDataObject
> > > > rdf:type
> > > > nfo:Audio
> > > > rdf:type
> > > > nie:InformationElement
> > > > nie:url
> > > > file:///home/vishesh/Music/**where_does_the_good_go.mp3
> > > > 
> > > > Storing this URL makes accessing file resources quite
> > convenient. But I
> > > > fear it has been a terrible design decision. By storing the url
> > we face the
> > > > following problems -
> > > 
> > > uhm, probably is right, keeping the full file url consistent is a
> > mess,
> > > however...
> > > 
> > > a very common use case is in the c++ code, doing
> > Nepomuk2::Resource(file path)
> > > 
> > > needing a fast way to obtain the resource associated to a
> > particular file
> > > (like in https://bugs.kde.org/show_bug.**cgi?id=310525<https://bugs.kde.org/show_bug.cgi?id=310525>
> > > 
> > )
> > > 
> > > otherwise how could be done quickly to have the metadata of a
> > file given we
> > > have the file, and the other way around?
> > 
> > 
> > It would be slightly more expensive, but not too hard. One would have to
> > retrieve the resource for each file resource till the root element. So
> > if you give me something like this
> > Resource("/home/vishesh/kde/**src/file.cpp")
> > 
> > I'll have to do either multiple queries -
> > 
> > select ?r where { ?r nfo:filename "home" ; nie:isPartOf <rootElement> .
> > } -> homeRes
> > select ?r where { ?r nfo:filename "vishesh" ; nie:isPartOf <homeRes> . }
> > -> visheshRes
> > ..
> > ..
> > or maybe it can be done in one query?
> > 
> 
> I think so:
> 
> select ?r where { ?r nfo:filename "file.cpp" ; nie:isPartOf [ nfo:filename
> "src" ; nie:isPartOf [ nfo:filename "kde" ... ] ] }
> 
> I am, however, not sure which is faster.
> 
> In general I like the idea to get rid of file URL, a lot actually. This
> could even mean that you get rid of nie:url alltogether. In the end there
> is really no need to use nie:url for http or any other remote resource...
> 
> As for your (3): that should actually be fairly simple. I wrote the code,
> which feels very hacky (not the code itself, but the need for its
> existance) and it could easily be adapted to only update nfo:filename and
> nie:isPartOf. Much simpler in the end.
> 
> All in all: +10 from me if you can get the direct file resource access
> fast.
> 
> Cheers,
> Sebastian
> 
> 
> > You get the gist. These all could be cached in memory so it shouldn't be
> > a big problem. This is actually quite analogous to what the kernel does
> > in the file system later, except that it matches inodes to their
> > filename. We will be matching resource uris.
> > 
> > I'd say retrieving metadata from a file is a "one-time" job of the
> > file-indexer.
> > Afterwards, we should rely on the data inside Nepomuk and only get
> > more once this fails.
> > 
> > In addition, the nepomuk-core part could offer a convenient method to
> > create the file url for the end-user and also cache this information
> > for a while to speed up the query. I assume its faster to check
> > QFile::exists() than creating the url with every query again.
> > 
> > 
> > Of course. This all should be transparently handled in the resource class.
> > 
> > Other than that, I like the idea. It seems there are several problems
> > with remove able media, which doesn't seem to get solved with the
> > current way.
> > 
> > 
> > Yeah. I think so as well.
> > 
> > But it's a BIG change. All the previous data will first need to be ported.
> > 
> > ______________________________**_________________
> > Nepomuk mailing list
> > Nepomuk@kde.org <mailto:Nepomuk@kde.org>
> > 
> > https://mail.kde.org/mailman/**listinfo/nepomuk<https://mail.kde.org/mailman/listinfo/nepomuk>
> >  
> > 
> > 
> > 
> > --
> > Vishesh Handa
> > 
> > 
> > 
> > ______________________________**_________________
> > Nepomuk mailing list
> > Nepomuk@kde.org
> > https://mail.kde.org/mailman/**listinfo/nepomuk<https://mail.kde.org/mailman/listinfo/nepomuk>
> >  
> > ______________________________**_________________
> Nepomuk mailing list
> Nepomuk@kde.org
> https://mail.kde.org/mailman/**listinfo/nepomuk<https://mail.kde.org/mailman/listinfo/nepomuk>
>  

-- 
Vishesh Handa

[Attachment #5 (text/html)]

Quick update -<br><br>Right now the plan is to implement this for 4.11.<br><div \
class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Nov 27, 2012 at 1:53 AM, \
Sebastian Trüg <span dir="ltr">&lt;<a href="mailto:trueg@kde.org" \
target="_blank">trueg@kde.org</a>&gt;</span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div class="im">On 11/23/2012 11:17 AM, Vishesh Handa \
wrote:<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im"> <br>
<br>
<br>
On Fri, Nov 23, 2012 at 3:30 PM, Jörg Ehrichs &lt;<a \
href="mailto:Joerg.Ehrichs@gmx.de" target="_blank">Joerg.Ehrichs@gmx.de</a><br></div> \
&lt;mailto:<a href="mailto:Joerg.Ehrichs@gmx.de" \
target="_blank">Joerg.Ehrichs@gmx.de</a>&gt;&gt; wrote:<br> <br>
    2012/11/23 Marco Martin &lt;<a href="mailto:notmart@gmail.com" \
target="_blank">notmart@gmail.com</a> &lt;mailto:<a href="mailto:notmart@gmail.com" \
target="_blank">notmart@gmail.com</a>&gt;&gt;:<div><div class="h5"><br>

     &gt; On Friday 23 November 2012, Vishesh Handa wrote:<br>
     &gt;<br>
     &gt;&gt; &lt;nepomuk:/res/23161f9c-8839-<u></u>4de3-bba0-affdd6d654ef&gt;<br>
     &gt;&gt;         rdf:type<br>
     &gt;&gt; nmm:MusicPiece<br>
     &gt;&gt;         rdf:type<br>
     &gt;&gt; nfo:FileDataObject<br>
     &gt;&gt;         rdf:type<br>
     &gt;&gt; nfo:Audio<br>
     &gt;&gt;         rdf:type<br>
     &gt;&gt; nie:InformationElement<br>
     &gt;&gt;         nie:url<br>
     &gt;&gt; file:///home/vishesh/Music/<u></u>where_does_the_good_go.mp3<br>
     &gt;&gt;<br>
     &gt;&gt; Storing this URL makes accessing file resources quite<br>
    convenient. But I<br>
     &gt;&gt; fear it has been a terrible design decision. By storing the url<br>
    we face the<br>
     &gt;&gt; following problems -<br>
     &gt;<br>
     &gt; uhm, probably is right, keeping the full file url consistent is a<br>
    mess,<br>
     &gt; however...<br>
     &gt;<br>
     &gt; a very common use case is in the c++ code, doing<br>
    Nepomuk2::Resource(file path)<br>
     &gt;<br>
     &gt; needing a fast way to obtain the resource associated to a<br>
    particular file<br>
     &gt; (like in <a href="https://bugs.kde.org/show_bug.cgi?id=310525" \
target="_blank">https://bugs.kde.org/show_bug.<u></u>cgi?id=310525</a>)<br>  &gt;<br>
     &gt; otherwise how could be done quickly to have the metadata of a<br>
    file given we<br>
     &gt; have the file, and the other way around?<br>
<br>
<br>
It would be slightly more expensive, but not too hard. One would have to<br>
retrieve the resource for each file resource till the root element. So<br>
if you give me something like this<br>
Resource(&quot;/home/vishesh/kde/<u></u>src/file.cpp&quot;)<br>
<br>
I&#39;ll have to do either multiple queries -<br>
<br>
select ?r where { ?r nfo:filename &quot;home&quot; ; nie:isPartOf &lt;rootElement&gt; \
.<br> } -&gt; homeRes<br>
select ?r where { ?r nfo:filename &quot;vishesh&quot; ; nie:isPartOf &lt;homeRes&gt; \
                . }<br>
-&gt; visheshRes<br>
..<br>
..<br>
or maybe it can be done in one query?<br>
</div></div></blockquote>
<br>
I think so:<br>
<br>
select ?r where { ?r nfo:filename &quot;file.cpp&quot; ; nie:isPartOf [ nfo:filename \
&quot;src&quot; ; nie:isPartOf [ nfo:filename &quot;kde&quot; ... ] ] }<br> <br>
I am, however, not sure which is faster.<br>
<br>
In general I like the idea to get rid of file URL, a lot actually. This could even \
mean that you get rid of nie:url alltogether. In the end there is really no need to \
use nie:url for http or any other remote resource...<br>

<br>
As for your (3): that should actually be fairly simple. I wrote the code, which feels \
very hacky (not the code itself, but the need for its existance) and it could easily \
be adapted to only update nfo:filename and nie:isPartOf. Much simpler in the end.<br>

<br>
All in all: +10 from me if you can get the direct file resource access fast.<br>
<br>
Cheers,<br>
Sebastian<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div class="im"> <br>
You get the gist. These all could be cached in memory so it shouldn&#39;t be<br>
a big problem. This is actually quite analogous to what the kernel does<br>
in the file system later, except that it matches inodes to their<br>
filename. We will be matching resource uris.<br>
<br>
    I&#39;d say retrieving metadata from a file is a &quot;one-time&quot; job of \
the<br>  file-indexer.<br>
    Afterwards, we should rely on the data inside Nepomuk and only get<br>
    more once this fails.<br>
<br>
    In addition, the nepomuk-core part could offer a convenient method to<br>
    create the file url for the end-user and also cache this information<br>
    for a while to speed up the query. I assume its faster to check<br>
    QFile::exists() than creating the url with every query again.<br>
<br>
<br>
Of course. This all should be transparently handled in the resource class.<br>
<br>
    Other than that, I like the idea. It seems there are several problems<br>
    with remove able media, which doesn&#39;t seem to get solved with the<br>
    current way.<br>
<br>
<br>
Yeah. I think so as well.<br>
<br>
But it&#39;s a BIG change. All the previous data will first need to be ported.<br>
<br>
    ______________________________<u></u>_________________<br>
    Nepomuk mailing list<br></div>
    <a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a> \
&lt;mailto:<a href="mailto:Nepomuk@kde.org" \
target="_blank">Nepomuk@kde.org</a>&gt;<div class="im"><br>  <a \
href="https://mail.kde.org/mailman/listinfo/nepomuk" \
target="_blank">https://mail.kde.org/mailman/<u></u>listinfo/nepomuk</a><br> <br>
<br>
<br>
<br>
--<br>
Vishesh Handa<br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
Nepomuk mailing list<br>
<a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a><br>
<a href="https://mail.kde.org/mailman/listinfo/nepomuk" \
target="_blank">https://mail.kde.org/mailman/<u></u>listinfo/nepomuk</a><br> <br>
</div></blockquote><div class="HOEnZb"><div class="h5">
______________________________<u></u>_________________<br>
Nepomuk mailing list<br>
<a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a><br>
<a href="https://mail.kde.org/mailman/listinfo/nepomuk" \
target="_blank">https://mail.kde.org/mailman/<u></u>listinfo/nepomuk</a><br> \
</div></div></blockquote></div><br><br clear="all"><br>-- <br><span \
style="color:rgb(192,192,192)">Vishesh Handa</span><br><br> </div>

_______________________________________________
Nepomuk mailing list
Nepomuk@kde.org
https://mail.kde.org/mailman/listinfo/nepomuk

[prev in list] [next in list] [prev in thread] [next in thread]