[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-frameworks-devel
Subject:    Re: Review Request 125762: External extractor plugin support for KFileMetaData
From:       "Vishesh Handa" <me () vhanda ! in>
Date:       2015-12-06 12:02:10
Message-ID: 20151206120210.29696.65003 () mimi ! kde ! org
[Download RAW message or body]

--===============0936766810465421497==
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/125762/#review89174
-----------------------------------------------------------


If you're planning on pushing this please push it to a testing branch. Untill we \
actually have some plugins using this we might not be sure of the API. My points \
below are quite negative and I was hoping on writing a more positive email about the \
different ways to go about this, but I seem to have been procrastinating and have not \
written it. Sorry.

My main objections with this -

1. One cannot choose between external plugins. Since all of them are in 1-plugin \
(called ExternalPlugin), and all the mimtypes are just combined. This matters less in \
the case of Baloo, but when someone else wants to choose specific extractors. It's \
impossible.

2. Extraction of plain text is simply not supported.

3. No way to differentiate in the implementation of a plugin. We may get plugins for \
the same mimetype using different technologies, perhaps we need a way to identify \
which one to choose. Maybe this will never be a problem, but I'm skeptical.

4. Contamination of the PATH, maybe we could put them in a different place which \
makes it obvious that users should never execute them.

5. Getting the list of extractors now means running a possibly large number of \
executables with possibly bad implementations. They could just get stuck, and all \
other plugins will suffer. Perhaps the list of mimetypes supported could be in a \
desktop file?

- Vishesh Handa


On Oct. 24, 2015, 12:19 p.m., Boudhayan Gupta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/125762/
> -----------------------------------------------------------
> 
> (Updated Oct. 24, 2015, 12:19 p.m.)
> 
> 
> Review request for Baloo, KDE Frameworks, Pinak Ahuja, and Vishesh Handa.
> 
> 
> Repository: kfilemetadata
> 
> 
> Description
> -------
> 
> This patch introduces support for external metadata extractors in KFileMetaData
> 
> The external extractors themselves can be written in any language, provided that it \
> can be executed as a standalone executable (compiled or script with a hashbang), \
> with command line arguments, and can output data to stdout. 
> The extractors are executed like so:
> 
> * `extractor --mimetypes` - outputs a list of mimetypes supported by the extractor, \
>                 one per line.
> * `extractor filename` - outputs a json document with the metadata. The keys are \
> such that they can be directly used with PropertyInfo::fromName(). 
> At the KFileMetaData end, an additional internal plugin (ExternalExtractor) is \
> provided that forms a conduit between external extractors and the internal API. \
> This plugin looks for executables called kfilemetadata_extractor_<something> in \
> /usr/bin to find external extractors, and executes them with the --mimetypes arg to \
> find the list of mimetypes each extractor supports. ExternalExtractor then claims \
> to support all of these mimetypes, and then delegates to the extractor executable \
> when doing the actual extraction. 
> 
> Diffs
> -----
> 
> README.md 19b1a26 
> src/extractors/CMakeLists.txt 5dd223e 
> src/extractors/externalextractor.h PRE-CREATION 
> src/extractors/externalextractor.cpp PRE-CREATION 
> 
> Diff: https://git.reviewboard.kde.org/r/125762/diff/
> 
> 
> Testing
> -------
> 
> Tested with the sample executable file extractor (as attched, written in python) \
> with the dump manual test in KFileMetaData. Works. 
> 
> File Attachments
> ----------------
> 
> kfilemetadata_extractor_executable
> https://git.reviewboard.kde.org/media/uploaded/files/2015/10/23/146b657f-31d9-4117-a82f-ef966a6339d4__kfilemetadata_extractor_executable
>  
> 
> Thanks,
> 
> Boudhayan Gupta
> 
> 


--===============0936766810465421497==
MIME-Version: 1.0
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: 7bit




<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="12" style="border: 1px #c9c399 \
solid; border-radius: 6px; -moz-border-radius: 6px; -webkit-border-radius: 6px;">  \
<tr>  <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="https://git.reviewboard.kde.org/r/125762/">https://git.reviewboard.kde.org/r/125762/</a>
  </td>
    </tr>
   </table>
   <br />





 <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">If \
you're planning on pushing this please push it to a testing branch. Untill we \
actually have some plugins using this we might not be sure of the API. My points \
below are quite negative and I was hoping on writing a more positive email about the \
different ways to go about this, but I seem to have been procrastinating and have not \
written it. Sorry.</p> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;">My main objections with this -</p> <ol \
style="padding: 0;text-rendering: inherit;margin: 0 0 0 2em;line-height: \
inherit;white-space: normal;"> <li style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: normal;"> <p style="padding: 0;text-rendering: \
inherit;margin: 0;line-height: inherit;white-space: inherit;">One cannot choose \
between external plugins. Since all of them are in 1-plugin (called ExternalPlugin), \
and all the mimtypes are just combined. This matters less in the case of Baloo, but \
when someone else wants to choose specific extractors. It's impossible.</p> </li>
<li style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: normal;"> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;">Extraction of plain text is simply not \
supported.</p> </li>
<li style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: normal;"> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;">No way to differentiate in the \
implementation of a plugin. We may get plugins for the same mimetype using different \
technologies, perhaps we need a way to identify which one to choose. Maybe this will \
never be a problem, but I'm skeptical.</p> </li>
<li style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: normal;"> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;">Contamination of the PATH, maybe we \
could put them in a different place which makes it obvious that users should never \
execute them.</p> </li>
<li style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: normal;"> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;">Getting the list of extractors now \
means running a possibly large number of executables with possibly bad \
implementations. They could just get stuck, and all other plugins will suffer. \
Perhaps the list of mimetypes supported could be in a desktop file?</p> </li>
</ol></pre>
 <br />









<p>- Vishesh Handa</p>


<br />
<p>On October 24th, 2015, 12:19 p.m. UTC, Boudhayan Gupta wrote:</p>








<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="12" style="border: \
1px #888a85 solid; border-radius: 6px; -moz-border-radius: 6px; \
-webkit-border-radius: 6px;">  <tr>
  <td>

<div>Review request for Baloo, KDE Frameworks, Pinak Ahuja, and Vishesh Handa.</div>
<div>By Boudhayan Gupta.</div>


<p style="color: grey;"><i>Updated Oct. 24, 2015, 12:19 p.m.</i></p>









<div style="margin-top: 1.5em;">
 <b style="color: #575012; font-size: 10pt;">Repository: </b>
kfilemetadata
</div>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;"><p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">This patch introduces support for external metadata \
extractors in KFileMetaData</p> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;">The external extractors themselves can \
be written in any language, provided that it can be executed as a standalone \
executable (compiled or script with a hashbang), with command line arguments, and can \
output data to stdout.</p> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;">The extractors are executed like \
so:</p> <ul style="padding: 0;text-rendering: inherit;margin: 0 0 0 1em;line-height: \
inherit;white-space: normal;"> <li style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: normal;"><code style="text-rendering: \
inherit;color: #4444cc;padding: 0;white-space: normal;margin: 0;line-height: \
inherit;">extractor --mimetypes</code> - outputs a list of mimetypes supported by the \
extractor, one per line.</li> <li style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: normal;"><code style="text-rendering: \
inherit;color: #4444cc;padding: 0;white-space: normal;margin: 0;line-height: \
inherit;">extractor filename</code> - outputs a json document with the metadata. The \
keys are such that they can be directly used with PropertyInfo::fromName().</li> \
</ul> <p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">At the KFileMetaData end, an additional internal \
plugin (ExternalExtractor) is provided that forms a conduit between external \
extractors and the internal API. This plugin looks for executables called \
kfilemetadata_extractor_&lt;something&gt; in /usr/bin to find external extractors, \
and executes them with the --mimetypes arg to find the list of mimetypes each \
extractor supports. ExternalExtractor then claims to support all of these mimetypes, \
and then delegates to the extractor executable when doing the actual \
extraction.</p></pre>  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;"><p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">Tested with the sample executable file extractor (as \
attched, written in python) with the dump manual test in KFileMetaData. \
Works.</p></pre>  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">

 <li>README.md <span style="color: grey">(19b1a26)</span></li>

 <li>src/extractors/CMakeLists.txt <span style="color: grey">(5dd223e)</span></li>

 <li>src/extractors/externalextractor.h <span style="color: \
grey">(PRE-CREATION)</span></li>

 <li>src/extractors/externalextractor.cpp <span style="color: \
grey">(PRE-CREATION)</span></li>

</ul>

<p><a href="https://git.reviewboard.kde.org/r/125762/diff/" style="margin-left: \
3em;">View Diff</a></p>



<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">File Attachments \
</h1>


 <li><a href="https://git.reviewboard.kde.org/media/uploaded/files/2015/10/23/146b657f \
-31d9-4117-a82f-ef966a6339d4__kfilemetadata_extractor_executable">kfilemetadata_extractor_executable</a></li>


</ul>




  </td>
 </tr>
</table>







  </div>
 </body>
</html>


--===============0936766810465421497==--


[Attachment #3 (text/plain)]

_______________________________________________
Kde-frameworks-devel mailing list
Kde-frameworks-devel@kde.org
https://mail.kde.org/mailman/listinfo/kde-frameworks-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic