[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    KFile plugins
From:       Thomas Kadauke <tkadauke () gmx ! de>
Date:       2005-10-13 15:59:06
Message-ID: 200510131759.06106.tkadauke () gmx ! de
[Download RAW message or body]

Hello list,

This is my first post to this list, so I want to introduce myself: My name is 
Thomas Kadauke, I'm a compsci student in Tübingen, Germany (yeah, where 
Matthias Ettrich studied :)

Recently, I uploaded several KFile plugins to kde-apps.org:

- http://www.kde-apps.org/content/show.php?content=30112
  BibTeX-plugin: calculates total number of references, number of 
book/article/other references

- http://www.kde-apps.org/content/show.php?content=30113
  .kdevelop project file plugin: extracts author, e-mail, version, language 
and some keywords

- http://www.kde-apps.org/content/show.php?content=30114
  LaTeX-plugin: extracts title, author, date and claculates number of 
chapters, sections, paragraphs, words, commands, footnotes and comments

- http://www.kde-apps.org/content/show.php?content=30115
  M3U playlist plugin: calculates number of tracks, total length, number of 
local files/streams

- http://www.kde-apps.org/content/show.php?content=30116
  MIDI-plugin: extracts number of tracks, instruments and length

I got several requests to include these plugins into the main KDE 
distribution. However, I'm new to KDE development and don't even have an SVN 
account for that. So if you think these plugins are useful, feel free to 
include them into KDE SVN.

While implementing these plugins, I encountered several problems/bugs:

- Documentation. To be honest, the documentation for KFilePlugin et.al. sucks. 
Most of the interesting methods are not documented at all. This wouldn't 
really be a problem, were there a descent tutorial about KFile plugins. This 
is *really* needed, because a KFile-plugin is *the* opportunity for 
KDE/QT-newbies to produce something useful quickly without digging too deep 
into kdelibs. But since complaining isn't helping anyone, I volunteer on 
updating/completing the relevant documentation.

- Four of the five plugins deal with text files. I'm planning on writing even 
more kfile-plugins, among them: kate-project, python source, quanta webprj, 
icalendar, docbook, rtf, java, vcalendar (if not already there) and vcard (if 
not already there). All these are actually (more or less) human-readable text 
files. So these could benefit from the meta information of the generic 
kfile_txt plugin that extracts line count, word count, etc. However, the 
current KFile/KDE API does not permit "mimetype-specialization", which would 
be needed to e.g. declare a text/x-latex file to be a specialization of 
text/plain. This would also solve the file association problem when e.g. a 
new text editor is installed and you want to update all text-based formats to 
use this new editor.

- What happens if two mimetypes contain the same filename pattern? AFAICS, 
this is handled on a first-come-first-serve basis. This, however, is not 
satisfactory, as e.g. the types text/x-tex and text/x-latex contain the same 
pattern (*.tex), but are completely different in nature. I'm proposing to use 
the filename pattern only as a hint, and determine the actual filetype based 
on the content.

- Konqueror (in KDE 3.4.1) does not show any meta-information of a text-based 
mimetype, if there is no KFile plugin for that mimetype. Specialization would 
help here.

- Some of the mimetypes have to be updated (if possible):
  - text/x-tex includes the patterns *.tex (good) and *.ltx (bad). *.ltx 
stands for LaTeX files, which generally have only little in common with plain 
TeX files. Also, I think the patterns *.sty and *.cls should get their own 
mimetypes.
  - text/x-c++hdr does not include the pattern "*.h", obviously because this 
is reserved by text/x-chdr. This, too, would be solved by determining the 
mimetype based on content.
  - text/x-objchdr does not have any filename pattern.

- I guess Tenor in KDE4 will use KFile-plugins (or whatever there will be for 
KDE4) to extract meta information from files. However, the current API is not 
sufficient for that. Say that I'm a java-programmer who uses Javadoc and want 
to use the KDE search-tool to look for a certain Java-method. I know that I 
just could use the fulltext search from text-files but that will most 
probably return a lot of noise, when the generated documentation is searched. 
So here the kfile-plugins should be able to extract a list of Java-Methods 
from a Java-file (it would be cool to even extract the method signature :) 
and assign a high priority to that information, since it's more relevant than 
the same words in the documentation. Right now, they are restricted to 
extract only a summary of a file's content (such as the number of methods in 
a java file, which is rather uninteresting)

- I haven't yet started to write KFile-plugins for programming language source 
files such as java or python, because I think a full-blown parser (which is 
needed for that) is too much for just extracting the method count and such. 
It would be great to reuse existing parsers (maybe from kdevelop) for that 
task and to extract all useful information from the source file. Right now, 
at least the text-based kfile-plugins are QRegExp-based hacks. A parser which 
really understands what it's reading there would bring benefit to the 
accuracy of the extraction of meta-information.

- The MIDI-plugin links to the somewhat broken and long-time-not-updated 
libkmid in kdelibs. Is it true that this library gets thrown out for KDE4? If 
not, is anyone going to fix it? (see the source of kfile_midi for a short 
description of what is broken)

So I'm proposing the following (for the upcoming KDE4, obviously):

- use the filename pattern as a hint for the mimetype, especially when no 
context is given (e.g. in konqueror). When a context IS given (e.g. in krita, 
you're most probably dealing with image files), but the pattern is unknown, 
try a context-specific list of mimetypes based on the file contents.

- use a KFile-plugin to determine if the contents of a file match the 
mimetype, regardless of the filename pattern.

- allow mimetypes to be a specialization of another mimetype (I think there is 
no need to allow "multiple inheritance"). The benefit here will most 
obviously be in text files. But it would also help to extract meta 
informtaion from all these XML-based file types out there. The KFile-plugin 
for a more specialized mimetype must extract all information that the 
KFile-plugin for the less specialized mimetype claims to extract.

- allow kfile-plugins to extract lists of meta-information (see above). This 
could be generalized to the full text extraction: the kfile_txt plugin could 
extract the meta-field "words" which contains the list of word in the 
documents. Besides, you get the word count for free.

- allow to assign a priority to meta information. The more specialized a 
mimetype (and therefore a kfile-plugin) is, the more important tends the 
extracted information to be.

OK, i guess that's it for now. Please tell me what you think. And thanks for 
your patience :)

--Thomas
 
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic