Hello Mr Schabell, Thank you for your e-mail and your interest in document conversion. For information, KOffice plans to switch to the OpenOffice file format once KOffice-1.3 is out (i.e. NOT for 1.3, but for the next release after that). This should reduce the amount of necessary file formats to support a little bit :) On the other hand, several of the existing KOffice filters could probably be very useful to your project. David. ---------------------------------------------------------------------------------------- Forwarded message from: "drs. Eric D. Schabell" (PRONIR) To: Abiword project , Open Office project , KOffice project , KOffice devel , JXTA discussion mailinglists , Google Api project , hans.bossenbroek@luminis.nl, p.jones@edmond.nl, erica@wizwise.nl, woody@dstc.edu.au, andrewg@dstc.edu.au, sparky@cs.kun.nl, PRONIR@NIC.SURFNET.NL, Mark@Overmeer.net, hoppie@uvt.nl CC: pronir-conversion@cs.kun.nl Date: Today 15:54:23 Hello everyone, As lead developer on the PRONIR Document Conversion project I wanted to get back to you on our progress with regards to our research in the area of document conversion. This is practical research, with regards to producing a usable tool and easy to use API interface for external access from existing projects. At the bottom of this e-mail you will find our original 'call for interest', submitted to you for review over a year ago. In the time since we have been busy researching some of the possibilities with regards to document conversion and creating the tools/projects listed below (all relevant phases of the global PRONIR project can be followed in the Scientific Programmers Workshop at http://www.pronir.nl/pub/spws. First off, the PRONIR Conversion Clearinghouse has been setup online at: http://dubyas.sci.kun.nl/cgi-bin/clearinghouse The goal is to have a central place of storage for the various existing conversion tools, to be made available for the (currently under development) DocConversion tool. We have set this up so that it is easy for users to submit conversion tools and routines that we might not yet have in our database. Feel free to browse and use the tools we have listed there. Also see the technical report published about the clearinghouse at: http://infolab.uvt.nl/people/erics/docs/tech_clearinghouse.pdf Secondly, we are currently in the beginning stages of our DocConversion project. We have setup a project site and uploaded the initial framework for our DocConversion tool called 'docconverison' on Sourceforge: http://docconversion.sourceforge.net Here you can take a look at it, participate and suggest improvements as we go along. Currently at 0.2 version, it is only working on the localhost where installed. The goals are to make use of the Conversion Clearinghouses database of conversions, create a smart Broker that can broker document conversions for clients and create distributed servers for managing broker requests for document conversions. Feel free to submit comments and feature request to the project site. We hope you enjoy the results up to now and will try to keep everyone informed as we believe that our DocConversion tool will be easy to insert in many projects that currently only deal with document conversions in passing. Interested Parties Please feel free to contact the authors via the DocConversion project site or at: pronir-conversion@cs.kun.nl for further information and collaboration possibilities. -- Mvg/Regards, /** * drs. Eric D. Schabell * Scientific Programmer - (PRONIR) * CentER Applied Research - Tilburg, The Netherlands * * e-mail : erics@cs.kun.nl * Mobile : +31 (0)6 543 613 15 * PRONIR : http://www.pronir.nl * DocConversion : http://docconversion.sf.net **/ ################################## Document Conversion Systems --------------------------- A call for interest to the Open Source Software community H.A. Proper and E.D. Schabell May 17, 2002 Introduction ========= Imagine yourself sitting at your computer in the near future, working on your latest document in your favorite editor. You decide to save the document to a different format than the standard format used by the editor. You try to 'save as...' another format, but this new format does not exist in your current editors conversion list. You try 'export' from the options menu and enter the form you wish to convert to. You receive a message from your editor that this new format is not available locally but it might be able to search the Internet for an algorithm that could make the conversion for you. Since you have a connection to the Internet, feel that it might be time for a coffee, you answer affirmative and the search begins. By the time you return from getting that cup of coffee the editor has popped up a message that the conversion algorithm has been found and applied. It also ask you if you would like to have this new algorithm added to your local library of conversion tools? Of course you think, and after giving the go-ahead your editor reports that the document you were working on has been converted and the new algorithm has been added to your local library of conversion tools. That coffee is tasting even better now that you can further expand on your document without conversion troubles! Call for Interest ============ The above scenario might seem a bit far fetched, however, at the moment we are starting up a research project and associated prototype, in which a core part will provide the above sketched functionality. In the research project, the seemless conversion functionality will be used to research & develop information retrieval systems for heterogeneous data sources. However, as sketched in the above scenario, such functionality can be used for many, many, other purposes. As a first step we are therefore interested in implementing an open & distributed system for conversions between data object. We aim to set this up as an Open Source Software project environment, since we feel that 1: this kind of functionality will be usefull to other applications than only information retrieval, 2: other research groups, in information retrieval, are likely to be faced with similar challenges, We are, therefore, looking for interested parties that would like to participate in developing, using and/or testing such a system as described above. Vision ===== At the moment we envision an open distributed system that uses a peer to peer (p2p) communication strategy as, for example, is used in gnutella. The system should distinguish between: - the definition of conversions from one data type to another data type - the actual implementations of these conversions - a suitable execution environment for these conversions This would make the conversion system fairly platform (OS, CPU, Memory) independent. Searches for appropriate services can be conducted using a p2p approach. The conversions we aim to include in the system are sheer endless. They may include: - `simple' conversions such as: text to postscript, postscript to text, word to text, XML to word, latex to postscript, GIF to tiff, bmp to GIF, wav to mp3, etc. - whole-part selection, for instance, splitting a mailbox into its constituent mails, or splitting a mail into subject, header, body or attachment-set, etc. - aspect conversions, such as: a document's full-content to an abstract, a document's full-content to a set of keywords, etc. Conversions may be composed as well. For example, a Word->Text conversion may be combined with a Full-Text->Abstract-Text conversion to derive an abstract from a word document. The system should be able to figure out such combinations automatically. As you may expect, a powerfull typing mechanism is needed. We are considering using the Typed Object Model (TOM) from http://tom.library.upenn.edu/sw/index.html as a starting point. On top of the conversion infrastructure, a host of plug-ins for editors may be developed that would allow for seemless import/export in different formats. Possible Components ================= Some existing Open Source Software projects may be integrated into the planned system. An infrastructure for the p2p infrastructure may be provided by: * JXTA project, which allows for the concept of providing "services". Pre-existing conversion routines which may be entered into the system as conversions: * a2ps, ghostscript, wv, xpdf, psutils, etc... * openjade, kea, etc... Interested Parties Please feel free to contact the authors at: pronir-conversion@cs.kun.nl for further information and collaboration possibilities. _______________________________________________ koffice-devel mailing list koffice-devel@mail.kde.org http://mail.kde.org/mailman/listinfo/koffice-devel