From kde-devel Wed Mar 14 09:00:48 2012 From: =?utf-8?q?Jos=C3=A9_Manuel_Santamar=C3=ADa_Lema?= Date: Wed, 14 Mar 2012 09:00:48 +0000 To: kde-devel Subject: Re: GSoC idea: improving scanning and OCR in KDE (skanlite/kooka) Message-Id: <201203141006.59622.panfaust () gmail ! com> X-MARC-Message: https://marc.info/?l=kde-devel&m=133171619124848 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--===============2228573338548705043==" --===============2228573338548705043== Content-Type: multipart/signed; boundary="nextPart2648268.kGJlundLhE"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit --nextPart2648268.kGJlundLhE Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Thank you K=C3=A5re and Klaas for your replies, I had some time to dig a bi= t more=20 about this: K=C3=A5re S=C3=A4rs > [snip] >=20 > 1) Create a non-GUI Qt/KDE library that can take an (Q)image and generate > output suitable for djvu/PDF/ODF. Maybe even generate djvu/PDF/ODF files. >=20 > 2) Make a simple GUI around the library to test the functionality. >=20 > 3) Add the ORC part to the KScan plugin ksaneplugin. (kdegraphics) >=20 > 4) Create a Kipi-plugin for use in Gwenview,Digikam,.... >=20 > 5) Standalone document scanning application that is specialized for > multipage scanning to PDF/djvu/ODT. >=20 >=20 > I'm not familiar with the ocropus API, so I'm not sure how much work it > would be. I'm not sure one GSOC would be enough for all 5 points ;) >=20 > Regards, > K=C3=A5re In first place, I have just realized that gocr is able to provide an output= =20 saying where the characters/words are located (see the gocr man page, I=20 checked how "-f XML" works with a sample image, and looks like it's what I= =20 need); thus it wouldn't be mandatory to add ocropus support right now; it=20 would be fine, but optional. In second place, and just FYI, I've got a ~12 years old scanner, I've teste= d=20 both skanlite and kooka, skanlite worked fine, however kooka doesn't work _= for_=20 _me_. Fortunately I think I still can provide a djvu generator supporting O= CR=20 with kooka, even if I don't port it to libksane; see below. About K=C3=A5re's tasks set: I think I would split the first item thus: 1a) Create a non-GUI Qt/KDE library able to open and generate djvu document= s=20 without text layer. (libkdjvu) 1b) Create a non-GUI Qt/KDE library that can take an (Q)image and generate= =20 output suitable for djvu/PDF/ODF (libkocr) 1c) Add suport to the libkdjvu library to include the data retrieved with=20 libkocr as text layer. Note that a djvu file may or may not have a text layer. Also note that gett= ing=20 a text with OCR and creating djvu files joining various images/texts are ve= ry=20 different jobs. That are the reasons to split the first item like that. Tha= t=20 being said, let me do some other remarks and questions: About my 1a): Perhaps I could reuse some code from okular; I'd need to=20 investigate more about this. About my 1b): There is already some code in kooka to do something like that= ;=20 see these classes: OcrGocrEngine, OcrEngine and KookaImage. So, performing= =20 these task would be mainly: hacking on OcrGocrEngine in order to make it gi= ve=20 an output suitable for my new libkdjvu library (that would be done processi= ng=20 the output of "gocr -f XML") and taking all the kooka classes related to oc= r=20 and putting them together in a shared library (libkocr). Looks like most of kooka files are licensed with GPLv2 only with a couple o= f=20 special exceptions; Klaas, could we please change that license to GPLv2 or= =20 later with the same couple of special exceptions? See: http://techbase.kde.org/Projects/KDE_Relicensing About 2) and 5): I'm open to other ideas, but right now I tend to think tha= t=20 both the "simple GUI" mentioned in 2) and the "Standalone document scanning= =20 application" mentioned in 5) will be a new tab in kooka which would behave = as=20 a djvu editor. I did quick mockup, this GUI would be able to open=20 existing djvu documents as well as creating new ones: http://alioth.debian.org/~santa-guest/gsoc2012/mockup.png About 3) and 4): if I create that libkocr library this should be easy to do= ;=20 however, I want to understand better how these plugins would work from a us= er=20 point of view; for instance, let's say I open a png file in my gwenview, I = have=20 a menu item called "Process image with OCR" inside the "Plugins" menu. What= =20 would happen if I click that item? Would it open a text editor with the OCR= =20 result or what? --nextPart2648268.kGJlundLhE Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJPYF+dAAoJEMhgx1pzqF8xx/YQAI2oEvX/bypfmFY7ecFgUVY4 tBh17nri02dQrLuvhxOq26QGRa6GjFeR/LHa5EF1pcYhyHL+Qvy1NAbspgme2/Fr buPiB12AypAtFIm/oS8YK3RGwP78QQElqA55GRYWWHmmAWGu1ksilzbwbIQybqII WPjDXuveeoJMyzczE6OJUi0kaimHnbofMdftVWaESjQlaSR5h2M/Bl8qZTfOJgPL UQKMaJQFw1tmEY5ldXHVLK885xVKAn4BrQ0VZqTHWmWJ4vFk9UKB15XYxufnq8jx YtRJR4q40Te/XCQY8thWvuNzj+S9O5tYY5lIoMl2SpzOmcU/FOv2wRomsAA5Owjp C3VKo0VEgi5JyYveqlh5G1CT4yUer5qL7XUawXSL+6YN4HtiUCMM0MfbVoxEFZiq 5nZ1gWuh7e24YphhrJo8wD2ejhIqTaF7M3CijSF+sZK2iVJg+O7uRpRFvic4w1Bn W0HMiRfih8JEEX07yODW38jPd6+TFbCPWIxy0F/EYn52QXyfupvBxKBOsA5cpWGl AJCpCvyuquMRJkI4IHl5ThMpKyBxSSXVJa7p1Iy9Za1ulm7FP1v4Z8q2Ed75noN8 4S0FrdwbKeSgo428OxeSlkAmdkF2S7UaLKIzprIaYc68uQV/JIRmHquQNiEregYr AMAwUBIZaBIS2IoV/hp+ =buj7 -----END PGP SIGNATURE----- --nextPart2648268.kGJlundLhE-- --===============2228573338548705043== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe << --===============2228573338548705043==--