--===============3711524620416097849== Content-Type: multipart/signed; boundary="nextPart1355736.PCmC9b17fe"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit --nextPart1355736.PCmC9b17fe Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi, I'm considering to apply to GSoC this year, and if I do, I would like to=20 improve the status of scanning and optical character recognition in KDE; be= ing=20 more specific: What I want to achieve =2D------------------------------- A few years ago I had to study electronics stuff at my university following= =20 class notes only available in paper. I was annoyed because I couldn't use=20 ctrl+F with a paper, so I investigated a bit about OCR stuff and I found th= e=20 open djvu file format[1]. So I tried to produce a djvu document with KDE and my free operating system= ;=20 it was (very difficult|impossible); if recall correctly I did something lik= e=20 this: I tried to figure out a workflow scanning a couple of pages as 2 jpeg= =20 files, then I tried to join them in a djvu multipage document using shell=20 commands, and suceeded. However I couldn't find out how to do the OCR part,= =20 iirc I tried a couple of free ocr programs (I didn't tried ocropus; I don't= =20 remember if either that program didn't exist at that moment or I just didn'= t=20 know about it) but their output was just the text without the coordinates=20 where the texts are located, which would be needed to produce a proper text= =20 layer in the djvu document. So I gave up and rebooted on Windows and I used a propietary software to=20 produce the document; it worked quite well, I just fed the papers in my=20 scanning device and produced a multipage document; when done I just clicked= a=20 menu item labelled as something "process the document using OCR" and that's= =20 it. I don't remember very well the name of the software I used, but I'd swe= ar=20 it was "Document Express"[2]. The result was excellent, and you can download the produced document here: http://alioth.debian.org/~santa-guest/gsoc2012/apuntes_te.djvu As you can see, the size of the document is reasonable (only 2.4M) and you = can=20 do ctrlf+F "zener" and read stuff about zener diodes. So... to sum up: it was/is easier to produce good djvu documents with=20 propietary software. I want a KDE'ish program to replace the expensive=20 "Document Express". Some technical details =2D------------------------------- Currently we have a couple of KDE programs to scan documents: skanlite and= =20 kooka. skanlite is quite simple (doesn't do OCR stuff), uses the modern lik= sane=20 library, it's in extragear and works fine. kooka provided more functionalit= y in=20 the KDE 3 old days than skanlite today (seems it was able to do some basic = OCR=20 stuff), uses its obsolete libkscan library, it's in playground and I don't = know=20 if it works or not because I don't have an scanning device right now, but a= t=20 least it builds properly. So... looks like the tasks to do to achive my goal would be: 1. If needed, extend libksane functionality in order to make it a good=20 replacement for the old libkscan. 2. Port kooka to the modern libksane. 3. Add ocropus support to kooka (I heard with ocropus you can get the=20 coordinates of the texts, but I don't know for sure yet) 4. Code something in kooka to produce djvu documents. [1]http://en.wikipedia.org/wiki/DjVu [2]https://www.caminova.net/en/shop/item.aspx?itemid=3D3 --nextPart1355736.PCmC9b17fe Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJPVgqNAAoJEMhgx1pzqF8xF8MP/j+nR37EJpBlVZ7K68priKb2 jElhHFwVjLG7NlhNcWX46mLCV8lGn8s96eVdBEiZ+Nof78oZfnGkbPSRJ3HGCA0w iJSi9hra5sANgkhEXrMtzejC5vnVKKJjLllo/xMuWG5OxC0zv4uaRi3WPoYLG24j 9jn+PaQKOjPyHWWxHwiIBu+qW+v9whO5SDjzEhuA1NtxlVzn20NlWSNVQLnCrw5s uu4S27bRskgmi4ZKdYJT6LuTKrMkHfsziYEAvCXWMvQy7DB+KnOANKl4/r0c2GIu mxuVl09xflItopBPdROzYoHWl6Ut+Vhv5qTW3o2xulbUForWeYFjiuTufrmNZHSH qRGOhYxxju6jxB58Pr9dZJru0y29DY5eYwyTcETunEUEdOXoHCqnw1eRdeRnoFSA R2f00mrJjJ/bDwNsNO5/4Z0UR3G6EZntSxICi1j4ZHh7HjF8cpCDrXqLRnuxQJ3X VtnSGCs5Q/LcovoHi9744ZywBNwJc7SkefT2AOWzHIXScFhIP6NW3bMm0WHhlgtX VkPkLDsHNQ9Dt80fLTyUYevVC2RFj1lpbzWesmYunEpIA1lkL+2vA1APqrZ8Z3Y0 o6PyHzp4nCyjQeAQhXrDg8hyRWfk944JqQCAfT2ufh+ZKuILUFDqV3QQlL+KkPAl TsVPEVeUT1BLSY1Zgjcb =qqfe -----END PGP SIGNATURE----- --nextPart1355736.PCmC9b17fe-- --===============3711524620416097849== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe << --===============3711524620416097849==--