[prev in list] [next in list] [prev in thread] [next in thread]
List: aspell-user
Subject: [Aspell-user] digit behavior
From: Michael Howard <michael () uforlife ! com>
Date: 2010-05-01 17:57:29
Message-ID: q2o9d22313c1005011057q90e07baau2465ef7187386466 () mail ! gmail ! com
[Download RAW message or body]
I am investigating aspell for use on a large set of scanned pages with
text that was generated through OCR.
I searched through the mailing list achiive and found
http://lists.gnu.org/archive/html/aspell-user/2002-07/msg00003.html
wherein Kevin Atkinson explains that aspell was not designed for
OCR-type errors.
Nevertheless, I chose to proceed a bit ... primarly because I was
unable to find anything open source that was better. Unfortunately I
did not get very far.
aspell seems to ignore any words with digits in them, and my OCR text
has plenty of digit/character confusion. I was unable to find any
options to control behavior with digits.
Searching the mailing list again I found
http://lists.gnu.org/archive/html/aspell-user/2006-08/msg00013.html
wherein Thomas G=FCttler suggested modifying the cset table so that
additional characters could be treated as word characters. I tried
copying the .cset file, modifying it to turn the Digits into Letters,
specifyiing my cset using --encoding on the command line. However but
the behavior did not change ... words with digits in them were still
ignored and did not show up with --list.
Any comments/suggestions/advice appreciated.
Michael
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic