[prev in list] [next in list] [prev in thread] [next in thread] 

List:       aspell-user
Subject:    [Aspell-user] digit behavior
From:       Michael Howard <michael () uforlife ! com>
Date:       2010-05-01 17:57:29
Message-ID: q2o9d22313c1005011057q90e07baau2465ef7187386466 () mail ! gmail ! com
[Download RAW message or body]

I am investigating aspell for use on a large set of scanned pages with
text that was generated through OCR.

I searched through the mailing list achiive and found
  http://lists.gnu.org/archive/html/aspell-user/2002-07/msg00003.html
wherein Kevin Atkinson explains that aspell was not designed for
OCR-type errors.

Nevertheless, I chose to proceed a bit ... primarly because I was
unable to find anything open source that was better. Unfortunately I
did not get very far.

aspell seems to ignore any words with digits in them, and my OCR text
has plenty of digit/character confusion. I was unable to find any
options to control behavior with digits.

Searching the mailing list again I found
  http://lists.gnu.org/archive/html/aspell-user/2006-08/msg00013.html
wherein Thomas G=FCttler suggested modifying the cset table so that
additional characters could be treated as word characters. I tried
copying the .cset file, modifying it to turn the Digits into Letters,
specifyiing my cset using --encoding on the command line. However but
the behavior did not change ... words with digits in them were still
ignored and did not show up with --list.

Any comments/suggestions/advice appreciated.


Michael



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic