[prev in list] [next in list] [prev in thread] [next in thread] 

List:       aspell-user
Subject:    Re: [aspell-user] Aspell and OCR-generated text
From:       Kevin Atkinson <kevin () atkinson ! dhs ! org>
Date:       2002-07-12 16:10:23
Message-ID: Pine.LNX.4.44.0207121852000.2202-100000 () kevin-pc ! atkinson ! dhs ! org
[Download RAW message or body]

On Thu, 11 Jul 2002 Peter.Binkley@ualberta.ca wrote:

> ... It would be much better, though, if aspell's
> algorithms were oriented toward the kinds of mistakes OCR engines make
> rather than the kinds made by human typists. 

Aspell algorithms are not really tuned for the type of mistakes made by
typists.  Rather they are tuned for the type of mistakes humans
(especially me) tend to make when trying to spell a word.  The typo
analysis in Aspell biases the result slightly, but it generally doesn't
make a huge difference.

> I can see how you might do this
> by working with the translation tables for the phonetic code, the keyboard
> files, etc. 

You will probably get the best results by turnings the soundslike analysis 
off all together.  Modifying the keyboard file will also help.   However, 
the best results will probably be from modifying the weights in 
TypoEditDistanceWeights found in util/typo_editdist.hh.  To do so will 
requiring modifying the code a bit.  The code that fills in the weight can 
be found in SuggestParms::fill_distance_lookup in lib/suggest.cc.  Most of 
the code should be self explanatory.  

You really need to understand how edit distance works in order to know 
what to modify.  The comments in the util/*editdist* files should give you 
enough information for this understanding.

--- 
http://kevin.atkinson.dhs.org



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic