[prev in list] [next in list] [prev in thread] [next in thread]
List: aspell-user
Subject: Re: International support
From: Samphan Raruenrom <samphan () thai ! com>
Date: 1999-01-11 14:46:29
Message-ID: LYR14511-8633-1999.01.11-09.45.14--kevinatk#home.com () franklin ! oit ! unc ! edu
[Download RAW message or body]
Kevin Atkinson wrote:
> Samphan Raruenrom wrote:
> > In Thai, we don't put spaces between words at all so
> > the same situation happends naturally.
> > Typical Thai word-segmentation algorithm (which usually
> > do spelling check also) use maximal-match backtracking
> > algorithm with trie word list(s).
> > My implementation is at http://www.thai.net/libinthai/
> > IBM Classes for Unicode implementation is at
> > http://www.ibm.com/java/education/boundaries/boundaries.html
> Ok so how do you detect bonduries of unknown or misspelled words.
IBM ICU's algorithm describe in the above URL is :-
: If we exhausted our possibilities without finding
: a valid sequence of words, it either means there's
: an error in the text, or the text includes a word
: that isn't in the dictionary. In either case, we restore
: the set of break positions that matched the most
: characters, advance one character past where the
: mismatch occurred in that sequence, and start over
: from there. This works pretty well: usually only
: one or two boundary positions around the error
: are in the wrong place.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic