[prev in list] [next in list] [prev in thread] [next in thread] 

List:       aspell-user
Subject:    Re: [Aspell-user] small bug: two following non alpha characters
From:       Kevin Atkinson <kevina () gnu ! org>
Date:       2005-11-02 2:03:38
Message-ID: 20051101185836.H52113 () bas ! flux ! utah ! edu
[Download RAW message or body]

On Wed, 26 Oct 2005, Gary Setter wrote:

> Back in August I was trying to make my program working with
> Unicode and the koi8-r character set. One of the problems was
> tokenizing the text into words. It seemed aspell was treating all
> character sets as ASCII.

Could you more specific.

> The speller object does have a language
> member and the language member does have a sense of the
> characteristics of each character in the characterset. What are
> the characteristics of the ampersand and dash in your
> characterset? Might aspell make use those characterset specific
> characteristics to tokenize "hait-l'-ovraedje" as one word?

Yes it might make sense but I do not have support for it.  The ' and - are 
treated as special characters.  They can only be part of a word if they 
have have normal letters on both sides otherwise thinks "like--this" will 
also be treated as a single word.  For this reason special care needs to 
be taken for treating these special characters.




[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic