[prev in list] [next in list] [prev in thread] [next in thread]
List: aspell-user
Subject: Re: [Aspell-user] small bug: two following non alpha characters
From: Kevin Atkinson <kevina () gnu ! org>
Date: 2005-11-02 2:03:38
Message-ID: 20051101185836.H52113 () bas ! flux ! utah ! edu
[Download RAW message or body]
On Wed, 26 Oct 2005, Gary Setter wrote:
> Back in August I was trying to make my program working with
> Unicode and the koi8-r character set. One of the problems was
> tokenizing the text into words. It seemed aspell was treating all
> character sets as ASCII.
Could you more specific.
> The speller object does have a language
> member and the language member does have a sense of the
> characteristics of each character in the characterset. What are
> the characteristics of the ampersand and dash in your
> characterset? Might aspell make use those characterset specific
> characteristics to tokenize "hait-l'-ovraedje" as one word?
Yes it might make sense but I do not have support for it. The ' and - are
treated as special characters. They can only be part of a word if they
have have normal letters on both sides otherwise thinks "like--this" will
also be treated as a single word. For this reason special care needs to
be taken for treating these special characters.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic