[prev in list] [next in list] [prev in thread] [next in thread] 

List:       aspell-user
Subject:    Re: [Aspell-user] =?utf-8?q?What_letters_belongs_to_a_word=3F_=28issu?=
From:       Kevin Atkinson <kevina () gnu ! org>
Date:       2011-09-10 23:14:26
Message-ID: alpine.BSF.2.00.1109101659090.13702 () bas ! flux ! utah ! edu
[Download RAW message or body]


On Sat, 10 Sep 2011, Daniel wrote:

> Kevin Atkinson <kevina <at> gnu.org> writes:
>> More or less.  Please see
>> http://aspell.net/man-html/Notes-on-8_002dbit-Characters.html.  That being
>> said the problem you are facing is not just because the dictionary is
>> 8-bit, but also because I convert the document to the 8-bit encoding
>> before I tokenize it.  The latter is something I plan to eventually fix.
>> If you really want to be able to recognize Turkish words when using the
>> English dictionary than you can try the attached special character set.
>> Unzip the contents in `aspell config data-dir` then change  "charset
>> iso8859-1" to "charset iso8859-1-u" in en.dat.
>
> Yes, since I have the knowledge to correctly spell non-English place names
> etc, then I really want to do that. Thanks Kevin, that text explained quite
> well why Aspell functions the way it does, and it now comes through as less
> silly... The alternative charset worked fine!
>
>> However, even if Aspell did recognize the word correctly it would be
>> unlikely to do what you want when using the English dictionary because
>> special rules are needed to handle the Turkish ı when changing case.
>
> Certainly. But now I can at least easily add whole words to the private
> dictionary, even if I have to add them once per case.

Note that the personal dictionary is likely to be saved in this internal 
encoding.  This is unlikely what you want since only Aspell can read the 
encoding.  Once the personal dictionary is created you can convert it to 
UTF-8 using the following command:

   aspell conv iso8859-1-u utf-8 < OLD > NEW

then edit NEW and add the string " utf-8" to the first line.  See 
(http://aspell.net/man-html/Format-of-the-Personal-and-Replacement-Dictionaries.html) 
You have Aspell use utf-8 by default by adding the line:

   data-encoding utf-8

to "en.dat", see (http://aspell.net/man-html/The-Language-Data-File.html).

Having the dictionary save in the same encoding that the language uses is 
due to historical reasons.  I hope to eventually have the encoding Aspell 
used for everything but its internal use default to utf-8.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic