[prev in list] [next in list] [prev in thread] [next in thread] 

List:       aspell-user
Subject:    [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position...
From:       "Martin Swift" <martin.swift () gmail ! com>
Date:       2007-02-21 15:40:43
Message-ID: a63aeed10702210740n5315ce97o3827ee80e10468fe () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (text/plain)]

Dear aspell community,

For a while, now, I've been unable to properly install the Icelandic and
German aspell libraries. I get a torrent of warnings similar to:
  Warning: The string "Grímseyjar" is invalid. Invalid UTF-8 sequence at
position 3. Skipping string.
  Warning: The string "Grímsson" is invalid. Invalid UTF-8 sequence at
position 3. Skipping string.
  Warning: The string "Grímssonar" is invalid. Invalid UTF-8 sequence at
position 3. Skipping string.
  Warning: The string "Grímssyni" is invalid. Invalid UTF-8 sequence at
position 3. Skipping string.

It seems that the problem arises in every string that contains a non-ascii
character. For languages such as those mentioned which contain a lot of such
characters, the resulting library is pretty much useless.

A while ago, I tried everything I could think of. Grabbed the source files
and converted some in different directions with iconv but without luck.
Eventually, I gave up figuring that my spelling wasn't *that* bad. Some time
later, I've found nothing on this and the Gentoo community doesn't seem to
recognize where the issue may lie.

The archives for this list also seem rather silent so I wanted to see if
anyone had any advice to offer as to what I could try to do to fix the
problem, or at least diagnose it. If not, well then at least it's here for
the next one looking...

Thanks,
Martin

$ aspell --version
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.4)

$ locale
LANG=is_IS.utf8
LC_CTYPE="is_IS.utf8"
LC_NUMERIC="is_IS.utf8"
LC_TIME="is_IS.utf8"
LC_COLLATE="is_IS.utf8"
LC_MONETARY="is_IS.utf8"
LC_MESSAGES=en_GB.utf8
LC_PAPER="is_IS.utf8"
LC_NAME="is_IS.utf8"
LC_ADDRESS="is_IS.utf8"
LC_TELEPHONE="is_IS.utf8"
LC_MEASUREMENT="is_IS.utf8"
LC_IDENTIFICATION="is_IS.utf8"
LC_ALL=

$ grep UTF /etc/locale.gen
en_GB.UTF-8 UTF-8
en_CA.UTF-8 UTF-8
en_US.UTF-8 UTF-8
is_IS.UTF-8 UTF-8
de_DE.UTF-8 UTF-8
ja_JP.UTF-8 UTF-8

[Attachment #3 (text/html)]

Dear aspell community,<br><br>For a while, now, I&#39;ve been unable to properly \
install the Icelandic and German aspell libraries. I get a torrent of warnings \
similar to:<br>&nbsp; Warning: The string &quot;Grímseyjar&quot; is invalid. Invalid \
UTF-8 sequence at position 3. Skipping string.  <br>&nbsp; Warning: The string \
&quot;Grímsson&quot; is invalid. Invalid UTF-8 sequence at position 3. Skipping \
string. <br>&nbsp; Warning: The string &quot;Grímssonar&quot; is invalid. Invalid \
UTF-8 sequence at position 3. Skipping string.  <br>&nbsp; Warning: The string \
&quot;Grímssyni&quot; is invalid. Invalid UTF-8 sequence at position 3. Skipping \
string.<br><br>It seems that the problem arises in every string that contains a \
non-ascii character. For languages such as those mentioned which contain a lot of \
such characters, the resulting library is pretty much useless. <br><br>A while ago, I \
tried everything I could think of. Grabbed the source files and converted some in \
different directions with iconv but without luck. Eventually, I gave up figuring that \
my spelling wasn&#39;t *that* bad. Some time later, I&#39;ve found nothing on this \
and the Gentoo community doesn&#39;t seem to recognize where the issue may lie. \
<br><br>The archives for this list also seem rather silent so I wanted to see if \
anyone had any advice to offer as to what I could try to do to fix the problem, or at \
least diagnose it. If not, well then at least it&#39;s here for the next one \
looking... <br><br>Thanks,<br>Martin<br><br>$ aspell --version<br>@(#) International \
Ispell Version 3.1.20 (but really Aspell 0.60.4)<br><br>$ locale <br>LANG=is_IS.utf8 \
<br>LC_CTYPE=&quot;is_IS.utf8&quot; <br>LC_NUMERIC=&quot;is_IS.utf8&quot;  \
<br>LC_TIME=&quot;is_IS.utf8&quot; <br>LC_COLLATE=&quot;is_IS.utf8&quot; \
<br>LC_MONETARY=&quot;is_IS.utf8&quot; <br>LC_MESSAGES=en_GB.utf8 \
<br>LC_PAPER=&quot;is_IS.utf8&quot; <br>LC_NAME=&quot;is_IS.utf8&quot; \
<br>LC_ADDRESS=&quot;is_IS.utf8&quot;  <br>LC_TELEPHONE=&quot;is_IS.utf8&quot; \
<br>LC_MEASUREMENT=&quot;is_IS.utf8&quot; \
<br>LC_IDENTIFICATION=&quot;is_IS.utf8&quot; <br>LC_ALL=<br><br>$ grep UTF \
/etc/locale.gen <br>en_GB.UTF-8 UTF-8 <br>en_CA.UTF-8 UTF-8 <br> en_US.UTF-8 UTF-8 \
<br>is_IS.UTF-8 UTF-8 <br>de_DE.UTF-8 UTF-8 <br>ja_JP.UTF-8 UTF-8<br>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic