[prev in list] [next in list] [prev in thread] [next in thread] 

List:       aspell-user
Subject:    [Aspell-user] Anybody working on Turkish?
From:       "Ethan Bradford" <ethanb () google ! com>
Date:       2006-06-29 0:34:28
Message-ID: 6b327d700606281734p423a6268y9a5833fdec31562d () mail ! google ! com
[Download RAW message or body]

Unless somebody else is nearly there for Turkish, Gokalp and I will probably
be working on improving Aspell for Turkish (that is, we have been working on
it, and are just awaiting some administrivia to start working on it again).

We'd love to collaborate with anybody else interested in it, or to get
feedback on our approach.


Here's some background, and then our approach, if you are interested.

Turkish is an "agglutinative" language, like Finnish, Estonian, Hungarian,
Japanese, and Korean.  That means that suffixes convey a lot more
information than in Indo-european languages, and that any complete list of
"surface forms" of words has to be enormously longer.  Though the suffix
trees are big, they're quite regular, so it fits reasonably well into
Aspell's structure (though it fits better into Hunspell, but for various
reasons we can't go there).  There's a good implementation of Aspell for
Finnish which proves the concept.

We hope to take the existing Turkish Aspell word list, or maybe even a
longer word list, if we have time to generate it, and apply a stemmer to it
to come of with a list of the represented stem forms.  We'll connect those
up with tables of suffixes we've collected from the web.

Does that sound like it will work?

[Attachment #3 (text/html)]

Unless somebody else is nearly there for Turkish, Gokalp and I will
probably be working on improving Aspell for Turkish (that is, we have
been working on it, and are just awaiting some administrivia to start
working on it again).<br>
<br>
We'd love to collaborate with anybody else interested in it, or to get feedback on our approach.<br>
<br>
<br>
Here's some background, and then our approach, if you are interested.<br>
<br>
Turkish is an &quot;agglutinative&quot; language, like Finnish, Estonian,
Hungarian, Japanese, and Korean.&nbsp; That means that suffixes convey
a lot more information than in Indo-european languages, and that any
complete list of &quot;surface forms&quot; of words has to be enormously
longer.&nbsp; Though the suffix trees are big, they're quite regular,
so it fits reasonably well into Aspell's structure (though it fits
better into Hunspell, but for various reasons we can't go there).&nbsp;
There's a good implementation of Aspell for Finnish which proves the
concept.<br>
<br>
We hope to take the existing Turkish Aspell word list, or maybe even a
longer word list, if we have time to generate it, and apply a stemmer
to it to come of with a list of the represented stem forms.&nbsp; We'll
connect those up with tables of suffixes we've collected from the web.<br>
<br>
Does that sound like it will work?<br>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic