'[aspell-devel] more hints'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       aspell-devel
Subject:    [aspell-devel] more hints
From:       "Dennis R. Crosby" <crosby () tiscali ! nl>
Date:       2010-05-18 21:34:20
Message-ID: 003a01caf6d1$e20a6410$a61f2c30$ () nl
[Download RAW message or body]

I've done some more link jumping from your site and read other peoples
comments also.

I'm afraid everyone is using some kind of word based approach. Words aren't
basic enough.

Everything is based on morphemes, units of meaning, that might correspond to
words, but also correspond to inflectional affixes (and sound shifts).

Starting from English is a bad start. Practical applications are possible
short term, but will be too unwieldy for other languages (some languages
will break the infrastructure immediately).

Given time, even English based systems will cease working, or require
increasingly complex additional layers of 'correction'.

 

All languages utilize morphemes and their allomorphs. Even polysynthetic
(where "word" in isolation is a meaningless concept and utterance (sentence)
is as small as you're going to get. They too are composed of strings of
morphemes in functional-contextual allomorphic variants.

 

Also: besides starting with your database of morphemes and conditional
rules, you need to identify morphemes on context frequencies (how often does
this morpheme indicate plural, how often possession, how often the
abbreviated allomorph form of 3rd person singular "to be", as in the case of
"s" (and its allomorphic variations "es", "ren", and null (as well as "z",
which is pronounced, though not written).

 

Context indexes need to include etymological identifiers as well as
functional identifiers. Why? Because where a word comes from originally, was
well as when it was adopted into English and sometimes even the root it took
(i.e. latin root imported via French or Spanish or technological revolution
born necessity [or should I say technological revolution *borne* necessity?
Neither is wrong, depending on what I want to emphasize. Either choice is
wrong if I mean to emphasize the other meaning]) = These factors actually
DETERMINE the set of rules that apply to spelling variation as well a usage
in English. They are they 'why' that we scratch our heads about but go on
applying consistently, knowing something is wrong when we try to do
otherwise. Morphemes are packages, units of meaning with variants that are
just as bound to their lineage as they are to the meanings they carry and
the phonemes of which they are composed (which in turn cause them to be
subject to another set of rules governing contextual sound variations). That
form/meaning package of variants has a set of rules that govern them - a
"citizenship" with rights and obligations if you will. A twin morpheme
(homophone) a "citizen" of another country, has other duties and other
rights-based expectations.

And the rules governing contextual sound shifts MAY BE DIFFERENT. Lineage is
the reason. Sometimes we didn't just borrow words. Sometimes we borrowed the
manual that went with the word. Sometimes we didn't read the manual, or read
it well, or it's been so long - how did that go? Maybe the manual got lost.

 

This stupid analogy illustrates how many things can be involved in "proper"
spelling. Periodically, the culture gets tired of learning to apply all
these things nobody can remember why and usually get wrong and a wave of
simplification sweeps through the language.

 

English has suffered from domination by French speaking Danish descendants
for 200 years followed by pretentions of loyalty and proclamations of
convictions - which were enormous factors in word choice, spelling and
grammar. English doesn't look at all like Icelandic, but it used to.

 

We didn't bother to educate the slaves or their descendants, so they never
quite got around to imitating our usage exactly, sometimes never quite
abandoned grammatical transformations that (quite conveniently) expressed
day-to-day sameness in a way 'standard' usage ignored completely and lo and
behold, 200 years later every good white Anglo-Saxon descendant in the US
knows exactly "what they be talkin' 'bout". Those lineage based forms (in
the latter case, concepts bound in rules that apply to categories - verbs)
are INTERNALIZED IN OUR WHOLE CULTURE. How many people do you know who can
explain why?

 

Sorry, this was meant to be a short letter. I can blame the medicine partly,
but it's rooted in basic nature, fed by much thinking and bottled up due to
a world-wide lack of interest in the subject.

 

I really hope you can glean some useful points out of my rantings. My points
are important, but I'm afraid I'm burying them too deeply. Sorry.


[Attachment #3 (text/html)]

<html xmlns:v="urn:schemas-microsoft-com:vml" \
xmlns:o="urn:schemas-microsoft-com:office:office" \
xmlns:w="urn:schemas-microsoft-com:office:word" \
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" \
xmlns="http://www.w3.org/TR/REC-html40">

<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">


<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
 /* Font Definitions */
 @font-face
	{font-family:SimSun;
	panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:"\@SimSun";
	panose-1:2 1 6 0 3 1 1 1 1 1;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0cm;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
span.EmailStyle17
	{mso-style-type:personal-compose;
	font-family:"Calibri","sans-serif";
	color:windowtext;}
.MsoChpDefault
	{mso-style-type:export-only;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.Section1
	{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
 <o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
 <o:shapelayout v:ext="edit">
  <o:idmap v:ext="edit" data="1" />
 </o:shapelayout></xml><![endif]-->
</head>

<body lang=EN-US link=blue vlink=purple>

<div class=Section1>

<p class=MsoNormal>I&#8217;ve done some more link jumping from your site and
read other peoples comments also.<o:p></o:p></p>

<p class=MsoNormal>I&#8217;m afraid everyone is using some kind of word based
approach. Words aren&#8217;t basic enough.<o:p></o:p></p>

<p class=MsoNormal>Everything is based on morphemes, units of meaning, that
might correspond to words, but also correspond to inflectional affixes (and
sound shifts).<o:p></o:p></p>

<p class=MsoNormal>Starting from English is a bad start. Practical applications
are possible short term, but will be too unwieldy for other languages (some
languages will break the infrastructure immediately).<o:p></o:p></p>

<p class=MsoNormal>Given time, even English based systems will cease working,
or require increasingly complex additional layers of \
&#8216;correction&#8217;.<o:p></o:p></p>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<p class=MsoNormal>All languages utilize morphemes and their allomorphs. Even
polysynthetic (where &#8220;word&#8221; in isolation is a meaningless concept
and utterance (sentence) is as small as you&#8217;re going to get. They too are
composed of strings of morphemes in functional-contextual allomorphic \
variants.<o:p></o:p></p>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<p class=MsoNormal>Also: besides starting with your database of morphemes and \
conditional rules, you need to identify morphemes on context frequencies (how often \
does this morpheme indicate plural, how often possession, how often the abbreviated
allomorph form of 3<sup>rd</sup> person singular &#8220;to be&#8221;, as in the
case of &#8220;s&#8221; (and its allomorphic variations &#8220;es&#8221;, \
&#8220;ren&#8221;, and null (as well as &#8220;z&#8221;, which is pronounced, though \
not written).<o:p></o:p></p>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<p class=MsoNormal>Context indexes need to include etymological identifiers as
well as functional identifiers. Why? Because where a word comes from originally,
was well as when it was adopted into English and sometimes even the root it
took (i.e. latin root imported via French or Spanish or technological
revolution born necessity [or should I say technological revolution *<b>borne</b>*
necessity? Neither is wrong, depending on what I want to emphasize. Either
choice is wrong if I mean to emphasize the other meaning]) = These factors
actually DETERMINE the set of rules that apply to spelling variation as well a
usage in English. They are they &#8216;why&#8217; that we scratch our heads
about but go on applying consistently, knowing something is wrong when we try
to do otherwise. Morphemes are packages, units of meaning with variants that
are just as bound to their lineage as they are to the meanings they carry and
the phonemes of which they are composed (which in turn cause them to be subject
to another set of rules governing contextual sound variations). That
form/meaning package of variants has a set of rules that govern them &#8211; a \
&#8220;citizenship&#8221; with rights and obligations if you will. A twin \
morpheme&nbsp; (homophone) a &#8220;citizen&#8221; of another country, has other \
duties and other rights-based expectations.<o:p></o:p></p>

<p class=MsoNormal>And the rules governing contextual sound shifts MAY BE
DIFFERENT. Lineage is the reason. Sometimes we didn&#8217;t just borrow words.
Sometimes we borrowed the manual that went with the word. Sometimes we didn&#8217;t
read the manual, or read it well, or it&#8217;s been so long &#8211; how did
that go? Maybe the manual got lost.<o:p></o:p></p>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<p class=MsoNormal>This stupid analogy illustrates how many things can be
involved in &#8220;proper&#8221; spelling. Periodically, the culture gets tired
of learning to apply all these things nobody can remember why and usually get
wrong and a wave of simplification sweeps through the language.<o:p></o:p></p>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<p class=MsoNormal>English has suffered from domination by French speaking
Danish descendants for 200 years followed by pretentions of loyalty and
proclamations of convictions &#8211; which were enormous factors in word
choice, spelling and grammar. English doesn&#8217;t look at all like Icelandic,
but it used to.<o:p></o:p></p>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<p class=MsoNormal>We didn&#8217;t bother to educate the slaves or their descendants,
so they never quite got around to imitating our usage exactly, sometimes never
quite abandoned grammatical transformations that (quite conveniently) expressed
day-to-day sameness in a way &#8216;standard&#8217; usage ignored completely
and lo and behold, 200 years later every good white Anglo-Saxon descendant in
the US knows exactly &#8220;what they be talkin&#8217; &#8216;bout&#8221;.
Those lineage based forms (in the latter case, concepts bound in rules that
apply to categories &#8211; verbs) are INTERNALIZED IN OUR WHOLE CULTURE. How
many people do you know who can explain why?<o:p></o:p></p>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<p class=MsoNormal>Sorry, this was meant to be a short letter. I can blame the
medicine partly, but it&#8217;s rooted in basic nature, fed by much thinking
and bottled up due to a world-wide lack of interest in the subject.<o:p></o:p></p>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<p class=MsoNormal>I really hope you can glean some useful points out of my
rantings. My points are important, but I&#8217;m afraid I&#8217;m burying them
too deeply. Sorry.<o:p></o:p></p>

</div>

 <BR><BR>__________ Information from ESET NOD32 Antivirus, version of virus signature \
database 5125 (20100518) __________<BR><BR>The message was checked by ESET NOD32 \
Antivirus.<BR><BR><A HREF="http://www.eset.com">http://www.eset.com</A><BR> </body>

</html>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic