'WordNet in kde'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-i18n-doc
Subject:    WordNet in kde
From:       "Eric Zak" <eric.zak () gmail ! com>
Date:       2008-08-23 17:37:38
Message-ID: dd498bee0808231037v79a9a7w3b4d26acd950edb9 () mail ! gmail ! com
[Download RAW message or body]

Howdy to all the playas out there interested in localization. So far, I've
never attempted to learn another spoken language, despite ample
opportunities. I've always been interested in learning a foreign language,
but apparently never interested enough. Unlike me and foreign languages, me
and computers click. My broadening view of computers has returned me to this
latent desire to learn a new language, sort of. Planning on implementing an
extension to nepomuk, it is becomming apparant that the feature which I wish
to implement could have profound effects on translation efforts. I would
appreciate your time and constructive comments, but first bare with me as I
explain this fluid concept.

From http://wordnet.princeton.edu/:
"WordNet(R) is a large lexical database of English, developed under the
direction of George A. Miller. Nouns, verbs, adjectives and adverbs are
grouped into sets of cognitive synonyms (synsets), each expressing a
distinct concept. Synsets are interlinked by means of conceptual-semantic
and lexical relations. The resulting network of meaningfully related words
and concepts can be navigated with the browser. WordNet is also freely and
publicly available for download. WordNet's structure makes it a useful tool
for computational linguistics and natural language processing."

I've been very interested in WordNet, since becoming aware of this amazing
software. Of the software that I'm aware of, few come close to utilizing
WordNet to it's potential- the Natural Language ToolKit(
http://nltk.sourceforge.net/index.php/Screenshots) being the exception.
Considering this, it seems evident that such functionality would have
profound impacts if implemented in KDE-- allowing easy adoption of lexical
utilities to a broad spectrum of people. Such a utility would allow usefull
features to become, some of which I will list in increasing order of
implementational complexity: 1) Reduction of index size, while increasing
"connections" to key word sets extracted from documents through the use of
synonyms for "Desktop Search" 2) Improvement to dictionary application, as a
quick example imagine hooking up one of these examples to wikipedia
http://kde-look.org/CONTENT/content-m1/m87173-1.png or
http://www.visualthesaurus.com/landing 3) Improved tools for translation and
knowledge representation. The case may be, that these tools will expand to
be the letters defining tomorrows paragraph-- the egg coming before the
chicken.

One becomes aware of the contextual complexity of language when attempting
to learn a specialized concept or having a conversation with a person who
has been subjected to a different set of circumstances. This complexity is
intensly magnified when comparing two different languages with this in mind.
Most everyone is able to utilise language without much conscious effort(as a
child), interestingly indicating that written language lacks much of the
information conveyed in spoken language(even with the use of the most
mastered literary devices). Not physically present communication has been a
determining factor in the size of mankinds stride, justifying furthur
improvements as contextual scope increases. It seems that WordNet was a
gigantic leap in the right direction, but such leaps are unlikely to have
smooth landings. The time is now right to extend the lexical database
WordNet, by allowing users to refine the database and have those refinements
rated by others-- by context. Such an improvement would be even more
versatile if done so in a decentralized manner, allowing the comparison of
different {interests,regions,languages} to strenghen our communication by
conveying information that was once not there.

Translations as Semantic Mirrors: From Parallel Corpus to Wordnet by Helge
Dyvik (www.hf.uib.no/i/LiLi/SLF/Dyvik/ICAMEpaper.pdf) explains how you can
use existing data to produce translations of wn, in addition to noting that
verifying the translations is no small task. I'm assuming that this means
the general objects that make up a sentence(nouns,verbs,adjectives, and
adverbs) apply to all languages, just according to different set of order
and rules. An infrastructure should be set in place to make semantical data
more modular, via Nepomuk. One such solution, involves lexical
servers(services) which can communicate with other servers gathering
information as needed or may be cross-referencing datasets. The clients to
these servers, perhaps Lexikal, connect to the servers to retrieve
information, submit changes(which will be rated, preferably indirectly, by
other users), and link objects/concepts together. By integrating the
features of "Lexikal" we could improve translation efforts immeasurably, in
addition to improving data quality. Personally, being most interested in
improving data quality, the perspective of the people who know more then one
language would be much appreciated. Also of interest to me is what kind of
server would be best utilized, although that isn't a topic in relation to
this list.

[Attachment #3 (text/html)]

<div dir="ltr">Howdy to all the playas out there interested in localization. So far, \
I&#39;ve never attempted to learn another spoken language, despite ample \
opportunities. I&#39;ve always been interested in learning a foreign language, but \
apparently never interested enough. Unlike me and foreign languages, me and computers \
click. My broadening view of computers has returned me to this latent desire to learn \
a new language, sort of. Planning on implementing an extension to nepomuk, it is \
becomming apparant that the feature which I wish to implement could have profound \
effects on translation efforts. I would appreciate your time and constructive \
comments, but first bare with me as I explain this fluid concept.<br> <br>From <a \
href="http://wordnet.princeton.edu/">http://wordnet.princeton.edu/</a>:<br>&quot;WordNet&reg; \
is a large lexical database of English, developed under the direction of George A. \
Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive \
synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by \
means of conceptual-semantic and lexical relations. The resulting network of \
meaningfully related words and concepts can be navigated with the browser. WordNet is \
also freely and publicly available for download. WordNet&#39;s structure makes it a \
useful tool for computational linguistics and natural language processing.&quot;<br> \
<br>I&#39;ve been very interested in WordNet, since becoming aware of this amazing \
software. Of the software that I&#39;m aware of, few come close to utilizing WordNet \
to it&#39;s potential- the Natural Language ToolKit(<a \
href="http://nltk.sourceforge.net/index.php/Screenshots">http://nltk.sourceforge.net/index.php/Screenshots</a>) \
being the exception. Considering this, it seems evident that such functionality would \
have profound impacts if implemented in KDE-- allowing easy adoption of lexical \
utilities to a broad spectrum of people. Such a utility would allow usefull features \
to become, some of which I will list in increasing order of implementational \
complexity: 1) Reduction of index size, while increasing &quot;connections&quot; to \
key word sets extracted from documents through the use of synonyms for &quot;Desktop \
Search&quot; 2) Improvement to dictionary application, as a quick example imagine \
hooking up one of these examples to wikipedia <a \
href="http://kde-look.org/CONTENT/content-m1/m87173-1.png">http://kde-look.org/CONTENT/content-m1/m87173-1.png</a> \
or <a href="http://www.visualthesaurus.com/landing">http://www.visualthesaurus.com/landing</a> \
3) Improved tools for translation and knowledge representation. The case may be, that \
these tools will expand to be the letters defining tomorrows paragraph-- the egg \
coming before the chicken.<br> <br>One becomes aware of the contextual complexity of \
language when attempting to learn a specialized concept or having a conversation with \
a person who has been subjected to a different set of circumstances. This complexity \
is intensly magnified when comparing two different languages with this in mind. Most \
everyone is able to utilise language without much conscious effort(as a child), \
interestingly indicating that written language lacks much of the information conveyed \
in spoken language(even with the use of the most mastered literary devices). Not \
physically present communication has been a determining factor in the size of \
mankinds stride, justifying furthur improvements as contextual scope increases. It \
seems that WordNet was a gigantic leap in the right direction, but such leaps are \
unlikely to have smooth landings. The time is now right to extend the lexical \
database WordNet, by allowing users to refine the database and have those refinements \
rated by others-- by context. Such an improvement would be even more versatile if \
done so in a decentralized manner, allowing the comparison of different \
{interests,regions,languages} to strenghen our communication by conveying information \
that was once not there.<br> <br>Translations as Semantic Mirrors: From Parallel \
Corpus to Wordnet by Helge Dyvik (<a \
href="http://www.hf.uib.no/i/LiLi/SLF/Dyvik/ICAMEpaper.pdf">www.hf.uib.no/i/LiLi/SLF/Dyvik/ICAMEpaper.pdf</a>) \
explains how you can use existing data to produce translations of wn, in addition to \
noting that verifying the translations is no small task. I&#39;m assuming that this \
means the general objects that make up a sentence(nouns,verbs,adjectives, and \
adverbs) apply to all languages, just according to different set of order and rules. \
An infrastructure should be set in place to make semantical data more modular, via \
Nepomuk. One such solution, involves lexical servers(services) which can communicate \
with other servers gathering information as needed or may be cross-referencing \
datasets. The clients to these servers, perhaps Lexikal, connect to the servers to \
retrieve information, submit changes(which will be rated, preferably indirectly, by \
other users), and link objects/concepts together. By integrating the features of \
&quot;Lexikal&quot; we could improve translation efforts immeasurably, in addition to \
improving data quality. Personally, being most interested in improving data quality, \
the perspective of the people who know more then one language would be much \
appreciated. Also of interest to me is what kind of server would be best utilized, \
although that isn&#39;t a topic in relation to this list.<br> <br></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic