[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: Natural language processing tech for the desktop!
From:       "Jordi Polo" <mumismo () gmail ! com>
Date:       2008-10-23 8:17:21
Message-ID: a4162420810230117vf00dbe8n3f50743dd45ce565 () mail ! gmail ! com
[Download RAW message or body]

Make link-grammar usable for KDE would be a good idea but for sure it is not
a 1-year project. Also, adding support for more languages is ... Not really
interesting.

I was thinking about getting automatic data extraction, tag it and add it to
Nepomuk. But most text or webpages I have myself on my computer are pretty
random...

Another idea may be getting the data from konqueror history or akregator
history or other data sources and create suggestions, etc. But I am not sure
the fact that there is a concert of a guy you have a lot of music files of
soon should really appear somewhere in your KDE desktop.

Also, here they are supposed to be interested in dialog management and
topics like that, but any thing that resembles the office clip scares people
...


On Thu, Oct 23, 2008 at 7:26 AM, Alexander Dymo <dymo@ukrpost.ua> wrote:

> > > I don't even think that spell and grammar checking should be separated
> > > very much, since a spell checker should ideally know about the sentence
> > > structure, too.
> >
> > Well, AFAIK (but checking is not my speciality), you can do a really good
> > and fast spell checker with simple statistical techniques and a simple
> > distance editing. For a grammar checker, you must have a full NL language
> > syntax parser and other techniques to find what are the errors and
> suggest
> > solutions. But, it's true that a good grammar checker must also be solid
> in
> > front of spelling errors.
>
> That's why I like the idea of using Link parser
> http://www.abisource.com/projects/link-grammar/
> http://www.link.cs.cmu.edu/link/
>
> It's quite convenient to use. It has both word dictionaries and grammar
> rules
> and when you try to build the graph (of links between words), it will
> figure
> out the morphology and syntax simultaneously.
>
> If the sentence is not correct, it will either leave words as not
> recognized
> morphologically (spelling error) or it will leave words outside the
> sentence
> link graph (syntax error). The algorithm to do that is IIRC O(n^3) which is
> great.
>
> I know abiword uses that now but I don't know how they do error reporting
> for
> syntax errors (which is quite interesting question itself).
>
> The only problem is that only english grammar is complete atm. There're
> italian and german grammars but they don't look like mature yet. There's
> also
> quite good russian grammar but it's unfortunatelly proprietary.
>
>


-- 
Jordi Polo Carres
NLP laboratory - NAIST
http://www.bahasara.org

[Attachment #3 (text/html)]

<br>Make link-grammar usable for KDE would be a good idea but for sure it is not a \
1-year project. Also, adding support for more languages is ... Not really \
interesting. <br><br>I was thinking about getting automatic data extraction, tag it \
and add it to Nepomuk. But most text or webpages I have myself on my computer are \
pretty random... <br> <br>Another idea may be getting the data from konqueror history \
or akregator history or other data sources and create suggestions, etc. But I am not \
sure the fact that there is a concert of a guy you have a lot of music files of soon \
should really appear somewhere in your KDE desktop.<br> <br>Also, here they are \
supposed to be interested in dialog management and topics like that, but any thing \
that resembles the office clip scares people ...<br><br><br><div \
class="gmail_quote">On Thu, Oct 23, 2008 at 7:26 AM, Alexander Dymo <span \
dir="ltr">&lt;<a href="mailto:dymo@ukrpost.ua">dymo@ukrpost.ua</a>&gt;</span> \
wrote:<br> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, \
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d">&gt; \
&gt; I don&#39;t even think that spell and grammar checking should be separated<br>

&gt; &gt; very much, since a spell checker should ideally know about the sentence<br>
&gt; &gt; structure, too.<br>
&gt;<br>
&gt; Well, AFAIK (but checking is not my speciality), you can do a really good<br>
&gt; and fast spell checker with simple statistical techniques and a simple<br>
&gt; distance editing. For a grammar checker, you must have a full NL language<br>
&gt; syntax parser and other techniques to find what are the errors and suggest<br>
&gt; solutions. But, it&#39;s true that a good grammar checker must also be solid \
in<br> &gt; front of spelling errors.<br>
<br>
</div>That&#39;s why I like the idea of using Link parser<br>
<a href="http://www.abisource.com/projects/link-grammar/" \
target="_blank">http://www.abisource.com/projects/link-grammar/</a><br> <a \
href="http://www.link.cs.cmu.edu/link/" \
target="_blank">http://www.link.cs.cmu.edu/link/</a><br> <br>
It&#39;s quite convenient to use. It has both word dictionaries and grammar rules<br>
and when you try to build the graph (of links between words), it will figure<br>
out the morphology and syntax simultaneously.<br>
<br>
If the sentence is not correct, it will either leave words as not recognized<br>
morphologically (spelling error) or it will leave words outside the sentence<br>
link graph (syntax error). The algorithm to do that is IIRC O(n^3) which is<br>
great.<br>
<br>
I know abiword uses that now but I don&#39;t know how they do error reporting for<br>
syntax errors (which is quite interesting question itself).<br>
<br>
The only problem is that only english grammar is complete atm. There&#39;re<br>
italian and german grammars but they don&#39;t look like mature yet. There&#39;s \
also<br> quite good russian grammar but it&#39;s unfortunatelly proprietary.<br>
<br>
</blockquote></div><br><br clear="all"><br>-- <br>Jordi Polo Carres<br>NLP laboratory \
- NAIST<br><a href="http://www.bahasara.org">http://www.bahasara.org</a><br><br>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic