[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    Re: Patch: thesaurus using wordnet
From:       David Faure <david () mandrakesoft ! com>
Date:       2001-05-19 15:53:31
[Download RAW message or body]

On Sunday 06 May 2001 16:46, Daniel Naber wrote:
> Hi,

Hi,
sorry for taking this long to answer this mail....

> here's a very unfinished patch that adds a thesaurus to KWord. Several 
> things need to be decided before this can go in. Also I cannot do this all 
> alone, but I need quite some help. Anyway, I think it's such a nice 
> feature that people will surely support me :-)

Yup ;-)

> You need Wordnet to use this:
> http://www.cogsci.princeton.edu/~wn/ (it's ~13 MB big!)
>
> Issues:
> 
> 1. Wordnet 1.6 has a bug that makes it crash often. It is very easy to fix 
> (see their release notes), but it is in the release and a new release is 
> not in sight. Working around the bug is possible but not nice (involves 
> parsing text output).

Really ?
I'm a very happy Wordnet-1.6 user. This seems to be a very old release btw,
I've been using Wordnet-1.6 for ages. 

WordNet 1.6 Copyright 1997 by Princeton University.

1997 ! :)

Oh I see. Your patch uses the library provided by WordNet, not the wn command-line tool.
Hmm, I wouldn't bet on any chance of a fix, if there has been no release for 4 years ?

"Parsing text output", I guess you mean using wn's command line tool instead
of the library ? This might be a good way to go anyway (see below).....

> 2. AFAIK wordnet is not part of any Linux distibution. The license looks 
> very liberal too me (someone please check it), so this might change once 
> KWord can use it?

Maybe not so liberal, I'd say :

"Permission to use, copy, modify and distribute this software and
database and its documentation for any purpose and without fee or
royalty is hereby granted, [...]"

Does this apply to distribution CDs ?
I can probably get someone at Mandrake to check up that license,
or we could ask Andreas Pour.

> 3. I don't know anything about configure, so the patch works on my system, 
> but probably not on other people's configurations.
:)

> 4. Integration into KWord doesn't work, because I don't know such simple 
> things like "get the word the cursor is over" - well, I never worked with 
> KWord before...
Well, that's more like qrichtext stuff now.
Something like, get the QTextCursor from the current KWFrameSetEdit, 
make a copy of it, and then use gotoWordLeft() (or write your own loop).
Then for each non-space char, go right and add up. A bit like is done
for Ctrl+delete (delete the word on the right). So this would give:

            QTextCursor cursor = *currentTextEdit()->getCursor();
            cursor.gotoWordLeft(); // TODO not do that if already at the beginning of a word
            QString word;
            while ( !cursor.atParagEnd() ) {
                QChar ch = cursor.parag()->at( cursor.index() )->c;
                if ( !ch.isSpace() )
                   word += ch;
                else
                   break;
                cursor.gotoRight();
            };

The QRT classes are quite self explanatory, but not documented at all
(I know, this is contradictory. I mean that they are all well named and well
designed, so this in itself is self explanatory ;-)

Replacing the word with another is what we already do for spellchecking,
so it's what's in KWView::spellCheckerCorrected (basically, removing
the selected word and calling KWTextFrameSet::insert)

Tell me if you need more pointers, or if you want me to write more code
dealing with that, that's no problem.

> 5. Several fixme's in the code. Especially the current results are 
> incomplete because I didn't manage to iterate over a "char **" list 
> without crashes. I need help with this (i.e. a fix, I already tried 
> everything I can think of). Also the patch needs a cleanup.

Ok, more on that later, see end of mail.

> 6. Currently only English is supported and you cannot just "translate" 
> Wordnet. However, there are data files for other languages, but with other 
> licenses.

Hmm, what kind of licenses ?

> 7. Maybe this should not be part of KWord but part of kdelibs, similar to 
> KSpell? It could even be used as a stand alone app.

Hmm, the point is that WordNet might not be available. If we call an external
kwn app or whatever, we still have to know before hand that wordnet isn't there
and that the feature should be disabled. With a library we can ask it if the feature
is available.

I think this would make much sense as a separate library. It would help being
used by another app (dunno which yet but that's not the point :), and it would
also help encapsulating the feature so that the library can provide a no-op
mode in case WN isn't available.

This could even be a "koffice data tool" once I reintroduce that in kword....
(see KoDataTool in kofficecore, I just committed some docu to it).
Hmm, the problem is, what info do we give the tool, and what info do we
get from it. KSpread gives/gets the current cell's text contents...
Your tool would need the word under the cursor, but other tools (like the
bibliography app that was mentionned here) would need the whole text.

I guess we could have flags in the tool desktop file for that, so that
it can say what it's interested in (e.g. word, line, parag, frameset, complete doc...).
Or we could (mis?)use dataType() for that.

Anyway, a KoDataTool sounds like a good way to me, as it only adds generic 
code to KWord (tool handling), and allows all the tool-specific code be separate 
from KWord itself. I'll try to add that when I'm done with view modes.

Doing this the way this patch suggests (directly linking wn's lib from KWord)
is a big no-no anyway, IMHO. If a distributor compiles KWord with wn installed, 
it would force all users to install (the quite big) wn. 
On the contrary if the code is separate (whether as a dlopened service available via 
a kde library, like the scan stuff, or as a kodatatool), then a separate 
package can be made, and users can choose to install it or not.

I'm not sure that the library + optionally-present service solution (like kscan)
is worth it... We could go with a datatool only, to start with. Tell me what you think.

Hmm, of course, if the crash in the wn library can't be worked around nor solved,
then we'll have to parse wn's output, and the library dependency problem will go
away. So I guess we should start from there... Any idea if they'll make a bugfix
release after all this time ? :}

-- 
David FAURE, david@mandrakesoft.com, faure@kde.org
http://perso.mandrakesoft.com/~david/, http://www.konqueror.org/
KDE, Making The Future of Computing Available Today
_______________________________________________
Koffice-devel mailing list
Koffice-devel@master.kde.org
http://master.kde.org/mailman/listinfo/koffice-devel

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic