[prev in list] [next in list] [prev in thread] [next in thread] 

List:       abiword-user
Subject:    OTS - Help in translations is needed!
From:       Nadav Rotem <nadavrotem () mail ! ru>
Date:       2003-07-10 10:51:54
[Download RAW message or body]

Hi 
                                                                            At the \
moment Open Text Summarizer can summarize documents in  \
English,Hebrew,Portuguese,Dutch and Norwegian Nynorsk.Enabling Abiword to summarize \
documents in your language is easy and fun! All you have to do is create a short text \
file that has about 200 special words in it. 


Here is how its done: 

Name your file (LangCode).dic (for example en.dic for english). 
In that file you need to put words that are common in your language
but are NOT the subject of any article. For example the word "the" in 
english is very common but is not an "important" word; 
In other words , we can find the word "the" in almost every sentence and
we can't tell anything about the sentence from it. Another example is 
the word "such" that is redundent (for this use). 
I know its a little strage but it works. 


Here is what I do. I take a UTF-8 text file (it has to be unicode) and 
ask OTS to tell me what words it thinks are key words in the article. 
here: 


 ots letter.txt --dic=he --keywords | more 


where "he" is the "Hebrew" dictionary file and letter.txt is the text 
file. 


here is an example of such a file (in english this time) 
Word[15][to] 
Word[8][the] 
Word[6][a] 
Word[5][love] 
Word[5][Becky] 
Word[5][October] 
Word[5][north] 
... 
... 


As you can see the word "to" appears 15 times in the text. "To" is not a
key-word so we need to place it in our dictionary file. The same goes 
for "the" and "a". Translating doc/en.dic would work for most germanic 
languages. Just play with it until you feel you get it right. 


for more info look into http://www.abisource.com/lxr/source/ots/README


Other OTS related news: 
* OTS made it into Gentoo! to get OTS 0.2.0 under gentoo type "emerge 
ots";


-----------------------------------------------
To unsubscribe from this list, send a message to
abiword-user-request@abisource.com with the word
unsubscribe in the message body.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic