From abiword-dev Tue Nov 25 13:48:13 2003 From: Nadav Rotem Date: Tue, 25 Nov 2003 13:48:13 +0000 To: abiword-dev Subject: NLP Inner Product Using OTS X-MARC-Message: https://marc.info/?l=abiword-dev&m=106976821517301 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--=-RdOekGWuFNptxhVLzQXe" --=-RdOekGWuFNptxhVLzQXe Content-Type: text/plain Content-Transfer-Encoding: 7bit The Inner product of two text is defined as the number of topics they share. One of my professors is doing research in this field and needed a matrix of the inner product of cunks of text. Here is a quick example, in a Bash script, of how to use OTS to generate this list of topics. Usage of the script: [nadav@gringo articles]$ ./inner.sh sacbee1.txt sacbee2.txt = 0 [nadav@gringo articles]$ ./inner.sh test1.txt test2.txt = 3 >From your c Code you can get the list of topics through this call: word = ots_word_in_list(Doc->ImpWords,i); -Nadav --=-RdOekGWuFNptxhVLzQXe Content-Disposition: attachment; filename=inner.sh Content-Type: text/x-sh; name=inner.sh; charset=UTF-8 Content-Transfer-Encoding: 7bit #!/bin/bash KeyA=`ots --about $1 | cut -f2 -d'"' | sed -e 's/\,/ /g'` KeyB=`ots --about $2 | cut -f2 -d'"' | sed -e 's/\,/ /g'` C=0; for wordA in $KeyA; do for wordB in $KeyB; do if [ $wordA = $wordB ]; then let C=C+1; fi done done echo '<'$1,$2'>'= $C --=-RdOekGWuFNptxhVLzQXe--