[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nepomuk
Subject:    Re: [Nepomuk] word lists - strigi? nepomuk?
From:       Sebastian_Trüg <sebastian () trueg ! de>
Date:       2012-07-18 18:10:59
Message-ID: 5006FC33.2010902 () trueg ! de
[Download RAW message or body]

the word list is internal to Virtuoso. You have access to the plain text 
content of files via the nie:plainTextContent property.

On 07/18/2012 02:42 PM, Dean Perry wrote:
> Hi,
>
> I originally posted this here :
> <http://forum.kde.org/viewtopic.php?f=43&t=106919>
>
> but the forum admin said I should try you directly... if you feel like
> answering, post to the forum or mail me and I'll copy it there; I can't
> be the only one who has wondered about this:
>
> I have an idea for an application to automatically categorise and tag
> documents based on their contents.
>
>
> To do this I need a frequency distribution of the words in the document.
>
> I have played around with the nepomuk examples and have a few clues
> about the tagging and rdf storage.
>
> I can't find much info on a per-document word list though - nepsak,
> nepoogle don't appear to show it, so maybe it's not stored in virtuoso?
>
> Is there a word list stored (eg: inverted vector index)? How does the
> full text search in Dolphin do its thing?
>
> Do I need to produce this list myself using libstreamanalyzer? I'd
> prefer not to do a second indexing pass.
>
>
>
> _______________________________________________
> Nepomuk mailing list
> Nepomuk@kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk
_______________________________________________
Nepomuk mailing list
Nepomuk@kde.org
https://mail.kde.org/mailman/listinfo/nepomuk
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic