'Re: Content based recommender using lucene/solr'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: Content based recommender using lucene/solr
From:       Lance Norskog <goksron () gmail ! com>
Date:       2013-06-30 0:50:39
Message-ID: 51CF80DF.7060307 () gmail ! com
[Download RAW message or body]

Solr/Lucene has two features for this:
1) the MoreLikeThis code, and
2) the clustering project in solr/contrib.

Lance

On 06/28/2013 11:15 AM, Luis Carlos Guerrero Covo wrote:
> I only have about a million docs right now so scaling is not a big issue.
> I'm looking to provide a quick implementation and then worry about scale
> when I get around to implementing a more robust recommender. I'm looking at
> a content based approach because we are not tracking users and items viewed
> by users. I was thinking of using morelikethis like walter mentioned, but
> wanted some feedback on the nuances required for a proper implementation
> like having a similarity based on euclidean distance, normalizing numerical
> field values and computing collection wide stats like mean and variance.
> Thank you for the link Otis, I will watch it right away.
> 
> 
> On Fri, Jun 28, 2013 at 1:12 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
> 
> > Hi,
> > 
> > It doesn't have to be one or the other.  In the past I've built a news
> > recommender engine based on CF (Mahout) and combined it with Content
> > Similarity-based engine (wasn't Solr/Lucene, but something custom that
> > worked with ngrams, but it may have as well been Lucene/Solr/ES).  It
> > worked well.  If you haven't worked with Mahout before I'd suggest the
> > approach in that video and going from there to Mahout only if it's
> > limiting.
> > 
> > See Ted's stuff on this topic, too:
> > http://www.slideshare.net/tdunning/search-as-recommendation +
> > http://berlinbuzzwords.de/sessions/multi-modal-recommendation-algorithms
> > (note: Mahout, Solr, Pig)
> > 
> > Otis
> > --
> > Solr & ElasticSearch Support -- http://sematext.com/
> > Performance Monitoring -- http://sematext.com/spm
> > 
> > 
> > 
> > On Fri, Jun 28, 2013 at 2:07 PM, Saikat Kanjilal <sxk1969@hotmail.com>
> > wrote:
> > > You could build a custom recommender in mahout to accomplish this, also
> > just out of curiosity why the content based approach as opposed to building
> > a recommender based on co-occurence.  One other thing, what is your data
> > size, are you looking at scale where you need something like hadoop?
> > > > From: lcguerrerocovo@gmail.com
> > > > Date: Fri, 28 Jun 2013 13:02:00 -0500
> > > > Subject: Re: Content based recommender using lucene/solr
> > > > To: solr-user@lucene.apache.org
> > > > CC: java-user@lucene.apache.org
> > > > 
> > > > Hey saikat, thanks for your suggestion. I've looked into mahout and
> > other
> > > > alternatives for computing k nearest neighbors. I would have to run a
> > job
> > > > and computer the k nearest neighbors and track them in the index for
> > > > retrieval. I wanted to see if this was something I could do with lucene
> > > > using lucene's scoring function and solr's morelikethis component. The
> > job
> > > > you specifically mention is for Item based recommendation which would
> > > > require me to track the different items users have viewed. I'm looking
> > for
> > > > a content based approach where I would use a distance measure to
> > establish
> > > > how near items are (how similar) and have some kind of training phase to
> > > > adjust weights.
> > > > 
> > > > 
> > > > On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal <sxk1969@hotmail.com
> > > wrote:
> > > > > Why not just use mahout to do this, there is an item similarity
> > algorithm
> > > > > in mahout that does exactly this :)
> > > > > 
> > > > > 
> > > > > 
> > https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
> > 
> > > > > You can use mahout in distributed and non-distributed mode as well.
> > > > > 
> > > > > > From: lcguerrerocovo@gmail.com
> > > > > > Date: Fri, 28 Jun 2013 12:16:57 -0500
> > > > > > Subject: Content based recommender using lucene/solr
> > > > > > To: solr-user@lucene.apache.org; java-user@lucene.apache.org
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > I'm using lucene and solr right now in a production environment
> > with an
> > > > > > index of about a million docs. I'm working on a recommender that
> > > > > basically
> > > > > > would list the n most similar items to the user based on the
> > current item
> > > > > > he is viewing.
> > > > > > 
> > > > > > I've been thinking of using solr/lucene since I already have all
> > docs
> > > > > > available and I want a quick version that can be deployed while we
> > work
> > > > > on
> > > > > > a more robust recommender. How about overriding the default
> > similarity so
> > > > > > that it scores documents based on the euclidean distance of
> > normalized
> > > > > item
> > > > > > attributes and then using a morelikethis component to pass in the
> > > > > > attributes of the item for which I want to generate
> > recommendations? I
> > > > > know
> > > > > > it has its issues like recomputing scores/normalization/weight
> > > > > application
> > > > > > at query time which could make this idea unfeasible/impractical.
> > I'm at a
> > > > > > very preliminary stage right now with this and would love some
> > > > > suggestions
> > > > > > from experienced users.
> > > > > > 
> > > > > > thank you,
> > > > > > 
> > > > > > Luis Guerrero
> > > > > 
> > > > 
> > > > 
> > > > --
> > > > Luis Carlos Guerrero Covo
> > > > M.S. Computer Engineering
> > > > (57) 3183542047
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic