[prev in list] [next in list] [prev in thread] [next in thread]
List: solr-user
Subject: Re: How might one search for dupe IDs other than faceting on the ID field?
From: Mikhail Khludnev <mkhludnev () griddynamics ! com>
Date: 2013-07-31 10:11:46
Message-ID: CANGii8fQ-6dTSbuWWxPb7DYp46GDQ6NJkS8f2Rj+cpLKqdKWvw () mail ! gmail ! com
[Download RAW message or body]
fwiw,
this code won't capture uncommitted duplicates.
On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen <dotancohen@gmail.com> wrote:
> On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
> <jack@basetechnology.com> wrote:
> > The Solr SignatureUpdateProcessorFactory is designed to facilitate
> dedupe...
> > any particular reason you did not use it?
> >
> > See:
> > http://wiki.apache.org/solr/Deduplication
> >
> > and
> >
> > https://cwiki.apache.org/confluence/display/solr/De-Duplication
> >
>
> Actually, the guy who made the changes (a coworker) did in fact write
> an alternative UpdateHandler. I've just noticed that there are a bunch
> of dupes right now, though.
>
> public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 {
>
> public DiscoAPIUpdateHandler(SolrCore core) {
> super(core);
> }
>
> @Override
> public int addDoc(AddUpdateCommand cmd) throws IOException{
>
> // if overwrite is set to false we'll use the
> DefaultUpdateHandler2 , this is done for debugging to insert
> duplicates to solr
> if (!cmd.overwrite) return super.addDoc(cmd);
>
>
> // when using ref counted objects you have!! to decrement the
> ref count when your done
> RefCounted<SolrIndexSearcher> indexSearcher =
> this.core.getNewestSearcher(false);
>
> // the idea is like this we'll make an internal lucene query
> and check if that id already exists
>
> Term updateTerm = null;
>
>
> if (cmd.updateTerm != null){
> updateTerm = cmd.updateTerm;
> } else {
> updateTerm = new Term("id",cmd.getIndexedId());
> }
>
>
> Query query = new TermQuery(updateTerm);
> TopDocs docs = indexSearcher.get().search(query,2);
>
> if (docs.totalHits>0){
> // index searcher is no longer needed
> indexSearcher.decref();
> // don't add the new document
> return 0;
> }
>
> // index searcher is no longer needed
> indexSearcher.decref();
>
> // if i'm here then it's a new document
> return super.addDoc(cmd);
>
> }
>
> }
>
>
> > And I give a bunch of examples in my book.
> >
>
> I anticipate the book with esteem!
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
<http://www.griddynamics.com>
<mkhludnev@griddynamics.com>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic