[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: How might one search for dupe IDs other than faceting on the ID field?
From:       "Jack Krupansky" <jack () basetechnology ! com>
Date:       2013-07-31 13:12:13
Message-ID: 4D57735395C846FA8368D3F179CE7B0B () JackKrupansky
[Download RAW message or body]

Good to note!

But... any "search" will not detect dupe IDs for uncommitted documents.

-- Jack Krupansky

-----Original Message----- 
From: Mikhail Khludnev
Sent: Wednesday, July 31, 2013 6:11 AM
To: solr-user
Subject: Re: How might one search for dupe IDs other than faceting on the ID 
field?

fwiw,

this code won't capture uncommitted duplicates.


On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen <dotancohen@gmail.com> wrote:

> On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
> <jack@basetechnology.com> wrote:
> > The Solr SignatureUpdateProcessorFactory is designed to facilitate
> dedupe...
> > any particular reason you did not use it?
> >
> > See:
> > http://wiki.apache.org/solr/Deduplication
> >
> > and
> >
> > https://cwiki.apache.org/confluence/display/solr/De-Duplication
> >
>
> Actually, the guy who made the changes (a coworker) did in fact write
> an alternative UpdateHandler. I've just noticed that there are a bunch
> of dupes right now, though.
>
> public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 {
>
>     public DiscoAPIUpdateHandler(SolrCore core) {
>         super(core);
>     }
>
>     @Override
>     public int  addDoc(AddUpdateCommand cmd) throws IOException{
>
>         // if overwrite is set to false we'll use the
> DefaultUpdateHandler2 , this is done for debugging to insert
> duplicates to solr
>         if (!cmd.overwrite) return super.addDoc(cmd);
>
>
>         // when using ref counted objects you have!! to decrement the
> ref count when your done
>         RefCounted<SolrIndexSearcher> indexSearcher =
> this.core.getNewestSearcher(false);
>
>         // the idea is like this we'll make an internal lucene query
> and check if that id already exists
>
>         Term updateTerm = null;
>
>
>         if (cmd.updateTerm != null){
>             updateTerm = cmd.updateTerm;
>         } else {
>             updateTerm = new Term("id",cmd.getIndexedId());
>         }
>
>
>         Query query = new TermQuery(updateTerm);
>         TopDocs docs = indexSearcher.get().search(query,2);
>
>         if (docs.totalHits>0){
>             // index searcher is no longer needed
>             indexSearcher.decref();
>             // don't add the new document
>             return 0;
>         }
>
>         // index searcher is no longer needed
>         indexSearcher.decref();
>
>         // if i'm here then it's a new document
>         return super.addDoc(cmd);
>
>     }
>
> }
>
>
> > And I give a bunch of examples in my book.
> >
>
> I anticipate the book with esteem!
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhludnev@griddynamics.com> 

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic