[prev in list] [next in list] [prev in thread] [next in thread]
List: solr-user
Subject: Re: How might one search for dupe IDs other than faceting on the ID field?
From: "Jack Krupansky" <jack () basetechnology ! com>
Date: 2013-07-31 13:12:13
Message-ID: 4D57735395C846FA8368D3F179CE7B0B () JackKrupansky
[Download RAW message or body]
Good to note!
But... any "search" will not detect dupe IDs for uncommitted documents.
-- Jack Krupansky
-----Original Message-----
From: Mikhail Khludnev
Sent: Wednesday, July 31, 2013 6:11 AM
To: solr-user
Subject: Re: How might one search for dupe IDs other than faceting on the ID
field?
fwiw,
this code won't capture uncommitted duplicates.
On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen <dotancohen@gmail.com> wrote:
> On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
> <jack@basetechnology.com> wrote:
> > The Solr SignatureUpdateProcessorFactory is designed to facilitate
> dedupe...
> > any particular reason you did not use it?
> >
> > See:
> > http://wiki.apache.org/solr/Deduplication
> >
> > and
> >
> > https://cwiki.apache.org/confluence/display/solr/De-Duplication
> >
>
> Actually, the guy who made the changes (a coworker) did in fact write
> an alternative UpdateHandler. I've just noticed that there are a bunch
> of dupes right now, though.
>
> public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 {
>
> public DiscoAPIUpdateHandler(SolrCore core) {
> super(core);
> }
>
> @Override
> public int addDoc(AddUpdateCommand cmd) throws IOException{
>
> // if overwrite is set to false we'll use the
> DefaultUpdateHandler2 , this is done for debugging to insert
> duplicates to solr
> if (!cmd.overwrite) return super.addDoc(cmd);
>
>
> // when using ref counted objects you have!! to decrement the
> ref count when your done
> RefCounted<SolrIndexSearcher> indexSearcher =
> this.core.getNewestSearcher(false);
>
> // the idea is like this we'll make an internal lucene query
> and check if that id already exists
>
> Term updateTerm = null;
>
>
> if (cmd.updateTerm != null){
> updateTerm = cmd.updateTerm;
> } else {
> updateTerm = new Term("id",cmd.getIndexedId());
> }
>
>
> Query query = new TermQuery(updateTerm);
> TopDocs docs = indexSearcher.get().search(query,2);
>
> if (docs.totalHits>0){
> // index searcher is no longer needed
> indexSearcher.decref();
> // don't add the new document
> return 0;
> }
>
> // index searcher is no longer needed
> indexSearcher.decref();
>
> // if i'm here then it's a new document
> return super.addDoc(cmd);
>
> }
>
> }
>
>
> > And I give a bunch of examples in my book.
> >
>
> I anticipate the book with esteem!
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
<http://www.griddynamics.com>
<mkhludnev@griddynamics.com>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic