[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: How might one search for dupe IDs other than faceting on the ID field?
From:       Dotan Cohen <dotancohen () gmail ! com>
Date:       2013-07-31 5:41:32
Message-ID: CAKDXFkOQw4YZL-pRRe4Wd36nyEbNZnZNLLGFXX5Q_mbqxVNa2A () mail ! gmail ! com
[Download RAW message or body]

On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
<jack@basetechnology.com> wrote:
> The Solr SignatureUpdateProcessorFactory is designed to facilitate dedupe...
> any particular reason you did not use it?
>
> See:
> http://wiki.apache.org/solr/Deduplication
>
> and
>
> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>

Actually, the guy who made the changes (a coworker) did in fact write
an alternative UpdateHandler. I've just noticed that there are a bunch
of dupes right now, though.

public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 {

    public DiscoAPIUpdateHandler(SolrCore core) {
        super(core);
    }

    @Override
    public int  addDoc(AddUpdateCommand cmd) throws IOException{

        // if overwrite is set to false we'll use the
DefaultUpdateHandler2 , this is done for debugging to insert
duplicates to solr
        if (!cmd.overwrite) return super.addDoc(cmd);


        // when using ref counted objects you have!! to decrement the
ref count when your done
        RefCounted<SolrIndexSearcher> indexSearcher =
this.core.getNewestSearcher(false);

        // the idea is like this we'll make an internal lucene query
and check if that id already exists

        Term updateTerm = null;


        if (cmd.updateTerm != null){
            updateTerm = cmd.updateTerm;
        } else {
            updateTerm = new Term("id",cmd.getIndexedId());
        }


        Query query = new TermQuery(updateTerm);
        TopDocs docs = indexSearcher.get().search(query,2);

        if (docs.totalHits>0){
            // index searcher is no longer needed
            indexSearcher.decref();
            // don't add the new document
            return 0;
        }

        // index searcher is no longer needed
        indexSearcher.decref();

        // if i'm here then it's a new document
        return super.addDoc(cmd);

    }

}


> And I give a bunch of examples in my book.
>

I anticipate the book with esteem!

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic