[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: How might one search for dupe IDs other than faceting on the ID field?
From:       Michael Della Bitta <michael.della.bitta () appinions ! com>
Date:       2013-07-30 18:23:00
Message-ID: CAPe6Lt0qVstA_krzW-ghnryX=ajvDwmGwQu86HeTmMr107-dNQ () mail ! gmail ! com
[Download RAW message or body]


Are you talking about the document's ID field?

If so, you can't have duplicates... the latter document would overwrite the
earlier.

If not, sorry for asking irrelevant questions. :)

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
                
w: appinions.com <http://www.appinions.com/>


On Tue, Jul 30, 2013 at 2:16 PM, Dotan Cohen <dotancohen@gmail.com> wrote:

> To search for duplicate IDs, I am running the following query:
> select?q=*:*&facet=true&facet.field=id&rows=0
> 
> However, since upgrading from Solr 4.1 to Solr 4.3 I am receiving
> OutOfMemoryError errors instead of the desired facet:
> 
> <response><lst name="error"><str
> name="msg">java.lang.OutOfMemoryError: Java heap space</str><str
> name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError:
> Java heap space
> at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:670)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
> at ...
> 
> Might there be a less resource-intensive way to get this information.
> This is Solr 4.3 running on Ubuntu Server 12.04 in Jetty. The index
> has over 100,000,000 small records, for a total of about 95 GiB of
> disk space, with Solr running on it's own disk. Actually, the 'disk'
> is an Amazon Web Service EBS volume.
> 
> --
> Dotan Cohen
> 
> http://gibberish.co.il
> http://what-is-what.com
> 



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic