[prev in list] [next in list] [prev in thread] [next in thread]
List: solr-user
Subject: Re: How might one search for dupe IDs other than faceting on the ID field?
From: Mikhail Khludnev <mkhludnev () griddynamics ! com>
Date: 2013-07-30 20:00:07
Message-ID: CANGii8c9+DESgAUvczhQnPzJ2-CC6iUYMNeQzG-yMH9wEO-uSg () mail ! gmail ! com
[Download RAW message or body]
Dotan,
Could you please provide more line of the stack trace?
I have no idea why it made worse at 4.3. I know that 4.3 can use facets
backed on DocValues, which are modest for the heap. But from what I saw,
but can be wrong it's disabled from numeric facets. Hence, I can suggest to
reindex id as string docvalues and hope for them. However, it's doubtful to
reindex everything without strong guaranties.
Also, I checked source code of
http://wiki.apache.org/solr/TermsComponentand found that it can be
really memory modest (ie without sort nor limit).
Be aware that df-s returned by that component are unaware of deleted
document, hence expungeDeletes before.
On Tue, Jul 30, 2013 at 10:16 PM, Dotan Cohen <dotancohen@gmail.com> wrote:
> To search for duplicate IDs, I am running the following query:
> select?q=*:*&facet=true&facet.field=id&rows=0
>
> However, since upgrading from Solr 4.1 to Solr 4.3 I am receiving
> OutOfMemoryError errors instead of the desired facet:
>
> <response><lst name="error"><str
> name="msg">java.lang.OutOfMemoryError: Java heap space</str><str
> name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError:
> Java heap space
> at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:670)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
> at ...
>
> Might there be a less resource-intensive way to get this information.
> This is Solr 4.3 running on Ubuntu Server 12.04 in Jetty. The index
> has over 100,000,000 small records, for a total of about 95 GiB of
> disk space, with Solr running on it's own disk. Actually, the 'disk'
> is an Amazon Web Service EBS volume.
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
<http://www.griddynamics.com>
<mkhludnev@griddynamics.com>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic