[prev in list] [next in list] [prev in thread] [next in thread]
List: solr-user
Subject: Re: Solr Memory Usage
From: Toke Eskildsen <te () statsbiblioteket ! dk>
Date: 2014-10-30 9:27:08
Message-ID: 1414661228.2707.7.camel () te-prime
[Download RAW message or body]
On Wed, 2014-10-29 at 23:37 +0100, Will Martin wrote:
> This command only touches OS level caches that hold pages destined for (or
> not) the swap cache. Its use means that disk will be hit on future requests,
> but in many instances the pages were headed for ejection anyway.
>
> It does not have anything whatsoever to do with Solr caches.
If you re-read my post, you will see "the OS had to spend a lot of
resources just bookkeeping memory". OS, not JVM.
> It also is not fragmentation related; it is a result of the kernel
> managing virtual pages in an "as designed manner". The proper command
> is
>
> #sync; echo 3 >/proc/sys/vm/drop_caches.
I just talked with a Systems guy to verify what happened when we had
the problem:
- The machine spawned Xmx1g JVMs with Tika, each instance processing a
single 100M ARC file, sending the result to a shared Solr instance
and shutting down. 40 instances were running at all times, each
instance living for a little less than 3 minutes.
Besides taking ~40GB of RAM in total, this also meant that about 10GB
of RAM was released and re-requested from the system each minute.
I don't know how the memory mapping in Solr works with regard to
re-use of existing allocations, so I can't say if Solr added to than
number or not.
- The indexing speed deteriorated after some days, grinding down to
(loose guess) something like 1/4th of initial speed.
- Running top showed that the majority of time was spend in the kernel.
- Running "echo 3 >/proc/sys/vm/drop_caches" (I asked Systems explicitly
about the integer and it was '3') brought the speed back to the
initial level. The temporary patch was to run it once every hour.
- Running top with the patch showed the vast majority of time was spend
in user space.
- Systems investigated and determined that "huge pages" were
automatically requested by processes on the machine, leading to
(virtual) memory fragmentation on the OS level. They used a tool in
'sysfsutils' (just relaying what they said here) to change the default
from huge pages to small pages (or whatever the default is named).
- The disabling of huge pages made the problem go away and we no longer
use the drop_caches-trick.
> http://linux.die.net/man/5/proc
>
> I have encountered resistance on the use of this on long-running processes
> for years ... from people who don't even research the matter.
The resistance is natural: Although it might work to drop_cache, as it
did for us, it is still symptom treatment. Until the cause has been
isolated and determined to be practically unresolvable, the drop_cache
is a red flag.
Your undetermined core problem might not be the same as ours, but it is
simple to check: Watch kernel time percentage. If it rises over time,
try disabling huge pages.
- Toke Eskildsen, State and University Library, Denmark
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic