'Re: G1 out of memory behaviour'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-gc-use
Subject:    Re: G1 out of memory behaviour
From:       Thomas Schatzl <thomas.schatzl () oracle ! com>
Date:       2013-10-31 13:20:42
Message-ID: 1383225642.2892.101.camel () cirrus
[Download RAW message or body]

Hi,

On Thu, 2013-10-31 at 13:53 +0100, Wolfgang Pedot wrote:
> Hi,
> 
> thanks for your explanations and effort, see my additional comments below.
> 
>  >
> >> What puzzles me is why there has not been a single visible
> >> OutOfMemoryError during the hole time while there are a whole bunch of
> >> different exceptions in the log. If the problem was a single thread an
> >> OOM could have terminated it. This application has been running for
> >> years (several weeks since the last update) and there has only been the
> >> one OOM situation before.
> >
> > The cause for this behavior is likely the large object/LOB.
> >
> > So the application allocates this LOB, does something, and additional
> > allocations trigger the full gc because the heap is completely full.
> >
> > This full gc can reclaim some space (there's no log output after the
> > full gc).
> >
> > This reclaimed space is large enough for G1 to continue for a little
> > while (i.e. the GC thinks everything is "okay"), however with only a
> > very small young gen, so these young GCs likely follow very closely upon
> > each other (explaining the high gc overhead of 92%), but making some
> > progress at least.
> 
> As I read the logs the young-gen is actually 0B and the CSet in those 
> collects consists of 0 regions so they do not seem to help much. There 
> is some progress during the full-GCs but because the values are in GB 
> its not possible to get exact numbers. I can see up to ~20 
> young-collects per second over a quite long time and the GC-overhead 
> reaches values above 99.5%.

Looking at this line for the young gc output you gave:

    [Eden: 0.0B(744.0M)->0.0B(744.0M) Survivors: 0.0B->0.0B Heap:
14.6G(14.6G)->14.6G(14.6G)]

It means that eden capacity is 744M (i.e. there is eden space
available), but there is nothing in it. Other than that the heap seems
full (14.6G of 14.6G used). And no CSet or survivors, but that is not
surprising given that the occupancy of eden regions before the
collection is zero bytes.

Another explanation try (given that I do not have enough log
information): G1 tries to allocate a LOB, but fails (seen by the failing
expansion requests). It then starts a young GC in the hope that the
young gc creates a large enough contiguous memory region. That does not
work out since the heap is full anyway.

For some reason the LOB can also not occupy the regions occupied by the
young gen (because of fragmentation of the young gen likely).

After that a full gc starts. That one manages to free enough memory I
guess - after all there are the 744M of the young gen, and as the heap
gets compacted, you will get a contiguous area of memory. Otherwise you
would get an OOME after a few unsuccessful attempts of full gcs in a
row.

So the application basically seems to allocate LOBs in a tight loop, do
something on it, and repeat.

If you think that might still be wrong, please provide a complete
sequence of log messages showing both young and full GCs.

> Would it be feasible to use the GC-overhead to decide that its time for 
> an OOM-Error?

What do the others think, it seems reasonable under the right
conditions.
Maybe you can you file a request for enhancement on
bugs.openjdk.java.net/bugs.sun.com?

Thomas

_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
[prev in list] [next in list] [prev in thread] [next in thread]