[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-gc-dev
Subject:    Re: Request for Review (xs) : 8038928 - gc/g1/TestGCLogMessages.java fail with "[Evacuation Failure'
From:       Jon Masamitsu <jon.masamitsu () oracle ! com>
Date:       2014-04-29 21:25:25
Message-ID: 536018C5.6080203 () oracle ! com
[Download RAW message or body]

On 04/28/2014 11:43 PM, Bengt Rutisson wrote:
>
> Hi Jon,
>
> On 4/28/14 11:17 PM, Jon Masamitsu wrote:
>> The requirement that an evacuation failure not happen during this
>> test is based on the expected behavior of the GC and is not a
>> required behavior.  In some instance the evacuation failure will
>> happen, but it is a not a  GC failure if it does and is only an
>> unexpected path being followed.
>>
>> The test is not reliable but before removing it, I've made
>> some changes to try and save it.  I've modified the
>> test to slow down the allocations and changed the allocation to
>> allocate smaller objects (which also has a side effect of slowing
>> allocations).   The goal is to detect gross breakages of
>> evacuation failure while risking only very, very rare spurious
>> failures.
>>
>> I had reproduced the failure with the unmodified test and it
>> would fail within 30 minutes.  With the modifications, I haven't
>> seen the failure in a day of testing.
>>
>> If the modifications don't work, I'll remove the test.
>>
>> http://cr.openjdk.java.net/~jmasa/8038928/webrev.00/
>>
>> https://bugs.openjdk.java.net/browse/JDK-8038928
>
> Slowing down the test does not seem like a stable solution. Just like 
> you point out.
>
> What do you think about this instead?
>
> The original code does:
>
> // create 128MB of garbage. This should result in at least one GC
> for (int i = 0; i < 1024; i++) {
>   garbage = new byte[128 * 1024];
> }
>
> We run with -Xmx10M but no -Xmn set. We should only ever promote one 
> object each GC, so I assume that what happens when we get an 
> evacuation failure is that we get too many GCs that it fills up the 
> old space.
>
> How about specifying -Xmn and only allocate enough to fill the young 
> gen a few times. Instead of allocating 128MB we could maybe run with 
> -Xmn2M and allocate 8MB worth of objects. That should be enough to get 
> a few GCs but not enough to fill the old gen up. If you want to be 
> really safe you could also increase -Xmx to something like 128M.

The new code for GCTest is

     public static void main(String [] args) {
       System.out.println("Creating garbage");
       // create 128MB of garbage. This should result in at least one GC
       for (int i = 0; i < 1024; i++) {
         work(i);
         for (int k = 0; k < 1024; k++) {
           garbage = new byte[128];
         }
       }
       System.out.println("Done");
     }
   }

This still does as many young GC's but promotes less since object size
is only about 128 bytes.  I think that puts less pressure on the old gen
the way doing fewer GC's as you suggest.  I ran this test (without the
work method) for about half day without any failures and  then I
got an evacuation failure.  I looked at the heap and the old gen was
full.  I thought then that the allocation rate was just too high.  I
tried lowering the initiating occupancy but ran into another bug.
I then added the work() method and it's been running (product and
fastdebug) for a couple of days.

I could reshape the heap as you say and avoid the evacuation
failure but I don't know how much value that is.  Might be the
same as removing the requirement for no evacuation failure.
I settled on this because I thought I was close to an evacuation
failure (if something in the G1 changed like slower mixed collections
or concurrent marking cycle starting way too late, this might
caught it) but not too close (so that it happened very rarely
if it was just happenstance).

My first thought was just to remove the requirement that no
evacuation failure happened, but I vaguely felt it might be worth trying
to save.

Jon

>
> Thanks,
> Bengt
>
>>
>> Thanks.
>>
>> Jon
>


[Attachment #3 (text/html)]

<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <div class="moz-cite-prefix">On 04/28/2014 11:43 PM, Bengt Rutisson
      wrote:<br>
    </div>
    <blockquote cite="mid:535F49F7.6060204@oracle.com" type="cite">
      <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
      <div class="moz-cite-prefix"><br>
        Hi Jon,<br>
        <br>
        On 4/28/14 11:17 PM, Jon Masamitsu wrote:<br>
      </div>
      <blockquote cite="mid:535EC568.7060207@oracle.com" type="cite">The
        requirement that an evacuation failure not happen during this <br>
        test is based on the expected behavior of the GC and is not a <br>
        required behavior.   In some instance the evacuation failure will
        <br>
        happen, but it is a not a   GC failure if it does and is only an
        <br>
        unexpected path being followed. <br>
        <br>
        The test is not reliable but before removing it, I've made <br>
        some changes to try and save it.   I've modified the <br>
        test to slow down the allocations and changed the allocation to
        <br>
        allocate smaller objects (which also has a side effect of
        slowing <br>
        allocations).     The goal is to detect gross breakages of <br>
        evacuation failure while risking only very, very rare spurious <br>
        failures. <br>
        <br>
        I had reproduced the failure with the unmodified test and it <br>
        would fail within 30 minutes.   With the modifications, I haven't
        <br>
        seen the failure in a day of testing. <br>
        <br>
        If the modifications don't work, I'll remove the test. <br>
        <br>
        <a moz-do-not-send="true" class="moz-txt-link-freetext"
          href="http://cr.openjdk.java.net/%7Ejmasa/8038928/webrev.00/">http://cr.openjdk.java.net/~jmasa/8038928/webrev.00/</a>
  <br>
        <br>
        <a moz-do-not-send="true" class="moz-txt-link-freetext"
          href="https://bugs.openjdk.java.net/browse/JDK-8038928">https://bugs.openjdk.java.net/browse/JDK-8038928</a>
  <br>
      </blockquote>
      <br>
      Slowing down the test does not seem like a stable solution. Just
      like you point out.<br>
      <br>
      What do you think about this instead?<br>
      <br>
      The original code does:<br>
      <br>
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      // create 128MB of garbage. This should result in at least one GC<br>
      for (int i = 0; i &lt; 1024; i++) {<br>
         garbage = new byte[128 * 1024];<br>
      }<br>
      <br>
      We run with
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      -Xmx10M but no -Xmn set. We should only ever promote one object
      each GC, so I assume that what happens when we get an evacuation
      failure is that we get too many GCs that it fills up the old
      space.<br>
      <br>
      How about specifying -Xmn and only allocate enough to fill the
      young gen a few times. Instead of allocating 128MB we could maybe
      run with -Xmn2M and allocate 8MB worth of objects. That should be
      enough to get a few GCs but not enough to fill the old gen up. If
      you want to be really safe you could also increase -Xmx to
      something like 128M.<br>
    </blockquote>
    <br>
    The new code for GCTest is<br>
    <br>
           public static void main(String [] args) {<br>
               System.out.println("Creating garbage");<br>
               // create 128MB of garbage. This should result in at least one
    GC<br>
               for (int i = 0; i &lt; 1024; i++) {<br>
                   work(i);<br>
                   for (int k = 0; k &lt; 1024; k++) {<br>
                       garbage = new byte[128];<br>
                   }<br>
               }<br>
               System.out.println("Done");<br>
           }<br>
       }<br>
    <br>
    This still does as many young GC's but promotes less since object
    size<br>
    is only about 128 bytes.   I think that puts less pressure on the old
    gen<br>
    the way doing fewer GC's as you suggest.   I ran this test (without
    the<br>
    work method) for about half day without any failures and   then I<br>
    got an evacuation failure.   I looked at the heap and the old gen was<br>
    full.   I thought then that the allocation rate was just too high.   I<br>
    tried lowering the initiating occupancy but ran into another bug.<br>
    I then added the work() method and it's been running (product and<br>
    fastdebug) for a couple of days. <br>
    <br>
    I could reshape the heap as you say and avoid the evacuation<br>
    failure but I don't know how much value that is.   Might be the<br>
    same as removing the requirement for no evacuation failure.<br>
    I settled on this because I thought I was close to an evacuation<br>
    failure (if something in the G1 changed like slower mixed
    collections<br>
    or concurrent marking cycle starting way too late, this might<br>
    caught it) but not too close (so that it happened very rarely<br>
    if it was just happenstance).   <br>
    <br>
    My first thought was just to remove the requirement that no<br>
    evacuation failure happened, but I vaguely felt it might be worth
    trying<br>
    to save. <br>
    <br>
    Jon<br>
    <br>
    <blockquote cite="mid:535F49F7.6060204@oracle.com" type="cite"> <br>
      Thanks,<br>
      Bengt<br>
      <br>
      <blockquote cite="mid:535EC568.7060207@oracle.com" type="cite"> <br>
        Thanks. <br>
        <br>
        Jon <br>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic