[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-gc-dev
Subject:    Re: Understanding High Object Copy Times
From:       "yu.zhang () oracle ! com" <yu ! zhang () oracle ! com>
Date:       2017-04-28 22:44:05
Message-ID: 968aa9ca-7323-05bd-2b94-81e41bc2b313 () oracle ! com
[Download RAW message or body]

Hi, Nezih,

I agree with your estimation. There is not much promotion going on.

Though I do not see high system cpu, if you are on a Linux system, can 
you verify that the OS Transparent Huge Page is off?

Remember set is the data structure that keep track of pointers from 
other regions to the region we are operating on. If there are pointers 
from old regions to the objects in the young regions, we can not collect 
that object.
After a gc pause, the live objects are evacuated to either the survivor 
or old gen regions. Then we need to update the Remember set. That is 
what I was referring to.
It is part of object copy.

Like Kirk mentioned, the parallelism of the gc threads seems fine. 
Charlie.Hunt suggested maybe the object graph is very deep so that the 
gc threads can not steal work and end up spinning.

Some things we can try:
-XX:MaxGCPauseMillis=<200> Maybe increase it to 400
Maybe try to reduce the -XX:ParallelGCThreads

|-XX:+G1SummarizeRSetStats -XX:G1SummarizeRSetStatsPeriod=10|


It is very expensive to print those statistics, for now we know there is 
a lot of coarsening. We can make G1SumarizeRSetStatsPeriod=100

The RSet footprint is so big,  increasing -XX:G1RSetRegionEntries will 
get rid of coarsening but make the memory footprint bigger. We can delay 
this for now.

What is your hardware/software configuration? cpu/memory/cores? There is 
no swapping, right?

Thanks
Jenny

On 04/27/2017 08:52 PM, nezih yigitbasi wrote:
> My first question is incorrect actually, so I am giving a better 
> example to rephrase my question. At time 
> "2017-04-26T03:12:24.259-0700"  there is a young GC that took 35.47 s, 
> where object copy took 28983.4 ms. In that event I see the following log:
>
>    [Eden: 7168.0M(7168.0M)->0.0B(7168.0M) Survivors: 1024.0M->1024.0M 
> Heap: 153.4G(160.0G)->149.4G(160.0G)]
>
> My interpretation for this is, ~4GB of garbage was collected from heap 
> in total and we see that eden usage goes down by ~7GB, this means ~3GB 
> of the eden was live objects. Is this interpretation correct? If it 
> is, how come copying over 3GB takes ~29s? In your answer you said 
> "most of the object copy time is dealing with the Remember Set", can 
> you please give some details about what kind of operations on rsets 
> are done during the object copy phase, and can we see that from these 
> logs?
>
> Thanks again,
> Nezih
>
>
> 2017-04-27 17:20 GMT-07:00 nezih yigitbasi <nezihyigitbasi@gmail.com 
> <mailto:nezihyigitbasi@gmail.com>>:
>
>     Thanks for the suggestions. We use the default pause time. And
>     here is our entire set of JVM args:
>     https://gist.github.com/nezihyigitbasi/04f5fdb9c32ac56097011819e20602d8
>     <https://gist.github.com/nezihyigitbasi/04f5fdb9c32ac56097011819e20602d8>
>
>
>     I have some followup questions:
>
>     - In some case the object copy took 39406.8 ms, even if the
>     remembered set is ~30GB isn't this too slow (that's <1GB/s of data)?
>     - Is there any way to reduce the rset overhead?
>     - My initial thought when I saw the high object copy times was
>     there may be some sort of contention to have such a low throughput
>     during the copy. Although it may not be the case here, I just
>     wonder whether there is a way to see the amount of contention from
>     the gc logs?
>
>     Nezih
>
>
>     2017-04-27 16:58 GMT-07:00 Jenny Zhang <yu.zhang@oracle.com
>     <mailto:yu.zhang@oracle.com>>:
>
>         Hi, Hezih,
>
>         It seems this workload is very heavy on Remember Set. It has
>         about 31G native memory for RSet (old gen) and still with
>         coarsening.
>
>         What is you pause time goal? The default (200ms) might be too
>         small for you.  Can you increase that so G1 can increase the
>         young gen size? Since there is not much promotion, I guess
>         most of the object copy time is dealing with the Remember Set.
>
>         There are other things you can try, like increase the
>         G1RSetReginEntries, but the memory footprint will be bigger.
>
>         So if you can, I suggest increase the pause time goal first.
>
>         Thanks
>
>         Jenny
>
>
>
>
>         On 4/27/2017 9:22 AM, nezih yigitbasi wrote:
>
>             Hi,
>             We see huge object copy times (and relatively high
>             termination times) during young GCs in our production
>             system running on Java 1.8.0_112-b15. You can find the GC
>             logs here:
>             https://gist.github.com/nezihyigitbasi/1f7a92da7860908a611cb1197bd8626b
>             <https://gist.github.com/nezihyigitbasi/1f7a92da7860908a611cb1197bd8626b>
>
>             The young GC times start going high after the timestamp
>             "2017-04-26T03:07:22.164-0700".
>
>             I will appreciate if you can give some details about:
>             - what goes into the "Object Copy" phase during young GCs
>             and how we can reduce it.
>             - why we see high Termination times and what we can do
>             about it
>
>             Thanks,
>             Nezih
>
>
>
>


[Attachment #3 (text/html)]

<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p>Hi, Nezih,</p>
    I agree with your estimation. There is not much promotion going on.<br>
    <br>
    Though I do not see high system cpu, if you are on a Linux system,
    can you verify that the OS Transparent Huge Page is off?<br>
    <br>
    Remember set is the data structure that keep track of pointers from
    other regions to the region we are operating on. If there are
    pointers from old regions to the objects in the young regions, we
    can not collect that object.<br>
    After a gc pause, the live objects are evacuated to either the
    survivor or old gen regions. Then we need to update the Remember
    set. That is what I was referring to.<br>
    It is part of object copy.<br>
    <br>
    Like Kirk mentioned, the parallelism of the gc threads seems fine.
    Charlie.Hunt suggested maybe the object graph is very deep so that
    the gc threads can not steal work and end up spinning.<br>
    <br>
    Some things we can try:<br>
    -XX:MaxGCPauseMillis=&lt;200&gt; Maybe increase it to 400<br>
    Maybe try to reduce the -XX:ParallelGCThreads<br>
    <br>
    <meta http-equiv="content-type" content="text/html; charset=utf-8">
    <pre style="box-sizing: border-box; font-family: SFMono-Regular, Consolas, \
&quot;Liberation Mono&quot;, Menlo, Courier, monospace; font-size: 13.6px; \
margin-top: 0px !important; margin-bottom: 0px !important; font-style: normal; \
font-variant: normal; font-weight: normal; font-stretch: normal; line-height: 1.45; \
word-wrap: normal; padding: 16px; overflow: auto; background-color: rgb(246, 248, \
250); border-radius: 3px; color: rgb(36, 41, 46); letter-spacing: normal; orphans: 2; \
text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: \
0px; -webkit-text-stroke-width: 0px;"><code style="box-sizing: border-box; \
font-family: SFMono-Regular, Consolas, &quot;Liberation Mono&quot;, Menlo, Courier, \
monospace; font-size: 13.6px; padding: 0px; margin: 0px; background: transparent; \
border-radius: 3px; word-break: normal; white-space: pre; border: 0px; display: \
inline; overflow: visible; line-height: inherit; word-wrap: \
                normal;">-XX:+G1SummarizeRSetStats
-XX:G1SummarizeRSetStatsPeriod=10</code></pre>
    <br>
    It is very expensive to print those statistics, for now we know
    there is a lot of coarsening. We can make
    G1SumarizeRSetStatsPeriod=100<br>
    <br>
    The RSet footprint is so big,   increasing <span style="font-family:
      Georgia; font-size: 18px; font-style: normal; font-variant-caps:
      normal; font-weight: normal; letter-spacing: normal; text-align:
      start; text-indent: 0px; text-transform: none; white-space:
      normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float:
      none; display: inline !important;" class="">-XX:G1RSetRegionEntries
      will get rid of coarsening but make the memory footprint bigger.
      We can delay this for now.<br>
      <br>
      What is your hardware/software configuration? cpu/memory/cores?
      There is no swapping, right?<br>
      <br>
      Thanks<br>
      Jenny<br>
      <br>
    </span>
    <div class="moz-cite-prefix">On 04/27/2017 08:52 PM, nezih yigitbasi
      wrote:<br>
    </div>
    <blockquote
cite="mid:CALur2hV1rRjqi0oAh17nNeJjKgz8sa7fqpWU8XPjZsA5weJfDw@mail.gmail.com"
      type="cite">
      <div dir="ltr">My first question is incorrect actually, so I am
        giving a better example to rephrase my question. At time
        "2017-04-26T03:12:24.259-0700"   there is a young GC that
        took  35.47 s, where object copy took 28983.4 ms. In that event I
        see the following log:
        <div><br>
        </div>
        <div>
          <div>     [Eden: 7168.0M(7168.0M)-&gt;0.0B(7168.0M) Survivors:
            1024.0M-&gt;1024.0M Heap: 153.4G(160.0G)-&gt;149.4G(160.0G)]</div>
          <div><br>
          </div>
          <div>My interpretation for this is, ~4GB of garbage was
            collected from heap in total and we see that eden usage goes
            down by ~7GB, this means ~3GB of the eden was live objects.
            Is this interpretation correct? If it is, how come copying
            over 3GB takes ~29s? In your answer you said "most of the
            object copy time is dealing with the Remember Set", can you
            please give some details about what kind of operations on
            rsets are done during the object copy phase, and can we see
            that from these logs?</div>
          <div><br>
          </div>
          <div>Thanks again,</div>
          <div>Nezih</div>
          <div><br>
          </div>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">2017-04-27 17:20 GMT-07:00 nezih
          yigitbasi <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:nezihyigitbasi@gmail.com" \
target="_blank">nezihyigitbasi@gmail.com</a>&gt;</span>:<br>  <blockquote \
                class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">Thanks for the suggestions. We use the
              default pause time. And here is our entire set of JVM
              args:  <a moz-do-not-send="true"
href="https://gist.github.com/nezihyigitbasi/04f5fdb9c32ac56097011819e20602d8"
                target="_blank">https://gist.github.com/<wbr>nezihyigitbasi/<wbr>04f5fdb9c32ac56097011819e20602<wbr>d8</a>
  <div><br>
              </div>
              <div>I have some followup questions:<br>
                <div>
                  <div><br>
                  </div>
                  <div>- In some case the object copy took 39406.8 ms,
                    even if the remembered set is ~30GB isn't this too
                    slow (that's &lt;1GB/s of data)?</div>
                  <div>- Is there any way to reduce the rset overhead?</div>
                  <div>- My initial thought when I saw the high object
                    copy times was there may be some sort of contention
                    to have such a low throughput during the copy.
                    Although it may not be the case here, I just wonder
                    whether there is a way to see the amount of
                    contention from the gc logs?</div>
                  <span class="HOEnZb"><font color="#888888">
                      <div><br>
                      </div>
                      <div>Nezih</div>
                    </font></span>
                  <div>
                    <div class="h5">
                      <div><br>
                      </div>
                      <div>
                        <div>
                          <div class="gmail_extra"><br>
                            <div class="gmail_quote">2017-04-27 16:58
                              GMT-07:00 Jenny Zhang <span dir="ltr">&lt;<a
                                  moz-do-not-send="true"
                                  href="mailto:yu.zhang@oracle.com"
                                  \
target="_blank">yu.zhang@oracle.com</a>&gt;</span>:<br>  <blockquote \
class="gmail_quote"  style="margin:0px 0px 0px
                                0.8ex;border-left:1px solid
                                rgb(204,204,204);padding-left:1ex">Hi,
                                Hezih,<br>
                                <br>
                                It seems this workload is very heavy on
                                Remember Set. It has about 31G native
                                memory for RSet (old gen) and still with
                                coarsening.<br>
                                <br>
                                What is you pause time goal? The default
                                (200ms) might be too small for you.   Can
                                you increase that so G1 can increase the
                                young gen size? Since there is not much
                                promotion, I guess most of the object
                                copy time is dealing with the Remember
                                Set.<br>
                                <br>
                                There are other things you can try, like
                                increase the G1RSetReginEntries, but the
                                memory footprint will be bigger.<br>
                                <br>
                                So if you can, I suggest increase the
                                pause time goal first.<br>
                                <br>
                                Thanks<span
                                  class="m_8770557251507851110gmail-HOEnZb"><font
                                    color="#888888"><br>
                                    <br>
                                    Jenny</font></span>
                                <div
                                  class="m_8770557251507851110gmail-HOEnZb">
                                  <div
                                    class="m_8770557251507851110gmail-h5"><br>
                                    <br>
                                    <br>
                                    <br>
                                    On 4/27/2017 9:22 AM, nezih
                                    yigitbasi wrote:<br>
                                    <blockquote class="gmail_quote"
                                      style="margin:0px 0px 0px
                                      0.8ex;border-left:1px solid
                                      rgb(204,204,204);padding-left:1ex">
                                      Hi,<br>
                                      We see huge object copy times (and
                                      relatively high termination times)
                                      during young GCs in our production
                                      system running on Java
                                      1.8.0_112-b15. You can find the GC
                                      logs here: <a
                                        moz-do-not-send="true"
href="https://gist.github.com/nezihyigitbasi/1f7a92da7860908a611cb1197bd8626b"
                                        rel="noreferrer" \
target="_blank">https://gist.github.com/nezihy<wbr>igitbasi/1f7a92da7860908a611cb<wbr>1197bd8626b</a><br>
  <br>
                                      The young GC times start going
                                      high after the timestamp
                                      "2017-04-26T03:07:22.164-0700"<wbr>.<br>
                                      <br>
                                      I will appreciate if you can give
                                      some details about:<br>
                                      - what goes into the "Object Copy"
                                      phase during young GCs and how we
                                      can reduce it.<br>
                                      - why we see high Termination
                                      times and what we can do about it<br>
                                      <br>
                                      Thanks,<br>
                                      Nezih<br>
                                    </blockquote>
                                    <br>
                                  </div>
                                </div>
                              </blockquote>
                            </div>
                            <br>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic