[prev in list] [next in list] [prev in thread] [next in thread]
List: openjdk-hotspot-gc-dev
Subject: =?utf-8?b?UmU6IGhvdyB0byB0dW5lIGdjIGZvciB0b21jYXQgc2VydmVyIG9uIGxh?= =?utf-8?b?cmdlIG1hY2hpbmUgdGhhd
From: "Michal Frajt" <michal () frajt ! eu>
Date: 2017-12-14 16:34:00
Message-ID: P0YM0P$8638B4C73253E0D5D9FBABAC30E5100F () frajt ! eu
[Download RAW message or body]
Hi Andy,
How many ConcurrentHashMap instances do you actually have in your 16 gig heap? Not \
sure if I understand your map structure correctly - "But the first char of the key \
takes you to the second tier of ConcurrentHashMaps and so". Could you provide \
historgram of your application when running full (before you start LRU sweeping)? Do \
you need the ConcurrentHashMaps if you have several tiers which already act as \
concurrent segments? Did you consider open addressing maps (Trove, Koloboke) \
eliminating the need of the map nodes (there would be some trade off when removing)? \
Did you consider to store char or even byte array instead of the String instance? Do \
your remove ConcurrentHashMap tier when it gets completely empty after the LRU sweep? \
All this might significantly reduce the heap requirement shortening the GC time.
Regards,
Michal
Od: "hotspot-gc-dev" hotspot-gc-dev-bounces@openjdk.java.net
Komu: "Andy Nuss" andrew_nuss@yahoo.com
Kopie: "hotspot-gc-dev@openjdk.java.net openjdk.java.net" \
hotspot-gc-dev@openjdk.java.net
Datum: Thu, 14 Dec 2017 08:19:21 +0100
Předmet: Re: how to tune gc for tomcat server on large machine that uses almost all \
old generation smallish objects
Hi Andy,
What you are describing is fairly routine caching behavior with a small twist in that \
the objects being held in this case are quite regular in size. Again, I wouldn't \
design with the collector in mind where as I certainly design with memory efficiency \
as a reasonable goal. As for GC, in the JVM there are two basic strategies which I \
then to label evacuating and in-place. G1 is completely evacuating and consequently \
the cost (aka pause duration) is (in most cases) a function of the number of live \
objects. The trigger for a young generational collection is when you have consumed \
all of the Eden regions. Thus the frequency is the size of Eden divided by your \
allocation rate. The trigger for a Concurrent Mark of tenured is when it consumes 45% \
of available heap. Thus your Concurrent Mark frequency is 45% to the size of heap / \
promotion rate. Additionally G1 keeps some memory on reserve to avoid painting the \
collector into a Full GC corner. Issues specific to caching are; very large live sets \
that result in inflated copy costs as data flows from Eden through survivor and \
finally into tenured space. In these case I've found that it's better slow down the \
frequency of collections as this will result in you experiencing the same pause \
time but less frequently. There is also another tactic that I've found to be helpful \
on occasion is to lower the Initiating Heap Occupancy Percent (aka IHOP) from it's \
default value of 45% into a value that sees is consistantly in the live set. Meaning, \
you'll run back to back concurrent cycles. And I've got a bag of other tactics that \
I've used with varying degrees of success. Which one would be for you? I've no idea. \
Tuning a collector isn't something you can do after reading a few tips from \
StackOverflow. GC behavior is an emergent reaction to the workload that you place on \
it meaning the only way to really understand how it's all going to work is to run \
production like experiments (or better yet, run in production) and look at a GC log. \
(Shameless plug.. Censum, my GC log visualization tooling helps). I understand your \
concerns in wanting to avoid the dreaded GC pause but I'd also look at your efforts \
in two ways. First, it's an opportunity to get a better understanding of GC and \
secondly, recognize that this feels like a premature optimization as you're trying to \
solve a problem that you, well none of us to be fair and honest, fully understand and \
may not actually have. Let me recommend some names that have written about how G1 \
works. Charlie Hunt in his performance tuning book, Poonan Parhhar in her blog \
entries, Monica Beckwith in a number of different places, Simone Bordet in a number \
of places. I should add that hotspot-gc-use@openjdk.java.net is a more appropriate \
list for these types of questions. We also have a number of GC related discussions on \
our mailing list, friends@jclarity.com. I've also recorded a session with Dr. Heinz \
Kabutz on his https://javaspecialists.teachable.com/ site. I'll get an exact link if \
you email me offline. Kind regards,Kirk Pepperdine On Dec 13, 2017, at 9:55 PM, Andy \
Nuss <andrew_nuss@yahoo.com> wrote: Let me try to explain. On a 16 gig heap, I \
anticipate almost 97% of the heap in use at any given moment is ~30 and ~100 char \
strings. The rest is small pointer objects in the ConcurrentHashMap, also longly \
held, and tomcat's nio stuff. So at any moment in time, most of the in-use heap \
(and I will keep about 20% unused to aid gc), is a huge number of longly held \
strings. Over time, as the single servlet receives requests to cache newly accessed \
key/val pairs, the number of strings grows to its maximum I allow. At that point, a \
background thread sweeps away half of the LRU key/value pairs (30,100 char strings). \
Now they are unreferenced and sweepable. That's all I do. Then the servlet keeps \
receiving requests to put more key/val pairs. As well as handle get requests. At \
the point in time where I clear all the LRU pairs, which might take minutes to \
iterate, G1 can start doing its thing, not that it will know to do so immediately. \
I'm worried that whenever G1 does its thing, because the sweepable stuff is 100% \
small oldgen objects, servlet threads will timeout on the client side. Not that \
this happens several times a day, but if G1 does take a long time to sweep a massive \
heap with all oldgen objects that are small, the *only* concern is that servlet \
requests will time out during this period. Realize I know nothing about GC, except \
that periodically, eclipse hangs due to gc and then crashes on me. I.e. after 4 \
hours of editing. And that all the blogs I found talked about newgen and TLAB and \
other things assuming typical ephemeral usage going on which is not at all the case \
on this particular machine instance. Again, all longly held small strings, growing \
and growing over time steadily, suddenly half are freed reference wise by me. If \
there are no GC settings that make that sweepable stuff happen in a non-blocking \
thread, and tomcat's servlets could all hang once every other day for many many \
seconds on this 16 gig machine (the so-called long gc-pause that people blog about), \
that might motivate me to abandon this and use the memcached product.
On Wednesday, December 13, 2017, 12:15:38 PM PST, Kirk \
Pepperdine <kirk@kodewerk.com> wrote:
Hi Andy,
On Dec 13, 2017, at 8:34 PM, Andy Nuss <andrew_nuss@yahoo.com> wrote:
Thanks Kirk,
The array is just a temporary buffer held onto that has its entries cleared to null \
after my LRU sweep. The references that are freed to GC are in the \
ConcurrentHashMaps, and are all 30 char and 100 char strings, key/vals, but not \
precisely, so I assume that when I do my LRU sweep when needed, its freeing a ton of \
small strings,
which G1 has to reallocate into bigger chunks, and mark freed, and so,
Not sure I understand this bit. Can you explain what you mean by this?
so that I can in the future add new such strings to the LRU cache. The concern was \
whether this sweep of old gen strings scattered all over the huge heap would cause \
tomcat nio-based threads to "hang", not respond quickly, or would G1 do things less \
pre-emptively. Are you basically saying that, "no tomcat servlet response time \
won't be significantly affected by G1 sweep"?
I'm not sure what you're goal is here. I would say, design as needed and let the \
collector do it's thing. That said, temporary humongous allocations are not well \
managed by the G1. Better to create up front and cache it for future downstream use. \
As for a sweep… what I think you're asking about is object copy costs. These costs \
should and typically do dominate pause time. Object copy cost is proportional to the \
number of live objects in the collection set (CSet). Strings are dedup'ed after age 5 \
so with most heap configurations, duplicate Strings will be dedup'ed before they hit \
tenured.
Also, I was wondering does anyone know how memcached works, and why it is used in \
preference to a custom design such as mine which seems a lot simpler. I.e. it seems \
that with "memcached", you have to worry about "slabs" and memcached's own heap \
management, and waste a lot of memory.
I'm the wrong person to defend the use of memcached. It certainly does serve a \
purpose.. that said, to use it to offload temp object means you end up creating your \
own garbage collector… and as you can see by the efforts GC engineers put into each \
implementation, it's a non-trivial under-taking. Kind regards,Kirk
[Attachment #3 (text/html)]
<div style="font: normal 13px Arial; color:rgb(0, 0, 0);"><br>Hi Andy,<br><br>How \
many ConcurrentHashMap instances do you actually have in your 16 gig heap? Not sure \
if I understand your map structure correctly - "<span style="font-family: \
"Helvetica Neue", Helvetica, Arial, sans-serif;">But the first char of the \
key takes you to the second tier of ConcurrentHashMaps and so". Could you provide \
historgram of your application when running full (before you start LRU sweeping)? Do \
you need the ConcurrentHashMaps if you have several tiers which already act as \
concurrent segments? Did you consider open addressing maps (Trove, Koloboke) \
eliminating the need of the map nodes (there would be some trade off when removing)? \
Did you consider to store char or even byte array instead of the String instance? Do \
your remove ConcurrentHashMap tier when it gets completely empty after the LRU sweep? \
All this might significantly reduce the heap requirement shortening the GC \
time. <br><br>Regards, <br>Michal <br> </span><br><br> <div><span \
style="font-family:Arial; font-size:11px; color:#5F5F5F;">Od</span><span \
style="font-family:Arial; font-size:12px; color:#5F5F5F; padding-left:5px;">: \
"hotspot-gc-dev" hotspot-gc-dev-bounces@openjdk.java.net</span></div> <div><span \
style="font-family:Arial; font-size:11px; color:#5F5F5F;">Komu</span><span \
style="font-family:Arial; font-size:12px; color:#5F5F5F; padding-left:5px;">: "Andy \
Nuss" andrew_nuss@yahoo.com</span></div> <div><span style="font-family:Arial; \
font-size:11px; color:#5F5F5F;">Kopie</span><span style="font-family:Arial; \
font-size:12px; color:#5F5F5F; padding-left:5px;">: "hotspot-gc-dev@openjdk.java.net \
openjdk.java.net" hotspot-gc-dev@openjdk.java.net</span></div> <div><span \
style="font-family:Arial; font-size:11px; color:#5F5F5F;">Datum</span><span \
style="font-family:Arial; font-size:12px; color:#5F5F5F; padding-left:5px;">: Thu, 14 \
Dec 2017 08:19:21 +0100</span></div> <div><span style="font-family:Arial; \
font-size:11px; color:#5F5F5F;">Předmet</span><span style="font-family:Arial; \
font-size:12px; color:#5F5F5F; padding-left:5px;">: Re: how to tune gc for tomcat \
server on large machine that uses almost all old generation smallish \
objects</span></div> <br>
Hi Andy,<div class=""><br class=""></div><div class="">What you are describing is \
fairly routine caching behavior with a small twist in that the objects being held in \
this case are quite regular in size. Again, I wouldn't design with the collector in \
mind where as I certainly design with memory efficiency as a reasonable \
goal.</div><div class=""><br class=""></div><div class="">As for GC, in the JVM there \
are two basic strategies which I then to label evacuating and in-place. G1 is \
completely evacuating and consequently the cost (aka pause duration) is (in most \
cases) a function of the number of live objects. The trigger for a young generational \
collection is when you have consumed all of the Eden regions. Thus the frequency is \
the size of Eden divided by your allocation rate. The trigger for a Concurrent Mark \
of tenured is when it consumes 45% of available heap. Thus your Concurrent Mark \
frequency is 45% to the size of heap / promotion rate. Additionally G1 keeps some \
memory on reserve to avoid painting the collector into a Full GC corner.</div><div \
class=""><br class=""></div><div class="">Issues specific to caching are; very large \
live sets that result in inflated copy costs as data flows from Eden through survivor \
and finally into tenured space. In these case I've found that it's better slow down \
the frequency of collections as this will result in you experiencing the same \
pause time but less frequently. There is also another tactic that I've found to be \
helpful on occasion is to lower the Initiating Heap Occupancy Percent (aka IHOP) from \
it's default value of 45% into a value that sees is consistantly in the live set. \
Meaning, you'll run back to back concurrent cycles. And I've got a bag of other \
tactics that I've used with varying degrees of success. Which one would be for you? \
I've no idea. Tuning a collector isn't something you can do after reading a few tips \
from StackOverflow. GC behavior is an emergent reaction to the workload that you \
place on it meaning the only way to really understand how it's all going to work is \
to run production like experiments (or better yet, run in production) and look at a \
GC log. (Shameless plug.. Censum, my GC log visualization tooling helps).</div><div \
class=""><br class=""></div><div class="">I understand your concerns in wanting to \
avoid the dreaded GC pause but I'd also look at your efforts in two ways. First, it's \
an opportunity to get a better understanding of GC and secondly, recognize that this \
feels like a premature optimization as you're trying to solve a problem that you, \
well none of us to be fair and honest, fully understand and may not actually have. \
Let me recommend some names that have written about how G1 works. Charlie Hunt in his \
performance tuning book, Poonan Parhhar in her blog entries, Monica Beckwith in a \
number of different places, Simone Bordet in a number of places. I should add that <a \
href="mailto:hotspot-gc-use@openjdk.java.net" \
_djrealurl="mailto:hotspot-gc-use@openjdk.java.net" \
class="">hotspot-gc-use@openjdk.java.net</a> is a more appropriate list for these \
types of questions. We also have a number of GC related discussions on our mailing \
list, <a href="mailto:friends@jclarity.com" _djrealurl="mailto:friends@jclarity.com" \
class="">friends@jclarity.com</a>. I've also recorded a session with Dr. Heinz Kabutz \
on his <a href="https://javaspecialists.teachable.com/" \
_djrealurl="https://javaspecialists.teachable.com/" \
class="">https://javaspecialists.teachable.com/</a> site. I'll get an exact link if \
you email me offline.</div><div class=""><br class=""></div><div class="">Kind \
regards,</div><div class="">Kirk Pepperdine</div><div class=""> </div><div \
class=""><div><blockquote type="cite" class=""><div class="">On Dec 13, 2017, at 9:55 \
PM, Andy Nuss <<a href="mailto:andrew_nuss@yahoo.com" \
_djrealurl="mailto:andrew_nuss@yahoo.com" class="">andrew_nuss@yahoo.com</a>> \
wrote:</div><br class="Apple-interchange-newline"><div class=""><div class=""><div \
style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;" \
class=""><div class="">Let me try to explain. On a 16 gig heap, I anticipate \
almost 97% of the heap in use at any given moment is ~30 and ~100 char strings. \
The rest is small pointer objects in the ConcurrentHashMap, also longly held, and \
tomcat's nio stuff. So at any moment in time, most of the in-use heap (and I \
will keep about 20% unused to aid gc), is a huge number of longly held strings. \
Over time, as the single servlet receives requests to cache newly accessed key/val \
pairs, the number of strings grows to its maximum I allow. At that point, a \
background thread sweeps away half of the LRU key/value pairs (30,100 char \
strings). Now they are unreferenced and sweepable. That's all I do. \
Then the servlet keeps receiving requests to put more key/val pairs. As well as \
handle get requests. At the point in time where I clear all the LRU pairs, \
which might take minutes to iterate, G1 can start doing its thing, not that it will \
know to do so immediately. I'm worried that whenever G1 does its thing, because \
the sweepable stuff is 100% small oldgen objects, servlet threads will timeout on the \
client side. Not that this happens several times a day, but if G1 does take a \
long time to sweep a massive heap with all oldgen objects that are small, the *only* \
concern is that servlet requests will time out during this period.</div><div \
class=""><br class=""></div><div class="">Realize I know nothing about GC, except \
that periodically, eclipse hangs due to gc and then crashes on me. I.e. after 4 \
hours of editing. And that all the blogs I found talked about newgen and TLAB \
and other things assuming typical ephemeral usage going on which is not at all the \
case on this particular machine instance. Again, all longly held small strings, \
growing and growing over time steadily, suddenly half are freed reference wise by \
me.</div><div class=""><br class=""></div><div class="">If there are no GC settings \
that make that sweepable stuff happen in a non-blocking thread, and tomcat's servlets \
could all hang once every other day for many many seconds on this 16 gig machine (the \
so-called long gc-pause that people blog about), that might motivate me to abandon \
this and use the memcached product.<br class=""></div>
<div class=""><br class=""></div><div class=""><br class=""></div>
<div id="yahoo_quoted_3833318302" class="yahoo_quoted">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, \
sans-serif;font-size:13px;color:#26282a;" class="">
<div class="">
On Wednesday, December 13, 2017, 12:15:38 PM PST, Kirk \
Pepperdine <<a href="mailto:kirk@kodewerk.com" \
_djrealurl="mailto:kirk@kodewerk.com" class="">kirk@kodewerk.com</a>> wrote: \
</div> <div class=""><br class=""></div>
<div class=""><br class=""></div>
<div class=""><div id="yiv5340352325" class=""><div class="">Hi \
Andy,<div class="yiv5340352325"><br class="yiv5340352325" clear="none"></div><div \
class="yiv5340352325"><div class=""><blockquote class="yiv5340352325" \
type="cite"><div class="yiv5340352325">On Dec 13, 2017, at 8:34 PM, Andy Nuss <<a \
rel="nofollow" shape="rect" class="yiv5340352325" \
ymailto="mailto:andrew_nuss@yahoo.com" target="_blank" \
href="mailto:andrew_nuss@yahoo.com" \
_djrealurl="mailto:andrew_nuss@yahoo.com">andrew_nuss@yahoo.com</a>> \
wrote:</div><br class="yiv5340352325Apple-interchange-newline" clear="none"><div \
class="yiv5340352325"><div class="yiv5340352325"><div class="yiv5340352325" \
style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;"><div \
class="yiv5340352325">Thanks Kirk,</div><div class="yiv5340352325"><br \
class="yiv5340352325" clear="none"></div><div class="yiv5340352325">The array is just \
a temporary buffer held onto that has its entries cleared to null after my LRU \
sweep. The references that are freed to GC are in the ConcurrentHashMaps, and \
are all 30 char and 100 char strings, key/vals, but not precisely, so I assume that \
when I do my LRU sweep when needed, its freeing a ton of small strings, \
</div></div></div></div></blockquote><div class=""><br class="yiv5340352325" \
clear="none"></div><br class="yiv5340352325" clear="none"><blockquote \
class="yiv5340352325" type="cite"><div class="yiv5340352325"><div \
class="yiv5340352325"><div class="yiv5340352325" style="font-family:Helvetica Neue, \
Helvetica, Arial, sans-serif;font-size:13px;"><div class="yiv5340352325">which G1 has \
to reallocate into bigger chunks, and mark freed, and \
so,</div></div></div></div></blockquote><div class=""><br class="yiv5340352325" \
clear="none"></div>Not sure I understand this bit. Can you explain what you mean by \
this?</div><div class=""><br class="yiv5340352325" clear="none"><blockquote \
class="yiv5340352325" type="cite"><div class="yiv5340352325"><div \
class="yiv5340352325"><div class="yiv5340352325" style="font-family:Helvetica Neue, \
Helvetica, Arial, sans-serif;font-size:13px;"><div class="yiv5340352325"> so that I \
can in the future add new such strings to the LRU cache. The concern was \
whether this sweep of old gen strings scattered all over the huge heap would cause \
tomcat nio-based threads to "hang", not respond quickly, or would G1 do things less \
pre-emptively. Are you basically saying that, "no tomcat servlet response time \
won't be significantly affected by G1 sweep"?<br class="yiv5340352325" \
clear="none"></div></div></div></div></blockquote><div class=""><br \
class="yiv5340352325" clear="none"></div>I'm not sure what you're goal is here. I \
would say, design as needed and let the collector do it's thing. That said, temporary \
humongous allocations are not well managed by the G1. Better to create up front and \
cache it for future downstream use.</div><div class=""><br class="yiv5340352325" \
clear="none"></div><div class="">As for a sweep… what I think you're asking about \
is object copy costs. These costs should and typically do dominate pause time. Object \
copy cost is proportional to the number of live objects in the collection set (CSet). \
Strings are dedup'ed after age 5 so with most heap configurations, duplicate Strings \
will be dedup'ed before they hit tenured.<br class="yiv5340352325" \
clear="none"><blockquote class="yiv5340352325" type="cite"><div \
class="yiv5340352325"><div class="yiv5340352325"><div class="yiv5340352325" \
style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;"><div \
class="yiv5340352325"><br class="yiv5340352325" clear="none"></div><div \
class="yiv5340352325">Also, I was wondering does anyone know how memcached works, and \
why it is used in preference to a custom design such as mine which seems a lot \
simpler. I.e. it seems that with "memcached", you have to worry about "slabs" \
and memcached's own heap management, and waste a lot of memory.<br \
class="yiv5340352325" clear="none"></div></div></div></div></blockquote><div \
class=""><br class="yiv5340352325" clear="none"></div></div>I'm the wrong person to \
defend the use of memcached. It certainly does serve a purpose.. that said, to use it \
to offload temp object means you end up creating your own garbage collector… and as \
you can see by the efforts GC engineers put into each implementation, it's a \
non-trivial under-taking.</div><div class="yiv5340352325yqt5567867100" \
id="yiv5340352325yqtfd65405"><div class="yiv5340352325"><br class="yiv5340352325" \
clear="none"></div><div class="yiv5340352325">Kind regards,</div><div \
class="yiv5340352325">Kirk</div><div class="yiv5340352325"><br class="yiv5340352325" \
clear="none"></div></div></div></div></div> </div>
</div></div></div></div></blockquote></div><br class=""></div></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic