[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-gc-dev
Subject:    =?utf-8?b?UmU6IGhvdyB0byB0dW5lIGdjIGZvciB0b21jYXQgc2VydmVyIG9uIGxh?= =?utf-8?b?cmdlIG1hY2hpbmUgdGhhd
From:       "Michal Frajt" <michal () frajt ! eu>
Date:       2017-12-14 16:34:00
Message-ID: P0YM0P$8638B4C73253E0D5D9FBABAC30E5100F () frajt ! eu
[Download RAW message or body]

Hi Andy,

How many ConcurrentHashMap instances do you actually have in your 16 gig heap? Not \
sure if I understand your map structure correctly - "But the first char of the key \
takes you to the second tier of ConcurrentHashMaps and so". Could you provide \
historgram of your application when running full (before you start LRU sweeping)? Do \
you need the ConcurrentHashMaps if you have several tiers which already act as \
concurrent segments? Did you consider open addressing maps (Trove, Koloboke) \
eliminating the need of the map nodes (there would be some trade off when removing)? \
Did you consider to store char or even byte array instead of the String instance? Do \
your remove ConcurrentHashMap tier when it gets completely empty after the LRU sweep? \
All this might significantly reduce the heap requirement shortening the GC time.  

Regards,  
Michal  
  


Od: "hotspot-gc-dev" hotspot-gc-dev-bounces@openjdk.java.net
Komu: "Andy Nuss" andrew_nuss@yahoo.com
Kopie: "hotspot-gc-dev@openjdk.java.net openjdk.java.net" \
                hotspot-gc-dev@openjdk.java.net
Datum: Thu, 14 Dec 2017 08:19:21 +0100
Předmet: Re: how to tune gc for tomcat server on large machine that uses almost all \
old generation smallish objects


Hi Andy,
What you are describing is fairly routine caching behavior with a small twist in that \
the objects being held in this case are quite regular in size. Again, I wouldn't \
design with the collector in mind where as I certainly design with memory efficiency \
as a reasonable goal. As for GC, in the JVM there are two basic strategies which I \
then to label evacuating and in-place. G1 is completely evacuating and consequently \
the cost (aka pause duration) is (in most cases) a function of the number of live \
objects. The trigger for a young generational collection is when you have consumed \
all of the Eden regions. Thus the frequency is the size of Eden divided by your \
allocation rate. The trigger for a Concurrent Mark of tenured is when it consumes 45% \
of available heap. Thus your Concurrent Mark frequency is 45% to the size of heap / \
promotion rate. Additionally G1 keeps some memory on reserve to avoid painting the \
collector into a Full GC corner. Issues specific to caching are; very large live sets \
that result in inflated copy costs as data flows from Eden through survivor and \
finally into tenured space. In these case I've found that it's better slow down the \
frequency of collections   as this will result in you experiencing the same pause \
time but less frequently. There is also another tactic that I've found to be helpful \
on occasion is to lower the Initiating Heap Occupancy Percent (aka IHOP) from it's \
default value of 45% into a value that sees is consistantly in the live set. Meaning, \
you'll run back to back concurrent cycles. And I've got a bag of other tactics that \
I've used with varying degrees of success. Which one would be for you? I've no idea. \
Tuning a collector isn't something you can do after reading a few tips from \
StackOverflow. GC behavior is an emergent reaction to the workload that you place on \
it meaning the only way to really understand how it's all going to work is to run \
production like experiments (or better yet, run in production) and look at a GC log. \
(Shameless plug.. Censum, my GC log visualization tooling helps). I understand your \
concerns in wanting to avoid the dreaded GC pause but I'd also look at your efforts \
in two ways. First, it's an opportunity to get a better understanding of GC and \
secondly, recognize that this feels like a premature optimization as you're trying to \
solve a problem that you, well none of us to be fair and honest, fully understand and \
may not actually have. Let me recommend some names that have written about how G1 \
works. Charlie Hunt in his performance tuning book, Poonan Parhhar in her blog \
entries, Monica Beckwith in a number of different places, Simone Bordet in a number \
of places. I should add that hotspot-gc-use@openjdk.java.net is a more appropriate \
list for these types of questions. We also have a number of GC related discussions on \
our mailing list, friends@jclarity.com. I've also recorded a session with Dr. Heinz \
Kabutz on his https://javaspecialists.teachable.com/ site. I'll get an exact link if \
you email me offline. Kind regards,Kirk Pepperdine  On Dec 13, 2017, at 9:55 PM, Andy \
Nuss <andrew_nuss@yahoo.com> wrote: Let me try to explain.   On a 16 gig heap, I \
anticipate almost 97% of the heap in use at any given moment is ~30 and ~100 char \
strings.   The rest is small pointer objects in the ConcurrentHashMap, also longly \
held, and tomcat's nio stuff.   So at any moment in time, most of the in-use heap \
(and I will keep about 20% unused to aid gc), is a huge number of longly held \
strings.   Over time, as the single servlet receives requests to cache newly accessed \
key/val pairs, the number of strings grows to its maximum I allow.   At that point, a \
background thread sweeps away half of the LRU key/value pairs (30,100 char strings).  \
Now they are unreferenced and sweepable.   That's all I do.   Then the servlet keeps \
receiving requests to put more key/val pairs.   As well as handle get requests.   At \
the point in time where I clear all the LRU pairs, which might take minutes to \
iterate, G1 can start doing its thing, not that it will know to do so immediately.   \
I'm worried that whenever G1 does its thing, because the sweepable stuff is 100% \
small oldgen objects, servlet threads will timeout on the client side.   Not that \
this happens several times a day, but if G1 does take a long time to sweep a massive \
heap with all oldgen objects that are small, the *only* concern is that servlet \
requests will time out during this period. Realize I know nothing about GC, except \
that periodically, eclipse hangs due to gc and then crashes on me.   I.e. after 4 \
hours of editing.   And that all the blogs I found talked about newgen and TLAB and \
other things assuming typical ephemeral usage going on which is not at all the case \
on this particular machine instance.   Again, all longly held small strings, growing \
and growing over time steadily, suddenly half are freed reference wise by me. If \
there are no GC settings that make that sweepable stuff happen in a non-blocking \
thread, and tomcat's servlets could all hang once every other day for many many \
seconds on this 16 gig machine (the so-called long gc-pause that people blog about), \
that might motivate me to abandon this and use the memcached product.

            


            
            
                
                    
                    
                        On Wednesday, December 13, 2017, 12:15:38 PM PST, Kirk \
Pepperdine <kirk@kodewerk.com> wrote:  
                    

                    

                    Hi Andy,
On Dec 13, 2017, at 8:34 PM, Andy Nuss <andrew_nuss@yahoo.com> wrote:
Thanks Kirk,
The array is just a temporary buffer held onto that has its entries cleared to null \
after my LRU sweep.   The references that are freed to GC are in the \
ConcurrentHashMaps, and are all 30 char and 100 char strings, key/vals, but not \
precisely, so I assume that when I do my LRU sweep when needed, its freeing a ton of \
small strings, 

which G1 has to reallocate into bigger chunks, and mark freed, and so,
Not sure I understand this bit. Can you explain what you mean by this?
 so that I can in the future add new such strings to the LRU cache.   The concern was \
whether this sweep of old gen strings scattered all over the huge heap would cause \
tomcat nio-based threads to "hang", not respond quickly, or would G1 do things less \
pre-emptively.   Are you basically saying that, "no tomcat servlet response time \
won't be significantly affected by G1 sweep"?

I'm not sure what you're goal is here. I would say, design as needed and let the \
collector do it's thing. That said, temporary humongous allocations are not well \
managed by the G1. Better to create up front and cache it for future downstream use. \
As for a sweep… what I think you're asking about is object copy costs. These costs \
should and typically do dominate pause time. Object copy cost is proportional to the \
number of live objects in the collection set (CSet). Strings are dedup'ed after age 5 \
so with most heap configurations, duplicate Strings will be dedup'ed before they hit \
tenured.

Also, I was wondering does anyone know how memcached works, and why it is used in \
preference to a custom design such as mine which seems a lot simpler.   I.e. it seems \
that with "memcached", you have to worry about "slabs" and memcached's own heap \
management, and waste a lot of memory.

I'm the wrong person to defend the use of memcached. It certainly does serve a \
purpose.. that said, to use it to offload temp object means you end up creating your \
own garbage collector… and as you can see by the efforts GC engineers put into each \
implementation, it's a non-trivial under-taking. Kind regards,Kirk

                
            


[Attachment #3 (text/html)]

<div style="font: normal 13px Arial; color:rgb(0, 0, 0);"><br>Hi Andy,<br><br>How \
many ConcurrentHashMap instances do you actually have in your 16 gig heap? Not sure \
if I understand your map structure correctly - "<span style="font-family: \
&quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif;">But the first char of the \
key takes you to the second tier of ConcurrentHashMaps and so". Could you provide \
historgram of your application when running full (before you start LRU sweeping)? Do \
you need the ConcurrentHashMaps if you have several tiers which already act as \
concurrent segments? Did you consider open addressing maps (Trove, Koloboke) \
eliminating the need of the map nodes (there would be some trade off when removing)? \
Did you consider to store char or even byte array instead of the String instance? Do \
your remove ConcurrentHashMap tier when it gets completely empty after the LRU sweep? \
All this might significantly reduce the heap requirement shortening the GC \
time.&nbsp;<br><br>Regards,&nbsp;<br>Michal&nbsp;<br>&nbsp;</span><br><br> <div><span \
style="font-family:Arial; font-size:11px; color:#5F5F5F;">Od</span><span \
style="font-family:Arial; font-size:12px; color:#5F5F5F; padding-left:5px;">: \
"hotspot-gc-dev" hotspot-gc-dev-bounces@openjdk.java.net</span></div> <div><span \
style="font-family:Arial; font-size:11px; color:#5F5F5F;">Komu</span><span \
style="font-family:Arial; font-size:12px; color:#5F5F5F; padding-left:5px;">: "Andy \
Nuss" andrew_nuss@yahoo.com</span></div> <div><span style="font-family:Arial; \
font-size:11px; color:#5F5F5F;">Kopie</span><span style="font-family:Arial; \
font-size:12px; color:#5F5F5F; padding-left:5px;">: "hotspot-gc-dev@openjdk.java.net \
openjdk.java.net" hotspot-gc-dev@openjdk.java.net</span></div> <div><span \
style="font-family:Arial; font-size:11px; color:#5F5F5F;">Datum</span><span \
style="font-family:Arial; font-size:12px; color:#5F5F5F; padding-left:5px;">: Thu, 14 \
Dec 2017 08:19:21 +0100</span></div> <div><span style="font-family:Arial; \
font-size:11px; color:#5F5F5F;">Předmet</span><span style="font-family:Arial; \
font-size:12px; color:#5F5F5F; padding-left:5px;">: Re: how to tune gc for tomcat \
server on large machine that uses almost all old generation smallish \
objects</span></div> <br>
Hi Andy,<div class=""><br class=""></div><div class="">What you are describing is \
fairly routine caching behavior with a small twist in that the objects being held in \
this case are quite regular in size. Again, I wouldn't design with the collector in \
mind where as I certainly design with memory efficiency as a reasonable \
goal.</div><div class=""><br class=""></div><div class="">As for GC, in the JVM there \
are two basic strategies which I then to label evacuating and in-place. G1 is \
completely evacuating and consequently the cost (aka pause duration) is (in most \
cases) a function of the number of live objects. The trigger for a young generational \
collection is when you have consumed all of the Eden regions. Thus the frequency is \
the size of Eden divided by your allocation rate. The trigger for a Concurrent Mark \
of tenured is when it consumes 45% of available heap. Thus your Concurrent Mark \
frequency is 45% to the size of heap / promotion rate. Additionally G1 keeps some \
memory on reserve to avoid painting the collector into a Full GC corner.</div><div \
class=""><br class=""></div><div class="">Issues specific to caching are; very large \
live sets that result in inflated copy costs as data flows from Eden through survivor \
and finally into tenured space. In these case I've found that it's better slow down \
the frequency of collections &nbsp;as this will result in you experiencing the same \
pause time but less frequently. There is also another tactic that I've found to be \
helpful on occasion is to lower the Initiating Heap Occupancy Percent (aka IHOP) from \
it's default value of 45% into a value that sees is consistantly in the live set. \
Meaning, you'll run back to back concurrent cycles. And I've got a bag of other \
tactics that I've used with varying degrees of success. Which one would be for you? \
I've no idea. Tuning a collector isn't something you can do after reading a few tips \
from StackOverflow. GC behavior is an emergent reaction to the workload that you \
place on it meaning the only way to really understand how it's all going to work is \
to run production like experiments (or better yet, run in production) and look at a \
GC log. (Shameless plug.. Censum, my GC log visualization tooling helps).</div><div \
class=""><br class=""></div><div class="">I understand your concerns in wanting to \
avoid the dreaded GC pause but I'd also look at your efforts in two ways. First, it's \
an opportunity to get a better understanding of GC and secondly, recognize that this \
feels like a premature optimization as you're trying to solve a problem that you, \
well none of us to be fair and honest, fully understand and may not actually have. \
Let me recommend some names that have written about how G1 works. Charlie Hunt in his \
performance tuning book, Poonan Parhhar in her blog entries, Monica Beckwith in a \
number of different places, Simone Bordet in a number of places. I should add that <a \
href="mailto:hotspot-gc-use@openjdk.java.net" \
_djrealurl="mailto:hotspot-gc-use@openjdk.java.net" \
class="">hotspot-gc-use@openjdk.java.net</a> is a more appropriate list for these \
types of questions. We also have a number of GC related discussions on our mailing \
list, <a href="mailto:friends@jclarity.com" _djrealurl="mailto:friends@jclarity.com" \
class="">friends@jclarity.com</a>. I've also recorded a session with Dr. Heinz Kabutz \
on his <a href="https://javaspecialists.teachable.com/" \
_djrealurl="https://javaspecialists.teachable.com/" \
class="">https://javaspecialists.teachable.com/</a> site. I'll get an exact link if \
you email me offline.</div><div class=""><br class=""></div><div class="">Kind \
regards,</div><div class="">Kirk Pepperdine</div><div class="">&nbsp;</div><div \
class=""><div><blockquote type="cite" class=""><div class="">On Dec 13, 2017, at 9:55 \
PM, Andy Nuss &lt;<a href="mailto:andrew_nuss@yahoo.com" \
_djrealurl="mailto:andrew_nuss@yahoo.com" class="">andrew_nuss@yahoo.com</a>&gt; \
wrote:</div><br class="Apple-interchange-newline"><div class=""><div class=""><div \
style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;" \
class=""><div class="">Let me try to explain.&nbsp; On a 16 gig heap, I anticipate \
almost 97% of the heap in use at any given moment is ~30 and ~100 char strings.&nbsp; \
The rest is small pointer objects in the ConcurrentHashMap, also longly held, and \
tomcat's nio stuff.&nbsp; So at any moment in time, most of the in-use heap (and I \
will keep about 20% unused to aid gc), is a huge number of longly held strings.&nbsp; \
Over time, as the single servlet receives requests to cache newly accessed key/val \
pairs, the number of strings grows to its maximum I allow.&nbsp; At that point, a \
background thread sweeps away half of the LRU key/value pairs (30,100 char \
strings).&nbsp; Now they are unreferenced and sweepable.&nbsp; That's all I do.&nbsp; \
Then the servlet keeps receiving requests to put more key/val pairs.&nbsp; As well as \
handle get requests.&nbsp; At the point in time where I clear all the LRU pairs, \
which might take minutes to iterate, G1 can start doing its thing, not that it will \
know to do so immediately.&nbsp; I'm worried that whenever G1 does its thing, because \
the sweepable stuff is 100% small oldgen objects, servlet threads will timeout on the \
client side.&nbsp; Not that this happens several times a day, but if G1 does take a \
long time to sweep a massive heap with all oldgen objects that are small, the *only* \
concern is that servlet requests will time out during this period.</div><div \
class=""><br class=""></div><div class="">Realize I know nothing about GC, except \
that periodically, eclipse hangs due to gc and then crashes on me.&nbsp; I.e. after 4 \
hours of editing.&nbsp; And that all the blogs I found talked about newgen and TLAB \
and other things assuming typical ephemeral usage going on which is not at all the \
case on this particular machine instance.&nbsp; Again, all longly held small strings, \
growing and growing over time steadily, suddenly half are freed reference wise by \
me.</div><div class=""><br class=""></div><div class="">If there are no GC settings \
that make that sweepable stuff happen in a non-blocking thread, and tomcat's servlets \
could all hang once every other day for many many seconds on this 16 gig machine (the \
so-called long gc-pause that people blog about), that might motivate me to abandon \
                this and use the memcached product.<br class=""></div>
            <div class=""><br class=""></div><div class=""><br class=""></div>
            
            <div id="yahoo_quoted_3833318302" class="yahoo_quoted">
                <div style="font-family:'Helvetica Neue', Helvetica, Arial, \
sans-serif;font-size:13px;color:#26282a;" class="">  
                    <div class="">
                        On Wednesday, December 13, 2017, 12:15:38 PM PST, Kirk \
Pepperdine &lt;<a href="mailto:kirk@kodewerk.com" \
_djrealurl="mailto:kirk@kodewerk.com" class="">kirk@kodewerk.com</a>&gt; wrote:  \
</div>  <div class=""><br class=""></div>
                    <div class=""><br class=""></div>
                    <div class=""><div id="yiv5340352325" class=""><div class="">Hi \
Andy,<div class="yiv5340352325"><br class="yiv5340352325" clear="none"></div><div \
class="yiv5340352325"><div class=""><blockquote class="yiv5340352325" \
type="cite"><div class="yiv5340352325">On Dec 13, 2017, at 8:34 PM, Andy Nuss &lt;<a \
rel="nofollow" shape="rect" class="yiv5340352325" \
ymailto="mailto:andrew_nuss@yahoo.com" target="_blank" \
href="mailto:andrew_nuss@yahoo.com" \
_djrealurl="mailto:andrew_nuss@yahoo.com">andrew_nuss@yahoo.com</a>&gt; \
wrote:</div><br class="yiv5340352325Apple-interchange-newline" clear="none"><div \
class="yiv5340352325"><div class="yiv5340352325"><div class="yiv5340352325" \
style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;"><div \
class="yiv5340352325">Thanks Kirk,</div><div class="yiv5340352325"><br \
class="yiv5340352325" clear="none"></div><div class="yiv5340352325">The array is just \
a temporary buffer held onto that has its entries cleared to null after my LRU \
sweep.&nbsp; The references that are freed to GC are in the ConcurrentHashMaps, and \
are all 30 char and 100 char strings, key/vals, but not precisely, so I assume that \
when I do my LRU sweep when needed, its freeing a ton of small strings, \
</div></div></div></div></blockquote><div class=""><br class="yiv5340352325" \
clear="none"></div><br class="yiv5340352325" clear="none"><blockquote \
class="yiv5340352325" type="cite"><div class="yiv5340352325"><div \
class="yiv5340352325"><div class="yiv5340352325" style="font-family:Helvetica Neue, \
Helvetica, Arial, sans-serif;font-size:13px;"><div class="yiv5340352325">which G1 has \
to reallocate into bigger chunks, and mark freed, and \
so,</div></div></div></div></blockquote><div class=""><br class="yiv5340352325" \
clear="none"></div>Not sure I understand this bit. Can you explain what you mean by \
this?</div><div class=""><br class="yiv5340352325" clear="none"><blockquote \
class="yiv5340352325" type="cite"><div class="yiv5340352325"><div \
class="yiv5340352325"><div class="yiv5340352325" style="font-family:Helvetica Neue, \
Helvetica, Arial, sans-serif;font-size:13px;"><div class="yiv5340352325"> so that I \
can in the future add new such strings to the LRU cache.&nbsp; The concern was \
whether this sweep of old gen strings scattered all over the huge heap would cause \
tomcat nio-based threads to "hang", not respond quickly, or would G1 do things less \
pre-emptively.&nbsp; Are you basically saying that, "no tomcat servlet response time \
won't be significantly affected by G1 sweep"?<br class="yiv5340352325" \
clear="none"></div></div></div></div></blockquote><div class=""><br \
class="yiv5340352325" clear="none"></div>I'm not sure what you're goal is here. I \
would say, design as needed and let the collector do it's thing. That said, temporary \
humongous allocations are not well managed by the G1. Better to create up front and \
cache it for future downstream use.</div><div class=""><br class="yiv5340352325" \
clear="none"></div><div class="">As for a sweep… what I think you're asking about \
is object copy costs. These costs should and typically do dominate pause time. Object \
copy cost is proportional to the number of live objects in the collection set (CSet). \
Strings are dedup'ed after age 5 so with most heap configurations, duplicate Strings \
will be dedup'ed before they hit tenured.<br class="yiv5340352325" \
clear="none"><blockquote class="yiv5340352325" type="cite"><div \
class="yiv5340352325"><div class="yiv5340352325"><div class="yiv5340352325" \
style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;"><div \
class="yiv5340352325"><br class="yiv5340352325" clear="none"></div><div \
class="yiv5340352325">Also, I was wondering does anyone know how memcached works, and \
why it is used in preference to a custom design such as mine which seems a lot \
simpler.&nbsp; I.e. it seems that with "memcached", you have to worry about "slabs" \
and memcached's own heap management, and waste a lot of memory.<br \
class="yiv5340352325" clear="none"></div></div></div></div></blockquote><div \
class=""><br class="yiv5340352325" clear="none"></div></div>I'm the wrong person to \
defend the use of memcached. It certainly does serve a purpose.. that said, to use it \
to offload temp object means you end up creating your own garbage collector… and as \
you can see by the efforts GC engineers put into each implementation, it's a \
non-trivial under-taking.</div><div class="yiv5340352325yqt5567867100" \
id="yiv5340352325yqtfd65405"><div class="yiv5340352325"><br class="yiv5340352325" \
clear="none"></div><div class="yiv5340352325">Kind regards,</div><div \
class="yiv5340352325">Kirk</div><div class="yiv5340352325"><br class="yiv5340352325" \
clear="none"></div></div></div></div></div>  </div>
            </div></div></div></div></blockquote></div><br class=""></div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic