'CMS & DefaultMaxTenuringThreshold/SurvivorRatio'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-gc-use
Subject:    CMS & DefaultMaxTenuringThreshold/SurvivorRatio
From:       Y.S.Ramakrishna () Sun ! COM (Y !  Srinivas Ramakrishna)
Date:       2010-01-15 21:31:01
Message-ID: 4B50DE95.4060901 () Sun ! COM
[Download RAW message or body]

Hi Shaun --

On 01/15/10 12:02, Shaun Hennessy wrote:
> Yes thanks I see the difference with the PrintTenuringDistribution --
> the _max_ is 4 while the _actual_ threshold varies from 1-4.
> 
> 
> I'd like to make sure I've on the right thinking here....
> here are 2 runs using no-survivor or survivor 
> (everything else the same, loads should be pretty close, but not identical)
> 
> 1) No Survivor + with a code fix ; uptime 3:31 (211min)
> MINOR 322 collections; 3m52s (232 seconds)
> (91.56 collections/hour) (0.72 seconds/collection
> = 65.9sec/hour on minor's = 1.8% of total time
> MAJOR 18 collections; 1m5s
> (5.12 collections/hour) (3.61 seconds/collection)
> = 18.48sec/hour =  0.51% of total time
> 
> 2) Survivor + a code fix;  uptime 3h14 (194min)
> MINOR  346 collections; 3m40s  (220 seconds)
> (107.01 collections/hour) (1.57 seconds/collection)
> = 167.99sec/hour  = 4.66% of total time
> MAJOR (aka Major) 8 collections; 16.7 sec
> (2.47 collections/hour) (2.08 seconds/collection)
> = 5.12 sec/hour =  0.14% of total time
> 
> So by using survivor spaces we're reduced the frequency and duration of 
> Major collections which was our goal,
> and is the general expected result when going from no-survivor to 
> survivor as we'll be kicking less up
> to the tenured space. Additionally we've increased the frequency of our 
> minor collections (again as expected
> as our Eden shrunk from 4GB  to 3GB (with 1GB for survivor), and our 
> duration has also increased because
> now we're doing more copying around between survivor spaces -- again 
> everything is as expected.
> Everything I've said so far correct?

Correct.

> 
> 
> Throwing in one more scenario, I now remove the code fix.  The code fix 
> was improving a method
> so it no longer allocated memory every invocation by instead re-using a 
> static threadlocal memory.
> This was one of top methods allocating useless memory, memory that I 
> expect would have died very quickly.  
> Now we have the following, again all parameters are the same, just 
> removed the fix,
> loads should be pretty close, but not exact
> 
> 3) Survivor, *NO Code Fix*; uptime 3h28min (208min)
> MINOR 432collections, 4m17s (257 second)
> *(123.61 collections/hour) (0.59 seconds/collection*
> = 72.92sec/hour on minor's = 2.02% of total time
> 
> MAJOR 12 collections, 25.387s
> (3.46 collections/hour) (2.11 seconds/collection)
> = 7.32 sec/hour =  0.20 % of total time
> 
> So comparing 2) and 3)
> Alright so now I'm having more minor collections without the code fix 
> compared to with the fix, - which would be expected as
> we aren't allocating every time we hit the method.   BUT -  these minor 
> collections are much quicker without code-fix than with the
> code fix -- presumably because this results in a greater % of objects 
> that are still living each collection that must
> be copied around a few times?   

Possibly, depending on the lifetime of these objects.

> 
> It's nice that the fix has resulted in less frequent major's - I am 
> guessing it's because by filling up the young generation
> less quickly means more time between collections which means more 
> objects have a chance to die -- but it seems
> to be quite the hit on the minor collection time to achieve this.  

One is basically trading off copying between survivor spaces
versus the cost of dealing with the garbage in the old generation
(or of premature tenuring causing other objects to stick around
as floating garbage).

> 
> I've written a d-trace script which tracks all our method's memory 
> allocation and we were going to address
> some of the top hitters as we did on the first one.  I guess my ultimate 
> question is if we start eliminating some of the
> low-hanging fruit memory allocating, most of which it is likely to be 
> that which would have died quickly --- how should we be tuning?
> Do we need to "lower" our tenuring threshold or possibly go back to not 
> using Survivor space?

Usually, getting rid of short-lived object allocation is unlikely to
provide big benefits because those are easiest to collect with a
copying collector.

Reducing long- and medium-lived objects might provide greater
benefits.

> It almost seems like we may have to choose to use survivor space OR we 
> can try to stop allocating as much memory
> -- but trying do both may be counter productive / make minor collection 
> times (and more importantly throughput of the application) unacceptable?

Not really. Allocating fewer objects will always be superior to
allocating more, no matter their lifetimes. But when you do that,
adjusting your tenuring threshold so as not to cause useless
copying of medium-lived objects between the survivor spaces
is important, especially if there is not much pressure on the
old generation collections.

> The original goal was to eliminate long major pauses, but we can't 
> completely ignore throughput....

I do not see this as the kind of choice you describe above.
Rather it comes down to setting a suitable tenuring threshold.
It is true though that if you have eliminated almost all medium-lived
objects, then setting MaxTenuringThreshold=1 will give the best
performance. (In most cases I have seen, completely doing away
with survivor spaces and using MaxTenuringThreshold=0 does not
seem to work as well.)

+PrintTenuringDistribution should let you find the "knee" of the
curve which will tell you what your optimal MaxTenuringDistribution
would be (i.e. beyond which the copying between survivor spaces
yileds no benefits).

cheers.
-- ramki

> 
> 
> thanks,
> Shaun
> 
> Y. Srinivas Ramakrishna wrote:
>> Hi Shaun --
>>
>> Jon Masamitsu wrote:
>>>
>>> Only the througput collector has GC ergonomics implemented.  That's
>>> the feature that would vary eden vs survivor sizes and the tenuring
>>> threshold.
>>
>> Just to clarify, that should read "_max_ tenuring threshold" above.
>> The tenuring threshold itself is indeed adaptively varied from
>> one scavenge to the next (based on survivor space size and object
>> survival demographics, using Ungar's adaptive tenuring algorithm)
>> by CMS and the serial collector. A different scheme determine the
>> tenuring threshold used per scavenge by Parallel GC.
>> Ask again if you want to know the difference between per-scavenge
>> tenuring threshold (which is adaptively varied) and
>> _max_ tenuring threshold (which is spec'd on the command-line).
>>
>> But yes the rest of the things, heap size, shape, and _max_ tenuring 
>> threshold
>> would need to be manually tuned for optimal performance of CMS. Read 
>> the GC
>> tuning guide for how you might tune the survivor space size and
>> max tenuring threshold for your application using 
>> PrintTenuringDistribution data.
>>
>>>
>>>> Also still curious if XX:ParallelGCThreads should be set to 16
>>>> (#cpus)-- if my desire is to minimize time spent
>>>> in STW GC time?
>>>
>>>
>>> Yes 16 on average will minimize the STW GC pauses but occasionally
>>> (pretty rare actually), there can be some interaction between the GC 
>>> using
>>> all the hardware threads and the OS needing one.
>>
>> I have occasionally found that unless you have really large Eden 
>> sizes, fewer
>> GC threads than CPU's often give you the best results. But yes you are 
>> in the
>> right ball-park by growing the number of GC threads as your cpu count, 
>> cache size per cpu
>> and heap size increase. With a 4GB young gen as you have, i'd try 8 
>> through 16
>> gc threads to see what works best.
>>
>> -- ramki
> 


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic