'Perplexing GC Time Growth'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-gc-use
Subject:    Perplexing GC Time Growth
From:       jason () mugfu ! com (Jason Vasquez)
Date:       2007-12-31 18:21:08
Message-ID: C1645320-A2CC-4E58-9DF9-93E657976E68 () mugfu ! com
[Download RAW message or body]

Hi all,

I'm having a perplexing problem -- the garbage collector appears to be  
functioning well, with a nice object/garbage lifecycle, yet minor GC  
times increase over the life of the process inexplicably.  We are  
working with telephony hardware with this application, so keeping GC  
pauses very low is quite important.  (keeping well below 100 ms would  
be ideal)

Here is the current configuration we are using:

-server \
-Xloggc:garbage.log \
-XX:+PrintGCDetails \
-Dsun.rmi.dgc.server.gcInterval=3600000 \
-Dsun.rmi.dgc.client.gcInterval=3600000 \
-XX:ParallelGCThreads=8 \
-XX:+UseParNewGC \
-XX:+UseConcMarkSweepGC \
-XX:+PrintGCTimeStamps \
-XX:-TraceClassUnloading \
-XX:+AggressiveOpts \
-Xmx512M \
-Xms512M \
-Xmn128M \
-XX:MaxTenuringThreshold=6 \
-XX:+ExplicitGCInvokesConcurrent

A large number of our bigger objects size-wise live for approximately  
4-5 minutes, thus the larger young generation, and tenuring threshold.  
This seems to be successful in filtering most objects before they  
reach the tenured gen. (8 core x86 server, running 1.6.0_03-b05 on 32- 
bit Linux, kernel rev 2.6.18)

Here is a representative snippet of our garbage log:

487.135: [GC 487.135: [ParNew: 112726K->7290K(118016K), 0.0218110  
secs] 134494K->29058K(511232K), 0.0220520 secs]
557.294: [GC 557.294: [ParNew: 112250K->7976K(118016K), 0.0204220  
secs] 134018K->29744K(511232K), 0.0206690 secs]
607.025: [GC 607.025: [ParNew: 112936K->7831K(118016K), 0.0231230  
secs] 134704K->30003K(511232K), 0.0233670 secs]
672.522: [GC 672.522: [ParNew: 112791K->7361K(118016K), 0.0253620  
secs] 134963K->29533K(511232K), 0.0256080 secs]
...
4006.635: [GC 4006.635: [ParNew: 112983K->7386K(118016K), 0.0385960  
secs] 141969K->36608K(511232K), 0.0388460 secs]
4083.066: [GC 4083.066: [ParNew: 112346K->8439K(118016K), 0.0365940  
secs] 141568K->37661K(511232K), 0.0368340 secs]
4158.457: [GC 4158.457: [ParNew: 113399K->7152K(118016K), 0.0360540  
secs] 142621K->36374K(511232K), 0.0362990 secs]
4228.312: [GC 4228.313: [ParNew: 112112K->8738K(118016K), 0.0362510  
secs] 141334K->38083K(511232K), 0.0365050 secs]
4293.800: [GC 4293.800: [ParNew: 113698K->8294K(118016K), 0.0368700  
secs] 143043K->37917K(511232K), 0.0371160 secs]
...
10489.555: [GC 10489.556: [ParNew: 112701K->7770K(118016K), 0.0639770  
secs] 151966K->47156K(511232K), 0.0642210 secs]
10562.544: [GC 10562.544: [ParNew: 112730K->9267K(118016K), 0.0625900  
secs] 152116K->48772K(511232K), 0.0628470 secs]
10622.558: [GC 10622.558: [ParNew: 114227K->8361K(118016K), 0.0675730  
secs] 153732K->48381K(511232K), 0.0678220 secs]
10678.842: [GC 10678.842: [ParNew: 113056K->7214K(118016K), 0.0669330  
secs] 153076K->47234K(511232K), 0.0671800 secs]
...
177939.062: [GC 177939.062: [ParNew: 112608K->8620K(118016K),  
0.7681440 secs] 466132K->362144K(511232K), 0.7684030 secs]
178005.483: [GC 178005.483: [ParNew: 113449K->7731K(118016K),  
0.7677300 secs] 466973K->361893K(511232K), 0.7679890 secs]
178069.658: [GC 178069.658: [ParNew: 112670K->6814K(118016K),  
0.7700020 secs] 466832K->360976K(511232K), 0.7702590 secs]
178133.513: [GC 178133.513: [ParNew: 111717K->7920K(118016K),  
0.7702920 secs] 465879K->362082K(511232K), 0.7705560 secs]

As you can see, the gc times continue to increase over time, on the  
order of about 10-20ms per hour.  CMS runs are spaced very far apart,  
in fact, since most objects die before reaching the tenured  
generation, the CMS is triggered more by RMI DGC runs then by heap  
growth.   (We were getting serial GCs, apparently due to RMI DGC  
before adding -XX:+ExplicitGCInvokesConcurrent)

Here's some representative output from `jstat -gcutil -t -h10 <pid> 2s`:

Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT
11067.6 55.74 0.00 89.32 9.73 59.84 168 7.471 8 0.280 7.751
11069.6 55.74 0.00 93.65 9.73 59.84 168 7.471 8 0.280 7.751
11071.6 55.74 0.00 99.34 9.73 59.84 168 7.471 8 0.280 7.751
11073.5 0.00 62.22 2.89 9.76 59.84 169 7.537 8 0.280 7.816
11075.6 0.00 62.22 4.80 9.76 59.84 169 7.537 8 0.280 7.816

Survivor spaces continue to sit at about 50-65% occupancy, which seems  
fairly good to my eye.  Eden fills approximately every 70 seconds,  
triggering minor GCs.


Any ideas?  This is becoming quite frustrating for us -- our  
application uptime is pretty horrible with the too-frequent scheduled  
restarts we are being forced to run.

Thanks for any assistance you might be able to offer,
-Jason Vasquez


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic