[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Review request: 7072527 CMS: JMM GC counters overcount in some cases
From:       kevin.walls () oracle ! com (Kevin Walls)
Date:       2011-07-29 23:13:17
Message-ID: 4E333E8D.7070005 () oracle ! com
[Download RAW message or body]


Actually I missed the other comment that we could go further and remove 
the unneeded constructor.

http://cr.openjdk.java.net/~kevinw/7072527/webrev.01/

Thanks
Kevin

On 29/07/11 17:35, Kevin Walls wrote:
> Hi --
>
> I got this ready in a webrev:
>
> http://cr.openjdk.java.net/~kevinw/7072527/webrev.00/
>
> The new test is passing, as are the old ones. 8-)
>
> Also, this change means that a CMS cycle ending in concurrent mode 
> failure now counts as one collection.  One could argue that either way 
> (should it be 2 collections?) - but I'm thinking if we are counting 
> "completed" collections then we are now counting correctly!
>
> Thanks
> Kevin
>
>
> On 29/07/11 02:08, Y. S. Ramakrishna wrote:
>> I filed: 7072527 CMS: JMM GC counters overcount in some cases
>>
>> On 07/28/11 17:29, Y. S. Ramakrishna wrote:
>>> Hi Kevin --
>>>
>>> thanks for jumping on this! More inline below ...
>>>
>>> On 07/28/11 09:33, Krystal Mok wrote:
>>>> Hi Kevin,
>>>>
>>>> Thank you for taking care of this, and it's good to see the problem 
>>>> is verified.
>>>>
>>>> I think whether or not the suggested fix is sufficient depends on 
>>>> what paths can reach CMSCollector::do_compaction_work(). If all 
>>>> paths that can reach CMSCollector::do_compaction_work() come from 
>>>> GenCollectedHeap::do_collection(), then the fix should be good to 
>>>> go. Otherwise it'll need a better workaround.
>>>>
>>>> I believe all concurrent mode failures/interrupts (which includes 
>>>> the System.gc() case) does come from 
>>>> GenCollectedHeap::do_collection(), but I'm not exactly sure about 
>>>> this, could anybody please clarify on it?
>>>
>>> Yes, i believe this is indeed the case, and my browsing of the
>>> code using cscope seemed to confirm that belief.
>>>
>>> More below ...
>>>
>>>>
>>>> Regards,
>>>> Kris Mok
>>>>
>>>> On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls 
>>>> <kevin.walls at oracle.com <mailto:kevin.walls at oracle.com>> wrote:
>>>>
>>>>     __
>>>>     Hi --
>>>>
>>>>     6580448 was marked as a duplicate of 6581734, which fixed the fact
>>>>     that CMS collections were just not counted at all - with CMS, 
>>>> only a
>>>>     stop the world full gc would be counted in the stats.
>>>>
>>>>     But looks like you're right...  Here is a quick variation of the
>>>>     testcase from 6581734 which shows the same thing, and this 
>>>> verifies
>>>>     the same, and is solved by ExplicitGCInvokesConcurrent.  If 
>>>> there is
>>>>     no other feedback I can test if the removal of the
>>>>     TraceCMSMemoryManagerStats() call in
>>>>     CMSCollector::do_compaction_work is all we need...
>>>
>>>
>>> Kevin, yes, it would be great if you could verify this and push the 
>>> fix.
>>> I am not sure if the push would need to wait for the signing of OCA
>>> from Kris, but best to check with Those Who Would Know Such Things.
>>>
>>> Since the original CR has been closed, i'll open one momentarily and
>>> can make you RE (if that's OK with you). I'll be happy to serve as
>>> reviewer of the change.
>>>
>>> As regards the jstat counter reporting two pauses per concurrent
>>> CMS cycle, I am of two minds on what the original intention
>>> was. I'd have originally regarded the double increment as a
>>> bug, but as you state it is really two pauses, even if part of
>>> a single cycle. And it makes sense to count them as two. I
>>> agree that this should be documented and left alone, given
>>> how long we have had this behaviour, and the alternative
>>> (of counting cycles, rather than pauses) may be no better
>>> (or arguably worse). There's actually an open CR for this
>>> which we can redirect into a CR to update the relevant documentation.
>>>
>>> -- ramki
>>>
>>>>
>>>>     Regards
>>>>     Kevin
>>>>
>>>>
>>>>     /*
>>>>      * Copyright (c) 2011, Oracle and/or its affiliates. All rights
>>>>     reserved.
>>>>      * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>>>      *
>>>>      * This code is free software; you can redistribute it and/or 
>>>> modify it
>>>>      * under the terms of the GNU General Public License version 2 
>>>> only, as
>>>>      * published by the Free Software Foundation.
>>>>      *
>>>>      * This code is distributed in the hope that it will be useful, 
>>>> but
>>>>     WITHOUT
>>>>      * ANY WARRANTY; without even the implied warranty of 
>>>> MERCHANTABILITY or
>>>>      * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General 
>>>> Public License
>>>>      * version 2 for more details (a copy is included in the LICENSE
>>>>     file that
>>>>      * accompanied this code).
>>>>      *
>>>>      * You should have received a copy of the GNU General Public 
>>>> License
>>>>     version
>>>>      * 2 along with this work; if not, write to the Free Software
>>>>     Foundation,
>>>>      * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
>>>>      *
>>>>      * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
>>>>     94065 USA
>>>>      * or visit www.oracle.com <http://www.oracle.com> if you need
>>>>     additional information or have any
>>>>      * questions.
>>>>      */
>>>>
>>>>     /*
>>>>      * @test TestFullGCount.java
>>>>      * @bug
>>>>      * @summary
>>>>      * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount
>>>>      *
>>>>      */
>>>>     import java.util.*;
>>>>     import java.lang.management.*;
>>>>
>>>>
>>>>
>>>>     public class TestFullGCCount {
>>>>
>>>>     private String poolName = "CMS";
>>>>     private String collectorName = "ConcurrentMarkSweep";
>>>>
>>>>     public static void main(String [] args) {
>>>>
>>>>         TestFullGCCount t = null;
>>>>         if (args.length==2) {
>>>>             t = new TestFullGCCount(args[0], args[1]);
>>>>         } else {
>>>>             System.out.println("Defaulting to monitor CMS pool and
>>>>     collector.");
>>>>             t = new TestFullGCCount();
>>>>         }
>>>>         t.run();
>>>>
>>>>     }
>>>>
>>>>     public TestFullGCCount(String pool, String collector) {
>>>>         poolName = pool;
>>>>         collectorName = collector;
>>>>     }
>>>>     public TestFullGCCount() {
>>>>     }
>>>>
>>>>     public void run() {
>>>>
>>>>         int count = 0;
>>>>         int iterations = 20;
>>>>         long counts[] = new long[iterations];
>>>>         boolean diffAlways2 = true; // assume we will fail
>>>>
>>>>         for (int i=0; i<iterations; i++) {
>>>>             System.gc();
>>>>             counts[i] = checkStats();
>>>>             if (i>0) {
>>>>                 if (counts[i] - counts[i-1] != 2) {
>>>>                     diffAlways2 = false;
>>>>                 }
>>>>             }
>>>>         }
>>>>         if (diffAlways2) {
>>>>             throw new RuntimeException("FAILED: difference in count is
>>>>     always 2.");
>>>>         }
>>>>         System.out.println("Passed.");
>>>>     }
>>>>
>>>>     private long checkStats() {
>>>>         long count = 0;
>>>>         List<MemoryPoolMXBean> pools =
>>>>     ManagementFactory.getMemoryPoolMXBeans();
>>>>         List<GarbageCollectorMXBean> collectors =
>>>>     ManagementFactory.getGarbageCollectorMXBeans();
>>>>         for (int i=0; i<collectors.size(); i++) {
>>>>             GarbageCollectorMXBean collector = collectors.get(i);
>>>>             String name = collector.getName();
>>>>             if (name.contains(collectorName)) {
>>>>                 System.out.println(name + ": collection count = "
>>>>                                    + collector.getCollectionCount());
>>>>                 count = collector.getCollectionCount();
>>>>             }
>>>>         }
>>>>         return count;
>>>>
>>>>     }
>>>>
>>>>     }
>>>>
>>>>
>>>>     On 27/07/11 17:12, Krystal Mok wrote:
>>>>>     Hi all,
>>>>>
>>>>>     I've been looking at a strange inconsistency of full GC count
>>>>>     recorded by jvmstat and JMM counters. I'd like to know which ones
>>>>>     of the following behaviors are by design, which ones are bugs, 
>>>>> and
>>>>>     which ones are just my misunderstanding. I apologize for making a
>>>>>     short story long...
>>>>>
>>>>>     =====================================================
>>>>>
>>>>>     The counters involved:
>>>>>
>>>>>     * A jvmstat counter named "sun.gc.collector.1.invocations" keeps
>>>>>     track of the number of pauses occured as a result of a major
>>>>>     collection. It is used by utilities such as jstat as the 
>>>>> source of
>>>>>     "FGC" (full collection count), and the old gen collection 
>>>>> count in
>>>>>     Visual GC. It's updated by an TraceCollectorStats object.
>>>>>     * A JMM counter, GCMemoryManager::_num_collections, keeps 
>>>>> track of
>>>>>     the number of collections that have ended. This counter is 
>>>>> used as
>>>>>     HotSpot's implementation of the JMX
>>>>>     GarbageCollectorMXBean.getCollectionCount(). It's updated by
>>>>>     either a TraceMemoryManagerStats object or a
>>>>>     TraceCMSMemoryManagerStats object.
>>>>>
>>>>>     To show the situation, I've made a screenshot of a VisualVM and a
>>>>>     JConsole running side by side, both are monitoring the VisualVM's
>>>>>     GC stats:
>>>>>     
>>>>> http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png 
>>>>>
>>>>>     (I'll upload the screenshot to somewhere else if anybody can't 
>>>>> see it)
>>>>>     The VisualVM instance is running on JDK6u26, with ParNew+CMS.
>>>>>     In the screenshot, Visual GC reports that the old gen collection
>>>>>     count is 20, while JConsole reports 10.
>>>>>
>>>>>     I see that there was this bug:
>>>>>     6580448: CMS: Full GC collection count mismatch between
>>>>>     GarbageCollectorMXBean and jvmstat (VisualGC)
>>>>>     I don't think the current implementation has a bug in the sense
>>>>>     that the two counters don't report the same number.
>>>>>
>>>>>     This behavior seems reasonable, but the naming of the value in
>>>>>     these tools are confusing: both tools say "collections", but
>>>>>     apparently the number in Visual GC means "number of pauses" where
>>>>>     as the number in JConsole means "number of collection cycles".
>>>>>     It'd be great if the difference could be documented somewhere, if
>>>>>     that's the intended behavior.
>>>>>
>>>>>     And then the buggy behavior. Code demo posted on gist:
>>>>>     https://gist.github.com/1106263
>>>>>     Starting from JDK6u23, when using CMS without
>>>>>     ExplicitGCInvokesConcurrent, System.gc() (or
>>>>>     Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would
>>>>>     make the JMM GC counter increment by 2 per invocation, while the
>>>>>     jvmstat counter is only incremented by 1. I believe the latter is
>>>>>     correct and the former needs some fixing.
>>>>>
>>>>>     =====================================================
>>>>>
>>>>>     My understanding of the behavior shown above:
>>>>>
>>>>>     1. The concurrent GC part:
>>>>>
>>>>>     There are 2 pauses in a CMS concurrent GC cycle, one in the
>>>>>     initial mark phase, and one in the final remark phase.
>>>>>     To trigger a concurrent GC cycle, the CMS thread wakes up
>>>>>     periodically to see if it shouldConcurrentCollect(), and 
>>>>> trigger a
>>>>>     cycle when the predicate returned true, or goes back to sleep if
>>>>>     the predicate returned false. The whole concurrent GC cycle
>>>>>     doesn't go through GenCollectedHeap::do_collection().
>>>>>
>>>>>     The jvmstat counter for old gen pauses is updated in
>>>>>     CMSCollector::do_CMS_operation(CMS_op_type op), which exactly
>>>>>     covers both pause phases.
>>>>>
>>>>>     The JMM counter, however, is updated in the concurrent sweep
>>>>>     phase, CMSCollector::sweep(bool asynch), if there was no
>>>>>     concurrent mode failure; or it is updated in
>>>>>     CMSCollector::do_compaction_work(bool clear_all_soft_refs) in 
>>>>> case
>>>>>     of a bailout due to concurrent mode failure (advertised as so in
>>>>>     the code comments). So that's an increment by 1 per concurrent GC
>>>>>     cycle, which does reflect the "number of collection cycles", fair
>>>>>     enough.
>>>>>
>>>>>     So far so good.
>>>>>
>>>>>     2. The System.gc() part:
>>>>>
>>>>>     Without ExplicitGCInvokesConcurrent set, System.gc() does a
>>>>>     stop-the-world full GC, which consists of only one pause, so
>>>>>     "number of pauses" would equal "number of collections" in this 
>>>>> case.
>>>>>     It will go into GenCollectedHeap::do_collection(); both the
>>>>>     jvmstat and the JMM GC counter gets incremented by 1 here,
>>>>>
>>>>>     TraceCollectorStats tcs(_gens[i]->counters());
>>>>>     TraceMemoryManagerStats tmms(_gens[i]->kind());
>>>>>
>>>>>     But, drilling down into:
>>>>>     _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
>>>>>
>>>>>     That'll eventually go into:
>>>>>     CMSCollector::acquire_control_and_collect(bool full, bool
>>>>>     clear_all_soft_refs)
>>>>>
>>>>>     System.gc() is user requested so that'll go further into
>>>>>     mark-sweep-compact:
>>>>>     CMSCollector::do_compaction_work(bool clear_all_soft_refs)
>>>>>     And here, it increments the JMM GC counter again (remember it was
>>>>>     in the concurrent GC path too, to handle bailouts), even though
>>>>>     this is still in the same collection. This leads to the "buggy
>>>>>     behavior" mentioned earlier.
>>>>>
>>>>>     The JMM GC counter wasn't added to CMS until this fix got in:
>>>>>     6581734: CMS Old Gen's collection usage is zero after GC which is
>>>>>     incorrect
>>>>>
>>>>>     The code added to CMSCollector::do_compaction_work() works 
>>>>> fine in
>>>>>     the concurrent GC path, but interacts badly with the original
>>>>>     logic in GenCollectedHeap::do_collection().
>>>>>
>>>>>     =====================================================
>>>>>
>>>>>     I thought all concurrent mode failures/interrupts come from
>>>>>     GenCollectedHeap::do_collection(). If that's the case, then it
>>>>>     seems unnecessary to update the JMM GC counter in
>>>>>     CMSCollector::do_compaction_work(), simply removing it should fix
>>>>>     the problem.
>>>>>
>>>>>     With that, I'd purpose the following (XS) change: (diff 
>>>>> against HS20)
>>>>>
>>>>>     diff -r f0f676c5a2c6
>>>>>     
>>>>> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>>>
>>>>>     ---
>>>>>     
>>>>> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>>>
>>>>>          Tue Mar 15 19:30:16 2011 -0700
>>>>>     +++
>>>>>     
>>>>> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>>>
>>>>>          Thu Jul 28 00:02:41 2011 +0800
>>>>>     @@ -2022,9 +2022,6 @@
>>>>>                                                     
>>>>> _intra_sweep_estimate.padded_average());
>>>>>        }
>>>>>
>>>>>     -  {
>>>>>     -    TraceCMSMemoryManagerStats();
>>>>>     -  }
>>>>>        GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
>>>>>          ref_processor(), clear_all_soft_refs);
>>>>>        #ifdef ASSERT
>>>>>
>>>>>     The same goes for the changes in:
>>>>>     7036199: Adding a notification to the implementation of
>>>>>     GarbageCollectorMXBeans
>>>>>
>>>>>     =====================================================
>>>>>
>>>>>     P.S. Is there an "official" name for the counters that I referred
>>>>>     to as "jvmstat counters" above? Is it just jvmstat, or 
>>>>> PerfData or
>>>>>     HSPERFDATA?
>>>>
>>>>
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic