'Re: Low-Overhead Heap Profiling'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: Low-Overhead Heap Profiling
From:       JC Beyler <jcbeyler () google ! com>
Date:       2017-06-30 5:01:30
Message-ID: CAF9BGBzbf6e809jNQKh1C-zco9VR-Z003Qu6dKcr7yM4a_Zjbg () mail ! gmail ! com
[Download RAW message or body]

Thanks Robbin,

This seems to have worked. When I have the next webrev ready, we will find
out but I'm fairly confident it will work!

Thanks agian!
Jc

On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn <robbin.ehn@oracle.com> wrote:

> Hi JC,
>
> On 06/29/2017 12:15 AM, JC Beyler wrote:
>
>> B) Incremental changes
>>
>
> I guess the most common work flow here is using mq :
> hg qnew fix_v1
> edit files
> hg qrefresh
> hg qnew fix_v2
> edit files
> hg qrefresh
>
> if you do hg log you will see 2 commits
>
> webrev.ksh -r -2 -o my_inc_v1_v2
> webrev.ksh -o my_full_v2
>
>
> In  your .hgrc you might need:
> [extensions]
> mq =
>
> /Robbin
>
>
>> Again another newbiew question here...
>>
>> For showing the incremental changes, is there a link that explains how to
>> do that? I apologize for my newbie questions all the time :)
>>
>> Right now, I do:
>>
>>   ksh ../webrev.ksh -m -N
>>
>> That generates a webrev.zip and send it to Chuck Rasbold. He then uploads
>> it to a new webrev.
>>
>> I tried commiting my change and adding a small change. Then if I just do
>> ksh ../webrev.ksh without any options, it seems to produce a similar page
>> but now with only the changes I had (so the 06-07 comparison you were
>> talking about) and a changeset that has it all. I imagine that is what you
>> meant.
>>
>> Which means that my workflow would become:
>>
>> 1) Make changes
>> 2) Make a webrev without any options to show just the differences with
>> the tip
>> 3) Amend my changes to my local commit so that I have it done with
>> 4) Go to 1
>>
>> Does that seem correct to you?
>>
>> Note that when I do this, I only see the full change of a file in the
>> full change set (Side note here: now the page says change set and not
>> patch, which is maybe why Serguei was having issues?).
>>
>> Thanks!
>> Jc
>>
>>
>>
>> On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn <robbin.ehn@oracle.com
>> <mailto:robbin.ehn@oracle.com>> wrote:
>>
>>     Hi,
>>
>>     On 06/28/2017 12:04 AM, JC Beyler wrote:
>>
>>         Dear Thomas et al,
>>
>>         Here is the newest webrev:
>>         http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>
>>
>>
>>
>>     You have some more bits to in there but generally this looks good and
>> really nice with more tests.
>>     I'll do and deep dive and re-test this when I get back from my long
>> vacation with whatever patch version you have then.
>>
>>     Also I think it's time you provide incremental (v06->07 changes) as
>> well as complete change-sets.
>>
>>     Thanks, Robbin
>>
>>
>>
>>
>>         Thomas, I "think" I have answered all your remarks. The summary
>> is:
>>
>>         - The statistic system is up and provides insight on what the
>> heap sampler is doing
>>              - I've noticed that, though the sampling rate is at the
>> right mean, we are missing some samples, I have not yet tracked out why
>> (details below)
>>
>>         - I've run a tiny benchmark that is the worse case: it is a very
>> tight loop and allocated a small array
>>              - In this case, I see no overhead when the system is off so
>> that is a good start :)
>>              - I see right now a high overhead in this case when sampling
>> is on. This is not a really too surprising but I'm going to see if this is
>> consistent with our
>>         internal implementation. The benchmark is really allocation
>> stressful so I'm not too surprised but I want to do the due diligence.
>>
>>            - The statistic system up is up and I have a new test
>>         http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>> >
>>               - I did a bit of a study about the random generator here,
>> more details are below but basically it seems to work well
>>
>>            - I added a capability but since this is the first time doing
>> this, I was not sure I did it right
>>              - I did add a test though for it and the test seems to do
>> what I expect (all methods are failing with the
>> JVMTI_ERROR_MUST_POSSESS_CAPABILITY error).
>>                  - http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>> bilityTest.java.patch>
>>
>>            - I still need to figure out what to do about the multi-agent
>> vs single-agent issue
>>
>>            - As far as measurements, it seems I still need to look at:
>>              - Why we do the 20 random calls first, are they necessary?
>>              - Look at the mean of the sampling rate that the random
>> generator does and also what is actually sampled
>>              - What is the overhead in terms of memory/performance when
>> on?
>>
>>         I have inlined my answers, I think I got them all in the new
>> webrev, let me know your thoughts.
>>
>>         Thanks again!
>>         Jc
>>
>>
>>         On Fri, Jun 23, 2017 at 3:52 AM, Thomas Schatzl <
>> thomas.schatzl@oracle.com <mailto:thomas.schatzl@oracle.com> <mailto:
>> thomas.schatzl@oracle.com
>>
>>         <mailto:thomas.schatzl@oracle.com>>> wrote:
>>
>>              Hi,
>>
>>              On Wed, 2017-06-21 at 13:45 -0700, JC Beyler wrote:
>>              > Hi all,
>>              >
>>              > First off: Thanks again to Robbin and Thomas for their
>> reviews :)
>>              >
>>              > Next, I've uploaded a new webrev:
>>              > http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>
>>
>>              >
>>              > Here is an update:
>>              >
>>              > - @Robbin, I forgot to say that yes I need to look at
>> implementing
>>              > this for the other architectures and testing it before it
>> is all
>>              > ready to go. Is it common to have it working on all
>> possible
>>              > combinations or is there a subset that I should be doing
>> first and we
>>              > can do the others later?
>>              > - I've tested slowdebug, built and ran the JTreg tests I
>> wrote with
>>              > slowdebug and fixed a few more issues
>>              > - I've refactored a bit of the code following Thomas'
>> comments
>>              >    - I think I've handled all the comments from Thomas (I
>> put
>>              > comments inline below for the specifics)
>>
>>              Thanks for handling all those.
>>
>>              > - Following Thomas' comments on statistics, I want to add
>> some
>>              > quality assurance tests and find that the easiest way
>> would be to
>>              > have a few counters of what is happening in the sampler
>> and expose
>>              > that to the user.
>>              >    - I'll be adding that in the next version if no one
>> sees any
>>              > objections to that.
>>              >    - This will allow me to add a sanity test in JTreg
>> about number of
>>              > samples and average of sampling rate
>>              >
>>              > @Thomas: I had a few questions that I inlined below but I
>> will
>>              > summarize the "bigger ones" here:
>>              >    - You mentioned constants are not using the right
>> conventions, I
>>              > looked around and didn't see any convention except normal
>> naming then
>>              > for static constants. Is that right?
>>
>>              I looked through https://wiki.openjdk.java.net/
>> display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui>
>>         <https://wiki.openjdk.java.net/display/HotSpot/StyleGui <
>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>
>>              de and the rule is to "follow an existing pattern and must
>> have a
>>              distinct appearance from other names". Which does not help a
>> lot I
>>              guess :/ The GC team started using upper camel case, e.g.
>>              SomeOtherConstant, but very likely this is probably not
>> applied
>>              consistently throughout. So I am fine with not adding
>> another style
>>              (like kMaxStackDepth with the "k" in front with some unknown
>> meaning)
>>              is fine.
>>
>>              (Chances are you will find that style somewhere used anyway
>> too,
>>              apologies if so :/)
>>
>>
>>         Thanks for that link, now I know where to look. I used the upper
>> camel case in my code as well then :) I should have gotten them all.
>>
>>
>>               > PS: I've also inlined my answers to Thomas below:
>>               >
>>               > On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl
>> <thomas.schatzl@oracl
>>               > e.com <http://e.com> <http://e.com>> wrote:
>>               > > Hi all,
>>               > >
>>               > > On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote:
>>               > > > Dear all,
>>               > > >
>>               > > > I've continued working on this and have done the
>> following
>>               > > webrev:
>>               > > > http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/>
>>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>
>>
>>               > >
>>               > > [...]
>>               > > > Things I still need to do:
>>               > > >    - Have to fix that TLAB case for the FastTLABRefill
>>               > > >    - Have to start looking at the data to see that it
>> is
>>               > > consistent and does gather the right samples, right
>> frequency, etc.
>>               > > >    - Have to check the GC elements and what that
>> produces
>>               > > >    - Run a slowdebug run and ensure I fixed all those
>> issues you
>>               > > saw > Robbin
>>               > > >
>>               > > > Thanks for looking at the webrev and have a great
>> week!
>>               > >
>>               > >   scratching a bit on the surface of this change, so
>> apologies for
>>               > > rather shallow comments:
>>               > >
>>               > > - macroAssembler_x86.cpp:5604: while this is compiler
>> code, and I
>>               > > am not sure this is final, please avoid littering the
>> code with
>>               > > TODO remarks :) They tend to be candidates for later
>> wtf moments
>>               > > only.
>>               > >
>>               > > Just file a CR for that.
>>               > >
>>               > Newcomer question: what is a CR and not sure I have the
>> rights to do
>>               > that yet ? :)
>>
>>              Apologies. CR is a change request, this suggests to file a
>> bug in the
>>              bug tracker. And you are right, you can't just create a new
>> account in
>>              the OpenJDK JIRA yourselves. :(
>>
>>
>>         Ok good to know, I'll continue with my own todo list but I'll
>> work hard on not letting it slip in the webrevs anymore :)
>>
>>
>>              I was mostly referring to the "... but it is a TODO" part of
>> that
>>              comment in macroassembler_x86.cpp. Comments about the why of
>> the code
>>              are appreciated.
>>
>>              [Note that I now understand that this is to some degree
>> still work in
>>              progress. As long as the final changeset does no contain
>> TODO's I am
>>              fine (and it's not a hard objection, rather their use in
>> "final" code
>>              is typically limited in my experience)]
>>
>>              5603   // Currently, if this happens, just set back the
>> actual end to
>>              where it was.
>>              5604   // We miss a chance to sample here.
>>
>>              Would be okay, if explaining "this" and the "why" of missing
>> a chance
>>              to sample here would be best.
>>
>>              Like maybe:
>>
>>              // If we needed to refill TLABs, just set the actual end
>> point to
>>              // the end of the TLAB again. We do not sample here although
>> we could.
>>
>>         Done with your comment, it works well in my mind.
>>
>>              I am not sure whether "miss a chance to sample" meant "we
>> could, but
>>              consciously don't because it's not that useful" or "it would
>> be
>>              necessary but don't because it's too complicated to do.".
>>
>>              Looking at the original comment once more, I am also not
>> sure if that
>>              comment shouldn't referring to the "end" variable (not
>> actual_end)
>>              because that's the variable that is responsible for taking
>> the sampling
>>              path? (Going from the member description of
>> ThreadLocalAllocBuffer).
>>
>>
>>         I've moved this code and it no longer shows up here but the
>> rationale and answer was:
>>
>>         So.. Yes, end is the variable provoking the sampling. Actual end
>> is the actual end of the TLAB.
>>
>>         What was happening here is that the code is resetting _end to
>> point towards the end of the new TLAB. Because, we now have the end for
>> sampling and _actual_end for
>>         the actual end, we need to update the actual_end as well.
>>
>>         Normally, were we to do the real work here, we would calculate
>> the (end - start) offset, then do:
>>
>>         - Set the new end to : start + (old_end - old_start)
>>         - Set the actual end like we do here now where it because it is
>> the actual end.
>>
>>         Why is this not done here now anymore?
>>             - I was still debating which path to take:
>>                - Do it in the fast refill code, it has its perks:
>>                    - In a world where fast refills are happening all the
>> time or a lot, we can augment there the code to do the sampling
>>                - Remember what we had as an end before leaving the
>> slowpath and check on return
>>                    - This is what I'm doing now, it removes the need to
>> go fix up all fast refill paths but if you remain in fast refill paths, you
>> won't get sampling. I
>>         have to think of the consequences of that, maybe a future change
>> later on?
>>                       - I have the statistics now so I'm going to study
>> that
>>                          -> By the way, though my statistics are showing
>> I'm missing some samples, if I turn off FastTlabRefill, it is the same loss
>> so for now, it seems
>>         this does not occur in my simple test.
>>
>>
>>
>>              But maybe I am only confused and it's best to just leave the
>> comment
>>              away. :)
>>
>>              Thinking about it some more, doesn't this not-sampling in
>> this case
>>              mean that sampling does not work in any collector that does
>> inline TLAB
>>              allocation at the moment? (Or is inline TLAB alloc
>> automatically
>>              disabled with sampling somehow?)
>>
>>              That would indeed be a bigger TODO then :)
>>
>>
>>         Agreed, this remark made me think that perhaps as a first step
>> the new way of doing it is better but I did have to:
>>            - Remove the const of the ThreadLocalBuffer remaining and
>> hard_end methods
>>            - Move hard_end out of the header file to have a bit more
>> logic there
>>
>>         Please let me know what you think of that and if you prefer it
>> this way or changing the fast refills. (I prefer this way now because it is
>> more incremental).
>>
>>
>>              > > - calling HeapMonitoring::do_weak_oops() (which should
>> probably be
>>              > > called weak_oops_do() like other similar methods) only
>> if string
>>              > > deduplication is enabled (in g1CollectedHeap.cpp:4511)
>> seems wrong.
>>              >
>>              > The call should be at least around 6 lines up outside the
>> if.
>>              >
>>              > Preferentially in a method like
>> process_weak_jni_handles(), including
>>              > additional logging. (No new (G1) gc phase without minimal
>> logging
>>              > :)).
>>              > Done but really not sure because:
>>              >
>>              > I put for logging:
>>              >   log_develop_trace(gc, freelist)("G1ConcRegionFreeing
>> [other] : heap
>>              > monitoring");
>>
>>              I would think that "gc, ref" would be more appropriate log
>> tags for
>>              this similar to jni handles.
>>              (I am als not sure what weak reference handling has to do
>> with
>>              G1ConcRegionFreeing, so I am a bit puzzled)
>>
>>
>>         I was not sure what to put for the tags or really as the message.
>> I cleaned it up a bit now to:
>>              log_develop_trace(gc, ref)("HeapSampling [other] : heap
>> monitoring processing");
>>
>>
>>
>>              > Since weak_jni_handles didn't have logging for me to be
>> inspired
>>              > from, I did that but unconvinced this is what should be
>> done.
>>
>>              The JNI handle processing does have logging, but only in
>>              ReferenceProcessor::process_discovered_references(). In
>>              process_weak_jni_handles() only overall time is measured (in
>> a G1
>>              specific way, since only G1 supports disabling reference
>> procesing) :/
>>
>>              The code in ReferenceProcessor prints both time taken
>>              referenceProcessor.cpp:254, as well as the count, but
>> strangely only in
>>              debug VMs.
>>
>>              I have no idea why this logging is that unimportant to only
>> print that
>>              in a debug VM. However there are reviews out for changing
>> this area a
>>              bit, so it might be useful to wait for that (JDK-8173335).
>>
>>
>>         I cleaned it up a bit anyway and now it returns the count of
>> objects that are in the system.
>>
>>
>>              > > - the change doubles the size of
>>              > > CollectedHeap::allocate_from_tlab_slow() above the
>> "small and nice"
>>              > > threshold. Maybe it could be refactored a bit.
>>              > Done I think, it looks better to me :).
>>
>>              In ThreadLocalAllocBuffer::handle_sample() I think the
>>              set_back_actual_end()/pick_next_sample() calls could be
>> hoisted out of
>>              the "if" :)
>>
>>
>>         Done!
>>
>>
>>              > > - referenceProcessor.cpp:261: the change should add
>> logging about
>>              > > the number of references encountered, maybe after the
>> corresponding
>>              > > "JNI weak reference count" log message.
>>              > Just to double check, are you saying that you'd like to
>> have the heap
>>              > sampler to keep in store how many sampled objects were
>> encountered in
>>              > the HeapMonitoring::weak_oops_do?
>>              >    - Would a return of the method with the number of
>> handled
>>              > references and logging that work?
>>
>>              Yes, it's fine if HeapMonitoring::weak_oops_do() only
>> returned the
>>              number of processed weak oops.
>>
>>
>>         Done also (but I admit I have not tested the output yet) :)
>>
>>
>>              >    - Additionally, would you prefer it in a separate block
>> with its
>>              > GCTraceTime?
>>
>>              Yes. Both kinds of information is interesting: while the
>> time taken is
>>              typically more important, the next question would be why,
>> and the
>>              number of references typically goes a long way there.
>>
>>              See above though, it is probably best to wait a bit.
>>
>>
>>         Agreed that I "could" wait but, if it's ok, I'll just
>> refactor/remove this when we get closer to something final. Either,
>> JDK-8173335
>>         has gone in and I will notice it now or it will soon and I can
>> change it then.
>>
>>
>>              > > - threadLocalAllocBuffer.cpp:331: one more "TODO"
>>              > Removed it and added it to my personal todos to look at.
>>              >      > >
>>              > > - threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer
>> class
>>              > > documentation should be updated about the sampling
>> additions. I
>>              > > would have no clue what the difference between
>> "actual_end" and
>>              > > "end" would be from the given information.
>>              > If you are talking about the comments in this file, I made
>> them more
>>              > clear I hope in the new webrev. If it was somewhere else,
>> let me know
>>              > where to change.
>>
>>              Thanks, that's much better. Maybe a note in the comment of
>> the class
>>              that ThreadLocalBuffer provides some sampling facility by
>> modifying the
>>              end() of the TLAB to cause "frequent" calls into the runtime
>> call where
>>              actual sampling takes place.
>>
>>
>>         Done, I think it's better now. Added something about the
>> slow_path_end as well.
>>
>>
>>              > > - in heapMonitoring.hpp: there are some random comments
>> about some
>>              > > code that has been grabbed from
>> "util/math/fastmath.[h|cc]". I
>>              > > can't tell whether this is code that can be used but I
>> assume that
>>              > > Noam Shazeer is okay with that (i.e. that's all Google
>> code).
>>              > Jeremy and I double checked and we can release that as I
>> thought. I
>>              > removed the comment from that piece of code entirely.
>>
>>              Thanks.
>>
>>              > > - heapMonitoring.hpp/cpp static constant naming does not
>> correspond
>>              > > to Hotspot's. Additionally, in Hotspot static methods
>> are cased
>>              > > like other methods.
>>              > I think I fixed the methods to be cased the same way as
>> all other
>>              > methods. For static constants, I was not sure. I fixed a
>> few other
>>              > variables but I could not seem to really see a consistent
>> trend for
>>              > constants. I made them as variables but I'm not sure now.
>>
>>              Sorry again, style is a kind of mess. The goal of my
>> suggestions here
>>              is only to prevent yet another style creeping in.
>>
>>              > > - in heapMonitoring.cpp there are a few cryptic comments
>> at the top
>>              > > that seem to refer to internal stuff that should
>> probably be
>>              > > removed.
>>              > Sorry about that! My personal todos not cleared out.
>>
>>              I am happy about comments, but I simply did not understand
>> any of that
>>              and I do not know about other readers as well.
>>
>>              If you think you will remember removing/updating them until
>> the review
>>              proper (I misunderstood the review situation a little it
>> seems).
>>
>>              > > I did not think through the impact of the TLAB changes
>> on collector
>>              > > behavior yet (if there are). Also I did not check for
>> problems with
>>              > > concurrent mark and SATB/G1 (if there are).
>>              > I would love to know your thoughts on this, I think this
>> is fine. I
>>
>>              I think so too now. No objects are made live out of thin air
>> :)
>>
>>              > see issues with multiple threads right now hitting the
>> stack storage
>>              > instance. Previous webrevs had a mutex lock here but we
>> took it out
>>              > for simplificity (and only for now).
>>
>>              :) When looking at this after some thinking I now assume for
>> this
>>              review that this code is not MT safe at all. There seems to
>> be more
>>              synchronization missing than just the one for the
>> StackTraceStorage. So
>>              no comments about this here.
>>
>>
>>         I doubled checked a bit (quickly I admit) but it seems that
>> synchronization in StackTraceStorage is really all you need (all methods
>> lead to a StackTraceStorage one
>>         and can be multithreaded outside of that).
>>         There is a question about the initialization where the method
>> HeapMonitoring::initialize_profiling is not thread safe.
>>         It would work (famous last words) and not crash if there was a
>> race but we could add a synchronization point there as well (and therefore
>> on the stop as well).
>>
>>         But anyway I will really check and do this once we add back
>> synchronization.
>>
>>
>>              Also, this would require some kind of specification of what
>> is allowed
>>              to be called when and where.
>>
>>
>>         Would we specify this with the methods in the jvmti.xml file? We
>> could start by specifying in each that they are not thread safe but I saw
>> no mention of that for
>>         other methods.
>>
>>
>>              One potentially relevant observation about locking here:
>> depending on
>>              sampling frequency, StackTraceStore::add_trace() may be
>> rather
>>              frequently called. I assume that you are going to do
>> measurements :)
>>
>>
>>         Though we don't have the TLAB implementation in our code, the
>> compiler generated sampler uses 2% of overhead with a 512k sampling rate. I
>> can do real measurements
>>         when the code settles and we can see how costly this is as a TLAB
>> implementation.
>>         However, my theory is that if the rate is 512k, the
>> memory/performance overhead should be minimal since it is what we saw with
>> our code/workloads (though not called
>>         the same way, we call it essentially at the same rate).
>>         If you have a benchmark you'd like me to test, let me know!
>>
>>         Right now, with my really small test, this does use a bit of
>> overhead even for a 512k sample size. I don't know yet why, I'm going to
>> see what is going on.
>>
>>         Finally, I think it is not reasonable to suppose the overhead to
>> be negligible if the sampling rate used is too low. The user should know
>> that the lower the rate,
>>         the higher the overhead (documentation TODO?).
>>
>>
>>              I am not sure what the expected usage of the API is, but
>>              StackTraceStore::add_trace() seems to be able to grow
>> without bounds.
>>              Only a GC truncates them to the live ones. That in itself
>> seems to be
>>              problematic (GCs can be *wide* apart), and of course some of
>> the API
>>              methods add to that because they duplicate that unbounded
>> array. Do you
>>              have any concerns/measurements about this?
>>
>>
>>         So, the theory is that yes add_trace can be able to grow without
>> bounds but it grows at a sample per 512k of allocated space. The stacks it
>> gathers are currently
>>         maxed at 64 (I'd like to expand that to an option to the user
>> though at some point). So I have no concerns because:
>>
>>         - If really this is taking a lot of space, that means the job is
>> keeping a lot of objects in memory as well, therefore the entire heap is
>> getting huge
>>         - If this is the case, you will be triggering a GC at some point
>> anyway.
>>
>>         (I'm putting under the rug the issue of "What if we set the rate
>> to 1 for example" because as you lower the sampling rate, we cannot
>> guarantee low overhead; the
>>         idea behind this feature is to have a means of having meaningful
>> allocated samples at a low overhead)
>>
>>         I have no measurements really right now but since I now have some
>> statistics I can poll, I will look a bit more at this question.
>>
>>         I have the same last sentence than above: the user should expect
>> this to happen if the sampling rate is too small. That probably can be
>> reflected in the
>>         StartHeapSampling as a note : careful this might impact your
>> performance.
>>
>>
>>              Also, these stack traces might hold on to huge arrays. Any
>>              consideration of that? Particularly it might be the cause
>> for OOMEs in
>>              tight memory situations.
>>
>>
>>         There is a stack size maximum that is set to 64 so it should not
>> hold huge arrays. I don't think this is an issue but I can double check
>> with a test or two.
>>
>>
>>              - please consider adding a safepoint check in
>>              HeapMonitoring::weak_oops_do to prevent accidental misuse.
>>
>>              - in struct StackTraceStorage, the public fields may also
>> need
>>              underscores. At least some files in the runtime directory
>> have structs
>>              with underscored public members (and some don't). The
>> runtime team
>>              should probably comment on that.
>>
>>
>>         Agreed I did not know. I looked around and a lot of structs did
>> not have them it seemed so I left it as is. I will happily change it if
>> someone prefers (I was not
>>         sure if you really preferred or not, your sentence seemed to be
>> more a note of "this might need to change but I don't know if the runtime
>> team enforces that", let
>>         me know if I read that wrongly).
>>
>>
>>              - In StackTraceStorage::weak_oops_do(), when examining the
>>              StackTraceData, maybe it is useful to consider having a
>> non-NULL
>>              reference outside of the heap's reserved space an error.
>> There should
>>              be no oop outside of the heap's reserved space ever.
>>
>>              Unless you allow storing random values in
>> StackTraceData::obj, which I
>>              would not encourage.
>>
>>
>>         I suppose you are talking about this part:
>>         if ((value != NULL && Universe::heap()->is_in_reserved(value)) &&
>>                     (is_alive == NULL || is_alive->do_object_b(value))) {
>>
>>         What you are saying is that I could have something like:
>>         if (value != my_non_null_reference &&
>>                     (is_alive == NULL || is_alive->do_object_b(value))) {
>>
>>         Is that what you meant? Is there really a reason to do so? When I
>> look at the code, is_in_reserved seems like a O(1) method call. I'm not
>> even sure we can have a
>>         NULL value to be honest. I might have to study that to see if
>> this was not a paranoid test to begin with.
>>
>>         The is_alive code has now morphed due to the comment below.
>>
>>
>>
>>              - HeapMonitoring::weak_oops_do() does not seem to use the
>>              passed AbstractRefProcTaskExecutor.
>>
>>
>>         It did use it:
>>            size_t HeapMonitoring::weak_oops_do(
>>               AbstractRefProcTaskExecutor *task_executor,
>>               BoolObjectClosure* is_alive,
>>               OopClosure *f,
>>               VoidClosure *complete_gc) {
>>             assert(SafepointSynchronize::is_at_safepoint(), "must be at
>> safepoint");
>>
>>             if (task_executor != NULL) {
>>               task_executor->set_single_threaded_mode();
>>             }
>>             return StackTraceStorage::storage()->weak_oops_do(is_alive,
>> f, complete_gc);
>>         }
>>
>>         But due to the comment below, I refactored this, so this is no
>> longer here. Now I have an always true closure that is passed.
>>
>>
>>              - I do not understand allowing to call this method with a
>> NULL
>>              complete_gc closure. This would mean that objects referenced
>> from the
>>              object that is referenced by the StackTraceData are not
>> pulled, meaning
>>              they would get stale.
>>
>>              - same with is_alive parameter value of NULL
>>
>>
>>         So these questions made me look a bit closer at this code. This
>> code I think was written this way to have a very small impact on the file
>> but you are right, there
>>         is no reason for this here. I've simplified the code by making in
>> referenceProcessor.cpp a process_HeapSampling method that handles
>> everything there.
>>
>>         The code allowed NULLs because it depended on where you were
>> coming from and how the code was being called.
>>
>>         - I added a static always_true variable and pass that now to be
>> more consistent with the rest of the code.
>>         - I moved the complete_gc into process_phaseHeapSampling now (new
>> method) and handle the task_executor and the complete_gc there
>>              - Newbie question: in our code we did a
>> set_single_threaded_mode but I see that process_phaseJNI does it right
>> before its call, do I need to do it for the
>>         process_phaseHeapSample?
>>         That API is much cleaner (in my mind) and is consistent with what
>> is done around it (again in my mind).
>>
>>
>>              - heapMonitoring.cpp:590: I do not completely understand the
>> purpose of
>>              this code: in the end this results in a fixed value directly
>> dependent
>>              on the Thread address anyway? In the end this results in a
>> fixed value
>>              directly dependent on the Thread address anyway?
>>              IOW, what is special about exactly 20 rounds?
>>
>>
>>         So we really want a fast random number generator that has a
>> specific mean (512k is the default we use). The code uses the thread
>> address as the start number of the
>>         sequence (why not, it is random enough is rationale). Then
>> instead of just starting there, we prime the sequence and really only start
>> at the 21st number, it is
>>         arbitrary and I have not done a study to see if we could do more
>> or less of that.
>>
>>         As I have the statistics of the system up and running, I'll run
>> some experiments to see if this is needed, is 20 good, or not.
>>
>>
>>              - also I would consider stripping a few bits of the threads'
>> address as
>>              initialization value for your rng. The last three bits (and
>> probably
>>              more, check whether the Thread object is allocated on special
>>              boundaries) are always zero for them.
>>              Not sure if the given "random" value is random enough
>> before/after,
>>              this method, so just skip that comment if you think this is
>> not
>>              required.
>>
>>
>>         I don't know is the honest answer. I think what is important is
>> that we tend towards a mean and it is random "enough" to not fall in
>> pitfalls of only sampling a
>>         subset of objects due to their allocation order. I added that as
>> test to do to see if it changes the mean in any way for the 512k default
>> value and/or if the first
>>         1000 elements look better.
>>
>>
>>              Some more random nits I did not find a place to put anywhere:
>>
>>              - ThreadLocalAllocBuffer::_extra_space does not seem to be
>> used
>>              anywhere?
>>
>>
>>         Good catch :).
>>
>>
>>              - Maybe indent the declaration of
>> ThreadLocalAllocBuffer::_bytes_until_sample to align below the other
>> members of that group.
>>
>>
>>         Done moved it up a bit to have non static members together and
>> static separate.
>>
>>              Thanks,
>>                 Thomas
>>
>>
>>         Thanks for your review!
>>         Jc
>>
>>
>>

[Attachment #3 (text/html)]

<div dir="ltr">Thanks Robbin,<div><br></div><div>This seems to have worked. When I \
have the next webrev ready, we will find out but I&#39;m fairly confident it will \
work!</div><div><br></div><div>Thanks agian!</div><div>Jc</div></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 28, 2017 at 11:46 PM, \
Robbin Ehn <span dir="ltr">&lt;<a href="mailto:robbin.ehn@oracle.com" \
target="_blank">robbin.ehn@oracle.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">Hi JC,<br> <br>
On 06/29/2017 12:15 AM, JC Beyler wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"> B) Incremental changes<br>
</blockquote>
<br>
I guess the most common work flow here is using mq :<br>
hg qnew fix_v1<br>
edit files<br>
hg qrefresh<br>
hg qnew fix_v2<br>
edit files<br>
hg qrefresh<br>
<br>
if you do hg log you will see 2 commits<br>
<br>
webrev.ksh -r -2 -o my_inc_v1_v2<br>
webrev.ksh -o my_full_v2<br>
<br>
<br>
In   your .hgrc you might need:<br>
[extensions]<br>
mq =<br>
<br>
/Robbin<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><span class=""> <br>
Again another newbiew question here...<br>
<br>
For showing the incremental changes, is there a link that explains how to do that? I \
apologize for my newbie questions all the time :)<br> <br>
Right now, I do:<br>
<br>
   ksh ../webrev.ksh -m -N<br>
<br>
That generates a webrev.zip and send it to Chuck Rasbold. He then uploads it to a new \
webrev.<br> <br>
I tried commiting my change and adding a small change. Then if I just do ksh \
../webrev.ksh without any options, it seems to produce a similar page but now with \
only the changes I had (so the 06-07 comparison you were talking about) and a \
changeset that has it all. I imagine that is what you meant.<br> <br>
Which means that my workflow would become:<br>
<br>
1) Make changes<br>
2) Make a webrev without any options to show just the differences with the tip<br>
3) Amend my changes to my local commit so that I have it done with<br>
4) Go to 1<br>
<br>
Does that seem correct to you?<br>
<br>
Note that when I do this, I only see the full change of a file in the full change set \
(Side note here: now the page says change set and not patch, which is maybe why \
Serguei was having issues?).<br> <br>
Thanks!<br>
Jc<br>
<br>
<br>
<br></span><span class="">
On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn &lt;<a \
href="mailto:robbin.ehn@oracle.com" target="_blank">robbin.ehn@oracle.com</a> \
&lt;mailto:<a href="mailto:robbin.ehn@oracle.com" \
target="_blank">robbin.ehn@oracle.com</a>&gt;<wbr>&gt; wrote:<br> <br>
      Hi,<br>
<br>
      On 06/28/2017 12:04 AM, JC Beyler wrote:<br>
<br>
            Dear Thomas et al,<br>
<br>
            Here is the newest webrev:<br></span>
            <a href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/" \
rel="noreferrer" target="_blank">http://cr.openjdk.java.net/~ra<wbr>sbold/8171119/webrev.07/</a> \
&lt;<a href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/" rel="noreferrer" \
target="_blank">http://cr.openjdk.java.net/~r<wbr>asbold/8171119/webrev.07/</a>&gt;<div><div \
class="h5"><br> <br>
<br>
      You have some more bits to in there but generally this looks good and really \
                nice with more tests.<br>
      I&#39;ll do and deep dive and re-test this when I get back from my long \
vacation with whatever patch version you have then.<br> <br>
      Also I think it&#39;s time you provide incremental (v06-&gt;07 changes) as well \
as complete change-sets.<br> <br>
      Thanks, Robbin<br>
<br>
<br>
<br>
<br>
            Thomas, I &quot;think&quot; I have answered all your remarks. The summary \
is:<br> <br>
            - The statistic system is up and provides insight on what the heap \
                sampler is doing<br>
                    - I&#39;ve noticed that, though the sampling rate is at the right \
mean, we are missing some samples, I have not yet tracked out why (details below)<br> \
                <br>
            - I&#39;ve run a tiny benchmark that is the worse case: it is a very \
                tight loop and allocated a small array<br>
                    - In this case, I see no overhead when the system is off so that \
                is a good start :)<br>
                    - I see right now a high overhead in this case when sampling is \
on. This is not a really too surprising but I&#39;m going to see if this is \
                consistent with our<br>
            internal implementation. The benchmark is really allocation stressful so \
I&#39;m not too surprised but I want to do the due diligence.<br> <br>
                 - The statistic system up is up and I have a new test<br>
            <a href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch" \
rel="noreferrer" target="_blank">http://cr.openjdk.java.net/~ra<wbr>sbold/8171119/webr \
ev.07/test/<wbr>serviceability/jvmti/HeapMonit<wbr>or/MyPackage/HeapMonitorStatTe<wbr>st.java.patch</a><br>
  &lt;<a href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch" \
rel="noreferrer" target="_blank">http://cr.openjdk.java.net/~r<wbr>asbold/8171119/webr \
ev.07/test/<wbr>serviceability/jvmti/HeapMonit<wbr>or/MyPackage/HeapMonitorStatTe<wbr>st.java.patch</a>&gt;<br>
                
                     - I did a bit of a study about the random generator here, more \
details are below but basically it seems to work well<br> <br>
                 - I added a capability but since this is the first time doing this, \
                I was not sure I did it right<br>
                    - I did add a test though for it and the test seems to do what I \
expect (all methods are failing with the JVMTI_ERROR_MUST_POSSESS_CAPAB<wbr>ILITY \
                error).<br>
                          - <a \
href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch" \
rel="noreferrer" target="_blank">http://cr.openjdk.java.net/~ra<wbr>sbold/8171119/webr \
ev.07/test/<wbr>serviceability/jvmti/HeapMonit<wbr>or/MyPackage/HeapMonitorNoCapa<wbr>bilityTest.java.patch</a><br>
  &lt;<a href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch" \
rel="noreferrer" target="_blank">http://cr.openjdk.java.net/~r<wbr>asbold/8171119/webr \
ev.07/test/<wbr>serviceability/jvmti/HeapMonit<wbr>or/MyPackage/HeapMonitorNoCapa<wbr>bilityTest.java.patch</a>&gt;<br>
 <br>
                 - I still need to figure out what to do about the multi-agent vs \
single-agent issue<br> <br>
                 - As far as measurements, it seems I still need to look at:<br>
                    - Why we do the 20 random calls first, are they necessary?<br>
                    - Look at the mean of the sampling rate that the random generator \
                does and also what is actually sampled<br>
                    - What is the overhead in terms of memory/performance when \
on?<br> <br>
            I have inlined my answers, I think I got them all in the new webrev, let \
me know your thoughts.<br> <br>
            Thanks again!<br>
            Jc<br>
<br>
<br></div></div>
            On Fri, Jun 23, 2017 at 3:52 AM, Thomas Schatzl &lt;<a \
href="mailto:thomas.schatzl@oracle.com" target="_blank">thomas.schatzl@oracle.com</a> \
&lt;mailto:<a href="mailto:thomas.schatzl@oracle.com" \
target="_blank">thomas.schatzl@oracle.<wbr>com</a>&gt; &lt;mailto:<a \
href="mailto:thomas.schatzl@oracle.com" \
                target="_blank">thomas.schatzl@oracle.<wbr>com</a><div><div \
                class="h5"><br>
            &lt;mailto:<a href="mailto:thomas.schatzl@oracle.com" \
target="_blank">thomas.schatzl@oracle.<wbr>com</a>&gt;&gt;&gt; wrote:<br> <br>
                    Hi,<br>
<br>
                    On Wed, 2017-06-21 at 13:45 -0700, JC Beyler wrote:<br>
                    &gt; Hi all,<br>
                    &gt;<br>
                    &gt; First off: Thanks again to Robbin and Thomas for their \
reviews :)<br>  &gt;<br>
                    &gt; Next, I&#39;ve uploaded a new webrev:<br>
                    &gt; <a \
href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/" rel="noreferrer" \
target="_blank">http://cr.openjdk.java.net/~ra<wbr>sbold/8171119/webrev.06/</a> \
&lt;<a href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/" rel="noreferrer" \
target="_blank">http://cr.openjdk.java.net/~r<wbr>asbold/8171119/webrev.06/</a>&gt;<br>
  &lt;<a href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/" \
rel="noreferrer" target="_blank">http://cr.openjdk.java.net/~r<wbr>asbold/8171119/webrev.06/</a> \
&lt;<a href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/" rel="noreferrer" \
target="_blank">http://cr.openjdk.java.net/~r<wbr>asbold/8171119/webrev.06/</a>&gt;&gt;<br>
 <br>
                    &gt;<br>
                    &gt; Here is an update:<br>
                    &gt;<br>
                    &gt; - @Robbin, I forgot to say that yes I need to look at \
                implementing<br>
                    &gt; this for the other architectures and testing it before it is \
                all<br>
                    &gt; ready to go. Is it common to have it working on all \
                possible<br>
                    &gt; combinations or is there a subset that I should be doing \
first and we<br>  &gt; can do the others later?<br>
                    &gt; - I&#39;ve tested slowdebug, built and ran the JTreg tests I \
wrote with<br>  &gt; slowdebug and fixed a few more issues<br>
                    &gt; - I&#39;ve refactored a bit of the code following \
                Thomas&#39; comments<br>
                    &gt;      - I think I&#39;ve handled all the comments from Thomas \
(I put<br>  &gt; comments inline below for the specifics)<br>
<br>
                    Thanks for handling all those.<br>
<br>
                    &gt; - Following Thomas&#39; comments on statistics, I want to \
                add some<br>
                    &gt; quality assurance tests and find that the easiest way would \
                be to<br>
                    &gt; have a few counters of what is happening in the sampler and \
expose<br>  &gt; that to the user.<br>
                    &gt;      - I&#39;ll be adding that in the next version if no one \
sees any<br>  &gt; objections to that.<br>
                    &gt;      - This will allow me to add a sanity test in JTreg \
about number of<br>  &gt; samples and average of sampling rate<br>
                    &gt;<br>
                    &gt; @Thomas: I had a few questions that I inlined below but I \
will<br>  &gt; summarize the &quot;bigger ones&quot; here:<br>
                    &gt;      - You mentioned constants are not using the right \
                conventions, I<br>
                    &gt; looked around and didn&#39;t see any convention except \
normal naming then<br>  &gt; for static constants. Is that right?<br>
<br>
                    I looked through <a \
href="https://wiki.openjdk.java.net/display/HotSpot/StyleGui" rel="noreferrer" \
target="_blank">https://wiki.openjdk.java.net/<wbr>display/HotSpot/StyleGui</a> \
&lt;<a href="https://wiki.openjdk.java.net/display/HotSpot/StyleGui" rel="noreferrer" \
target="_blank">https://wiki.openjdk.java.net<wbr>/display/HotSpot/StyleGui</a>&gt;<br>
  &lt;<a href="https://wiki.openjdk.java.net/display/HotSpot/StyleGui" \
rel="noreferrer" target="_blank">https://wiki.openjdk.java.net<wbr>/display/HotSpot/StyleGui</a> \
&lt;<a href="https://wiki.openjdk.java.net/display/HotSpot/StyleGui" rel="noreferrer" \
target="_blank">https://wiki.openjdk.java.net<wbr>/display/HotSpot/StyleGui</a>&gt;&gt;<br>
                
                    de and the rule is to &quot;follow an existing pattern and must \
                have a<br>
                    distinct appearance from other names&quot;. Which does not help a \
                lot I<br>
                    guess :/ The GC team started using upper camel case, e.g.<br>
                    SomeOtherConstant, but very likely this is probably not \
                applied<br>
                    consistently throughout. So I am fine with not adding another \
                style<br>
                    (like kMaxStackDepth with the &quot;k&quot; in front with some \
unknown meaning)<br>  is fine.<br>
<br>
                    (Chances are you will find that style somewhere used anyway \
too,<br>  apologies if so :/)<br>
<br>
<br>
            Thanks for that link, now I know where to look. I used the upper camel \
case in my code as well then :) I should have gotten them all.<br> <br>
<br>
                     &gt; PS: I&#39;ve also inlined my answers to Thomas below:<br>
                     &gt;<br>
                     &gt; On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl \
&lt;thomas.schatzl@oracl<br></div></div><div><div class="h5">  &gt; <a \
href="http://e.com" rel="noreferrer" target="_blank">e.com</a> &lt;<a \
href="http://e.com" rel="noreferrer" target="_blank">http://e.com</a>&gt; &lt;<a \
href="http://e.com" rel="noreferrer" target="_blank">http://e.com</a>&gt;&gt; \
wrote:<br>  &gt; &gt; Hi all,<br>
                     &gt; &gt;<br>
                     &gt; &gt; On Mon, 2017-06-12 at 11:11 -0700, JC Beyler \
wrote:<br>  &gt; &gt; &gt; Dear all,<br>
                     &gt; &gt; &gt;<br>
                     &gt; &gt; &gt; I&#39;ve continued working on this and have done \
the following<br>  &gt; &gt; webrev:<br>
                     &gt; &gt; &gt; <a \
href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/" rel="noreferrer" \
target="_blank">http://cr.openjdk.java.net/~ra<wbr>sbold/8171119/webrev.05/</a> \
&lt;<a href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/" rel="noreferrer" \
target="_blank">http://cr.openjdk.java.net/~r<wbr>asbold/8171119/webrev.05/</a>&gt;<br>
  &lt;<a href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/" \
rel="noreferrer" target="_blank">http://cr.openjdk.java.net/~r<wbr>asbold/8171119/webrev.05/</a> \
&lt;<a href="http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/" rel="noreferrer" \
target="_blank">http://cr.openjdk.java.net/~r<wbr>asbold/8171119/webrev.05/</a>&gt;&gt;<br>
 <br>
                     &gt; &gt;<br>
                     &gt; &gt; [...]<br>
                     &gt; &gt; &gt; Things I still need to do:<br>
                     &gt; &gt; &gt;      - Have to fix that TLAB case for the \
                FastTLABRefill<br>
                     &gt; &gt; &gt;      - Have to start looking at the data to see \
                that it is<br>
                     &gt; &gt; consistent and does gather the right samples, right \
                frequency, etc.<br>
                     &gt; &gt; &gt;      - Have to check the GC elements and what \
                that produces<br>
                     &gt; &gt; &gt;      - Run a slowdebug run and ensure I fixed all \
those issues you<br>  &gt; &gt; saw &gt; Robbin<br>
                     &gt; &gt; &gt;<br>
                     &gt; &gt; &gt; Thanks for looking at the webrev and have a great \
week!<br>  &gt; &gt;<br>
                     &gt; &gt;     scratching a bit on the surface of this change, so \
apologies for<br>  &gt; &gt; rather shallow comments:<br>
                     &gt; &gt;<br>
                     &gt; &gt; - macroAssembler_x86.cpp:5604: while this is compiler \
                code, and I<br>
                     &gt; &gt; am not sure this is final, please avoid littering the \
                code with<br>
                     &gt; &gt; TODO remarks :) They tend to be candidates for later \
wtf moments<br>  &gt; &gt; only.<br>
                     &gt; &gt;<br>
                     &gt; &gt; Just file a CR for that.<br>
                     &gt; &gt;<br>
                     &gt; Newcomer question: what is a CR and not sure I have the \
rights to do<br>  &gt; that yet ? :)<br>
<br>
                    Apologies. CR is a change request, this suggests to file a bug in \
                the<br>
                    bug tracker. And you are right, you can&#39;t just create a new \
account in<br>  the OpenJDK JIRA yourselves. :(<br>
<br>
<br>
            Ok good to know, I&#39;ll continue with my own todo list but I&#39;ll \
work hard on not letting it slip in the webrevs anymore :)<br> <br>
<br>
                    I was mostly referring to the &quot;... but it is a TODO&quot; \
                part of that<br>
                    comment in macroassembler_x86.cpp. Comments about the why of the \
code<br>  are appreciated.<br>
<br>
                    [Note that I now understand that this is to some degree still \
                work in<br>
                    progress. As long as the final changeset does no contain \
                TODO&#39;s I am<br>
                    fine (and it&#39;s not a hard objection, rather their use in \
&quot;final&quot; code<br>  is typically limited in my experience)]<br>
<br>
                    5603     // Currently, if this happens, just set back the actual \
end to<br>  where it was.<br>
                    5604     // We miss a chance to sample here.<br>
<br>
                    Would be okay, if explaining &quot;this&quot; and the \
&quot;why&quot; of missing a chance<br>  to sample here would be best.<br>
<br>
                    Like maybe:<br>
<br>
                    // If we needed to refill TLABs, just set the actual end point \
                to<br>
                    // the end of the TLAB again. We do not sample here although we \
could.<br> <br>
            Done with your comment, it works well in my mind.<br>
<br>
                    I am not sure whether &quot;miss a chance to sample&quot; meant \
                &quot;we could, but<br>
                    consciously don&#39;t because it&#39;s not that useful&quot; or \
                &quot;it would be<br>
                    necessary but don&#39;t because it&#39;s too complicated to \
do.&quot;.<br> <br>
                    Looking at the original comment once more, I am also not sure if \
                that<br>
                    comment shouldn&#39;t referring to the &quot;end&quot; variable \
                (not actual_end)<br>
                    because that&#39;s the variable that is responsible for taking \
                the sampling<br>
                    path? (Going from the member description of \
ThreadLocalAllocBuffer).<br> <br>
<br>
            I&#39;ve moved this code and it no longer shows up here but the rationale \
and answer was:<br> <br>
            So.. Yes, end is the variable provoking the sampling. Actual end is the \
actual end of the TLAB.<br> <br>
            What was happening here is that the code is resetting _end to point \
towards the end of the new TLAB. Because, we now have the end for sampling and \
_actual_end for<br>  the actual end, we need to update the actual_end as well.<br>
<br>
            Normally, were we to do the real work here, we would calculate the (end - \
start) offset, then do:<br> <br>
            - Set the new end to : start + (old_end - old_start)<br>
            - Set the actual end like we do here now where it because it is the \
actual end.<br> <br>
            Why is this not done here now anymore?<br>
                  - I was still debating which path to take:<br>
                       - Do it in the fast refill code, it has its perks:<br>
                             - In a world where fast refills are happening all the \
                time or a lot, we can augment there the code to do the sampling<br>
                       - Remember what we had as an end before leaving the slowpath \
                and check on return<br>
                             - This is what I&#39;m doing now, it removes the need to \
go fix up all fast refill paths but if you remain in fast refill paths, you won&#39;t \
                get sampling. I<br>
            have to think of the consequences of that, maybe a future change later \
                on?<br>
                                 - I have the statistics now so I&#39;m going to \
                study that<br>
                                      -&gt; By the way, though my statistics are \
showing I&#39;m missing some samples, if I turn off FastTlabRefill, it is the same \
loss so for now, it seems<br>  this does not occur in my simple test.<br>
<br>
<br>
<br>
                    But maybe I am only confused and it&#39;s best to just leave the \
comment<br>  away. :)<br>
<br>
                    Thinking about it some more, doesn&#39;t this not-sampling in \
                this case<br>
                    mean that sampling does not work in any collector that does \
                inline TLAB<br>
                    allocation at the moment? (Or is inline TLAB alloc \
automatically<br>  disabled with sampling somehow?)<br>
<br>
                    That would indeed be a bigger TODO then :)<br>
<br>
<br>
            Agreed, this remark made me think that perhaps as a first step the new \
                way of doing it is better but I did have to:<br>
                 - Remove the const of the ThreadLocalBuffer remaining and hard_end \
                methods<br>
                 - Move hard_end out of the header file to have a bit more logic \
there<br> <br>
            Please let me know what you think of that and if you prefer it this way \
or changing the fast refills. (I prefer this way now because it is more \
incremental).<br> <br>
<br>
                    &gt; &gt; - calling HeapMonitoring::do_weak_oops() (which should \
                probably be<br>
                    &gt; &gt; called weak_oops_do() like other similar methods) only \
                if string<br>
                    &gt; &gt; deduplication is enabled (in g1CollectedHeap.cpp:4511) \
seems wrong.<br>  &gt;<br>
                    &gt; The call should be at least around 6 lines up outside the \
if.<br>  &gt;<br>
                    &gt; Preferentially in a method like process_weak_jni_handles(), \
                including<br>
                    &gt; additional logging. (No new (G1) gc phase without minimal \
logging<br>  &gt; :)).<br>
                    &gt; Done but really not sure because:<br>
                    &gt;<br>
                    &gt; I put for logging:<br>
                    &gt;     log_develop_trace(gc, \
freelist)(&quot;G1ConcRegionFreeing [other] : heap<br>  &gt; monitoring&quot;);<br>
<br>
                    I would think that &quot;gc, ref&quot; would be more appropriate \
log tags for<br>  this similar to jni handles.<br>
                    (I am als not sure what weak reference handling has to do \
with<br>  G1ConcRegionFreeing, so I am a bit puzzled)<br>
<br>
<br>
            I was not sure what to put for the tags or really as the message. I \
                cleaned it up a bit now to:<br>
                    log_develop_trace(gc, ref)(&quot;HeapSampling [other] : heap \
monitoring processing&quot;);<br> <br>
<br>
<br>
                    &gt; Since weak_jni_handles didn&#39;t have logging for me to be \
                inspired<br>
                    &gt; from, I did that but unconvinced this is what should be \
done.<br> <br>
                    The JNI handle processing does have logging, but only in<br>
                    ReferenceProcessor::process_d<wbr>iscovered_references(). In<br>
                    process_weak_jni_handles() only overall time is measured (in a \
                G1<br>
                    specific way, since only G1 supports disabling reference \
procesing) :/<br> <br>
                    The code in ReferenceProcessor prints both time taken<br>
                    referenceProcessor.cpp:254, as well as the count, but strangely \
only in<br>  debug VMs.<br>
<br>
                    I have no idea why this logging is that unimportant to only print \
                that<br>
                    in a debug VM. However there are reviews out for changing this \
                area a<br>
                    bit, so it might be useful to wait for that (JDK-8173335).<br>
<br>
<br>
            I cleaned it up a bit anyway and now it returns the count of objects that \
are in the system.<br> <br>
<br>
                    &gt; &gt; - the change doubles the size of<br>
                    &gt; &gt; CollectedHeap::allocate_from_t<wbr>lab_slow() above the \
                &quot;small and nice&quot;<br>
                    &gt; &gt; threshold. Maybe it could be refactored a bit.<br>
                    &gt; Done I think, it looks better to me :).<br>
<br>
                    In ThreadLocalAllocBuffer::handle<wbr>_sample() I think the<br>
                    set_back_actual_end()/pick_ne<wbr>xt_sample() calls could be \
hoisted out of<br>  the &quot;if&quot; :)<br>
<br>
<br>
            Done!<br>
<br>
<br>
                    &gt; &gt; - referenceProcessor.cpp:261: the change should add \
                logging about<br>
                    &gt; &gt; the number of references encountered, maybe after the \
                corresponding<br>
                    &gt; &gt; &quot;JNI weak reference count&quot; log message.<br>
                    &gt; Just to double check, are you saying that you&#39;d like to \
                have the heap<br>
                    &gt; sampler to keep in store how many sampled objects were \
encountered in<br>  &gt; the HeapMonitoring::weak_oops_do?<br>
                    &gt;      - Would a return of the method with the number of \
handled<br>  &gt; references and logging that work?<br>
<br>
                    Yes, it&#39;s fine if HeapMonitoring::weak_oops_do() only \
returned the<br>  number of processed weak oops.<br>
<br>
<br>
            Done also (but I admit I have not tested the output yet) :)<br>
<br>
<br>
                    &gt;      - Additionally, would you prefer it in a separate block \
with its<br>  &gt; GCTraceTime?<br>
<br>
                    Yes. Both kinds of information is interesting: while the time \
                taken is<br>
                    typically more important, the next question would be why, and \
                the<br>
                    number of references typically goes a long way there.<br>
<br>
                    See above though, it is probably best to wait a bit.<br>
<br>
<br>
            Agreed that I &quot;could&quot; wait but, if it&#39;s ok, I&#39;ll just \
                refactor/remove this when we get closer to something final. Either, \
                JDK-8173335<br>
            has gone in and I will notice it now or it will soon and I can change it \
then.<br> <br>
<br>
                    &gt; &gt; - threadLocalAllocBuffer.cpp:331<wbr>: one more \
                &quot;TODO&quot;<br>
                    &gt; Removed it and added it to my personal todos to look at.<br>
                    &gt;         &gt; &gt;<br>
                    &gt; &gt; - threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer \
                class<br>
                    &gt; &gt; documentation should be updated about the sampling \
                additions. I<br>
                    &gt; &gt; would have no clue what the difference between \
                &quot;actual_end&quot; and<br>
                    &gt; &gt; &quot;end&quot; would be from the given \
                information.<br>
                    &gt; If you are talking about the comments in this file, I made \
                them more<br>
                    &gt; clear I hope in the new webrev. If it was somewhere else, \
let me know<br>  &gt; where to change.<br>
<br>
                    Thanks, that&#39;s much better. Maybe a note in the comment of \
                the class<br>
                    that ThreadLocalBuffer provides some sampling facility by \
                modifying the<br>
                    end() of the TLAB to cause &quot;frequent&quot; calls into the \
runtime call where<br>  actual sampling takes place.<br>
<br>
<br>
            Done, I think it&#39;s better now. Added something about the \
slow_path_end as well.<br> <br>
<br>
                    &gt; &gt; - in heapMonitoring.hpp: there are some random comments \
                about some<br>
                    &gt; &gt; code that has been grabbed from \
                &quot;util/math/fastmath.[h|cc]&quot;. I<br>
                    &gt; &gt; can&#39;t tell whether this is code that can be used \
                but I assume that<br>
                    &gt; &gt; Noam Shazeer is okay with that (i.e. that&#39;s all \
                Google code).<br>
                    &gt; Jeremy and I double checked and we can release that as I \
                thought. I<br>
                    &gt; removed the comment from that piece of code entirely.<br>
<br>
                    Thanks.<br>
<br>
                    &gt; &gt; - heapMonitoring.hpp/cpp static constant naming does \
                not correspond<br>
                    &gt; &gt; to Hotspot&#39;s. Additionally, in Hotspot static \
methods are cased<br>  &gt; &gt; like other methods.<br>
                    &gt; I think I fixed the methods to be cased the same way as all \
                other<br>
                    &gt; methods. For static constants, I was not sure. I fixed a few \
                other<br>
                    &gt; variables but I could not seem to really see a consistent \
                trend for<br>
                    &gt; constants. I made them as variables but I&#39;m not sure \
now.<br> <br>
                    Sorry again, style is a kind of mess. The goal of my suggestions \
here<br>  is only to prevent yet another style creeping in.<br>
<br>
                    &gt; &gt; - in heapMonitoring.cpp there are a few cryptic \
                comments at the top<br>
                    &gt; &gt; that seem to refer to internal stuff that should \
probably be<br>  &gt; &gt; removed.<br>
                    &gt; Sorry about that! My personal todos not cleared out.<br>
<br>
                    I am happy about comments, but I simply did not understand any of \
that<br>  and I do not know about other readers as well.<br>
<br>
                    If you think you will remember removing/updating them until the \
                review<br>
                    proper (I misunderstood the review situation a little it \
seems).<br> <br>
                    &gt; &gt; I did not think through the impact of the TLAB changes \
                on collector<br>
                    &gt; &gt; behavior yet (if there are). Also I did not check for \
                problems with<br>
                    &gt; &gt; concurrent mark and SATB/G1 (if there are).<br>
                    &gt; I would love to know your thoughts on this, I think this is \
fine. I<br> <br>
                    I think so too now. No objects are made live out of thin air \
:)<br> <br>
                    &gt; see issues with multiple threads right now hitting the stack \
                storage<br>
                    &gt; instance. Previous webrevs had a mutex lock here but we took \
it out<br>  &gt; for simplificity (and only for now).<br>
<br>
                    :) When looking at this after some thinking I now assume for \
                this<br>
                    review that this code is not MT safe at all. There seems to be \
                more<br>
                    synchronization missing than just the one for the \
StackTraceStorage. So<br>  no comments about this here.<br>
<br>
<br>
            I doubled checked a bit (quickly I admit) but it seems that \
synchronization in StackTraceStorage is really all you need (all methods lead to a \
StackTraceStorage one<br>  and can be multithreaded outside of that).<br>
            There is a question about the initialization where the method \
HeapMonitoring::initialize_pro<wbr>filing is not thread safe.<br>  It would work \
(famous last words) and not crash if there was a race but we could add a \
synchronization point there as well (and therefore on the stop as well).<br> <br>
            But anyway I will really check and do this once we add back \
synchronization.<br> <br>
<br>
                    Also, this would require some kind of specification of what is \
allowed<br>  to be called when and where.<br>
<br>
<br>
            Would we specify this with the methods in the jvmti.xml file? We could \
start by specifying in each that they are not thread safe but I saw no mention of \
that for<br>  other methods.<br>
<br>
<br>
                    One potentially relevant observation about locking here: \
                depending on<br>
                    sampling frequency, StackTraceStore::add_trace() may be \
                rather<br>
                    frequently called. I assume that you are going to do measurements \
:)<br> <br>
<br>
            Though we don&#39;t have the TLAB implementation in our code, the \
compiler generated sampler uses 2% of overhead with a 512k sampling rate. I can do \
                real measurements<br>
            when the code settles and we can see how costly this is as a TLAB \
implementation.<br>  However, my theory is that if the rate is 512k, the \
memory/performance overhead should be minimal since it is what we saw with our \
code/workloads (though not called<br>  the same way, we call it essentially at the \
                same rate).<br>
            If you have a benchmark you&#39;d like me to test, let me know!<br>
<br>
            Right now, with my really small test, this does use a bit of overhead \
even for a 512k sample size. I don&#39;t know yet why, I&#39;m going to see what is \
going on.<br> <br>
            Finally, I think it is not reasonable to suppose the overhead to be \
negligible if the sampling rate used is too low. The user should know that the lower \
the rate,<br>  the higher the overhead (documentation TODO?).<br>
<br>
<br>
                    I am not sure what the expected usage of the API is, but<br>
                    StackTraceStore::add_trace() seems to be able to grow without \
                bounds.<br>
                    Only a GC truncates them to the live ones. That in itself seems \
                to be<br>
                    problematic (GCs can be *wide* apart), and of course some of the \
                API<br>
                    methods add to that because they duplicate that unbounded array. \
Do you<br>  have any concerns/measurements about this?<br>
<br>
<br>
            So, the theory is that yes add_trace can be able to grow without bounds \
but it grows at a sample per 512k of allocated space. The stacks it gathers are \
                currently<br>
            maxed at 64 (I&#39;d like to expand that to an option to the user though \
at some point). So I have no concerns because:<br> <br>
            - If really this is taking a lot of space, that means the job is keeping \
                a lot of objects in memory as well, therefore the entire heap is \
                getting huge<br>
            - If this is the case, you will be triggering a GC at some point \
anyway.<br> <br>
            (I&#39;m putting under the rug the issue of &quot;What if we set the rate \
to 1 for example&quot; because as you lower the sampling rate, we cannot guarantee \
                low overhead; the<br>
            idea behind this feature is to have a means of having meaningful \
allocated samples at a low overhead)<br> <br>
            I have no measurements really right now but since I now have some \
statistics I can poll, I will look a bit more at this question.<br> <br>
            I have the same last sentence than above: the user should expect this to \
                happen if the sampling rate is too small. That probably can be \
                reflected in the<br>
            StartHeapSampling as a note : careful this might impact your \
performance.<br> <br>
<br>
                    Also, these stack traces might hold on to huge arrays. Any<br>
                    consideration of that? Particularly it might be the cause for \
OOMEs in<br>  tight memory situations.<br>
<br>
<br>
            There is a stack size maximum that is set to 64 so it should not hold \
huge arrays. I don&#39;t think this is an issue but I can double check with a test or \
two.<br> <br>
<br>
                    - please consider adding a safepoint check in<br>
                    HeapMonitoring::weak_oops_do to prevent accidental misuse.<br>
<br>
                    - in struct StackTraceStorage, the public fields may also \
                need<br>
                    underscores. At least some files in the runtime directory have \
                structs<br>
                    with underscored public members (and some don&#39;t). The runtime \
team<br>  should probably comment on that.<br>
<br>
<br>
            Agreed I did not know. I looked around and a lot of structs did not have \
them it seemed so I left it as is. I will happily change it if someone prefers (I was \
not<br>  sure if you really preferred or not, your sentence seemed to be more a note \
of &quot;this might need to change but I don&#39;t know if the runtime team enforces \
that&quot;, let<br>  me know if I read that wrongly).<br>
<br>
<br>
                    - In StackTraceStorage::weak_oops_d<wbr>o(), when examining \
                the<br>
                    StackTraceData, maybe it is useful to consider having a \
                non-NULL<br>
                    reference outside of the heap&#39;s reserved space an error. \
                There should<br>
                    be no oop outside of the heap&#39;s reserved space ever.<br>
<br>
                    Unless you allow storing random values in StackTraceData::obj, \
which I<br>  would not encourage.<br>
<br>
<br>
            I suppose you are talking about this part:<br>
            if ((value != NULL &amp;&amp; \
                Universe::heap()-&gt;is_in_reserv<wbr>ed(value)) &amp;&amp;<br>
                              (is_alive == NULL || is_alive-&gt;do_object_b(value))) \
{<br> <br>
            What you are saying is that I could have something like:<br>
            if (value != my_non_null_reference &amp;&amp;<br>
                              (is_alive == NULL || is_alive-&gt;do_object_b(value))) \
{<br> <br>
            Is that what you meant? Is there really a reason to do so? When I look at \
the code, is_in_reserved seems like a O(1) method call. I&#39;m not even sure we can \
                have a<br>
            NULL value to be honest. I might have to study that to see if this was \
not a paranoid test to begin with.<br> <br>
            The is_alive code has now morphed due to the comment below.<br>
<br>
<br>
<br>
                    - HeapMonitoring::weak_oops_do() does not seem to use the<br>
                    passed AbstractRefProcTaskExecutor.<br>
<br>
<br>
            It did use it:<br>
                 size_t HeapMonitoring::weak_oops_do(<br>
                     AbstractRefProcTaskExecutor *task_executor,<br>
                     BoolObjectClosure* is_alive,<br>
                     OopClosure *f,<br>
                     VoidClosure *complete_gc) {<br>
                  assert(SafepointSynchronize::i<wbr>s_at_safepoint(), &quot;must be \
at safepoint&quot;);<br> <br>
                  if (task_executor != NULL) {<br>
                     task_executor-&gt;set_single_thre<wbr>aded_mode();<br>
                  }<br>
                  return StackTraceStorage::storage()-&gt;<wbr>weak_oops_do(is_alive, \
f, complete_gc);<br>  }<br>
<br>
            But due to the comment below, I refactored this, so this is no longer \
here. Now I have an always true closure that is passed.<br> <br>
<br>
                    - I do not understand allowing to call this method with a \
                NULL<br>
                    complete_gc closure. This would mean that objects referenced from \
                the<br>
                    object that is referenced by the StackTraceData are not pulled, \
meaning<br>  they would get stale.<br>
<br>
                    - same with is_alive parameter value of NULL<br>
<br>
<br>
            So these questions made me look a bit closer at this code. This code I \
think was written this way to have a very small impact on the file but you are right, \
there<br>  is no reason for this here. I&#39;ve simplified the code by making in \
referenceProcessor.cpp a process_HeapSampling method that handles everything \
there.<br> <br>
            The code allowed NULLs because it depended on where you were coming from \
and how the code was being called.<br> <br>
            - I added a static always_true variable and pass that now to be more \
                consistent with the rest of the code.<br>
            - I moved the complete_gc into process_phaseHeapSampling now (new method) \
                and handle the task_executor and the complete_gc there<br>
                    - Newbie question: in our code we did a set_single_threaded_mode \
but I see that process_phaseJNI does it right before its call, do I need to do it for \
the<br>  process_phaseHeapSample?<br>
            That API is much cleaner (in my mind) and is consistent with what is done \
around it (again in my mind).<br> <br>
<br>
                    - heapMonitoring.cpp:590: I do not completely understand the \
                purpose of<br>
                    this code: in the end this results in a fixed value directly \
                dependent<br>
                    on the Thread address anyway? In the end this results in a fixed \
value<br>  directly dependent on the Thread address anyway?<br>
                    IOW, what is special about exactly 20 rounds?<br>
<br>
<br>
            So we really want a fast random number generator that has a specific mean \
(512k is the default we use). The code uses the thread address as the start number of \
the<br>  sequence (why not, it is random enough is rationale). Then instead of just \
starting there, we prime the sequence and really only start at the 21st number, it \
                is<br>
            arbitrary and I have not done a study to see if we could do more or less \
of that.<br> <br>
            As I have the statistics of the system up and running, I&#39;ll run some \
experiments to see if this is needed, is 20 good, or not.<br> <br>
<br>
                    - also I would consider stripping a few bits of the threads&#39; \
                address as<br>
                    initialization value for your rng. The last three bits (and \
                probably<br>
                    more, check whether the Thread object is allocated on special<br>
                    boundaries) are always zero for them.<br>
                    Not sure if the given &quot;random&quot; value is random enough \
                before/after,<br>
                    this method, so just skip that comment if you think this is \
not<br>  required.<br>
<br>
<br>
            I don&#39;t know is the honest answer. I think what is important is that \
we tend towards a mean and it is random &quot;enough&quot; to not fall in pitfalls of \
only sampling a<br>  subset of objects due to their allocation order. I added that as \
test to do to see if it changes the mean in any way for the 512k default value and/or \
if the first<br>  1000 elements look better.<br>
<br>
<br>
                    Some more random nits I did not find a place to put anywhere:<br>
<br>
                    - ThreadLocalAllocBuffer::_extra<wbr>_space does not seem to be \
used<br>  anywhere?<br>
<br>
<br>
            Good catch :).<br>
<br>
<br>
                    - Maybe indent the declaration of \
ThreadLocalAllocBuffer::_bytes<wbr>_until_sample to align below the other members of \
that group.<br> <br>
<br>
            Done moved it up a bit to have non static members together and static \
separate.<br> <br>
                    Thanks,<br>
                        Thomas<br>
<br>
<br>
            Thanks for your review!<br>
            Jc<br>
<br>
<br>
</div></div></blockquote>
</blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic