'=?utf-8?q?=5B389-devel=5D?= Re: System tap performance results,'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       fedora-directory-devel
Subject:    =?utf-8?q?=5B389-devel=5D?= Re: System tap performance results,
From:       William Brown <william () blackhats ! net ! au>
Date:       2019-12-11 23:19:57
Message-ID: 4FF5B016-94D1-40CC-8172-924EFE2B7D01 () blackhats ! net ! au
[Download RAW message or body]



> On 11 Dec 2019, at 20:29, thierry bordaz <tbordaz@redhat.com> wrote:
> 
> 
> 
> On 12/11/19 1:21 AM, William Brown wrote:
> > 
> > > On 10 Dec 2019, at 19:15, thierry bordaz <tbordaz@redhat.com> wrote:
> > > 
> > > Hi William,
> > > 
> > > Thanks for these very interesting results.
> > > It would help if you can provide the stap scripts to make sure what you are \
> > > accounting the latency.
> > Yes, I plan to put them into a PR soon once I have done a bit more data \
> > collection and polishing of the script setup. 
> > > Also just to be sure is latency a synonym for response time ?
> > Yep, here I mean the execution time of a single operation.
> > 
> > > Regarding the comparison (tail) 1client/16client. It looks to me that the tail \
> > > are equivalent: The more we have clients the more we have long latency. So in a \
> > > first approach I would eliminate contention effect.
> > I disagree, the tail is much more pronounced in the 16 client version, and there \
> > is a significant shift of response times from the 32768 bucket to 65536 \
> > indicating that many operations are now being "held up".
> 
> You are probably right.
> There is a response time shift but I would expect a major contention effect to \
> shift more and flatter the graph to upper response time. Whatever is the answer, I \
> think an important point is to agree on the method and that reports shows \
> somethings suspicious.

In a highly concurrent system like DS we shouldn't be seeing tail latency or spikes, \
we should be seeing all responses within a really small window especially when the \
searches are all the "same query" for a single uid. So that's why I'm looking at the \
spikes first. 

> 
> > 
> > > Regarding the ratio shorter/longer latency (assuming the search are equivalent) \
> > > this is interesting to know why we have such effect. One of the possible cause \
> > > I was thinking of is the impact of DB thread (checkpointing or trickling). But \
> > > if it exists similar long tail in ldbm_back, the absolute value is much lower \
> > > than the opshare_search: in short ldbm_back accounted at most for 4ms while \
> > > opshared for 67ms. So there is something else (aci, network, frontend..). 
> > > Regarding USDT I think it is very good idea. However, just to share some recent \
> > > stap experience, I found it intrusive. In short, I had not the same throughput \
> > > with and without. In my case it was not a problem, as I wanted to investigate a \
> > > reproducible contention. But if we want support/user/customers to use it, the \
> > > performance impact in production will be valid point.
> > I haven't noticed any "intrusiveness" from USDT so far? How were you running the \
> > stap scripts?
> I did not add probe in DS. I was just using stap to gather per operation specific \
> functions duration (like plugin, backend or core). 
> I do not recall the "intrusiveness" level as I was more looking for contention data \
> than performance value.

Hmmm okay. I've found stap extremely fast and it doesn't get in the way, so unless I \
saw your stap scripts I'm not sure ..... 

> 
> > 
> > > best regards
> > > thierry
> > > 
> > > 
> > > 
> > > On 12/9/19 6:16 AM, William Brown wrote:
> > > > Hi all,
> > > > 
> > > > Following last weeks flamegraph runs, I wanted to find more details on \
> > > > exactly what was happening. While flamegraphs highlighted that a changed \
> > > > existed between single and multithreaded servers, it did not help to isolate \
> > > > where the change was occuring.
> > > > 
> > > > To understand this I have started to work on a set of systemtap scripts that \
> > > > we can use to profile 389ds. These will be included in a future PR. 
> > > > Here are the hisograms from an initial test of profiling "do_search"
> > > > 
> > > > 1 thread
> > > > 
> > > > stap test_do_search.stp
> > > > ^CDistribution of latencies (in nanoseconds) for 441148 samples
> > > > max/avg/min: 35911542/85694/38927
> > > > value |-------------------------------------------------- count
> > > > 8192 |                                                        0
> > > > 16384 |                                                        0
> > > > 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 167285
> > > > 65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  239280
> > > > 131072 |@@@@@                                               25788
> > > > 262144 |@                                                    8530
> > > > 524288 |                                                      252
> > > > 1048576 |                                                        6
> > > > 2097152 |                                                        1
> > > > 4194304 |                                                        3
> > > > 8388608 |                                                        0
> > > > 16777216 |                                                        2
> > > > 33554432 |                                                        1
> > > > 67108864 |                                                        0
> > > > 134217728 |                                                        0
> > > > 
> > > > 16 thread
> > > > 
> > > > stap test_do_search.stp
> > > > ^CDistribution of latencies (in nanoseconds) for 407806 samples
> > > > max/avg/min: 100315928/112407/39368
> > > > value |-------------------------------------------------- count
> > > > 8192 |                                                        0
> > > > 16384 |                                                        0
> > > > 32768 |@@@@@@@@@@@@@@@@@@@@                               100281
> > > > 65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  249656
> > > > 131072 |@@@@@@@                                             37837
> > > > 262144 |@@@                                                 18322
> > > > 524288 |                                                     1171
> > > > 1048576 |                                                      203
> > > > 2097152 |                                                       90
> > > > 4194304 |                                                       74
> > > > 8388608 |                                                       83
> > > > 16777216 |                                                       58
> > > > 33554432 |                                                       21
> > > > 67108864 |                                                       10
> > > > 134217728 |                                                        0
> > > > 268435456 |                                                        0
> > > > 
> > > > 
> > > > It's interesting to note the tail latency here: On the 16 thread version we \
> > > > see 67000 less in the 32768 buckets, shifting mostly through the 131072 and \
> > > > 262144 buckets, as well as showing a much greater number of calls in the \
> > > > tail. In thread 1 no operation made it to 67108864, but 10 did in 16thread, \
> > > > along with ~200 more that are higher than 1048567, and ~1500 more that are \
> > > > greater than 524288. This kind of tailing means we have "spikes" of latency \
> > > > throughout the execution, which then have a minor flow on cause other \
> > > > operations to be increased in latency. 
> > > > These are all in nanoseconds, so this means most operations are in do_search \
> > > > for 0.000131072 seconds or less, while there are spikes of operations taking \
> > > > between nearly 0.001048576 and 0.067108864 seconds to complete (which is an \
> > > > 8x to 512x increase in execution time. 
> > > > From these point I took two measurements of back_ldbm and the access log, \
> > > > knowing these were easy targets for potential areas of latency. Both of these \
> > > > were performed with the 16thread server. I didn't need to do this on the 1 \
> > > > thread because I'm looking for tail latency, not comparisons now. Because we \
> > > > know that there is an increase of ~1500 events of high latency, we are \
> > > > looking for similar numbers to appear in other tail latencies. 
> > > > 
> > > > ^CDistribution of back_ldbm latencies (in nanoseconds) for 364223 samples
> > > > max/avg/min: 4757744/31192/13622
> > > > value |-------------------------------------------------- count
> > > > 2048 |                                                        0
> > > > 4096 |                                                        0
> > > > 8192 |@@                                                  14318
> > > > 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  261265
> > > > 32768 |@@@@@@@@@@@@                                        63533
> > > > 65536 |@@@                                                 20624
> > > > 131072 |                                                     4062
> > > > 262144 |                                                      304
> > > > 524288 |                                                       68
> > > > 1048576 |                                                       33
> > > > 2097152 |                                                       15
> > > > 4194304 |                                                        1
> > > > 8388608 |                                                        0
> > > > 16777216 |                                                        0
> > > > 
> > > > This shows some tail latency in back_ldbm however it's infrequent, only \
> > > > showing a handful of spikes, but they do exist and tend to taper off quickly \
> > > > as they approach 2097152. Certainly these should be looked at as they will \
> > > > likely add up to affecting do_search. 
> > > > 
> > > > stap -v test_access_log.stp
> > > > ^CDistribution of slapi_log_access latencies (in nanoseconds) for 1284350 \
> > > >                 samples
> > > > max/avg/min: 23843992/5138/1036
> > > > value |-------------------------------------------------- count
> > > > 256 |                                                        0
> > > > 512 |                                                        0
> > > > 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     379710
> > > > 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        361243
> > > > 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  412480
> > > > 8192 |@@@@@@@@@@                                          83815
> > > > 16384 |@@@@                                                40087
> > > > 32768 |                                                     5076
> > > > 65536 |                                                     1253
> > > > 131072 |                                                      406
> > > > 262144 |                                                      130
> > > > 524288 |                                                       72
> > > > 1048576 |                                                       33
> > > > 2097152 |                                                       16
> > > > 4194304 |                                                       19
> > > > 8388608 |                                                        1
> > > > 16777216 |                                                        9
> > > > 33554432 |                                                        0
> > > > 67108864 |                                                        0
> > > > 
> > > > 
> > > > From this, I can see that while some tail latency exists in some conditions \
> > > > of back_ldbm, there is a much longer and present tail in the access log, with \
> > > > nearly 2000 operations in or exceeding the 65536 mark - remember, in \
> > > > do_search most of the operations as a whole take 65536, so we have latency \
> > > > spikes in slapi_log_access that are as large or larger than whole search \
> > > > operations as a total. 
> > > > Now we can see "most! operations are fimly below 16384. This is what we want \
> > > > to see, but it's interesting that a slapi_log_access can "take up to" a \
> > > > quarter of a whole operations processing on the avg case. 
> > > > What's truly telling here is the tail down to 16777216, with ~250 operations \
> > > > exceeding 262144 ns. This shows that within slapi_log_access, we have have \
> > > > large spiked delays in the behaviour, which will be affecting both the tail \
> > > > latency of do_search as a whole, but also causing other slapi_log_access \
> > > > operations to "hold up". 
> > > > My next steps from here will be:
> > > > 
> > > > * To add USDT probes to the logs and back_ldbm to get better, fine detail \
> > > >                 about what is causing the excessive latency.
> > > > * these probes are also needed to resolve what appears to be a symbol \
> > > >                 resolution issue with systemtap when optimisations are \
> > > >                 enabled.
> > > > * To add probes in other parts of the code base to get better visualisation \
> > > >                 about where and how long timings are taking through an \
> > > >                 operation.
> > > > * To run a lock contention profile (I was unable to do this today due to bugs \
> > > >                 in systemtap)
> > > > * To document the setup proceedures
> > > > * Commit all these into a PR
> > > > * Document some actionable improvements we can make to improve the server \
> > > > performance. 
> > > > 
> > > > Hope that is interesting to everyone,
> > > > 
> > > > 
> > > > --
> > > > Sincerely,
> > > > 
> > > > William
> > > > _______________________________________________
> > > > 389-devel mailing list -- 389-devel@lists.fedoraproject.org
> > > > To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org
> > > > Fedora Code of Conduct: \
> > > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List \
> > > > Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List \
> > > > Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
> > > > 
> > --
> > Sincerely,
> > 
> > William
> > 
> _______________________________________________
> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
> To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org
> Fedora Code of Conduct: \
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: \
> https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: \
> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org

--
Sincerely,

William
_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic