[prev in list] [next in list] [prev in thread] [next in thread]
List: fedora-directory-devel
Subject: =?utf-8?q?=5B389-devel=5D?= Re: System tap performance results,
From: William Brown <william () blackhats ! net ! au>
Date: 2019-12-11 23:19:57
Message-ID: 4FF5B016-94D1-40CC-8172-924EFE2B7D01 () blackhats ! net ! au
[Download RAW message or body]
> On 11 Dec 2019, at 20:29, thierry bordaz <tbordaz@redhat.com> wrote:
>
>
>
> On 12/11/19 1:21 AM, William Brown wrote:
> >
> > > On 10 Dec 2019, at 19:15, thierry bordaz <tbordaz@redhat.com> wrote:
> > >
> > > Hi William,
> > >
> > > Thanks for these very interesting results.
> > > It would help if you can provide the stap scripts to make sure what you are \
> > > accounting the latency.
> > Yes, I plan to put them into a PR soon once I have done a bit more data \
> > collection and polishing of the script setup.
> > > Also just to be sure is latency a synonym for response time ?
> > Yep, here I mean the execution time of a single operation.
> >
> > > Regarding the comparison (tail) 1client/16client. It looks to me that the tail \
> > > are equivalent: The more we have clients the more we have long latency. So in a \
> > > first approach I would eliminate contention effect.
> > I disagree, the tail is much more pronounced in the 16 client version, and there \
> > is a significant shift of response times from the 32768 bucket to 65536 \
> > indicating that many operations are now being "held up".
>
> You are probably right.
> There is a response time shift but I would expect a major contention effect to \
> shift more and flatter the graph to upper response time. Whatever is the answer, I \
> think an important point is to agree on the method and that reports shows \
> somethings suspicious.
In a highly concurrent system like DS we shouldn't be seeing tail latency or spikes, \
we should be seeing all responses within a really small window especially when the \
searches are all the "same query" for a single uid. So that's why I'm looking at the \
spikes first.
>
> >
> > > Regarding the ratio shorter/longer latency (assuming the search are equivalent) \
> > > this is interesting to know why we have such effect. One of the possible cause \
> > > I was thinking of is the impact of DB thread (checkpointing or trickling). But \
> > > if it exists similar long tail in ldbm_back, the absolute value is much lower \
> > > than the opshare_search: in short ldbm_back accounted at most for 4ms while \
> > > opshared for 67ms. So there is something else (aci, network, frontend..).
> > > Regarding USDT I think it is very good idea. However, just to share some recent \
> > > stap experience, I found it intrusive. In short, I had not the same throughput \
> > > with and without. In my case it was not a problem, as I wanted to investigate a \
> > > reproducible contention. But if we want support/user/customers to use it, the \
> > > performance impact in production will be valid point.
> > I haven't noticed any "intrusiveness" from USDT so far? How were you running the \
> > stap scripts?
> I did not add probe in DS. I was just using stap to gather per operation specific \
> functions duration (like plugin, backend or core).
> I do not recall the "intrusiveness" level as I was more looking for contention data \
> than performance value.
Hmmm okay. I've found stap extremely fast and it doesn't get in the way, so unless I \
saw your stap scripts I'm not sure .....
>
> >
> > > best regards
> > > thierry
> > >
> > >
> > >
> > > On 12/9/19 6:16 AM, William Brown wrote:
> > > > Hi all,
> > > >
> > > > Following last weeks flamegraph runs, I wanted to find more details on \
> > > > exactly what was happening. While flamegraphs highlighted that a changed \
> > > > existed between single and multithreaded servers, it did not help to isolate \
> > > > where the change was occuring.
> > > >
> > > > To understand this I have started to work on a set of systemtap scripts that \
> > > > we can use to profile 389ds. These will be included in a future PR.
> > > > Here are the hisograms from an initial test of profiling "do_search"
> > > >
> > > > 1 thread
> > > >
> > > > stap test_do_search.stp
> > > > ^CDistribution of latencies (in nanoseconds) for 441148 samples
> > > > max/avg/min: 35911542/85694/38927
> > > > value |-------------------------------------------------- count
> > > > 8192 | 0
> > > > 16384 | 0
> > > > 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 167285
> > > > 65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 239280
> > > > 131072 |@@@@@ 25788
> > > > 262144 |@ 8530
> > > > 524288 | 252
> > > > 1048576 | 6
> > > > 2097152 | 1
> > > > 4194304 | 3
> > > > 8388608 | 0
> > > > 16777216 | 2
> > > > 33554432 | 1
> > > > 67108864 | 0
> > > > 134217728 | 0
> > > >
> > > > 16 thread
> > > >
> > > > stap test_do_search.stp
> > > > ^CDistribution of latencies (in nanoseconds) for 407806 samples
> > > > max/avg/min: 100315928/112407/39368
> > > > value |-------------------------------------------------- count
> > > > 8192 | 0
> > > > 16384 | 0
> > > > 32768 |@@@@@@@@@@@@@@@@@@@@ 100281
> > > > 65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 249656
> > > > 131072 |@@@@@@@ 37837
> > > > 262144 |@@@ 18322
> > > > 524288 | 1171
> > > > 1048576 | 203
> > > > 2097152 | 90
> > > > 4194304 | 74
> > > > 8388608 | 83
> > > > 16777216 | 58
> > > > 33554432 | 21
> > > > 67108864 | 10
> > > > 134217728 | 0
> > > > 268435456 | 0
> > > >
> > > >
> > > > It's interesting to note the tail latency here: On the 16 thread version we \
> > > > see 67000 less in the 32768 buckets, shifting mostly through the 131072 and \
> > > > 262144 buckets, as well as showing a much greater number of calls in the \
> > > > tail. In thread 1 no operation made it to 67108864, but 10 did in 16thread, \
> > > > along with ~200 more that are higher than 1048567, and ~1500 more that are \
> > > > greater than 524288. This kind of tailing means we have "spikes" of latency \
> > > > throughout the execution, which then have a minor flow on cause other \
> > > > operations to be increased in latency.
> > > > These are all in nanoseconds, so this means most operations are in do_search \
> > > > for 0.000131072 seconds or less, while there are spikes of operations taking \
> > > > between nearly 0.001048576 and 0.067108864 seconds to complete (which is an \
> > > > 8x to 512x increase in execution time.
> > > > From these point I took two measurements of back_ldbm and the access log, \
> > > > knowing these were easy targets for potential areas of latency. Both of these \
> > > > were performed with the 16thread server. I didn't need to do this on the 1 \
> > > > thread because I'm looking for tail latency, not comparisons now. Because we \
> > > > know that there is an increase of ~1500 events of high latency, we are \
> > > > looking for similar numbers to appear in other tail latencies.
> > > >
> > > > ^CDistribution of back_ldbm latencies (in nanoseconds) for 364223 samples
> > > > max/avg/min: 4757744/31192/13622
> > > > value |-------------------------------------------------- count
> > > > 2048 | 0
> > > > 4096 | 0
> > > > 8192 |@@ 14318
> > > > 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 261265
> > > > 32768 |@@@@@@@@@@@@ 63533
> > > > 65536 |@@@ 20624
> > > > 131072 | 4062
> > > > 262144 | 304
> > > > 524288 | 68
> > > > 1048576 | 33
> > > > 2097152 | 15
> > > > 4194304 | 1
> > > > 8388608 | 0
> > > > 16777216 | 0
> > > >
> > > > This shows some tail latency in back_ldbm however it's infrequent, only \
> > > > showing a handful of spikes, but they do exist and tend to taper off quickly \
> > > > as they approach 2097152. Certainly these should be looked at as they will \
> > > > likely add up to affecting do_search.
> > > >
> > > > stap -v test_access_log.stp
> > > > ^CDistribution of slapi_log_access latencies (in nanoseconds) for 1284350 \
> > > > samples
> > > > max/avg/min: 23843992/5138/1036
> > > > value |-------------------------------------------------- count
> > > > 256 | 0
> > > > 512 | 0
> > > > 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 379710
> > > > 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 361243
> > > > 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 412480
> > > > 8192 |@@@@@@@@@@ 83815
> > > > 16384 |@@@@ 40087
> > > > 32768 | 5076
> > > > 65536 | 1253
> > > > 131072 | 406
> > > > 262144 | 130
> > > > 524288 | 72
> > > > 1048576 | 33
> > > > 2097152 | 16
> > > > 4194304 | 19
> > > > 8388608 | 1
> > > > 16777216 | 9
> > > > 33554432 | 0
> > > > 67108864 | 0
> > > >
> > > >
> > > > From this, I can see that while some tail latency exists in some conditions \
> > > > of back_ldbm, there is a much longer and present tail in the access log, with \
> > > > nearly 2000 operations in or exceeding the 65536 mark - remember, in \
> > > > do_search most of the operations as a whole take 65536, so we have latency \
> > > > spikes in slapi_log_access that are as large or larger than whole search \
> > > > operations as a total.
> > > > Now we can see "most! operations are fimly below 16384. This is what we want \
> > > > to see, but it's interesting that a slapi_log_access can "take up to" a \
> > > > quarter of a whole operations processing on the avg case.
> > > > What's truly telling here is the tail down to 16777216, with ~250 operations \
> > > > exceeding 262144 ns. This shows that within slapi_log_access, we have have \
> > > > large spiked delays in the behaviour, which will be affecting both the tail \
> > > > latency of do_search as a whole, but also causing other slapi_log_access \
> > > > operations to "hold up".
> > > > My next steps from here will be:
> > > >
> > > > * To add USDT probes to the logs and back_ldbm to get better, fine detail \
> > > > about what is causing the excessive latency.
> > > > * these probes are also needed to resolve what appears to be a symbol \
> > > > resolution issue with systemtap when optimisations are \
> > > > enabled.
> > > > * To add probes in other parts of the code base to get better visualisation \
> > > > about where and how long timings are taking through an \
> > > > operation.
> > > > * To run a lock contention profile (I was unable to do this today due to bugs \
> > > > in systemtap)
> > > > * To document the setup proceedures
> > > > * Commit all these into a PR
> > > > * Document some actionable improvements we can make to improve the server \
> > > > performance.
> > > >
> > > > Hope that is interesting to everyone,
> > > >
> > > >
> > > > --
> > > > Sincerely,
> > > >
> > > > William
> > > > _______________________________________________
> > > > 389-devel mailing list -- 389-devel@lists.fedoraproject.org
> > > > To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org
> > > > Fedora Code of Conduct: \
> > > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List \
> > > > Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List \
> > > > Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
> > > >
> > --
> > Sincerely,
> >
> > William
> >
> _______________________________________________
> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
> To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org
> Fedora Code of Conduct: \
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: \
> https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: \
> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
--
Sincerely,
William
_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic