'Re: Coordinated Omission (CO) - possible strategies'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       jmeter-user
Subject:    Re: Coordinated Omission (CO) - possible strategies
From:       Kirk Pepperdine <kirk.pepperdine () gmail ! com>
Date:       2013-10-19 14:26:04
Message-ID: 62CD9F76-16F4-4662-B01C-549B4C3F4669 () gmail ! com
[Download RAW message or body]


On 2013-10-19, at 9:56 AM, Gil Tene <gil@azulsystems.com> wrote:

> To focus on the "how to deal with Coordinated Omission" part:
> 
> There are two main ways to deal with CO in your actual executed behavior:
> 
> 1. Change the behavior to avoid CO to begin with. 
> 
> 2. Detect it and correct it.

I'll add detect and report. I believe there is value beyond you can't believe the \
data. It's telling you that there is a condition that you need to eliminate from your \
test.
> 
> There is a "detect it and report it" one too, but I dot think it is of any real \
> use, as detection without correction will just tell you your data can't be believed \
> at all, but won't tell you anything about what can be. Since CO can move percentile \
> magnitudes and position by literal multiple orders if magnitude (I have multiple \
> measured real world production behaviors that show this) , "hoping it us not too \
> bad" when you know it is there amounts to burying your head in the sand. 
> So Kirk, is the random behavior you need one if random timing, or random operation \
> sequencing (or both)?

I need operations to occur at a random internal. That said, the interval is "random" \
to the server and *not* to JMeter. JMeter can pre-calculate when certain events \
should occur and then detect when it misses that target. The easiest way to do this \
is to build an event (sampler??) queue that understands when things such as the next \
HTTP sampler should be fired.

Regards,
Kirk
 
> 
> Sent from my iPad
> 
> On Oct 18, 2013, at 10:48 PM, "Kirk Pepperdine" <kirk.pepperdine@gmail.com> wrote:
> 
> > 
> > On 2013-10-19, at 1:33 AM, Gil Tene <gil@azulsystems.com> wrote:
> > 
> > > I guess we look at human response back pressure in different ways. It's a \
> > > question of whether or not you consider the humans to be part of the system you \
> > > are testing, and what you think your stats are supposed to represent.
> > 
> > You've seen my presentations and so you know that I do believe that human and \
> > non-human actors are definitively part of the system. They provide the dynamics \
> > for the system being tested. A change in how that layer in my model works can and \
> > does makes a huge difference in how the other layers work to support the overall \
> > system.
> > > 
> > > Some people will take the "forgiving" approach, which considers the client \
> > > behavior client as part of the overall system behavior. In such an approach, if \
> > > a human responded to slow behavior by not asking any more questions for a \
> > > while, that's simply what the overall system did, and the stats reported should \
> > > reflect only the actual attempts that actual humans would have, including their \
> > > slowing down their requests in response to slow reaction times. 
> > 
> > Sort of. I want to know that a user was inhibited from making forward progress \
> > because the previous step in their workflow blew stated tolerances. In some cases \
> > I'd like to have that user abandon. I'm not sure I'd call this forgiving though I \
> > am looking to see what the overall system can do to answer the question; is it \
> > good enough and if not, why not. 
> > I'm not going to suggest your view is incorrect. I think it's quite valid. I \
> > don't believe the two views are orthogonal and that there are elements of both in \
> > each. The question here on more practical terms is; what needs to be done to \
> > reduce the level of CO that currently occurs in JMeter and how should we react to \
> > it. Throwing out entire datasets from runs seems like an academic answer to a \
> > more practical question; will our application stand up when under load. From my \
> > point of view, for JMeter to better answer that question.  
> > > 
> > > A web site being completely down for 5 minutes an hour would generate a lot of \
> > > human back pressure response. It may even slow down request rates so much \
> > > during the outage that 99%+ of the overall actual requests by end users during \
> > > an hour that included such a 5 minute outage would still be very good. \
> > > Reporting on those (actual requests by humans) would be very different from \
> > > reporting on what would have happened without human back pressure. But it's \
> > > easy to examine which of the two reporting methods would be accepted by a \
> > > reader of such reports.
> > 
> > But then that 5 minute outage is going to show up some where and if you bury it \
> > in how you report.... that would seem to be a problem. This whole argument \
> > suggests that what you want is a better regime for the treatment of the data. If \
> > that is what you're saying, we're in complete agreement. The 5 minute pause \
> > should not be filtered out of the data! 
> > IMHO, the first thing to do is eliminate or reduce the known sources of CO from \
> > JMeter. I'm not sure that tackling the CTT is the beat way to go. In fact I'd \
> > prefer a combination of approaches that includes things like how jHiccup works \
> > with a GC STW detector. As you've mentioned before, even with a fix to the \
> > threading model in JMeter, CO will still occur. 
> > Regards,
> > Kirk
> > 



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic