'Re: General Guidance on IdP Environment Sizing'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       shibboleth-users
Subject:    Re: General Guidance on IdP Environment Sizing
From:       "Cantor, Scott" <cantor.2 () osu ! edu>
Date:       2018-09-28 17:37:23
Message-ID: C75C6C63-8D00-442B-BEF8-017668AC0EA8 () osu ! edu
[Download RAW message or body]

On 9/28/18, 1:09 PM, "users on behalf of Paul Fardy" <users-bounces@shibboleth.net on \
behalf of paul.fardy@utoronto.ca> wrote:

> Could you expand on this?

With the caveat that I consider every second spent thinking about this sort of thing \
to be wasted time and know very little about it. The IdP scales linearly. So get a \
decent load balancer, add servers, declare victory and move on. Time is expensive, \
servers aren't.

> This isn't our experience. Our IdP spends most off its time awaiting LDAP and \
> Kerberos queries. CPU utilization is low: consistently less than 10% usage of a \
> 4CPU VM, with a few peaks of 20%. We increased CPUs for the load. We can't \
> determine if there was any benefit.

You can probably expand your pool sizes then, or your CPUs are incredibly powerful \
and you should see great performance. My VM is constantly pegged because it's so \
underpowered. A loaded IdP handling lots of traffic is going to be at 100% across any \
CPUs it can hit and if it's not pegged, it's throttled by other limitations. My \
physical box is never loaded, but it spikes routinely to 500-600%.

> The time taken to serve the request, in seconds.

SSO requests should finish in the aggregate in under a second unless your back-end \
and overall architecture is too slow. There are noisy exceptions in the data but a \
monthly 95th percentile should certainly be under 1 sec.

If it's not busy with CPU and is not getting in and out then your pools are too small \
and it's stuck waiting for connections, or your back end is too slow.

> This has helped find SLOW responses, though it cannot help tune for fast responses. \
> The granularity is seconds. It would > be great if we could log LDAP latency to the \
> IdP logs.

You can use the metrics support to take all sorts of timings that would be at least \
approximating specific LDAP behavior, at least for attributes. I have never had to \
bother but it's there.  
> Initially, we weren't even re-using LDAP connections. So every authn and every \
> attribute query included connect, bind, search, close.

That's certainly a killer. The complexity of pooling authn in LDAP is one big reason \
I use Kerberos protocol with AD whenever I can.

> I think the IdP software's internal issue would be JVM multi-threading. Could \
> context switching be slow?

If it saturates then it will thrash and that's easy to see most of the time if it \
happens. I haven't heard of that in a long time. Performance falls off a cliff.

> Can we determine if or when it's overloaded?

You said it's not overloaded.

> I increased CPUs. But that's really speculating that the JVM needs help. If 
> it's mostly waiting, it doesn't need more hardware. Our system isn't using it. I \
> just one wonder if it might need more.

If it's waiting for anything, it needs a new architecture to get its work done, and \
no, hardware won't help.

> Sizing memory is also an issue. We increased our RAM before a student registration \
> load day and we tuned tomcat: we  increased the thread pool, ... which immediately \
> increased memory usage, but was is usage or wastage?

Memory is virtually all for metadata until people stop using huge batches and then \
it's pretty irrelevant.

> setting stack size to 384K

That seems incredibly oversized, but I haven't ever looked at it.

> We found our bottlenecks in LDAP latency. One box runs well for us.

That sounds like an LDAP problem to me, which fits everything else you're saying.
 
-- Scott


-- 
For Consortium Member technical support, see \
https://wiki.shibboleth.net/confluence/x/coFAAg To unsubscribe from this list send an \
email to users-unsubscribe@shibboleth.net


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic