[prev in list] [next in list] [prev in thread] [next in thread]
List: shibboleth-users
Subject: Re: General Guidance on IdP Environment Sizing
From: "Cantor, Scott" <cantor.2 () osu ! edu>
Date: 2018-09-28 17:37:23
Message-ID: C75C6C63-8D00-442B-BEF8-017668AC0EA8 () osu ! edu
[Download RAW message or body]
On 9/28/18, 1:09 PM, "users on behalf of Paul Fardy" <users-bounces@shibboleth.net on \
behalf of paul.fardy@utoronto.ca> wrote:
> Could you expand on this?
With the caveat that I consider every second spent thinking about this sort of thing \
to be wasted time and know very little about it. The IdP scales linearly. So get a \
decent load balancer, add servers, declare victory and move on. Time is expensive, \
servers aren't.
> This isn't our experience. Our IdP spends most off its time awaiting LDAP and \
> Kerberos queries. CPU utilization is low: consistently less than 10% usage of a \
> 4CPU VM, with a few peaks of 20%. We increased CPUs for the load. We can't \
> determine if there was any benefit.
You can probably expand your pool sizes then, or your CPUs are incredibly powerful \
and you should see great performance. My VM is constantly pegged because it's so \
underpowered. A loaded IdP handling lots of traffic is going to be at 100% across any \
CPUs it can hit and if it's not pegged, it's throttled by other limitations. My \
physical box is never loaded, but it spikes routinely to 500-600%.
> The time taken to serve the request, in seconds.
SSO requests should finish in the aggregate in under a second unless your back-end \
and overall architecture is too slow. There are noisy exceptions in the data but a \
monthly 95th percentile should certainly be under 1 sec.
If it's not busy with CPU and is not getting in and out then your pools are too small \
and it's stuck waiting for connections, or your back end is too slow.
> This has helped find SLOW responses, though it cannot help tune for fast responses. \
> The granularity is seconds. It would > be great if we could log LDAP latency to the \
> IdP logs.
You can use the metrics support to take all sorts of timings that would be at least \
approximating specific LDAP behavior, at least for attributes. I have never had to \
bother but it's there.
> Initially, we weren't even re-using LDAP connections. So every authn and every \
> attribute query included connect, bind, search, close.
That's certainly a killer. The complexity of pooling authn in LDAP is one big reason \
I use Kerberos protocol with AD whenever I can.
> I think the IdP software's internal issue would be JVM multi-threading. Could \
> context switching be slow?
If it saturates then it will thrash and that's easy to see most of the time if it \
happens. I haven't heard of that in a long time. Performance falls off a cliff.
> Can we determine if or when it's overloaded?
You said it's not overloaded.
> I increased CPUs. But that's really speculating that the JVM needs help. If
> it's mostly waiting, it doesn't need more hardware. Our system isn't using it. I \
> just one wonder if it might need more.
If it's waiting for anything, it needs a new architecture to get its work done, and \
no, hardware won't help.
> Sizing memory is also an issue. We increased our RAM before a student registration \
> load day and we tuned tomcat: we increased the thread pool, ... which immediately \
> increased memory usage, but was is usage or wastage?
Memory is virtually all for metadata until people stop using huge batches and then \
it's pretty irrelevant.
> setting stack size to 384K
That seems incredibly oversized, but I haven't ever looked at it.
> We found our bottlenecks in LDAP latency. One box runs well for us.
That sounds like an LDAP problem to me, which fits everything else you're saying.
-- Scott
--
For Consortium Member technical support, see \
https://wiki.shibboleth.net/confluence/x/coFAAg To unsubscribe from this list send an \
email to users-unsubscribe@shibboleth.net
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic