'RE: Weird IdP Hanging Issue'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       shibboleth-users
Subject:    RE: Weird IdP Hanging Issue
From:       "Royder, Kyle D" <kroyder () austin ! utexas ! edu>
Date:       2013-03-26 20:30:17
Message-ID: FCB0A8ABF711AA479956F6AC25A01D4D4BA378E3 () EXMBX03 ! austin ! utexas ! edu
[Download RAW message or body]

I just wanted to follow-up on out solution for this.  Still being new to the admin \
role, I hadn't yet performed a thread dump of tomcat using kill -3 on the process.  \
After doing so and looking at catalina.out, I was  able to determine that the \
mechanism that was implemented to send emails when any errors were logged in \
logging.conf was failing and causing threads for SMTPAppender to be created and never \
cleared causing tomcat to max out on open threads.

-Kyle


-----Original Message-----
From: users-bounces@shibboleth.net [mailto:users-bounces@shibboleth.net] On Behalf Of \
                Russell Beall
Sent: Friday, March 22, 2013 1:35 PM
To: Shib Users
Subject: Re: Weird IdP Hanging Issue

Yep, looks like you have more than enough memory.

In days long past when our LDAP servers were occasionally overloaded, the shibboleth \
nodes would start to backlog requests and this would make it look like it was \
hanging.  Are your data servers overloaded?

I haven't played with the connection pool.  The only settings that we have on that is \
set in the data connector directly and that just sets the pool limit to 100.

Regards,
Russ.

On Mar 22, 2013, at 10:58 AM, "Royder, Kyle D" <kroyder@austin.utexas.edu>
 wrote:

> Thanks Russ.  
> 
> We're currently running with "-Xms512m -Xmx2048m -XX:+DisableExplicitGC \
> -XX:MaxPermSize=1024m".  We have plenty of free memory so I could try doubling this \
> but I'll check the garbage collection with debugging on.  Thanks for the tip there. \
> It would be nice to see something wrong show up so I know what I'm dealing with. 
> One change I have made is adding connection pool options as described on \
> https://wiki.shibboleth.net/confluence/display/SHIB2/ResolverLDAPDataConnector. 
> The thing that has me concerned is the mention of the default options:
> blockWhenEmpty: Whether to wait for an available connection when the entire pool is \
> in use, default is true. If set to false then the number of connections can grow \
>                 beyond maxPoolSize.
> blockWaitTime: Length of time to wait, given in XML duration notation, on the pool \
> if blockWhenEmpty is true and the pool is empty. Default value is to wait \
> indefinitely. 
> I'm wondering if there are connections that get tied up and then everything gets \
> held up waiting on one of the 3 defaults connection to become available.  I've made \
> a change to test it out by setting blockWhenEmpty to false.  There's probably a \
> better way to setup the connection pool options but I'm wondering if this has \
> anything to do with it.  It's only been a couple of hours since I made this change \
> but so far so good.  That doesn't really mean anything though. :) 
> Does anyone know if the connection pool defaults are all in place even if you are \
> explicitly using the <ConnectionPool /> definition? 
> Thanks,
> Kyle
> 
> -----Original Message-----
> From: users-bounces@shibboleth.net [mailto:users-bounces@shibboleth.net] On Behalf \
>                 Of Russell Beall
> Sent: Friday, March 22, 2013 12:29 PM
> To: Shib Users
> Subject: Re: Weird IdP Hanging Issue
> 
> I'm just wondering if you provided any extra memory to the tomcat process.  You \
> might be hitting your memory limits and causing tomcat to go into regular Full GC \
> cycles.  This is usually what I have seen cause the behavior you described.  It \
> would be useful to add debug logging to the tomcat process which will print garbage \
> collection details.  For instance, I use these in my JAVA_OPTS: 
> -verbose:gc 
> -XX:+PrintGCTimeStamps 
> -XX:-TraceClassUnloading 
> 
> Also, regarding the resource reloading polling frequency, I have those set at one \
> minute, but the reload is never invoked unless there is a change.  It should be \
> safe to maintain a short interval on that reload check.  If you remove the reload \
> interval, then I believe it won't ever check and you would have to restart the IdP \
> to get it to load the change.  That might be fine for the relying-party.xml file \
> which Chad said shouldn't really be reloaded on the fly, but others that get \
> changed more frequently and are safe should be reloadable, such as the \
> attribute-filter.xml that was discussed already. 
> Regards,
> Russ.
> 
> On Mar 21, 2013, at 11:11 AM, "Royder, Kyle D" <kroyder@austin.utexas.edu> wrote:
> 
> > Thanks for the help!  I'll turn up our LDAP logging and watch check the LDAP logs \
> > as well, and turn configuration reloading.  We have a pool of IdPs and like to \
> > reboot them one at a time anyways to bring the one with changes back down if \
> > there is an issue to make sure the service stays up as a safety precaution.  If \
> > this is our change policy, I don't think configuration reloading with gain us \
> > anything the way we are currently doing things. 
> > Thanks,
> > Kyle
> > 
> > -----Original Message-----
> > From: users-bounces@shibboleth.net [mailto:users-bounces@shibboleth.net] On \
> >                 Behalf Of Cantor, Scott
> > Sent: Thursday, March 21, 2013 1:04 PM
> > To: Shib Users
> > Subject: RE: Weird IdP Hanging Issue
> > 
> > > I haven't had to mess with this before, but reading about it on the wiki, I'm
> > > assuming you're referring to the configurationResourcePollingFrequency
> > > attribute added to one of the four reloadable services in service.xml?
> > 
> > Yes.
> > 
> > > If so, it does look like this was setup to reload all four every 1 minute?
> > 
> > Yikes.
> > 
> > > I don't want this turned on so I'm going to remove the
> > > configurationResourcePollingFrequency attribute from all four of these.
> > 
> > You may well want the filer policy reloading, but probably not every minute.
> > 
> > > Hopefully this will resolve this issue.  The weird thing is that none of these
> > > configs have been changes in a couple of weeks and this just started over the
> > > past couple of days.
> > 
> > Agreed, I didn't necessarily think it was the cause, but there's an explicit bug \
> > in relying-party reloading, though I think that actually hangs hard. 
> > I would have to think your data connectors are the underlying cause here. If it's \
> > LDAP, I'd probably suggest logging more there. 
> > -- Scott
> > 
> > 
> > --
> > To unsubscribe from this list send an email to users-unsubscribe@shibboleth.net
> > --
> > To unsubscribe from this list send an email to users-unsubscribe@shibboleth.net
> > 
> 
> 
> --
> To unsubscribe from this list send an email to users-unsubscribe@shibboleth.net
> --
> To unsubscribe from this list send an email to users-unsubscribe@shibboleth.net
> 


--
To unsubscribe from this list send an email to users-unsubscribe@shibboleth.net
--
To unsubscribe from this list send an email to users-unsubscribe@shibboleth.net


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic