[prev in list] [next in list] [prev in thread] [next in thread] 

List:       forgerock-opendj
Subject:    [Opendj] Debugging LDAP Client Timeouts
From:       mavricknzwork () yahoo ! com (Matt Stevenson)
Date:       2011-09-21 20:08:05
Message-ID: 1316635685.14606.YahooMailNeo () web161805 ! mail ! bf1 ! yahoo ! com
[Download RAW message or body]

Some ideas.

> Still wrestling with this. I have determined that when this locks up
> the Solaris LDAP client, the packet trace shows an LDAP query packet
> being retransmitted 10 times to the server with no ack.?

I think tcp queue length drops really are drops, no ack.

On LDAP server (solaris use ndd and?tcp_conn_req_max_q0?and?tcp_conn_req_max_q)...

echo 2048 >?/proc/sys/net/core/somaxconn # and then add to system startup

In OpenDJ set...

dn: cn=LDAP Connection Handler,cn=Connection Handlers,cn=config
ds-cfg-accept-backlog: 2048
ds-cfg-num-request-handlers: 4 // not sure this helps much with 2 CPUs.


dn: cn=LDAPS Connection Handler,cn=Connection Handlers,cn=config
ds-cfg-accept-backlog: 2048
ds-cfg-num-request-handlers: 4

Make sure max file descriptors is high as each socket is a file as is each open DB \
file/lib/... You could check usage with lsof or in /proc/"pid"/fd .

> Java version is 1.6.0_24.

Less important, 24 has a bug in the CMS collector, promotion failures after a while \
(maybe a week). Causes a long full GC then normal for a while (week, depends on \
load/tuning.). Fixed in _26.

> -Dsun.security.ssl.allowLegacyHelloMessages=true
> -Dsun.security.ssl.allowUnsafeRenegotiation=true

Allow old ssl stacks to do renegotiations, some security implications. You'll have to \
read up this one ;)

Regards
Matt

________________________________
From: Jason J. W. Williams <jasonjwwilliams at gmail.com>
To: Matt Stevenson <mavricknzwork at yahoo.com>; OpenDJ discussion list <opendj at \
                forgerock.org>
Sent: Wednesday, September 21, 2011 8:36 PM
Subject: Re: [Opendj] Debugging LDAP Client Timeouts

Hi Matt,


> I don't know if you got to the bottom of this but a few suggestions.

Still wrestling with this. I have determined that when this locks up
the Solaris LDAP client, the packet trace shows an LDAP query packet
being retransmitted 10 times to the server with no ack. However, the
NICs on the client, server and switch show no errors so I'm a bit
befuddled as to what is causing this intermittent packet loss. The
irony being if it reconnects to the same LDAP server after this event
everything is hunky dory again.

> Make sure OpenDJ can use a lot of file descriptors (ulimit -Hn , bash) something \
> like 65k is fine (overkill), 1024 is not.

That may be an issue here. We always ran OpenLDAP on the same box
without messing around with the ulimit, but it's crossed my mind that
there may be an issue here. How it would cause the OS network stack
not to return an ACK doesn't make sense to me though.

> 
> I don't know if you tuned the JVM, but make sure its the latest 1.6 and turn GC \
> debugging on. Something like below should be ok for a "reasonable" size directory \
> (maybe ~100K entry, NIS/Unix) .

Directory is pretty small (<2000 entries), but no I haven't tuned it.
Java version is 1.6.0_24.

-J

> -Dsun.security.ssl.allowLegacyHelloMessages=true
> -Dsun.security.ssl.allowUnsafeRenegotiation=true

What do these do? This issue isn't plaguing the Solaris clients still
running Nexenta and the only diff in their config is that they are NOT
using SSL ( the new ones are).


> Make sure cache is 50-60% e.g.
> dsconfig -n set-backed-prop --backend-name userRoot --set db-cache-percent:50

I'll check.

> What are your client timeouts? 5sec bind and 20sec search are common (unusual to \
> hit them).

Defaults. Only time out we have set is on the Linux clients for
idletimeout (500s).

-J


> Do you have lots of netgroups in the passwd file?

No.

Thank you so much for your help. It is really appreciated.

-J


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic