[prev in list] [next in list] [prev in thread] [next in thread]
List: james-user
Subject: Re: James Running Out Of Connections
From: Robert J Taylor <rjamestaylor () cox ! net>
Date: 2004-02-25 1:53:12
Message-ID: 403C0008.8060003 () cox ! net
[Download RAW message or body]
What was the resolution of this problem?
I'm running James 2.1.2 on:
Linux pjemail1.jazel.net 2.4.21-4.0.1.ELsmp #1 SMP Thu Oct 23 01:21:42
EDT 2003 i686 athlon i386 GNU/Linux
Java:
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)
Thanks,
Robert Taylor
Cesar Bonadio wrote:
>
> java version "1.4.2_03"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
> Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)
>
> And Linux 2.4
>
>
> On Tue, 2004-02-17 at 17:02, Michael Nestler wrote:
>
>
>>Phew - at least I am not the only one who has this problem ;-) Are you
>>using Linux 2.4 as well?
>>
>>I also did a load test before deploying James into production and I was
>>able to deliver about 100,000 emails within 8 hours without any issue.
>>The issue seems to occur only sporadically - sometimes it takes only a
>>few hours, sometimes a few days. But once it hangs, it does not come
>>back in my case (at least not within 6 hours).
>>
>>I also had the issue of James suddenly "partially" halting before I
>>upgraded from file to db persistence. I don't know though if this was
>>the same problem.
>>
>>My connections log level was set to DEBUG when the log was made.
>>
>>I did not have any ESTABLISHED smtp connection while James was hanging -
>>almost all had status TIME_WAIT.
>>
>>Michael
>>
>>
>>Cesar Bonadio wrote:
>>
>>
>>>I think you are not alone :)
>>>
>>>In the last five days I have found several messages in the connection
>>>log with the same
>>>problem my server stops responding but after a while it comes back and
>>>if I run a netstat when
>>>I am receiving the error I really dont have more than 20 ESTABLISED
>>>connections.
>>>
>>>This is interesting problem because in our test server running the same
>>>version of James and java
>>>a manage to deliver more than 20.000 messages with smtp-source and the
>>>connections works just fine
>>>if I go over it complains and releases the connections after use.
>>>
>>>I will put my connection log in debug mode to see if I get something
>>>
>>>
>>>[]s
>>>
>>>
>>>
>>>
>>>On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:
>>>
>>>
>>>
>>>
>>>>OK, James stopped again last night at about 1:50 am. I did not have the
>>>>time yet to run postal or smtp-source. But this time I have a log
>>>>showing smtp sockets with a 10 minute resolution. Since the last stop I
>>>>increased the maximum number of connections to 120 (thread number to 150).
>>>>
>>>>At 1:50 James reported in connections*.log:
>>>>
>>>>17/02/04 01:50:00 WARN connections: Maximum number of open connections
>>>>exceeded - refusing connection. Current number of connections is 120
>>>>
>>>>repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron
>>>>job "netstat | grep smtp") shows for the same time:
>>>>
>>>>Tue Feb 17 01:50:00 EST 2004
>>>>tcp 0 0 email1.whoglue.net:smtp DSL217-1:fyre-messanger
>>>>SYN_RECV
>>>>tcp 0 0 192.168.168.5:smtp vm211.vm:cruise-swroute
>>>>ESTABLISHED
>>>>tcp 1 0 192.168.168.5:smtp 220.127.103.78:idmaps
>>>>CLOSE_WAIT
>>>>tcp 0 0 192.168.168.5:smtp mta.email.contine:44071
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp custd530.innerhost:4074
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp pro109.profontis.:59902
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp smtp01.1800contact:4220
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp mta.email.contine:42345
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp mta.email.contine:25694
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp stdout-mv-01.inbo:44652
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp 128.121.26.2:31574
>>>>ESTABLISHED
>>>>tcp 1 0 192.168.168.5:smtp ool-435798e8.dy:evb-elm
>>>>CLOSE_WAIT
>>>>tcp 1 0 192.168.168.5:smtp lsanca1-ar5:notify_srvr
>>>>CLOSE_WAIT
>>>>tcp 0 0 192.168.168.5:smtp mta113.cheetahmai:52853
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp 66.42.142.98:49221
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp server611.colonize:3926
>>>>ESTABLISHED
>>>>tcp 1 0 192.168.168.5:smtp 211-56-243-3.inter:linx
>>>>CLOSE_WAIT
>>>>tcp 0 0 192.168.168.5:smtp wchicon1.qvc.com:4611
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp vm125.vmadmin.com:8381
>>>>ESTABLISHED
>>>>tcp 7 0 192.168.168.5:smtp 66.28.60.195:58491
>>>>CLOSE_WAIT
>>>>tcp 0 0 192.168.168.5:smtp imo-r10.mx.aol.co:61761
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52001
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52272
>>>>ESTABLISHED
>>>>tcp 1 0 192.168.168.5:smtp 202.Red-213-97-41:pipes
>>>>CLOSE_WAIT
>>>>tcp 0 0 192.168.168.5:smtp n25.grp.scd.yahoo:47649
>>>>ESTABLISHED
>>>>tcp 1 0 192.168.168.5:smtp 218.98.97.183:4612
>>>>CLOSE_WAIT
>>>>tcp 1 0 192.168.168.5:smtp mail.contactel.cz:43724
>>>>CLOSE_WAIT
>>>>tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43794
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43801
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp 212.143.236.10:hylafax
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp cust54-224.netcabo:4416
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp ana216.anotherday:40710
>>>>ESTABLISHED
>>>>tcp 1 0 192.168.168.5:smtp 218.232.228.130:vsixml
>>>>CLOSE_WAIT
>>>>
>>>>
>>>>And at the next time the netstat snapshot was made:
>>>>
>>>>Tue Feb 17 02:00:00 EST 2004
>>>>tcp 0 0 192.168.168.5:smtp mta.email.contine:13622
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp imo-m23.mx.aol.co:63699
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp modemcable:conclave-cpp
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp 219.144.194.18:43349
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp 219.144.194.18:43281
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp c-24-7-95-19.clien:4669
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30519
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30518
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp relay.quickhire.c:34089
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp ACC1B832.:ns-cfg-server
>>>>TIME_WAIT
>>>>
>>>>
>>>>From there on the smtp sockets log file basically shows only TIME_WAIT
>>>>sockets (there are a few SYN_RECV) until I restarted James at 8:29 am.
>>>>Never netstat shows 120 open smtp connections (what James seems to
>>>>believe). As a strange aside the default*.log contains a database
>>>>connection exception (it contained the same exception at the last time
>>>>James stopped 6 days ago). I don't know if this means anything - could
>>>>this exception have caused the problem? spoolmanager*.log and
>>>>mailet*.log stopped recording at 1:32 am.
>>>>
>>>>To summarize the events from my point of view:
>>>>o James was running for about 6 days without any problem, delivering
>>>>about 5,000-10,000 emails a day
>>>>o some event at a few minutes before 1:50 am stopped James from
>>>>returning used smtp connection socket objects to the pool starting at
>>>>that time
>>>>o as a consequence new connections were not accepted and the service
>>>>stopped effectively at 1:32 am
>>>>o it never recovered from that situation until I relieved it by
>>>>restarting phoenix at 8:29 am
>>>>
>>>>From my point of view it seems to be likely that the mentioned
>>>>exception in default*.log caused the problem, since it happened so close
>>>>to the time James stopped:
>>>>
>>>>17/02/04 01:33:11 INFO database-connections.maildb: ***** connection 1
>>>>is way too old: 63408 > 60000
>>>>17/02/04 01:33:11 INFO database-connections.maildb: java.lang.Throwable
>>>> at
>>>>org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
>>>> at
>>>>org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178)
>>>>
>>>> at
>>>>org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484)
>>>>
>>>> at
>>>>org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243)
>>>>
>>>> at
>>>>org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649)
>>>>
>>>> at
>>>>org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413)
>>>>
>>>> at
>>>>org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436)
>>>>
>>>> at
>>>>org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366)
>>>>
>>>> at
>>>>org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47)
>>>>
>>>> at
>>>>org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80)
>>>>
>>>>
>>>>On the other hand it could be just a side effect. I attached the
>>>>mentioned log files since midnight to this email in zipped format.
>>>>
>>>>I really hope somebody on this list can make any sense out of this and
>>>>knows what's going on here and how I can fix it. Let me know if I should
>>>>run postal or smtp-source first and I will do it.
>>>>
>>>>All help is greatly appreciated.
>>>>
>>>>Best regards,
>>>>Michael
>>>>
>>>>P.S. I was not able to attach a ZIP file to this email. mail.apache.org
>>>>bounced it saying "552 we don't accept email with executable content
>>>>(#5.3.4)" ? I try it again as TGZ.
>>>>
>>>>
>>>>
>>>>Noel J. Bergman wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>Run netstat and check to see if they are real. I have tested the connection
>>>>>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
>>>>>amongst other versions.
>>>>>
>>>>>What version of James?
>>>>>
>>>>> --- Noel
>>>>>
>>>>>
>>>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>>For additional commands, e-mail: server-user-help@james.apache.org
>>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic