[prev in list] [next in list] [prev in thread] [next in thread] 

List:       james-user
Subject:    Re: James Running Out Of Connections
From:       Robert J Taylor <rjamestaylor () cox ! net>
Date:       2004-02-25 1:53:12
Message-ID: 403C0008.8060003 () cox ! net
[Download RAW message or body]

What was the resolution of this problem?

I'm running James 2.1.2 on:
Linux pjemail1.jazel.net 2.4.21-4.0.1.ELsmp #1 SMP Thu Oct 23 01:21:42 
EDT 2003 i686 athlon i386 GNU/Linux

Java:
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)


Thanks,

Robert Taylor

Cesar Bonadio wrote:
> 
> java version "1.4.2_03"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
> Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)
> 
> And Linux 2.4
> 
> 
> On Tue, 2004-02-17 at 17:02, Michael Nestler wrote:
> 
> 
>>Phew - at least I am not the only one who has this problem ;-) Are you 
>>using Linux 2.4 as well?
>>
>>I also did a load test before deploying James into production and I was 
>>able to deliver about 100,000 emails within 8 hours without any issue. 
>>The issue seems to occur only sporadically - sometimes it takes only a 
>>few hours, sometimes a few days. But once it hangs, it does not come 
>>back in my case (at least not within 6 hours).
>>
>>I also had the issue of James suddenly "partially" halting before I 
>>upgraded from file to db persistence. I don't know though if this was 
>>the same problem.
>>
>>My connections log level was set to DEBUG when the log was made.
>>
>>I did not have any ESTABLISHED smtp connection while James was hanging - 
>>almost all had status TIME_WAIT.
>>
>>Michael
>>
>>
>>Cesar Bonadio wrote:
>>
>>
>>>I think you are not alone :) 
>>>
>>>In the last five days I have found several messages in the connection
>>>log with the same
>>>problem my server stops responding but after a while it comes back and
>>>if I run a netstat when
>>>I am receiving the error I really dont have more than 20 ESTABLISED
>>>connections.
>>>
>>>This is interesting problem because in our test server running the same
>>>version of James and java
>>>a manage to deliver more than 20.000 messages with smtp-source and the
>>>connections works just fine
>>>if I go over it complains and releases the connections after use.
>>>
>>>I will put my connection log in debug mode to see if I get something
>>>
>>>
>>>[]s
>>>
>>>
>>>
>>>
>>>On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:
>>>
>>> 
>>>
>>>
>>>>OK, James stopped again last night at about 1:50 am. I did not have the 
>>>>time yet to run postal or smtp-source. But this time I have a log 
>>>>showing smtp sockets with a 10 minute resolution. Since the last stop I 
>>>>increased the maximum number of connections to 120 (thread number to 150).
>>>>
>>>>At 1:50 James reported in connections*.log:
>>>>
>>>>17/02/04 01:50:00 WARN  connections: Maximum number of open connections 
>>>>exceeded - refusing connection.  Current number of connections is 120
>>>>
>>>>repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron 
>>>>job "netstat | grep smtp") shows for the same time:
>>>>
>>>>Tue Feb 17 01:50:00 EST 2004
>>>>tcp        0      0 email1.whoglue.net:smtp DSL217-1:fyre-messanger 
>>>>SYN_RECV   
>>>>tcp        0      0 192.168.168.5:smtp      vm211.vm:cruise-swroute 
>>>>ESTABLISHED
>>>>tcp        1      0 192.168.168.5:smtp      220.127.103.78:idmaps   
>>>>CLOSE_WAIT 
>>>>tcp        0      0 192.168.168.5:smtp      mta.email.contine:44071 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      custd530.innerhost:4074 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      pro109.profontis.:59902 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      smtp01.1800contact:4220 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      mta.email.contine:42345 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      mta.email.contine:25694 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      stdout-mv-01.inbo:44652 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      128.121.26.2:31574      
>>>>ESTABLISHED
>>>>tcp        1      0 192.168.168.5:smtp      ool-435798e8.dy:evb-elm 
>>>>CLOSE_WAIT 
>>>>tcp        1      0 192.168.168.5:smtp      lsanca1-ar5:notify_srvr 
>>>>CLOSE_WAIT 
>>>>tcp        0      0 192.168.168.5:smtp      mta113.cheetahmai:52853 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      66.42.142.98:49221      
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      server611.colonize:3926 
>>>>ESTABLISHED
>>>>tcp        1      0 192.168.168.5:smtp      211-56-243-3.inter:linx 
>>>>CLOSE_WAIT 
>>>>tcp        0      0 192.168.168.5:smtp      wchicon1.qvc.com:4611   
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      vm125.vmadmin.com:8381  
>>>>ESTABLISHED
>>>>tcp        7      0 192.168.168.5:smtp      66.28.60.195:58491      
>>>>CLOSE_WAIT 
>>>>tcp        0      0 192.168.168.5:smtp      imo-r10.mx.aol.co:61761 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52001 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52272 
>>>>ESTABLISHED
>>>>tcp        1      0 192.168.168.5:smtp      202.Red-213-97-41:pipes 
>>>>CLOSE_WAIT 
>>>>tcp        0      0 192.168.168.5:smtp      n25.grp.scd.yahoo:47649 
>>>>ESTABLISHED
>>>>tcp        1      0 192.168.168.5:smtp      218.98.97.183:4612      
>>>>CLOSE_WAIT 
>>>>tcp        1      0 192.168.168.5:smtp      mail.contactel.cz:43724 
>>>>CLOSE_WAIT 
>>>>tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43794 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43801 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      212.143.236.10:hylafax  
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      cust54-224.netcabo:4416 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      ana216.anotherday:40710 
>>>>ESTABLISHED
>>>>tcp        1      0 192.168.168.5:smtp      218.232.228.130:vsixml  
>>>>CLOSE_WAIT 
>>>>
>>>>
>>>>And at the next time the netstat snapshot was made:
>>>>
>>>>Tue Feb 17 02:00:00 EST 2004
>>>>tcp        0      0 192.168.168.5:smtp      mta.email.contine:13622 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      imo-m23.mx.aol.co:63699 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      modemcable:conclave-cpp 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      219.144.194.18:43349    
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      219.144.194.18:43281    
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      c-24-7-95-19.clien:4669 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30519 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30518 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      relay.quickhire.c:34089 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      ACC1B832.:ns-cfg-server 
>>>>TIME_WAIT  
>>>>
>>>>
>>>>From there on the smtp sockets log file basically shows only TIME_WAIT 
>>>>sockets (there are a few SYN_RECV) until I restarted James at 8:29 am. 
>>>>Never netstat shows 120 open smtp connections (what James seems to 
>>>>believe). As a strange aside the default*.log contains a database 
>>>>connection exception (it contained the same exception at the last time 
>>>>James stopped 6 days ago). I don't know if this means anything - could 
>>>>this exception have caused the problem? spoolmanager*.log and 
>>>>mailet*.log stopped recording at 1:32 am.
>>>>
>>>>To summarize the events from my point of view:
>>>>o James was running for about 6 days without any problem, delivering 
>>>>about 5,000-10,000 emails a day
>>>>o some event at a few minutes before 1:50 am stopped James from 
>>>>returning used smtp connection socket objects to the pool starting at 
>>>>that time
>>>>o as a consequence new connections were not accepted and the service 
>>>>stopped effectively at 1:32 am
>>>>o it never recovered from that situation until I relieved it by 
>>>>restarting phoenix at 8:29 am
>>>>
>>>>From my point of view it seems to be likely that the mentioned 
>>>>exception in default*.log caused the problem, since it happened so close 
>>>>to the time James stopped:
>>>>
>>>>17/02/04 01:33:11 INFO  database-connections.maildb:  ***** connection 1 
>>>>is way too old: 63408 > 60000
>>>>17/02/04 01:33:11 INFO  database-connections.maildb: java.lang.Throwable
>>>>  at 
>>>>org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
>>>>  at 
>>>>org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178) 
>>>>
>>>>  at 
>>>>org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484) 
>>>>
>>>>  at 
>>>>org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243) 
>>>>
>>>>  at 
>>>>org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649) 
>>>>
>>>>  at 
>>>>org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413) 
>>>>
>>>>  at 
>>>>org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436) 
>>>>
>>>>  at 
>>>>org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366) 
>>>>
>>>>  at 
>>>>org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47) 
>>>>
>>>>  at 
>>>>org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80) 
>>>>
>>>>
>>>>On the other hand it could be just a side effect. I attached the 
>>>>mentioned log files since midnight to this email in zipped format.
>>>>
>>>>I really hope somebody on this list can make any sense out of this and 
>>>>knows what's going on here and how I can fix it. Let me know if I should 
>>>>run postal or smtp-source first and I will do it.
>>>>
>>>>All help is greatly appreciated.
>>>>
>>>>Best regards,
>>>>Michael
>>>>
>>>>P.S. I was not able to attach a ZIP file to this email. mail.apache.org 
>>>>bounced it saying "552 we don't accept email with executable content 
>>>>(#5.3.4)" ? I try it again as TGZ.
>>>>
>>>>
>>>>
>>>>Noel J. Bergman wrote:
>>>>
>>>>   
>>>>
>>>>
>>>>>Run netstat and check to see if they are real.  I have tested the connection
>>>>>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
>>>>>amongst other versions.
>>>>>
>>>>>What version of James?
>>>>>
>>>>>	--- Noel
>>>>>
>>>>>     
>>>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>>For additional commands, e-mail: server-user-help@james.apache.org
>>
> 
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic