[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-dev
Subject:    [jira] [Commented] (AVRO-1407) NettyTransceiver can cause a infinite loop when slow to connect
From:       "Hudson (JIRA)" <jira () apache ! org>
Date:       2014-11-26 20:56:14
Message-ID: JIRA.12681865.1385722840000.29908.1417035374728 () Atlassian ! JIRA
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/AVRO-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226804#comment-14226804 \
] 

Hudson commented on AVRO-1407:
------------------------------

SUCCESS: Integrated in AvroJava #502 (See \
                [https://builds.apache.org/job/AvroJava/502/])
AVRO-1407: Java: Fix infinite loop on slow connect in NettyTransceiver.  Contributed \
                by Gareth Davis. (cutting: rev 1641894)
* /avro/trunk/CHANGES.txt
* /avro/trunk/lang/java/ipc/src/main/java/org/apache/avro/ipc/NettyTransceiver.java
* /avro/trunk/lang/java/ipc/src/test/java/org/apache/avro/ipc/NettyTransceiverWhenFailsToConnect.java



> NettyTransceiver can cause a infinite loop when slow to connect
> ---------------------------------------------------------------
> 
> Key: AVRO-1407
> URL: https://issues.apache.org/jira/browse/AVRO-1407
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.7.5, 1.7.6
> Reporter: Gareth Davis
> Assignee: Gareth Davis
> Fix For: 1.7.8
> 
> Attachments: AVRO-1407-1.patch, AVRO-1407-2.patch, AVRO-1407-testcase.patch
> 
> 
> When a new {{NettyTransceiver}} is created it forces the channel to be allocated \
> and connected to the remote host. it waits for the connectTimeout ms on the \
> [connect channel future|https://github.com/apache/avro/blob/1579ab1ac95731630af58fc3 \
> 03a07c9bf28541d6/lang/java/ipc/src/main/java/org/apache/avro/ipc/NettyTransceiver.java#L271] \
> this is obivously a good thing it's only that on being unsuccessful, ie \
> {{!channelFuture.isSuccess()}} an exception is thrown and the call to the \
> constructor fails with an {{IOException}}, but has the potential to leave a active \
> channel associated with the {{ChannelFactory}} The problem is that a Netty \
> {{NioClientSocketChannelFactory}} will not shutdown if there are active channels \
> still around and if you have supplied the {{ChannelFactory}} to the \
> {{NettyTransceiver}} then  you will not be able to cancel it by calling \
> {{ChannelFactory.releaseExternalResources()}} like the [Flume Avro RPC client \
> does|https://github.com/apache/flume/blob/b8cf789b8509b1e5be05dd0b0b16c5d9af9698ae/flume-ng-sdk/src/main/java/org/apache/flume/api/NettyAvroRpcClient.java#L158]. \
> In order to recreate this you need a very laggy network, where the connect attempt \
> takes longer than the connect timeout but does actually work, this very hard to \
> organise in a test case, although I do have a test setup using vagrant VM's that \
> recreates this everytime, using the Flume RPC client and server. The following \
> stack is from a production system, it won't ever leave recover until the channel is \
> disconnected (by forcing a disconnect at the remote host) or restarting the JVM. \
> {noformat:title=Production stack trace} "TLOG-0" daemon prio=10 \
>                 tid=0x00007f581c7be800 nid=0x39a1 waiting on condition \
>                 [0x00007f57ef9f2000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x00000007218b16e0> (a \
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at \
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at \
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
>  at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1253)
>  at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:103)
> at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.releaseExternalResources(AbstractNioWorkerPool.java:80)
>  at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.releaseExternalResources(NioClientSocketChannelFactory.java:181)
>  at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:142)
> at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:101)
> at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:564)
> locked <0x00000006c30ae7b0> (a org.apache.flume.api.NettyAvroRpcClient)
> at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88)
> at org.apache.flume.api.LoadBalancingRpcClient.createClient(LoadBalancingRpcClient.java:214)
>  at org.apache.flume.api.LoadBalancingRpcClient.getClient(LoadBalancingRpcClient.java:205)
>  locked <0x00000006a97b18e8> (a org.apache.flume.api.LoadBalancingRpcClient)
> at org.apache.flume.api.LoadBalancingRpcClient.appendBatch(LoadBalancingRpcClient.java:95)
>  at com.ean.platform.components.tlog.client.service.AvroRpcEventRouter$1.call(AvroRpcEventRouter.java:45)
>  at com.ean.platform.components.tlog.client.service.AvroRpcEventRouter$1.call(AvroRpcEventRouter.java:43)
>  {noformat}
> The solution is very simple, and a patch should be along in a moment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic