[prev in list] [next in list] [prev in thread] [next in thread] 

List:       activemq-dev
Subject:    [jira] [Commented] (AMQ-4160) DiscoveryNetworkConnector can lose track of active bridges, resulting
From:       "Stirling Chow (JIRA)" <jira () apache ! org>
Date:       2012-11-29 23:22:58
Message-ID: 1987913246.42862.1354231378721.JavaMail.jiratomcat () arcas
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/AMQ-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506927#comment-13506927 \
] 

Stirling Chow commented on AMQ-4160:
------------------------------------

Great to hear, Timothy.  Thanks for working through this with me.
                
> DiscoveryNetworkConnector can lose track of active bridges, resulting in permanent \
>                 bridge failure or continued attempts to re-connect existing bridges
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>  
> Key: AMQ-4160
> URL: https://issues.apache.org/jira/browse/AMQ-4160
> Project: ActiveMQ
> Issue Type: Bug
> Reporter: Stirling Chow
> Assignee: Timothy Bish
> Priority: Critical
> Fix For: 5.8.0
> 
> Attachments: AMQ4160_2.patch, AMQ4160.patch, AMQ4160Test.java
> 
> 
> Symptom
> =======
> {{DiscoveryNetworkConnector}} is not thread-safe, and as a result, race conditions \
> allow the {{bridges}} data structure to become corrupt and not accurately represent \
> the bridges that exist. Because {{bridges}} is used to control whether a discovery \
> event will result in a bridge creation attempt, if it is not accurate, two results \
> are possible: # A discovery event will be ignored even though its corresponding \
> bridge is not active; as a result, the bridge will never be established # A \
> discovery event will be processed even though its corresponding bridge is active; \
> as a result, the bridge creation will fail (because of duplicate connections), and \
> be indefinitely retried, generating many WARN/ERROR log messages Cause
> =====
> {{DiscoveryNetworkConnector}} updates a {{bridges}} hashmap whenever a bridge is \
> created or removed: {code:title=DiscoveryNetworkConnector.java}
> public void onServiceAdd(DiscoveryEvent event) {
> ...
> // Should we try to connect to that URI?
> synchronized (bridges) {
> if( bridges.containsKey(uri) ) {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Discovery agent generated a duplicate onServiceAdd event for: "+uri );
> }
> return;
> }
> }
> ...
> NetworkBridge bridge = createBridge(localTransport, remoteTransport, event);
> try {
> bridge.start();
> synchronized (bridges) {
> bridges.put(uri, bridge);
> }
> ...
> }
> public void onServiceRemove(DiscoveryEvent event) {
> String url = event.getServiceName();
> if (url != null) {
> URI uri;
> try {
> uri = new URI(url);
> } catch (URISyntaxException e) {
> LOG.warn("Could not connect to remote URI: " + url + " due to bad URI syntax: " + \
> e, e); return;
> }
> synchronized (bridges) {
> bridges.remove(uri);
> }
> }
> }
> {code}
> There are a number of problems:
> # The check-and-set for adding an entry {{bridges}} is not atomic.  As a result, \
> concurrent attempts to add a bridge to the same URL can be allowed to proceed to \
> bridge creation.  Only one of the bridges will be established (the other will fail \
> when it attempts to add duplicate connections); however, the failure of the second \
> bridge will result in a call to {{onServiceRemove(...)}} that will remove the \
> single/shared {{bridges}} entry. # The bridge is started before an entry is added \
> to {{bridges}}.  Since bridges are multi-threaded, it is possible that an exception \
> will be handled by a different thread at some time between {{bridge.start()}} and \
> {{bridges.put(uri, bridge}}.  In processing this exception, the thread will call \
> {{onServiceRemove(...}}, which will remove the (non-existent) {{bridges}} entry.  \
> Subseqently, {{bridges.put(uri, bridge)}} is executed, and adds the entry to \
> {{bridges}} even though it is now stale and represents a failed bridge.  Subsequent \
> attempts to restore the bridge will be ignored, and no active bridge will be \
> created. The lack of thread-safety in {{DiscoveryNetworkConnector}} is exacerbated \
> by AMQ-4159, which can result in many concurrent attempts to establish a bridge to \
> the same URL.  AMQ-4159 also described how multiple calls can be made to \
> {{onServiceRemove(...)}}, which can result in mal-behaviour similar to the second \
> case described above (i.e., unexpected removal of a bridge that is active). \
> Solution ========
> The attached patch attempts to restore some thread-safety to \
> {{DiscoveryNetworkConnector}} by making the check-and-set atomic and adding the \
> entry to {{bridges}} prior to starting the bridge.  An additional structure, \
> {{activeEvents}}, keeps track of the event that represents the current attempt to \
> establish a bridge to a given remote URL --- this is used to prevent multiple calls \
> to {{onServiceRemove(...)}} from removing a {{bridges}} entry that was *not* added \
> by the corresponding discovery event.   This patch is a band aid to the design \
> flaws in {{DiscoveryNetworkConnector}}, and a refactoring of the connector should \
> be considered.  In particular, there is a tight-coupling between \
> {{DiscoveryNetworkConnector}} and {{SimpleDiscoveryAgent}} that is not evident \
> through their interfaces.  For example, {{DiscoveryNetworkConnector}} assumes that \
> the call to {{discoveryAgent.serviceFailed(...)}} will result in a call back to \
> {{DiscoveryNetworkConnector.onServiceRemove(...)}}.  The call to \
> {{onServiceRemove(...)}} is necessary to clean up the resources used by the bridge, \
> but that requirement is not explicit, and therefore easily missed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic