[prev in list] [next in list] [prev in thread] [next in thread] 

List:       asterisk-dev
Subject:    Re: [asterisk-dev] [Code Review] 2961: Add channel locking for calls that result in channel snapshot
From:       "rmudgett" <reviewboard () asterisk ! org>
Date:       2013-11-26 18:49:13
Message-ID: 20131126184913.25214.38034 () sonic ! digium ! api
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/2961/#review10285
-----------------------------------------------------------

Ship it!


Ship It!

- rmudgett


On Nov. 26, 2013, 6:13 p.m., Mark Michelson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/2961/
> -----------------------------------------------------------
> 
> (Updated Nov. 26, 2013, 6:13 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Bugs: ASTERISK-22709
> https://issues.asterisk.org/jira/browse/ASTERISK-22709
> 
> 
> Repository: Asterisk
> 
> 
> Description
> -------
> 
> John Bigelow discovered that if the newly-created atxfer_threeway_nominal test is \
> run in a loop, eventually a test run will cause the UUT instance of Asterisk to \
> crash. Investigation of the crash showed that the crash was being caused while \
> trying to get the string representation of a channel's translation paths while \
> creating a channel snapshot. The attended transfer manager thread was creating the \
> channel snapshot as part of the attended transfer stasis message while in another \
> thread (in this case, the bridge channel thread) the translation paths were being \
> altered to make the channel compatible with others in the bridge. 
> It's clear that the issue is that the channel's contents are not lock protected \
> while creating a channel snapshot. Luckily, the channel's lock is held already when \
> updating translation paths. By adding locking, I was able to run the test over 450 \
> times straight with no negative consequences. 
> When adding locking, there were two strategies to follow:
> 
> 1) Take advantage of the fact that channel locks are recursive, and just add a \
> single SCOPED_CHANNELLOCK() call to ast_channel_snapshot_create() and call it a \
> day. 2) Make an audit of channel-related calls that end up creating a channel \
> snapshot and require locks be held prior to making those calls. 
> I like approach 2 more, even though it's harder to pull off. In investigating what \
> paths result in channel snapshots being created, I found that channel locks should \
> be required before calling the following functions: 
> * ast_channel_snapshot_create()
> * ast_channel_blob_create()
> * ast_channel_publish_snapshot()
> * ast_publish_channel_state()
> * ast_channel_publish_blob()
> * ast_channel_publish_varset()
> * ast_channel_amaflags_set()
> * ast_channel_callid_set()
> * ast_channel_whentohangup_set()
> * ast_channel_callgroup_set()
> * ast_channel_pickupgroup_set()
> * ast_channel_internal_bridge_set()
> * ast_channel_language_set()
> * ast_channel_accountcode_set()
> * ast_channel_peeraccount_set()
> * ast_channel_linkedid_set()
> * ast_channel_stage_snapshot()
> * ast_setstate()
> * ast_aoc_manager_event()
> * ast_channel_setwhentohangup_tv()
> * ast_channel_setwhentohangup()
> * ast_set_variables()
> 
> All of these functions' documentation have been updated to indicate a precondition \
> of having the channel locked. 
> In addition, there are some more complex functions that create channel snapshots \
> but could not easily be made to have the precondition of having the channel locked. \
> These are: 
> * ast_bridge_blob_create()
> * ast_publish_attended_transfer_fail()
> * ast_publish_attended_transfer_bridge_merge()
> * ast_publish_attended_transfer_threeway()
> * ast_publish_attended_transfer_app()
> * ast_publish_attended_transfer_link()
> * ast_bridge_publish_enter()
> * ast_bridge_publish_leave()
> * ast_bridge_publish_blind_transfer()
> 
> These functions' documentation now have a precondition of having no channels or \
> bridges locked. 
> This review seeks only to make sure that channels are properly locked prior to \
> creating channel snapshots. A similar effort is likely needed to ensure that \
> bridges are locked during bridge snapshot creation. 
> 
> Diffs
> -----
> 
> /branches/12/tests/test_stasis_channels.c 402818 
> /branches/12/tests/test_cel.c 402818 
> /branches/12/tests/test_cdr.c 402818 
> /branches/12/res/res_stasis.c 402818 
> /branches/12/res/res_pjsip_refer.c 402818 
> /branches/12/res/res_agi.c 402818 
> /branches/12/res/parking/parking_manager.c 402818 
> /branches/12/res/parking/parking_bridge_features.c 402818 
> /branches/12/pbx/pbx_realtime.c 402818 
> /branches/12/main/stasis_channels.c 402818 
> /branches/12/main/stasis_bridges.c 402818 
> /branches/12/main/pickup.c 402818 
> /branches/12/main/pbx.c 402818 
> /branches/12/main/endpoints.c 402818 
> /branches/12/main/dial.c 402818 
> /branches/12/main/core_unreal.c 402818 
> /branches/12/main/core_local.c 402818 
> /branches/12/main/channel.c 402818 
> /branches/12/main/cel.c 402818 
> /branches/12/main/bridge_channel.c 402818 
> /branches/12/main/bridge.c 402818 
> /branches/12/include/asterisk/stasis_channels.h 402818 
> /branches/12/include/asterisk/stasis_bridges.h 402818 
> /branches/12/include/asterisk/channelstate.h 402818 
> /branches/12/include/asterisk/channel.h 402818 
> /branches/12/include/asterisk/aoc.h 402818 
> /branches/12/funcs/func_timeout.c 402818 
> /branches/12/channels/sig_pri.c 402818 
> /branches/12/channels/sig_analog.c 402818 
> /branches/12/channels/chan_vpb.cc 402818 
> /branches/12/channels/chan_unistim.c 402818 
> /branches/12/channels/chan_skinny.c 402818 
> /branches/12/channels/chan_sip.c 402818 
> /branches/12/channels/chan_pjsip.c 402818 
> /branches/12/channels/chan_phone.c 402818 
> /branches/12/channels/chan_oss.c 402818 
> /branches/12/channels/chan_nbs.c 402818 
> /branches/12/channels/chan_motif.c 402818 
> /branches/12/channels/chan_misdn.c 402818 
> /branches/12/channels/chan_mgcp.c 402818 
> /branches/12/channels/chan_jingle.c 402818 
> /branches/12/channels/chan_iax2.c 402818 
> /branches/12/channels/chan_h323.c 402818 
> /branches/12/channels/chan_gtalk.c 402818 
> /branches/12/channels/chan_dahdi.c 402818 
> /branches/12/channels/chan_console.c 402818 
> /branches/12/channels/chan_alsa.c 402818 
> /branches/12/apps/app_voicemail.c 402818 
> /branches/12/apps/app_userevent.c 402818 
> /branches/12/apps/app_queue.c 402818 
> /branches/12/apps/app_meetme.c 402818 
> /branches/12/apps/app_disa.c 402818 
> /branches/12/apps/app_dial.c 402818 
> /branches/12/apps/app_confbridge.c 402818 
> /branches/12/apps/app_agent_pool.c 402818 
> /branches/12/addons/chan_ooh323.c 402818 
> /branches/12/addons/chan_mobile.c 402818 
> 
> Diff: https://reviewboard.asterisk.org/r/2961/diff/
> 
> 
> Testing
> -------
> 
> As stated in the description, I ran the atxfer_threeway_nominal test in a loop. \
> After approximately 450 test runs, there were no negative consequences. Prior to \
> this changeset, running the test 20-50 times would result in a crash. 
> 
> Thanks,
> 
> Mark Michelson
> 
> 


[Attachment #5 (text/html)]

<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;">  <tr>
     <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="https://reviewboard.asterisk.org/r/2961/">https://reviewboard.asterisk.org/r/2961/</a>
  </td>
    </tr>
   </table>
   <br />



 <p>Ship it!</p>



 <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Ship It!</pre>  <br />









<p>- rmudgett</p>


<br />
<p>On November 26th, 2013, 6:13 p.m. UTC, Mark Michelson wrote:</p>








<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('https://reviewboard.asterisk.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;">  <tr>
  <td>

<div>Review request for Asterisk Developers.</div>
<div>By Mark Michelson.</div>


<p style="color: grey;"><i>Updated Nov. 26, 2013, 6:13 p.m.</i></p>







<div style="margin-top: 1.5em;">
 <b style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Bugs: </b>


 <a href="https://issues.asterisk.org/jira/browse/ASTERISK-22709">ASTERISK-22709</a>


</div>



<div style="margin-top: 1.5em;">
 <b style="color: #575012; font-size: 10pt;">Repository: </b>
Asterisk
</div>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">John Bigelow discovered that if the newly-created \
atxfer_threeway_nominal test is run in a loop, eventually a test run will cause the \
UUT instance of Asterisk to crash. Investigation of the crash showed that the crash \
was being caused while trying to get the string representation of a channel&#39;s \
translation paths while creating a channel snapshot. The attended transfer manager \
thread was creating the channel snapshot as part of the attended transfer stasis \
message while in another thread (in this case, the bridge channel thread) the \
translation paths were being altered to make the channel compatible with others in \
the bridge.

It&#39;s clear that the issue is that the channel&#39;s contents are not lock \
protected while creating a channel snapshot. Luckily, the channel&#39;s lock is held \
already when updating translation paths. By adding locking, I was able to run the \
test over 450 times straight with no negative consequences.

When adding locking, there were two strategies to follow:

1) Take advantage of the fact that channel locks are recursive, and just add a single \
SCOPED_CHANNELLOCK() call to ast_channel_snapshot_create() and call it a day. 2) Make \
an audit of channel-related calls that end up creating a channel snapshot and require \
locks be held prior to making those calls.

I like approach 2 more, even though it&#39;s harder to pull off. In investigating \
what paths result in channel snapshots being created, I found that channel locks \
should be required before calling the following functions:

* ast_channel_snapshot_create()
* ast_channel_blob_create()
* ast_channel_publish_snapshot()
* ast_publish_channel_state()
* ast_channel_publish_blob()
* ast_channel_publish_varset()
* ast_channel_amaflags_set()
* ast_channel_callid_set()
* ast_channel_whentohangup_set()
* ast_channel_callgroup_set()
* ast_channel_pickupgroup_set()
* ast_channel_internal_bridge_set()
* ast_channel_language_set()
* ast_channel_accountcode_set()
* ast_channel_peeraccount_set()
* ast_channel_linkedid_set()
* ast_channel_stage_snapshot()
* ast_setstate()
* ast_aoc_manager_event()
* ast_channel_setwhentohangup_tv()
* ast_channel_setwhentohangup()
* ast_set_variables()

All of these functions&#39; documentation have been updated to indicate a \
precondition of having the channel locked.

In addition, there are some more complex functions that create channel snapshots but \
could not easily be made to have the precondition of having the channel locked. These \
are:

* ast_bridge_blob_create()
* ast_publish_attended_transfer_fail()
* ast_publish_attended_transfer_bridge_merge()
* ast_publish_attended_transfer_threeway()
* ast_publish_attended_transfer_app()
* ast_publish_attended_transfer_link()
* ast_bridge_publish_enter()
* ast_bridge_publish_leave()
* ast_bridge_publish_blind_transfer()

These functions&#39; documentation now have a precondition of having no channels or \
bridges locked.

This review seeks only to make sure that channels are properly locked prior to \
creating channel snapshots. A similar effort is likely needed to ensure that bridges \
are locked during bridge snapshot creation.</pre>  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">As stated in the description, I ran the atxfer_threeway_nominal test in \
a loop. After approximately 450 test runs, there were no negative consequences. Prior \
to this changeset, running the test 20-50 times would result in a crash.</pre>  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">

 <li>/branches/12/tests/test_stasis_channels.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/tests/test_cel.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/tests/test_cdr.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/res/res_stasis.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/res/res_pjsip_refer.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/res/res_agi.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/res/parking/parking_manager.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/res/parking/parking_bridge_features.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/pbx/pbx_realtime.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/main/stasis_channels.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/main/stasis_bridges.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/main/pickup.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/main/pbx.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/main/endpoints.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/main/dial.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/main/core_unreal.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/main/core_local.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/main/channel.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/main/cel.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/main/bridge_channel.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/main/bridge.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/include/asterisk/stasis_channels.h <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/include/asterisk/stasis_bridges.h <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/include/asterisk/channelstate.h <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/include/asterisk/channel.h <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/include/asterisk/aoc.h <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/funcs/func_timeout.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/channels/sig_pri.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/channels/sig_analog.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/channels/chan_vpb.cc <span style="color: grey">(402818)</span></li>

 <li>/branches/12/channels/chan_unistim.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/channels/chan_skinny.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/channels/chan_sip.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/channels/chan_pjsip.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/channels/chan_phone.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/channels/chan_oss.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/channels/chan_nbs.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/channels/chan_motif.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/channels/chan_misdn.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/channels/chan_mgcp.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/channels/chan_jingle.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/channels/chan_iax2.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/channels/chan_h323.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/channels/chan_gtalk.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/channels/chan_dahdi.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/channels/chan_console.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/channels/chan_alsa.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/apps/app_voicemail.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/apps/app_userevent.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/apps/app_queue.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/apps/app_meetme.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/apps/app_disa.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/apps/app_dial.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/apps/app_confbridge.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/apps/app_agent_pool.c <span style="color: \
grey">(402818)</span></li>

 <li>/branches/12/addons/chan_ooh323.c <span style="color: grey">(402818)</span></li>

 <li>/branches/12/addons/chan_mobile.c <span style="color: grey">(402818)</span></li>

</ul>

<p><a href="https://reviewboard.asterisk.org/r/2961/diff/" style="margin-left: \
3em;">View Diff</a></p>







  </td>
 </tr>
</table>








  </div>
 </body>
</html>



-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic