[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-nio-dev
Subject:    Re: A race problem about select in a small time window
From:       Rob McKenna <rob.mckenna () oracle ! com>
Date:       2013-02-25 22:18:51
Message-ID: 512BE34B.2090105 () oracle ! com
[Download RAW message or body]

Apologies for the delay Sean,

Just to be clear: it appears we're neglecting to add the channel to the 
EPollArrayWrapper idle set as we're depending on ch.isOpen() to 
determine whether we need to or not.

It makes sense to me that anything with u.events == 0 should indeed be 
put into the idle list regardless of whether it "isOpen". It will always 
be removed later by release, and both release and updateRegistrations 
are synced on updateList.

I'll try to get ahold of Alan tomorrow just to run it past him (in any 
case, you'll need him to review this as I'm not a reviewer) and to 
figure out if I'm missing something with the ch.isOpen().

     -Rob

On 19/02/13 08:30, Sean Chou wrote:
> Hi Rob,
> 
> Is there any progress ?
> 
> 
> On Tue, Jan 15, 2013 at 5:10 AM, Rob McKenna <rob.mckenna@oracle.com 
> <mailto:rob.mckenna@oracle.com>> wrote:
> 
> Apologies folks, I managed to overlook this completely. Sean, its
> on my radar and I'll get back to you soon.
> 
> -Rob
> 
> 
> On 21/12/12 15:54, Alan Bateman wrote:
> 
> 
> I don't have cycles to look at this one (too much going on for
> M6) but Rob McKenna (cc'ed) might.
> 
> On 17/12/2012 08:56, Sean Chou wrote:
> 
> Hello ,
> 
> This is the detail problem, there is a small time window
> in which a 3 threads race makes select() always return 0
> without blocking.
> 
> I wrote a
> testcase(http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>) which
> needs to modify the lib code to reproduce, because the
> time windows is small.
> 
> The reproduce scenario is described in follow, use Tx for
> thread x:
> 
> 1. T1 (the user code) is selecting a channel(suppose C),
> it just returns from native select function, and niolib
> select method is checking if the returned channel is
> interested in the event, then 2 happens;
> 2. T2 is closing channel C, it just set the open variable
> to false but not yet closed the channel actually, and then
> 3 happens;
> 3. T3 set the interedOps of the channel to 0. // 0 means
> the channel is not interested in anything, the channel
> will be put into cancel list normally.
> 
> In this senario, T1 returns from select, and return 0
> which means no channel is selected(because the channel C
> returned from native invocation has nothing insterested
> in, it is not returned to application). Then T1 goes to
> invoke select again(usually in a loop, this is how select
> is designed to be used). In normal case, select method
> checks if any channels those should be cancelled and
> remove them from the set to be selected. Then, goes to
> native select function.
> 
> The problem is: select method first checks if the channel
> is closed, if it is closed, select method doesn't put it
> into cancel list.
> 
> In above senario, channel C is in close state, but not
> closed indeed, and setInteredOps to 0(which means cancel).
> So select method doesn't put C into cancel list(due to the
> problem) which means the native select set still contains
> channel C . So the native select always return C and nio
> select always return 0. Until the channel is finally closed.
> 
> 
> The testcase:
> http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>
> 
> A working fix:
> http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev_fix/
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/> \
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/> 
> 
> Please have a look.
> 
> 
> 
> 
> 
> 
> 
> -- 
> Best Regards,
> Sean Chou


[Attachment #3 (text/html)]

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Apologies for the delay Sean,<br>
    <br>
    Just to be clear: it appears we're neglecting to add the channel to
    the EPollArrayWrapper idle set as we're depending on ch.isOpen() to
    determine whether we need to or not.<br>
    <br>
    It makes sense to me that anything with u.events == 0 should indeed
    be put into the idle list regardless of whether it "isOpen". It will
    always be removed later by release, and both release and
    updateRegistrations are synced on updateList. <br>
    <br>
    I'll try to get ahold of Alan tomorrow just to run it past him (in
    any case, you'll need him to review this as I'm not a reviewer) and
    to figure out if I'm missing something with the ch.isOpen(). <br>
    <br>
    &nbsp;&nbsp;&nbsp; -Rob<br>
    <br>
    <div class="moz-cite-prefix">On 19/02/13 08:30, Sean Chou wrote:<br>
    </div>
    <blockquote
cite="mid:CAOo=rxU_imzantAptjtDB2tE--ZvsDGwrC-mHQCOn86YBkGDHg@mail.gmail.com"
      type="cite">
      <div dir="ltr">Hi Rob,
        <div><br>
        </div>
        <div style="">Is there any progress ?</div>
      </div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On Tue, Jan 15, 2013 at 5:10 AM, Rob
          McKenna <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:rob.mckenna@oracle.com" \
target="_blank">rob.mckenna@oracle.com</a>&gt;</span>  wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">Apologies
            folks, I managed to overlook this completely. Sean, its on
            my radar and I'll get back to you soon.<span class="HOEnZb"><font
                color="#888888"><br>
                <br>
                &nbsp; &nbsp; -Rob</font></span>
            <div class="HOEnZb">
              <div class="h5"><br>
                <br>
                On 21/12/12 15:54, Alan Bateman wrote:<br>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <br>
                  I don't have cycles to look at this one (too much
                  going on for M6) but Rob McKenna (cc'ed) might.<br>
                  <br>
                  On 17/12/2012 08:56, Sean Chou wrote:<br>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    Hello ,<br>
                    <br>
                    This is the detail problem, there is a small time
                    window in which a 3 threads race makes select()
                    always return 0 without blocking.<br>
                    <br>
                    I wrote a testcase(<a moz-do-not-send="true"
                      href="http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/"
                      \
target="_blank">http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/</a>  &lt;<a \
                moz-do-not-send="true"
                      href="http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/"
                      \
target="_blank">http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/</a>&gt;)  \
which needs to modify the lib code to reproduce,  because the time windows is \
small.<br>  <br>
                    The reproduce scenario is described in follow, use
                    Tx for thread x:<br>
                    <br>
                    1. T1 (the user code) is selecting a channel(suppose
                    C), it just returns from native select function, and
                    niolib select method is checking if the returned
                    channel is interested in the event, then 2 happens;<br>
                    2. T2 is closing channel C, it just set the open
                    variable to false but not yet closed the channel
                    actually, and then 3 happens;<br>
                    3. T3 set the interedOps of the channel to 0. // 0
                    means the channel is not interested in anything, the
                    channel will be put into cancel list normally.<br>
                    <br>
                    In this senario, T1 returns from select, and return
                    0 which means no channel is selected(because the
                    channel C returned from native invocation has
                    nothing insterested in, it is not returned to
                    application). Then T1 goes to invoke select
                    again(usually in a loop, this is how select is
                    designed to be used). In normal case, select method
                    checks if any channels those should be cancelled and
                    remove them from the set to be selected. Then, goes
                    to native select function.<br>
                    <br>
                    The problem is: select method first checks if the
                    channel is closed, if it is closed, select method
                    doesn't put it into cancel list.<br>
                    <br>
                    In above senario, channel C is in close state, but
                    not closed indeed, and setInteredOps to 0(which
                    means cancel). So select method doesn't put C into
                    cancel list(due to the problem) which means the
                    native select set still contains channel C . So the
                    native select always return C and nio select always
                    return 0. Until the channel is finally closed.<br>
                    <br>
                    <br>
                    The testcase: <a moz-do-not-send="true"
                      href="http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/"
                      \
target="_blank">http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/</a>  &lt;<a \
                moz-do-not-send="true"
                      href="http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/"
                      \
target="_blank">http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/</a>&gt;<br>  \
<br>  A working fix: <a moz-do-not-send="true"
                      \
                href="http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/"
                      \
target="_blank">http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev_fix/</a>  &lt;<a \
                moz-do-not-send="true"
                      \
                href="http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/"
                      \
target="_blank">http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/</a>&gt;<br> \
<br>  <br>
                    Please have a look.<br>
                    <br>
                    <br>
                  </blockquote>
                  <br>
                </blockquote>
                <br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        Best Regards,<br>
        Sean Chou<br>
      </div>
    </blockquote>
    <br>
  </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic