[prev in list] [next in list] [prev in thread] [next in thread]
List: openjdk-nio-dev
Subject: Re: A race problem about select in a small time window
From: Rob McKenna <rob.mckenna () oracle ! com>
Date: 2013-02-25 22:18:51
Message-ID: 512BE34B.2090105 () oracle ! com
[Download RAW message or body]
Apologies for the delay Sean,
Just to be clear: it appears we're neglecting to add the channel to the
EPollArrayWrapper idle set as we're depending on ch.isOpen() to
determine whether we need to or not.
It makes sense to me that anything with u.events == 0 should indeed be
put into the idle list regardless of whether it "isOpen". It will always
be removed later by release, and both release and updateRegistrations
are synced on updateList.
I'll try to get ahold of Alan tomorrow just to run it past him (in any
case, you'll need him to review this as I'm not a reviewer) and to
figure out if I'm missing something with the ch.isOpen().
-Rob
On 19/02/13 08:30, Sean Chou wrote:
> Hi Rob,
>
> Is there any progress ?
>
>
> On Tue, Jan 15, 2013 at 5:10 AM, Rob McKenna <rob.mckenna@oracle.com
> <mailto:rob.mckenna@oracle.com>> wrote:
>
> Apologies folks, I managed to overlook this completely. Sean, its
> on my radar and I'll get back to you soon.
>
> -Rob
>
>
> On 21/12/12 15:54, Alan Bateman wrote:
>
>
> I don't have cycles to look at this one (too much going on for
> M6) but Rob McKenna (cc'ed) might.
>
> On 17/12/2012 08:56, Sean Chou wrote:
>
> Hello ,
>
> This is the detail problem, there is a small time window
> in which a 3 threads race makes select() always return 0
> without blocking.
>
> I wrote a
> testcase(http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>) which
> needs to modify the lib code to reproduce, because the
> time windows is small.
>
> The reproduce scenario is described in follow, use Tx for
> thread x:
>
> 1. T1 (the user code) is selecting a channel(suppose C),
> it just returns from native select function, and niolib
> select method is checking if the returned channel is
> interested in the event, then 2 happens;
> 2. T2 is closing channel C, it just set the open variable
> to false but not yet closed the channel actually, and then
> 3 happens;
> 3. T3 set the interedOps of the channel to 0. // 0 means
> the channel is not interested in anything, the channel
> will be put into cancel list normally.
>
> In this senario, T1 returns from select, and return 0
> which means no channel is selected(because the channel C
> returned from native invocation has nothing insterested
> in, it is not returned to application). Then T1 goes to
> invoke select again(usually in a loop, this is how select
> is designed to be used). In normal case, select method
> checks if any channels those should be cancelled and
> remove them from the set to be selected. Then, goes to
> native select function.
>
> The problem is: select method first checks if the channel
> is closed, if it is closed, select method doesn't put it
> into cancel list.
>
> In above senario, channel C is in close state, but not
> closed indeed, and setInteredOps to 0(which means cancel).
> So select method doesn't put C into cancel list(due to the
> problem) which means the native select set still contains
> channel C . So the native select always return C and nio
> select always return 0. Until the channel is finally closed.
>
>
> The testcase:
> http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>
>
> A working fix:
> http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev_fix/
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/> \
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/>
>
> Please have a look.
>
>
>
>
>
>
>
> --
> Best Regards,
> Sean Chou
[Attachment #3 (text/html)]
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Apologies for the delay Sean,<br>
<br>
Just to be clear: it appears we're neglecting to add the channel to
the EPollArrayWrapper idle set as we're depending on ch.isOpen() to
determine whether we need to or not.<br>
<br>
It makes sense to me that anything with u.events == 0 should indeed
be put into the idle list regardless of whether it "isOpen". It will
always be removed later by release, and both release and
updateRegistrations are synced on updateList. <br>
<br>
I'll try to get ahold of Alan tomorrow just to run it past him (in
any case, you'll need him to review this as I'm not a reviewer) and
to figure out if I'm missing something with the ch.isOpen(). <br>
<br>
-Rob<br>
<br>
<div class="moz-cite-prefix">On 19/02/13 08:30, Sean Chou wrote:<br>
</div>
<blockquote
cite="mid:CAOo=rxU_imzantAptjtDB2tE--ZvsDGwrC-mHQCOn86YBkGDHg@mail.gmail.com"
type="cite">
<div dir="ltr">Hi Rob,
<div><br>
</div>
<div style="">Is there any progress ?</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Tue, Jan 15, 2013 at 5:10 AM, Rob
McKenna <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:rob.mckenna@oracle.com" \
target="_blank">rob.mckenna@oracle.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Apologies
folks, I managed to overlook this completely. Sean, its on
my radar and I'll get back to you soon.<span class="HOEnZb"><font
color="#888888"><br>
<br>
-Rob</font></span>
<div class="HOEnZb">
<div class="h5"><br>
<br>
On 21/12/12 15:54, Alan Bateman wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
I don't have cycles to look at this one (too much
going on for M6) but Rob McKenna (cc'ed) might.<br>
<br>
On 17/12/2012 08:56, Sean Chou wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello ,<br>
<br>
This is the detail problem, there is a small time
window in which a 3 threads race makes select()
always return 0 without blocking.<br>
<br>
I wrote a testcase(<a moz-do-not-send="true"
href="http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/"
\
target="_blank">http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/</a> <<a \
moz-do-not-send="true"
href="http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/"
\
target="_blank">http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/</a>>) \
which needs to modify the lib code to reproduce, because the time windows is \
small.<br> <br>
The reproduce scenario is described in follow, use
Tx for thread x:<br>
<br>
1. T1 (the user code) is selecting a channel(suppose
C), it just returns from native select function, and
niolib select method is checking if the returned
channel is interested in the event, then 2 happens;<br>
2. T2 is closing channel C, it just set the open
variable to false but not yet closed the channel
actually, and then 3 happens;<br>
3. T3 set the interedOps of the channel to 0. // 0
means the channel is not interested in anything, the
channel will be put into cancel list normally.<br>
<br>
In this senario, T1 returns from select, and return
0 which means no channel is selected(because the
channel C returned from native invocation has
nothing insterested in, it is not returned to
application). Then T1 goes to invoke select
again(usually in a loop, this is how select is
designed to be used). In normal case, select method
checks if any channels those should be cancelled and
remove them from the set to be selected. Then, goes
to native select function.<br>
<br>
The problem is: select method first checks if the
channel is closed, if it is closed, select method
doesn't put it into cancel list.<br>
<br>
In above senario, channel C is in close state, but
not closed indeed, and setInteredOps to 0(which
means cancel). So select method doesn't put C into
cancel list(due to the problem) which means the
native select set still contains channel C . So the
native select always return C and nio select always
return 0. Until the channel is finally closed.<br>
<br>
<br>
The testcase: <a moz-do-not-send="true"
href="http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/"
\
target="_blank">http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/</a> <<a \
moz-do-not-send="true"
href="http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/"
\
target="_blank">http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/</a>><br> \
<br> A working fix: <a moz-do-not-send="true"
\
href="http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/"
\
target="_blank">http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev_fix/</a> <<a \
moz-do-not-send="true"
\
href="http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/"
\
target="_blank">http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/</a>><br> \
<br> <br>
Please have a look.<br>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Best Regards,<br>
Sean Chou<br>
</div>
</blockquote>
<br>
</body>
</html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic