[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-core-libs-dev
Subject:    Re: RFR(s): 8228580: DnsClient TCP socket timeout
From:       Pavel Rappo <pavel.rappo () oracle ! com>
Date:       2019-09-24 22:05:03
Message-ID: B476CC38-35CA-4A1D-B8F2-6D8E118AEB1D () oracle ! com
[Download RAW message or body]

Milan,

I satisfied myself by running the final version of the test some 8k times and then \
pushed the change. Thanks for your patience and persistence. I saw your question on \
the net-dev and nio-dev mailing lists. Thanks.

-Pavel

> On 24 Sep 2019, at 13:41, Milan Mimica <milan.mimica@gmail.com> wrote:
> 
> Pavel,
> 
> Deal. Handling early returns too:
> http://cr.openjdk.java.net/~mmimica/8228580/webrev.05/
> I will ask there about socket timeout semantics.
> 
> On Tue, 24 Sep 2019 at 12:51, Pavel Rappo <pavel.rappo@oracle.com> wrote:
> > 
> > Milan,
> > 
> > Thanks for looking into this. I think you should ask a question on the expected \
> > timing semantics and guarantees on net-dev (with maybe a cc to nio-dev). As for \
> > our test. I agree with you that we should simply work a possibility of early \
> > returns into the check. 
> > ...
> > 
> > /* The acceptable variation of early returns from timed socket operations. */
> > private static final long PREMATURE_RETURN = adjustTimeout(100);
> > 
> > ...
> > 
> > long elapsed = NANOSECONDS.toMillis(System.nanoTime() - startNanos);
> > if (elapsed < timeout - PREMATURE_RETURN || elapsed > timeout + TOLERANCE) {
> > String msg = String.format(
> > "elapsed=%s, timeout=%s, TOLERANCE=%s, PREMATURE_RETURN=%s",
> > timeout, timeout, TOLERANCE, PREMATURE_RETURN);
> > throw new RuntimeException(msg);
> > }
> > 
> > ...
> > 
> > Thoughts?
> > 
> > -Pavel
> > 
> > > On 24 Sep 2019, at 09:12, Milan Mimica <milan.mimica@gmail.com> wrote:
> > > 
> > > Hi Pavel
> > > 
> > > Wow, I find this awesome. I don't have a Windows machine to play with,
> > > but I think I may have found something.
> > > The difference is how Java_sun_nio_ch_Net_poll is implemented. On unix
> > > it uses poll(2), on Windows it uses select(2). Regarding timeouts,
> > > poll() has "wait at least" semantics and overruns by design[1], while
> > > select() on windows has "waits at most" semantics, or how they put
> > > it[2]: "specifies the maximum time that select should wait before
> > > returning.". It returns early by design! Is this a known thing? I
> > > don't think there is much one can do here. It probably makes no sense
> > > to loop it and wait for time remainder.
> > > Java's soTimeout does not specify[3] should it wait at least or at
> > > most the specified timeout, so it's fine I guess. Old, "plain" socket
> > > impl are not much different.
> > > 
> > > If the above is correct, should I just add a tolerance for the lower bound?
> > > 
> > > [1] http://man7.org/linux/man-pages/man2/poll.2.html
> > > [2] https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select
> > >  [3] https://docs.oracle.com/javase/9/docs/api/java/net/Socket.html#setSoTimeout-int-
> > >  
> > > On Mon, 23 Sep 2019 at 16:15, Pavel Rappo <pavel.rappo@oracle.com> wrote:
> > > > 
> > > > Milan,
> > > > 
> > > > I'm observing the latest version (.04) of this test failing quite frequently \
> > > > (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) machines. The test \
> > > > passes fine on macOS and Linux. Here's the typical output I see in the logs: 
> > > > java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000
> > > > java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000
> > > > java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000
> > > > java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000
> > > > ...
> > > > 
> > > > Now, there might be many reasons for that. One of which would be that the \
> > > > DnsClient code is buggy. The other reason would be that the accuracy \
> > > > guaranteed by Windows implementation of `read` is not what we would expect. \
> > > > Would you be able to investigate that? 
> > > > P.S. The good news is that the CSR has been approved:
> > > > 
> > > > https://bugs.openjdk.java.net/browse/JDK-8230965
> > > > 
> > > > > On 23 Sep 2019, at 14:20, Milan Mimica <milan.mimica@gmail.com> wrote:
> > > > > 
> > > > > Got it. Thanks Pavel!
> > > > > 
> > > > > 
> > > > > On Mon, 23 Sep 2019 at 13:37, Pavel Rappo <pavel.rappo@oracle.com> wrote:
> > > > > > 
> > > > > > Milan,
> > > > > > 
> > > > > > How do you check which tests are run? That's what I see in the \
> > > > > > /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr \
> > > > > > file after I have run the test locally on my machine: 
> > > > > > ----------messages:(5/233)----------
> > > > > > command: main TcpTimeout
> > > > > > reason: User specified action: run main TcpTimeout
> > > > > > Mode: othervm
> > > > > > Additional options from @modules: --add-modules java.base --add-exports \
> > > > > > java.base/sun.security.util=ALL-UNNAMED elapsed time (seconds): 1.751
> > > > > > 
> > > > > > ...
> > > > > > 
> > > > > > ----------messages:(5/313)----------
> > > > > > command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
> > > > > > reason: User specified action: run main TcpTimeout \
> > > > > >                 -Dcom.sun.jndi.dns.timeout.initial=5000
> > > > > > Mode: othervm
> > > > > > Additional options from @modules: --add-modules java.base --add-exports \
> > > > > > java.base/sun.security.util=ALL-UNNAMED elapsed time (seconds): 5.498
> > > > > > 
> > > > > > ------------------------------------
> > > > > > 
> > > > > > Which is consistent with what I would expect given the timeout values.
> > > > > > 
> > > > > > The following output does not tell the full story, just the name of the \
> > > > > > test: 
> > > > > > ==============================
> > > > > > Test summary
> > > > > > ==============================
> > > > > > TEST                                              TOTAL  PASS  FAIL ERROR
> > > > > > jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java
> > > > > > 1     1     0     0
> > > > > > ==============================
> > > > > > TEST SUCCESS
> > > > > > 
> > > > > > -Pavel
> > > > > > 
> > > > > > > On 20 Sep 2019, at 15:42, Milan Mimica <milan.mimica@gmail.com> wrote:
> > > > > > > 
> > > > > > > Pavel,
> > > > > > > 
> > > > > > > Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/
> > > > > > > I don't see the test is run twice when I execute "make test
> > > > > > > TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am
> > > > > > > I missing something?
> > > > > > > 
> > > > > 
> > > > > 
> > > > > --
> > > > > Milan Mimica
> > > > 
> > > 
> > > 
> > > --
> > > Milan Mimica
> > 
> 
> 
> -- 
> Milan Mimica


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic