'Re: RFR 8240902: JDI shared memory connector can use already closed Handles'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: RFR 8240902: JDI shared memory connector can use already closed Handles
From:       "serguei.spitsyn () oracle ! com" <serguei ! spitsyn () oracle ! com>
Date:       2020-03-23 18:22:38
Message-ID: 68e2e091-7bfd-8501-f8bf-60b55d64b9af () oracle ! com
[Download RAW message or body]

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">Hi Patricio,<br>
      <br>
      The update looks good.<br>
      <br>
      Thanks,<br>
      Serguei<br>
      <br>
      <br>
      On 3/18/20 23:18, Patricio Chilano wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:e1a2d907-3907-0f0e-31f3-221338549a5e@oracle.com"> Hi
      David,<br>
      <br>
      <div class="moz-cite-prefix">On 3/18/20 8:10 PM, David Holmes
        wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:8928590e-6516-051e-aa84-098a6fdc9d45@oracle.com">Hi
        Patricio, <br>
        <br>
        On 19/03/2020 6:44 am, Patricio Chilano wrote: <br>
        <blockquote type="cite">Hi David, <br>
          <br>
          On 3/18/20 4:27 AM, David Holmes wrote: <br>
          <blockquote type="cite">Hi Patricio, <br>
            <br>
            On 18/03/2020 6:14 am, Patricio Chilano wrote: <br>
            <blockquote type="cite">Hi all, <br>
              <br>
              Please review the following patch: <br>
              <br>
              Bug: <a class="moz-txt-link-freetext"
                href="https://bugs.openjdk.java.net/browse/JDK-8240902"
                moz-do-not-send="true">https://bugs.openjdk.java.net/browse/JDK-8240902</a>
  <br>
              Webrev: <a class="moz-txt-link-freetext"
                href="http://cr.openjdk.java.net/~pchilanomate/8240902/v1/webrev/"
                moz-do-not-send="true">http://cr.openjdk.java.net/~pchilanomate/8240902/v1/webrev/</a>
  <br>
              <br>
              Calling closeConnection() on an already created/opened
              connection includes calls to CloseHandle() on objects that
              can still be used by other threads. This can lead to
              either undefined behavior or, as detailed in the bug
              comments, changes of state of unrelated objects. </blockquote>
            <br>
            This was a really great find! <br>
          </blockquote>
          Thanks!  : ) <br>
          <br>
          <blockquote type="cite">
            <blockquote type="cite">This issue was found while debugging
              the reason behind some jshell test failures seen after
              pushing 8230594. Not as important, but there are also
              calls to closeStream() from createStream()/openStream()
              when failing to create/open a stream that will return
              after executing "CHECK_ERROR(enterMutex(stream, NULL));"
              without closing the intended resources. Then, calling
              closeConnection() could assert if the reason of the
              previous failure was that the stream's mutex failed to be
              created/opened. These patch aims to address these issues
              too. <br>
            </blockquote>
            <br>
            Patch looks good in general. The internal reference count
            guards deletion of the internal resources, and is itself
            safe because never actually delete the connection. Thanks
            for adding the comment about this aspect. <br>
            <br>
            A few items: <br>
            <br>
            Please update copyright year before pushing. <br>
          </blockquote>
          Done. <br>
          <br>
          <blockquote type="cite">Please align
            ENTER_CONNECTION/LEAVE_CONNECTION macros the same way as
            STREAM_INVARIANT. <br>
          </blockquote>
          Done. <br>
          <br>
          <blockquote type="cite"> 170 unsigned int refcount; <br>
             171     jint state; <br>
            <br>
            I'm unclear about the use of stream-&gt;state and
            connection-&gt;state as guards - unless accessed under a
            mutex these would seem to at least need acquire/release
            semantics. <br>
            <br>
            Additionally the reads of refcount would also seem to need
            to some form of memory synchronization - though the Windows
            docs for the Interlocked* API does not show how to simply
            read such a variable! Though I note that the
            RtlFirstEntrySList method for the "Interlocked Singly Linked
            Lists" API does state "Access to the list is synchronized on
            a multiprocessor system." which suggests a read of such a
            variable does require some form of memory synchronization! <br>
          </blockquote>
          In the case of the stream struct, the state field is protected
          by the mutex field. It is set to STATE_CLOSED while holding
          the mutex, and threads that read it must acquire the mutex
          first through sysIPMutexEnter(). For the cases where
          sysIPMutexEnter() didn't acquire the mutex, we will return
          something different than SYS_OK and the call will exit
          anyways. All this behaves as before, I didn't change it. <br>
        </blockquote>
        <br>
        Thanks for clarifying. <br>
        <br>
        <blockquote type="cite">The refcount and state that I added to
          the SharedMemoryConnection struct work together. For a thread
          closing the connection, setting the connection state to
          STATE_CLOSED has to happen before reading the refcount (more
          on the atomicity of that read later). That's why I added the
          MemoryBarrier() call; which I see it's better if I just move
          it to after setting the connection state to closed. For the
          threads accessing the connection, incrementing the refcount
          has to happen before reading the connection state. That's
          already provided by the InterlockedIncrement() which uses a
          full memory barrier. In this way if the thread closing the
          connection reads a refcount of 0, then we know it's safe to
          release the resources, since other threads accessing the
          connection will see that the state is closed after
          incrementing the refcount. If the read of refcount is not 0,
          then it could be that a thread is accessing the connection or
          not (it could have read a state connection of STATE_CLOSED
          after incrementing the refcount), we don't know, so we can't
          release anything. Similarly if the thread accessing the
          connection reads that the state is not closed, then we know
          it's safe to access the stream since anybody closing the
          connection will still have to read refcount which will be at
          least 1. <br>
          As for the atomicity of the read of refcount, from
          <a class="moz-txt-link-freetext"
href="https://docs.microsoft.com/en-us/windows/win32/sync/interlocked-variable-access"
                
            moz-do-not-send="true">https://docs.microsoft.com/en-us/windows/win32/sync/interlocked-variable-access</a>,
  it states that "simple reads and writes to properly-aligned
          32-bit variables are atomic operations". Maybe I should
          declare refcount explicitly as DWORD32? <br>
        </blockquote>
        <br>
        It isn't the atomicity in question with the naked read but the
        visibility. Any latency in the visibility of the store done by
        the InterLocked*() function should be handled by the retry loop,
        but what is to stop the C++ compiler from hoisting the read of
        refcount out of the loop? It isn't even volatile (which has a
        stronger meaning in VS than regular C+++). <br>
      </blockquote>
      I see what you mean now, I was thinking on atomicity and order of
      operations but didn't consider the visibility of that read. Yes,
      if the compiler decides to be smart and hoist the read out of the
      loop we might never notice that it is safe to release those
      resources and we would leak them for no reason. I see from the
      windows docs(<a
        href="https://docs.microsoft.com/en-us/cpp/c-language/type-qualifiers"
        moz-do-not-send="true">https://docs.microsoft.com/en-us/cpp/c-language/type-qualifiers</a>)
  that declaring it volatile as you pointed out should be enough to
      prevent that.<br>
      <br>
      <blockquote type="cite"
        cite="mid:8928590e-6516-051e-aa84-098a6fdc9d45@oracle.com">
        <blockquote type="cite">Instead of having a refcount we could
          have done something similar to the stream struct and protect
          access to the connection through a mutex. To avoid serializing
          all threads we could have used SRW locks and only the one
          closing the connection would do AcquireSRWLockExclusive(). It
          would change the state of the connection to STATE_CLOSED,
          close all handles, and then release the mutex.
          ENTER_CONNECTION() and LEAVE_CONNECTION() would acquire and
          release the mutex in shared mode. But other that maybe be more
          easy to read I don't think the change will be smaller. <br>
          <br>
          <blockquote type="cite"> 413 while (attempts&gt;0) { <br>
            <br>
            spaces around &gt; <br>
          </blockquote>
          Done. <br>
          <br>
          <blockquote type="cite">If the loop at 413 never encounters a
            zero reference_count then it doesn't close the events or the
            mutex but still returns SYS_OK. That seems wrong but I'm not
            sure what the right behaviour is here. <br>
          </blockquote>
          I can change the return value to be SYS_ERR, but I don't think
          there is much we can do about it unless we want to wait
          forever until we can release those resources. <br>
        </blockquote>
        <br>
        SYS_ERR would look better, but I see now that the return value
        is completely ignored anyway. So we're just going to leak
        resources if the loop "times out". I guess this is the best we
        can do. <br>
      </blockquote>
      Here is v2 with the corrections:<br>
      <br>
      Full: <a
        href="http://cr.openjdk.java.net/~pchilanomate/8240902/v2/webrev/"
        moz-do-not-send="true">http://cr.openjdk.java.net/~pchilanomate/8240902/v2/webrev/</a><br>
  Inc: <a
        href="http://cr.openjdk.java.net/~pchilanomate/8240902/v2/inc/"
        moz-do-not-send="true">http://cr.openjdk.java.net/~pchilanomate/8240902/v2/inc/webrev/</a>
  (not sure why the indent fixes are not highlighted as changes
      but the Frames view does show they changed)<br>
      <br>
      I'll give it a run on mach5 adding tier5 as Serguei suggested.<br>
      <br>
      <br>
      Thanks,<br>
      Patricio<br>
      <blockquote type="cite"
        cite="mid:8928590e-6516-051e-aa84-098a6fdc9d45@oracle.com">Thanks,
        <br>
        David <br>
        <br>
        <blockquote type="cite"> <br>
          <blockquote type="cite">And please wait for serviceability
            folk to review this. <br>
          </blockquote>
          Sounds good. <br>
          <br>
          <br>
          Thanks for looking at this David! I will move the
          MemoryBarrier() and change the refcount to be DWORD32 if you
          are okay with that. <br>
          <br>
          <br>
          Thanks, <br>
          Patricio <br>
          <blockquote type="cite">Thanks, <br>
            David <br>
            ----- <br>
            <br>
            <blockquote type="cite">Tested in mach5 with the current
              baseline, tiers1-3 and several runs of
              open/test/langtools/:tier1 which includes the jshell tests
              where this connector is used. I also applied patch <a
                class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/~pchilanomate/8240902/triggerbug/webrev"
                moz-do-not-send="true">http://cr.openjdk.java.net/~pchilanomate/8240902/triggerbug/webrev</a>
  mentioned in the comments of the bug, on top of the
              baseline and run the langtool tests with and without this
              fix. Without the fix running around 30 repetitions already
              shows failures in tests
              jdk/jshell/FailOverExecutionControlTest.java and
              jdk/jshell/FailOverExecutionControlHangingLaunchTest.java.
              With the fix I run several hundred runs and saw no
              failures. Let me know if there is any additional testing I
              should do. <br>
              <br>
              As a side note, I see there are a couple of open issues
              related with jshell failures (8209848) which could be
              related to this bug and therefore might be fixed by this
              patch. <br>
              <br>
              Thanks, <br>
              Patricio <br>
              <br>
            </blockquote>
          </blockquote>
          <br>
        </blockquote>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic