[prev in list] [next in list] [prev in thread] [next in thread] 

List:       llvm-bugs
Subject:    [llvm-bugs] [Bug 43508] New: Coroutine symmetric transfer tail call optimization not working on AArc
From:       via llvm-bugs <llvm-bugs () lists ! llvm ! org>
Date:       2019-09-30 16:17:22
Message-ID: bug-43508-206 () http ! bugs ! llvm ! org/
[Download RAW message or body]

--1569860243.811DA5.21507
Date: Mon, 30 Sep 2019 09:17:23 -0700
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.llvm.org/
Auto-Submitted: auto-generated

https://bugs.llvm.org/show_bug.cgi?id=43508

            Bug ID: 43508
           Summary: Coroutine symmetric transfer tail call optimization
                    not working on AArch64
           Product: clang
           Version: 9.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: C++2a
          Assignee: unassignedclangbugs@nondot.org
          Reporter: bartde@microsoft.com
                CC: blitzrakete@gmail.com, erik.pilkington@gmail.com,
                    llvm-bugs@lists.llvm.org, richard-llvm@metafoo.co.uk

The following code:

  task<void> sync_async() { co_return; }

  task<void> do_async()
  {
    for (int i = 0; i < 1024 * 1024; i++)
    {
      co_await sync_async();
    }
  }

Causes a stack overflow

  sync_async
  do_async
  sync_async
  do_async
  …

using clang9 on AArch64, without optimization (-O0). The task<T> implementation
uses symmetric transfer for the final awaiter:

  template <typename Promise>
  coroutine_handle_t await_suspend(std::experimental::coroutine_handle<Promise>
h) const noexcept
  {
    return h.promise().m_waiter;
  }

and for its operator co_await implementation:

  coroutine_handle_t await_suspend(coroutine_handle_t h) const
  {
    m_coro.promise().m_waiter = h;
    return m_coro;
  }

I think the task<T> provided by cppcoro should behave completely similar, so it
can be used for the repro.

The stack overflow doesn't repro on x86/x64, or for higher levels of
optimization on AArch64. Both for -O0 builds, the x86 version emits a tail call
by means of a jmp:

   b68d2:   e8 49 bc f7 ff           callq  32520
<_ZNKSt12experimental16coroutine_handleIvE7addressEv>
   b68d7:   48 89 c1                 mov    %rax,%rcx
   b68da:   48 8b 00                 mov    (%rax),%rax
   b68dd:   48 89 cf                 mov    %rcx,%rdi
   b68e0:   48 81 c4 a0 00 00 00     add    $0xa0,%rsp
   b68e7:   5d                       pop    %rbp
   b68e8:   ff e0                    jmpq   *%rax

while the AArch64 version emits:

   a8be0:   97fe1fbf    bl    30adc
<_ZNKSt12experimental16coroutine_handleIvE7addressEv>
   a8be4:   f9400008    ldr   x8, [x0]
   a8be8:   d63f0100    blr   x8
   a8bec:   a9497bfd    ldp   x29, x30, [sp, #144]
   a8bf0:   910283ff    add   sp, sp, #0xa0
   a8bf4:   d65f03c0    ret

which seems to perform a regular call using blr.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
--1569860243.811DA5.21507
Date: Mon, 30 Sep 2019 09:17:23 -0700
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.llvm.org/
Auto-Submitted: auto-generated

<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Coroutine symmetric transfer tail call optimization not working on \
AArch64"  href="https://bugs.llvm.org/show_bug.cgi?id=43508">43508</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Coroutine symmetric transfer tail call optimization not working on \
AArch64  </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>clang
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>9.0
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>C++2a
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedclangbugs&#64;nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>bartde&#64;microsoft.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>blitzrakete&#64;gmail.com, erik.pilkington&#64;gmail.com, \
llvm-bugs&#64;lists.llvm.org, richard-llvm&#64;metafoo.co.uk  </td>
        </tr></table>
      <p>
        <div>
        <pre>The following code:

  task&lt;void&gt; sync_async() { co_return; }

  task&lt;void&gt; do_async()
  {
    for (int i = 0; i &lt; 1024 * 1024; i++)
    {
      co_await sync_async();
    }
  }

Causes a stack overflow

  sync_async
  do_async
  sync_async
  do_async
  …

using clang9 on AArch64, without optimization (-O0). The task&lt;T&gt; implementation
uses symmetric transfer for the final awaiter:

  template &lt;typename Promise&gt;
  coroutine_handle_t await_suspend(std::experimental::coroutine_handle&lt;Promise&gt;
h) const noexcept
  {
    return h.promise().m_waiter;
  }

and for its operator co_await implementation:

  coroutine_handle_t await_suspend(coroutine_handle_t h) const
  {
    m_coro.promise().m_waiter = h;
    return m_coro;
  }

I think the task&lt;T&gt; provided by cppcoro should behave completely similar, so it
can be used for the repro.

The stack overflow doesn't repro on x86/x64, or for higher levels of
optimization on AArch64. Both for -O0 builds, the x86 version emits a tail call
by means of a jmp:

   b68d2:   e8 49 bc f7 ff           callq  32520
&lt;_ZNKSt12experimental16coroutine_handleIvE7addressEv&gt;
   b68d7:   48 89 c1                 mov    %rax,%rcx
   b68da:   48 8b 00                 mov    (%rax),%rax
   b68dd:   48 89 cf                 mov    %rcx,%rdi
   b68e0:   48 81 c4 a0 00 00 00     add    $0xa0,%rsp
   b68e7:   5d                       pop    %rbp
   b68e8:   ff e0                    jmpq   *%rax

while the AArch64 version emits:

   a8be0:   97fe1fbf    bl    30adc
&lt;_ZNKSt12experimental16coroutine_handleIvE7addressEv&gt;
   a8be4:   f9400008    ldr   x8, [x0]
   a8be8:   d63f0100    blr   x8
   a8bec:   a9497bfd    ldp   x29, x30, [sp, #144]
   a8bf0:   910283ff    add   sp, sp, #0xa0
   a8bf4:   d65f03c0    ret

which seems to perform a regular call using blr.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>
--1569860243.811DA5.21507--


[Attachment #3 (text/plain)]

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic