[prev in list] [next in list] [prev in thread] [next in thread] 

List:       busybox
Subject:    Re: wget code shrink (recent change)
From:       Xabier Oneca  --  xOneca <xoneca () gmail ! com>
Date:       2018-11-20 8:37:53
Message-ID: CACkgH72xG+tx_P76YO7q_FnqdQyL7NPGPMmDf11a3OigBAQioA () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Hi Raffaello & Denys,

First, thank you both very much for your quick response.

>> I can't see why that change would generate less asm. Out of curiosity,
>> anybody cares to explain?
>
> Not hard to guess… option_mask32 will already be in a register right after
> the if, while it might need to be reloaded after the bb_error_msg() call.
Or if
> the architecture supports indirect operands (like x86-64), the compiler
might
> still generate shorter opcodes by replacing two indirect instructions
with a
> load, two register instructions, and a store.

That makes sense. I didn't thought on that... :/

> > I can't see why that change would generate less asm. Out of curiosity,
> > anybody cares to explain?
>
> make networking/wget.s
>
>         movl    option_mask32, %eax     # option_mask32, option_mask32.23
>         testb   $32, %ah        #, option_mask32.23
>         jne     .L13    #,
>         orb     $32, %ah        #, option_mask32.23
>         movl    %eax, option_mask32     # option_mask32.23, option_mask32
>         pushl   $.LC3   #
>         call    bb_error_msg    #
>         popl    %esi    #
> .L13:

The new code seems longer to me (note: I don't usually work with asm).

--- networking/wget.s.old    2018-11-20 09:07:14.894126056 +0100
+++ networking/wget.s.new    2018-11-20 09:07:14.902126010 +0100
@@ -136,11 +136,14 @@
     movq    %fs:40, %rax
     movq    %rax, 8(%rsp)
     xorl    %eax, %eax
-    testb    $32, option_mask32+1(%rip)
+    movl    option_mask32(%rip), %eax
+    testb    $32, %ah
     jne    .L8
+    orb    $32, %ah
     movl    $.LC4, %edi
+    movl    %eax, option_mask32(%rip)
+    xorl    %eax, %eax
     call    bb_error_msg
-    orl    $8192, option_mask32(%rip)
 .L8:
     movq    %rbx, %rdi
     call    xstrdup

In fact, I re-checked the bloatcheck, and now it gives +3 bytes. (I don't
know what did I do last time I checked... :S )

function                                             old     new   delta
spawn_ssl_client                                     282     285      +3
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 1/0 up/down: 3/0)                 Total: 3
bytes
   text       data        bss        dec        hex    filename
  87172       1406        488      89066      15bea    busybox_old
  87175       1406        488      89069      15bed    busybox_unstripped

> Now, if there's a way to code bit-level test_and_set idiom in C?
> x86 has BTS insn which does it efficiently, but it's not used here
> by the compiler. It could be:
>
>         btsl    $13, option_mask32
>         jnc     .L13    #,
>         pushl   $.LC3   #
>         call    bb_error_msg    #
>         popl    %esi    #
> .L13:

Oh! Interesting instruction. That could be optimization work for the
compiler...

Thanks!

Xabier Oneca_,,_

[Attachment #5 (text/html)]

<div dir="ltr"><div dir="ltr"><div>Hi Raffaello &amp; Denys,<br><br>First, thank you \
both very much for your quick response.<br><br>&gt;&gt; I can&#39;t see why that \
change would generate less asm. Out of curiosity,<br>&gt;&gt; anybody cares to \
explain?<br>&gt;<br>&gt; Not hard to guess… option_mask32 will already be in a \
register right after<br>&gt; the if, while it might need to be reloaded after the \
bb_error_msg() call. Or if<br>&gt; the architecture supports indirect operands (like \
x86-64), the compiler might<br>&gt; still generate shorter opcodes by replacing two \
indirect instructions with a<br>&gt; load, two register instructions, and a \
store.<br><br>That makes sense. I didn&#39;t thought on that... :/<br><br>&gt; &gt; I \
can&#39;t see why that change would generate less asm. Out of curiosity,<br>&gt; &gt; \
anybody cares to explain?<br>&gt;<br>&gt; make networking/wget.s<br>&gt;<br>&gt;      \
movl    option_mask32, %eax     # option_mask32, option_mask32.23<br>&gt;         \
testb   $32, %ah        #, option_mask32.23<br>&gt;         jne     .L13    \
#,<br>&gt;         orb     $32, %ah        #, option_mask32.23<br>&gt;         movl   \
%eax, option_mask32     # option_mask32.23, option_mask32<br>&gt;         pushl   \
$.LC3   #<br>&gt;         call    bb_error_msg    #<br>&gt;         popl    %esi    \
#<br>&gt; .L13:<br><br>The new code seems longer to me (note: I don&#39;t usually \
work with asm).<br><br><span style="font-family:monospace,monospace">--- \
networking/wget.s.old    2018-11-20 09:07:14.894126056 +0100<br>+++ \
networking/wget.s.new    2018-11-20 09:07:14.902126010 +0100<br>@@ -136,11 +136,14 \
@@<br>     movq    %fs:40, %rax   <br>     movq    %rax, 8(%rsp)   <br>     xorl    \
%eax, %eax   <br>-    testb    $32, option_mask32+1(%rip)   <br>+    movl    \
option_mask32(%rip), %eax   <br>+    testb    $32, %ah   <br>     jne    .L8   <br>+  \
orb    $32, %ah   <br>     movl    $.LC4, %edi   <br>+    movl    %eax, \
option_mask32(%rip)   <br>+    xorl    %eax, %eax   <br>     call    bb_error_msg   \
<br>-    orl    $8192, option_mask32(%rip)   <br> .L8:<br>     movq    %rbx, %rdi   \
<br>     call    xstrdup   <br></span><br></div><div>In fact, I re-checked the \
bloatcheck, and now it gives +3 bytes. (I don&#39;t know what did I do last time I \
checked... :S )<br></div><div><br>function                                            \
old     new   delta<br>spawn_ssl_client                                     282     \
285      +3<br>------------------------------------------------------------------------------<br>(add/remove: \
0/0 grow/shrink: 1/0 up/down: 3/0)                 Total: 3 bytes<br>   text       \
data        bss        dec        hex    filename<br>  87172       1406        488    \
89066      15bea    busybox_old<br>  87175       1406        488      89069      \
15bed    busybox_unstripped<br><br></div><div>&gt; Now, if there&#39;s a way to code \
bit-level test_and_set idiom in C?<br>&gt; x86 has BTS insn which does it \
efficiently, but it&#39;s not used here<br>&gt; by the compiler. It could \
be:<br>&gt;<br>&gt;         btsl    $13, option_mask32<br>&gt;         jnc     .L13   \
#,<br>&gt;         pushl   $.LC3   #<br>&gt;         call    bb_error_msg    \
#<br>&gt;         popl    %esi    #<br>&gt; .L13:<br><br>Oh! Interesting instruction. \
That could be optimization work for the \
compiler...<br><br></div><div>Thanks!<br><br></div><div>Xabier \
Oneca_,,_<br></div></div></div>



_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic