[prev in list] [next in list] [prev in thread] [next in thread]
List: busybox
Subject: Re: wget code shrink (recent change)
From: Xabier Oneca -- xOneca <xoneca () gmail ! com>
Date: 2018-11-20 8:37:53
Message-ID: CACkgH72xG+tx_P76YO7q_FnqdQyL7NPGPMmDf11a3OigBAQioA () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
Hi Raffaello & Denys,
First, thank you both very much for your quick response.
>> I can't see why that change would generate less asm. Out of curiosity,
>> anybody cares to explain?
>
> Not hard to guess⦠option_mask32 will already be in a register right after
> the if, while it might need to be reloaded after the bb_error_msg() call.
Or if
> the architecture supports indirect operands (like x86-64), the compiler
might
> still generate shorter opcodes by replacing two indirect instructions
with a
> load, two register instructions, and a store.
That makes sense. I didn't thought on that... :/
> > I can't see why that change would generate less asm. Out of curiosity,
> > anybody cares to explain?
>
> make networking/wget.s
>
> movl option_mask32, %eax # option_mask32, option_mask32.23
> testb $32, %ah #, option_mask32.23
> jne .L13 #,
> orb $32, %ah #, option_mask32.23
> movl %eax, option_mask32 # option_mask32.23, option_mask32
> pushl $.LC3 #
> call bb_error_msg #
> popl %esi #
> .L13:
The new code seems longer to me (note: I don't usually work with asm).
--- networking/wget.s.old 2018-11-20 09:07:14.894126056 +0100
+++ networking/wget.s.new 2018-11-20 09:07:14.902126010 +0100
@@ -136,11 +136,14 @@
movq %fs:40, %rax
movq %rax, 8(%rsp)
xorl %eax, %eax
- testb $32, option_mask32+1(%rip)
+ movl option_mask32(%rip), %eax
+ testb $32, %ah
jne .L8
+ orb $32, %ah
movl $.LC4, %edi
+ movl %eax, option_mask32(%rip)
+ xorl %eax, %eax
call bb_error_msg
- orl $8192, option_mask32(%rip)
.L8:
movq %rbx, %rdi
call xstrdup
In fact, I re-checked the bloatcheck, and now it gives +3 bytes. (I don't
know what did I do last time I checked... :S )
function old new delta
spawn_ssl_client 282 285 +3
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 1/0 up/down: 3/0) Total: 3
bytes
text data bss dec hex filename
87172 1406 488 89066 15bea busybox_old
87175 1406 488 89069 15bed busybox_unstripped
> Now, if there's a way to code bit-level test_and_set idiom in C?
> x86 has BTS insn which does it efficiently, but it's not used here
> by the compiler. It could be:
>
> btsl $13, option_mask32
> jnc .L13 #,
> pushl $.LC3 #
> call bb_error_msg #
> popl %esi #
> .L13:
Oh! Interesting instruction. That could be optimization work for the
compiler...
Thanks!
Xabier Oneca_,,_
[Attachment #5 (text/html)]
<div dir="ltr"><div dir="ltr"><div>Hi Raffaello & Denys,<br><br>First, thank you \
both very much for your quick response.<br><br>>> I can't see why that \
change would generate less asm. Out of curiosity,<br>>> anybody cares to \
explain?<br>><br>> Not hard to guess… option_mask32 will already be in a \
register right after<br>> the if, while it might need to be reloaded after the \
bb_error_msg() call. Or if<br>> the architecture supports indirect operands (like \
x86-64), the compiler might<br>> still generate shorter opcodes by replacing two \
indirect instructions with a<br>> load, two register instructions, and a \
store.<br><br>That makes sense. I didn't thought on that... :/<br><br>> > I \
can't see why that change would generate less asm. Out of curiosity,<br>> > \
anybody cares to explain?<br>><br>> make networking/wget.s<br>><br>> \
movl option_mask32, %eax # option_mask32, option_mask32.23<br>> \
testb $32, %ah #, option_mask32.23<br>> jne .L13 \
#,<br>> orb $32, %ah #, option_mask32.23<br>> movl \
%eax, option_mask32 # option_mask32.23, option_mask32<br>> pushl \
$.LC3 #<br>> call bb_error_msg #<br>> popl %esi \
#<br>> .L13:<br><br>The new code seems longer to me (note: I don't usually \
work with asm).<br><br><span style="font-family:monospace,monospace">--- \
networking/wget.s.old 2018-11-20 09:07:14.894126056 +0100<br>+++ \
networking/wget.s.new 2018-11-20 09:07:14.902126010 +0100<br>@@ -136,11 +136,14 \
@@<br> movq %fs:40, %rax <br> movq %rax, 8(%rsp) <br> xorl \
%eax, %eax <br>- testb $32, option_mask32+1(%rip) <br>+ movl \
option_mask32(%rip), %eax <br>+ testb $32, %ah <br> jne .L8 <br>+ \
orb $32, %ah <br> movl $.LC4, %edi <br>+ movl %eax, \
option_mask32(%rip) <br>+ xorl %eax, %eax <br> call bb_error_msg \
<br>- orl $8192, option_mask32(%rip) <br> .L8:<br> movq %rbx, %rdi \
<br> call xstrdup <br></span><br></div><div>In fact, I re-checked the \
bloatcheck, and now it gives +3 bytes. (I don't know what did I do last time I \
checked... :S )<br></div><div><br>function \
old new delta<br>spawn_ssl_client 282 \
285 +3<br>------------------------------------------------------------------------------<br>(add/remove: \
0/0 grow/shrink: 1/0 up/down: 3/0) Total: 3 bytes<br> text \
data bss dec hex filename<br> 87172 1406 488 \
89066 15bea busybox_old<br> 87175 1406 488 89069 \
15bed busybox_unstripped<br><br></div><div>> Now, if there's a way to code \
bit-level test_and_set idiom in C?<br>> x86 has BTS insn which does it \
efficiently, but it's not used here<br>> by the compiler. It could \
be:<br>><br>> btsl $13, option_mask32<br>> jnc .L13 \
#,<br>> pushl $.LC3 #<br>> call bb_error_msg \
#<br>> popl %esi #<br>> .L13:<br><br>Oh! Interesting instruction. \
That could be optimization work for the \
compiler...<br><br></div><div>Thanks!<br><br></div><div>Xabier \
Oneca_,,_<br></div></div></div>
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic