[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-hackers
Subject:    Re: amd64 syscall ABI (vs. Darwin)
From:       Damian's Proton Mail <damian () dmcyk ! xyz>
Date:       2022-01-17 23:14:55
Message-ID: 4979A00A-9678-4BAC-881D-71F7533D93F9 () dmcyk ! xyz
[Download RAW message or body]

[Attachment #2 (text/plain)]

> On 17 Jan 2022, at 23:51, Konstantin Belousov <kostikbel@gmail.com> wrote:
> 
> On Mon, Jan 17, 2022 at 10:31:09PM +0000, Damian's Proton Mail wrote:
> 
> > > On 17 Jan 2022, at 14:38, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > 
> > > Look at the sys/amd64/amd64/exceptions.S. The fast_syscall entry point
> > > is where we receive control after the syscall instruction.
> > 
> > A lot of new things in there for me, but the flow is clear. I was able to find \
> > corresponding logic in XNU's sources too. Earlier I said: 
> > > At a first glance Darwin approach seems more optimal
> > 
> > But it's instead the opposite/no difference at all, as in Darwin, they explicitly \
> > restore/set all registers, including callee saved r12-r15. 
> > Explicitly preserving registers would prevent kernel data leakage too. Doing so \
> > in FreeBSD would also be an ABI compatible change I think, since users shouldn't \
> > rely on values in those registers. I'm curious if you see any obvious pros/cons \
> > with either approach, or is it just a more arbitrary implementation choice?
> 
> We preserve everything on syscall entry, it is the SYSCALL instruction
> behavior that makes it look somewhat convoluted. I suggest you to read
> the SDM description of the SYSCALL instruction to understand the registers
> manipulations on entry.
> 
> On the other hand, on the fast syscall return, we indeed not restore
> everything. If you want to restore full frame, use PCB_FULL_IRET pcb
> flag to request iretq return path.
> 
> > Not that I'd propose changing the ABI though, I also want my toy project to work \
> > as a plug-in kernel module. I guess the only other option to emulate Darwin's \
> > behaviour would be to intercept syscalls in userspace somehow first and manually \
> > preserve the register values?
> 
> To emulate Darwin, you would need specific ABI personality (sysent) in the
> kernel, which would also provide sv_syscall_ret method. The method can
> do whatever is needed to the return frame, and set PCB_FULL_IRET to indicate
> that kernel should load it into CPU GPR file as is.
> 
> BTW, does Darwin use SYSCALL instruction for syscall entry on amd64?

Yes, it also uses SYSCALL. Also rax/rdx for return values and the carry bit to \
indicate errors. Even the syscall numbers are similar. They use different masks to \
distinguish BSD/Mach syscalls, but the effective BSD syscall numbers seem to be the \
same so far. So I already had sysent hooks, and PCB_FULL_IRET works indeed, thanks!


[Attachment #3 (text/html)]

<html><head><meta http-equiv="Content-Type" content="text/html; \
charset=UTF-8"/></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; \
line-break: after-white-space;" class=""><br class=""/><div><blockquote type="cite" \
class=""><div class="">On 17 Jan 2022, at 23:51, Konstantin Belousov &lt;<a \
href="mailto:kostikbel@gmail.com" class="">kostikbel@gmail.com</a>&gt; \
wrote:</div><br class="Apple-interchange-newline"/><div class=""><div class="">On \
Mon, Jan 17, 2022 at 10:31:09PM +0000, Damian&#39;s Proton Mail wrote:<br \
class=""/><blockquote type="cite" class=""><br class=""/><blockquote type="cite" \
class="">On 17 Jan 2022, at 14:38, Konstantin Belousov &lt;<a \
href="mailto:kostikbel@gmail.com" class="">kostikbel@gmail.com</a>&gt; wrote:<br \
class=""/><br class=""/></blockquote><blockquote type="cite" class="">Look at the \
sys/amd64/amd64/exceptions.S.  The fast_syscall entry point<br class=""/>is where we \
receive control after the syscall instruction.<br class=""/></blockquote>A lot of new \
things in there for me, but the flow is clear. I was able to find corresponding logic \
in XNU’s sources too. Earlier I said:<br class=""/><br class=""/><blockquote \
type="cite" class="">At a first glance Darwin approach seems more optimal<br \
class=""/></blockquote>But it’s instead the opposite/no difference at all, as in \
Darwin, they explicitly restore/set all registers, including callee saved r12-r15.<br \
class=""/><br class=""/>Explicitly preserving registers would prevent kernel data \
leakage too. Doing so in FreeBSD would also be an ABI compatible change I think, \
since users shouldn’t rely on values in those registers.<br class=""/>I’m curious if \
you see any obvious pros/cons with either approach, or is it just a more arbitrary \
implementation choice?<br class=""/></blockquote>We preserve everything on syscall \
entry, it is the SYSCALL instruction<br class=""/>behavior that makes it look \
somewhat convoluted.  I suggest you to read<br class=""/>the SDM description of the \
SYSCALL instruction to understand the registers<br class=""/>manipulations on \
entry.<br class=""/><br class=""/>On the other hand, on the fast syscall return, we \
indeed not restore<br class=""/>everything. If you want to restore full frame, use \
PCB_FULL_IRET pcb<br class=""/>flag to request iretq return path.<br class=""/><br \
class=""/><blockquote type="cite" class=""><br class=""/>Not that I’d propose \
changing the ABI though, I also want my toy project to work as a plug-in kernel \
module.<br class=""/>I guess the only other option to emulate Darwin&#39;s behaviour \
would be to intercept syscalls in userspace somehow first and manually preserve the \
register values?<br class=""/></blockquote><br class=""/>To emulate Darwin, you would \
need specific ABI personality (sysent) in the<br class=""/>kernel, which would also \
provide sv_syscall_ret method.  The method can<br class=""/>do whatever is needed to \
the return frame, and set PCB_FULL_IRET to indicate<br class=""/>that kernel should \
load it into CPU GPR file as is.<br class=""/><br class=""/>BTW, does Darwin use \
SYSCALL instruction for syscall entry on amd64?<br \
class=""/></div></div></blockquote><br class=""/></div><div>Yes, it also uses \
SYSCALL. Also rax/rdx for return values and the <i class="">carry</i><span \
style="font-style: normal;" class=""> bit to indicate errors.</span></div><div><span \
style="font-style: normal;" class="">Even the syscall numbers are similar. They use \
different masks to distinguish BSD/Mach syscalls, but the effective BSD syscall \
numbers seem to be the same so far.</span></div><div>So I already had sysent hooks, \
and PCB_FULL_IRET works indeed, thanks!</div></body></html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic