[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-api
Subject:    Re: [PATCH] x86: Implement arch_prctl(ARCH_VSYSCALL_LOCKOUT) to disable vsyscall
From:       "Andy Lutomirski" <luto () kernel ! org>
Date:       2021-11-28 4:45:23
Message-ID: 3b5fb404-7228-48d6-a290-9dd1d6095325 () www ! fastmail ! com
[Download RAW message or body]

On Fri, Nov 26, 2021, at 3:18 PM, Florian Weimer wrote:
> * Andy Lutomirski:
> 
> > On Fri, Nov 26, 2021, at 12:24 PM, Florian Weimer wrote:
> > > * Andy Lutomirski:
> > > 
> > > > On Fri, Nov 26, 2021, at 5:47 AM, Florian Weimer wrote:
> > > > > Distributions struggle with changing the default for vsyscall
> > > > > emulation because it is a clear break of userspace ABI, something
> > > > > that should not happen.
> > > > > 
> > > > > The legacy vsyscall interface is supposed to be used by libcs only,
> > > > > not by applications.  This commit adds a new arch_prctl request,
> > > > > ARCH_VSYSCALL_LOCKOUT.  Newer libcs can adopt this request to signal
> > > > > to the kernel that the process does not need vsyscall emulation.
> > > > > The kernel can then disable it for the remaining lifetime of the
> > > > > process.  Legacy libcs do not perform this call, so vsyscall remains
> > > > > enabled for them.  This approach should achieves backwards
> > > > > compatibility (perfect compatibility if the assumption that only libcs
> > > > > use vsyscall is accurate), and it provides full hardening for new
> > > > > binaries.
> > > > 
> > > > Why is a lockout needed instead of just a toggle?  By the time an
> > > > attacker can issue prctls, an emulated vsyscall seems like a pretty
> > > > minor exploit technique.  And programs that load legacy modules or
> > > > instrument other programs might need to re-enable them.
> > > 
> > > For glibc, I plan to add an environment variable to disable the lockout.
> > > There's no ELF markup that would allow us to do this during dlopen.
> > > (And after this change, you can run an old distribution in a chroot
> > > for legacy software, something that the userspace ABI break prevents.)
> > > 
> > > If it can be disabled, people will definitely say, "we get more complete
> > > hardening if we break old userspace".  I want to avoid that.  (People
> > > will say that anyway because there's this fairly large window of libcs
> > > that don't use vsyscalls anymore, but have not been patched yet to do
> > > the lockout.)
> > 
> > I'm having trouble following the logic. What I mean is that I think it
> > should be possible to do the arch_prctl again to turn vsyscalls back
> > on.
> 
> The "By the time an attacker can issue prctls" argument does resonate
> with me, but I'm not the one who needs convincing.

Who else needs convincing?  It's your patch.

This could possibly be much more generic: have a mask of legacy features to disable \
and a separate mask of lock bits.

> 
> I can turn this into a toggle, and we could probably default our builds
> to vsyscalls=xonly.  Given the userspace ABI impact, we'd still have to
> upstream the toggle.  Do you see a chance of a patch a long these lines
> going in at all, given that it's an incomplete solution for
> vsyscall=emulate?

There is basically no reason for anyone to use vsyscall=emulate any more.  I'm aware \
of exactly one use case, and it's quite bizarre and involves instrumenting an \
outdated binary with an outdated instrumentation tool.  If either one is recent (last \
few years), vsyscall=xonly is fine.

> 
> > > Maybe the lockout also simplifies the implementation?
> > > 
> > > > Also, the interaction with emulate mode is somewhat complex. For now,
> > > > let's support this in xonly mode only. A complete implementation will
> > > > require nontrivial mm work.  I had that implemented pre-KPTI, but KPTI
> > > > made it more complicated.
> > > 
> > > I admit I only looked at the code in emulate_vsyscall.  It has code that
> > > seems to deal with faults not due to instruction fetch, and also checks
> > > for vsyscall=emulate mode.  But it seems that we don't get to this point
> > > for reads in vsyscall=emulate mode, presumably because the page is
> > > already mapped?
> > 
> > Yes, and, with KPTI off, it's nontrivial to unmap it. I have code for
> > this, but I'm not sure the complexity is worthwhile.
> 
> Huh.  KPTI is the new thing, right?  Does it make things harder or not?
> I'm confused.
> 
> If we knew at execve time that the new process image doesn't have
> vsyscall, would that be easier to set up?  vsyscall opt-out could be
> triggered by an ELF NOTE segment on the program interpreter (or main
> program if there isn't one).

Nah, it's a different issue.  The vsyscall mapping isn't a normal mapping at all.  \
It's in the *kernel* address range, so it's not in the user portion of the page \
tables.  This means that, per mm, there is only the pgd entry that can be changed.  \
With kpti off, it can be fudged using the U bit (hah!).  With kpti on, the same trick \
would work, but the whole pagetable arrangement is different, and the patch would \
need updating.

The patch looks a bit like this:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso_permm&id=18432aa9942e8c36c3ba008d2908c246127d135c


except I screwed up and there's a bunch of irrelevant stuff in there. But the patch \
would need updating for new kernels.  In any event, none of this is needed in xonly \
mode.

> 
> > > > Finally, /proc/self/maps should be wired up via the gate_area code.
> > > 
> > > Should the "[vsyscall]" string change to something else if execution is
> > > disabled?
> > 
> > I think the line should disappear entirely, just like booting with
> > vsyscall=none.
> 
> Hmm.  But only for vsyscall=xonly, right?  With vsyscall=emulate,
> reading at those addresses will still succeed.

IMO if vsyscall is disabled for a process, reads and executes should both fail.  This \
is trivial in xonly mode.

> 
> Thanks,
> Florian


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic