[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-api
Subject: Re: [PATCH] x86: Implement arch_prctl(ARCH_VSYSCALL_LOCKOUT) to disable vsyscall
From: "Andy Lutomirski" <luto () kernel ! org>
Date: 2021-11-28 4:45:23
Message-ID: 3b5fb404-7228-48d6-a290-9dd1d6095325 () www ! fastmail ! com
[Download RAW message or body]
On Fri, Nov 26, 2021, at 3:18 PM, Florian Weimer wrote:
> * Andy Lutomirski:
>
> > On Fri, Nov 26, 2021, at 12:24 PM, Florian Weimer wrote:
> > > * Andy Lutomirski:
> > >
> > > > On Fri, Nov 26, 2021, at 5:47 AM, Florian Weimer wrote:
> > > > > Distributions struggle with changing the default for vsyscall
> > > > > emulation because it is a clear break of userspace ABI, something
> > > > > that should not happen.
> > > > >
> > > > > The legacy vsyscall interface is supposed to be used by libcs only,
> > > > > not by applications. This commit adds a new arch_prctl request,
> > > > > ARCH_VSYSCALL_LOCKOUT. Newer libcs can adopt this request to signal
> > > > > to the kernel that the process does not need vsyscall emulation.
> > > > > The kernel can then disable it for the remaining lifetime of the
> > > > > process. Legacy libcs do not perform this call, so vsyscall remains
> > > > > enabled for them. This approach should achieves backwards
> > > > > compatibility (perfect compatibility if the assumption that only libcs
> > > > > use vsyscall is accurate), and it provides full hardening for new
> > > > > binaries.
> > > >
> > > > Why is a lockout needed instead of just a toggle? By the time an
> > > > attacker can issue prctls, an emulated vsyscall seems like a pretty
> > > > minor exploit technique. And programs that load legacy modules or
> > > > instrument other programs might need to re-enable them.
> > >
> > > For glibc, I plan to add an environment variable to disable the lockout.
> > > There's no ELF markup that would allow us to do this during dlopen.
> > > (And after this change, you can run an old distribution in a chroot
> > > for legacy software, something that the userspace ABI break prevents.)
> > >
> > > If it can be disabled, people will definitely say, "we get more complete
> > > hardening if we break old userspace". I want to avoid that. (People
> > > will say that anyway because there's this fairly large window of libcs
> > > that don't use vsyscalls anymore, but have not been patched yet to do
> > > the lockout.)
> >
> > I'm having trouble following the logic. What I mean is that I think it
> > should be possible to do the arch_prctl again to turn vsyscalls back
> > on.
>
> The "By the time an attacker can issue prctls" argument does resonate
> with me, but I'm not the one who needs convincing.
Who else needs convincing? It's your patch.
This could possibly be much more generic: have a mask of legacy features to disable \
and a separate mask of lock bits.
>
> I can turn this into a toggle, and we could probably default our builds
> to vsyscalls=xonly. Given the userspace ABI impact, we'd still have to
> upstream the toggle. Do you see a chance of a patch a long these lines
> going in at all, given that it's an incomplete solution for
> vsyscall=emulate?
There is basically no reason for anyone to use vsyscall=emulate any more. I'm aware \
of exactly one use case, and it's quite bizarre and involves instrumenting an \
outdated binary with an outdated instrumentation tool. If either one is recent (last \
few years), vsyscall=xonly is fine.
>
> > > Maybe the lockout also simplifies the implementation?
> > >
> > > > Also, the interaction with emulate mode is somewhat complex. For now,
> > > > let's support this in xonly mode only. A complete implementation will
> > > > require nontrivial mm work. I had that implemented pre-KPTI, but KPTI
> > > > made it more complicated.
> > >
> > > I admit I only looked at the code in emulate_vsyscall. It has code that
> > > seems to deal with faults not due to instruction fetch, and also checks
> > > for vsyscall=emulate mode. But it seems that we don't get to this point
> > > for reads in vsyscall=emulate mode, presumably because the page is
> > > already mapped?
> >
> > Yes, and, with KPTI off, it's nontrivial to unmap it. I have code for
> > this, but I'm not sure the complexity is worthwhile.
>
> Huh. KPTI is the new thing, right? Does it make things harder or not?
> I'm confused.
>
> If we knew at execve time that the new process image doesn't have
> vsyscall, would that be easier to set up? vsyscall opt-out could be
> triggered by an ELF NOTE segment on the program interpreter (or main
> program if there isn't one).
Nah, it's a different issue. The vsyscall mapping isn't a normal mapping at all. \
It's in the *kernel* address range, so it's not in the user portion of the page \
tables. This means that, per mm, there is only the pgd entry that can be changed. \
With kpti off, it can be fudged using the U bit (hah!). With kpti on, the same trick \
would work, but the whole pagetable arrangement is different, and the patch would \
need updating.
The patch looks a bit like this:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso_permm&id=18432aa9942e8c36c3ba008d2908c246127d135c
except I screwed up and there's a bunch of irrelevant stuff in there. But the patch \
would need updating for new kernels. In any event, none of this is needed in xonly \
mode.
>
> > > > Finally, /proc/self/maps should be wired up via the gate_area code.
> > >
> > > Should the "[vsyscall]" string change to something else if execution is
> > > disabled?
> >
> > I think the line should disappear entirely, just like booting with
> > vsyscall=none.
>
> Hmm. But only for vsyscall=xonly, right? With vsyscall=emulate,
> reading at those addresses will still succeed.
IMO if vsyscall is disabled for a process, reads and executes should both fail. This \
is trivial in xonly mode.
>
> Thanks,
> Florian
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic