'Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-api
Subject:    Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to sandboxing
From:       Mickaël_Salaün <mic () digikod ! net>
Date:       2016-04-28 23:45:25
Message-ID: 5722A095.1080809 () digikod ! net
[Download RAW message or body]

[Attachment #2 (multipart/mixed)]


Thanks for the comments. Here are mine:

On 28/04/2016 04:36, Kees Cook wrote:
> Okay, I've read through this whole series now (sorry for the huge
> delay). I think that it is overly complex for what it results in
> providing. Here are some background thoughts I had:

It may be a bit complex but my goal was to create a generic framework easily \
extensible in the future.

> 
> 1) People have asked for "dereferenced argument inspection" (I will
> call this DAI...), in that they would like to be able to process
> arguments like how BPF traditionally processes packets. This series
> doesn't provide that. Rather, it provides static checks against
> specific arguments types (currently just path checks).

The thing is, a network packet can be filtered based on some basic type checks (e.g. \
integer, bit fields, enum) but it seems really complex to be able to check for stuff \
like file path (without even thinking about a BPF regex engine :). However, the \
approach taken in this series is to allow complex checks based on a path *object* \
which could not be directly possible by inspecting the argument. Indeed, the kernel \
can expose more information than just a (user-controlled) string: parent directory, \
inode, device, mount point…

I think that the need is not to be able to (directly) dereference syscall arguments \
but to be able to evaluate arguments.

Moreover, exposing raw arguments in the seccomp_data struct can be tricky because of \
possible multiple pointer indirections.

Last but not least, giving the ability to a BPF to interpret syscall arguments can \
lead to inconsistent evaluation (incorrect mirroring of the OS code and state, cf. \
http://www.isoc.org/isoc/conferences/ndss/03/proceedings/papers/11.pdf).

However, another approach could be to expose a high-level (canonicalized path, inode \
values…) object in the seccomp_data struct but this seems more complex and less \
flexible. How to store path reference object? How to check path hierarchy?

> 
> 2) When I dig into the requirements people have around DAI, it's
> mostly about checking path names. There is some interest in some of
> the network structures, but mostly it's path names. This series
> certainly underscores this since your first example is path names. :)

Indeed. The same approach could be used to filter arguments as socket.

> 
> 3) Solving ToCToU should also solve performance problems. For example,
> this series, on a successful syscall, will look up a pathname twice
> (once in seccomp, then again in the syscall, and then compares the
> results in the LSM as a ToCToU back-stop). This seems like a waste of
> effort, since this reimplements the work the kernel is already doing
> to pass the resulting structure to the LSM hooks. As such, since this
> series is doing static checks and not allowing byte processing for
> DAI, I'm convinced that it should entirely happen in the LSM hooks.

This could be misleading. This series use the audit cache to not evaluate multiple \
path (and prevent ToCToU). I think there is then no more penalty than a syscall using \
multiple times the same file descriptor. If the checked file is not modified by \
another process, the file (path) cache check is only an integer/address comparison.

According to the current seccomp and syscall workflow in Linux, I don't see any other \
way to check an argument without modifying the current kernel behavior (e.g. ptrace \
hook) or locking resources (i.e. file).


> 
> 4) Performing the checks in the LSM hooks carries a risk of exposing
> the syscall's argument processing code to an attacker, but I think
> that is okay since very similar code would already need to be written
> to do the same thing before entering the syscall. The only way out of
> this, I think, would be to standardize syscall argument processing.

I created a basic, but generic and non-intrusive, syscall argument table to store \
useful types (e.g. int, char *) but standardizing syscall argument processing seems a \
big and long term task. However, using a cache similar to audit for some argument \
types seems more realistic ;)

> 
> 5) If we can standardize syscall argument processing, we could also
> change when it happens, and retain the results for the syscall,
> allowing for full byte processing style of DAI. e.g. copy userspace to
> kernel space, do BPF on the argument, if okay, pass the kernel copy to
> the syscall where it continues the processing. If the kernel copy
> wasn't already created by seccomp, the syscall would just make that
> copy itself, etc.

Cf. my first comment and the mirroring code problem (and complexity).

> 
> So, I see DAI as going one of two ways:
> 
> a) rewrite all syscall entry to use a common cacheable argument parser
> and offering true BPF processing of the argument bytes.
> 
> b) use the existing LSM hooks and define a policy language that can be
> loaded ahead of time.
> 
> Doing "a" has many problems, I think. Not the least of which is that I
> can't imagine a way for such an architectural change to not have
> negative performance impacts for the regular case.

Agree :)

> 
> Doing "b" means writing a policy engine. I would expect it to look a
> lot like either AppArmor or TOMOYO. TOMOYO has network structure
> processing, so probably it would look more like TOMOYO if you wanted
> more than just file paths. Maybe a seccomp LSM could share logic from
> one of the existing path-based LSMs.

An interesting thing about BPF is that it's already an engine and would be \
interesting to do more than denying accesses but, for example, faking syscall return \
values as it is already possible. Moreover, this keeps a small attack surface.

> 
> Another note I had for this series was that because the checker tries
> to keep a cached struct path, it allows unprivileged users to check
> for path names existing or not, regardless of the user's permissions.

The registration of a new checker (SECCOMP_ADD_CHECKER_GROUP) against a file path \
should only be allowed according to the current user permissions, and only a filter \
from the same thread can checks against this same file path. So I don't see how this \
series allows unprivileged users to check for path names regardless of their \
permissions.


> Instead, you have to check the path against the policy each time.

Well, each time doesn't means a path parsing but a cache comparison (like does the \
kernel anyway).

> AppArmor does this efficiently with a pre-built deterministic finite
> automatons (built from regular expressions), and TOMOYO just does
> string compares and limited glob parsing every time.
> 
> So, I can't take this as-is, but I'll take the one fix near the start.
> > ) I hope this isn't too discouraging, since I'd love to see this
> solved. Hopefully you can keep chipping away at it!

Of course this series is not ready as-is, but I'm convinced the main ideas could make \
happy sandbox developers!

Regards,
 Mickaël


["signature.asc" (application/pgp-signature)]
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic