'Re: [PATCH 0/5] Add pidfd support to the fanotify API'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-api
Subject:    Re: [PATCH 0/5] Add pidfd support to the fanotify API
From:       Matthew Bobrowski <repnop () google ! com>
Date:       2021-05-26 22:54:16
Message-ID: YK7RmNYzYDEvhSqb () google ! com
[Download RAW message or body]

On Wed, May 26, 2021 at 08:05:29PM +0200, Christian Brauner wrote:
> On Wed, May 26, 2021 at 09:20:55AM +1000, Matthew Bobrowski wrote:
> > On Tue, May 25, 2021 at 12:31:33PM +0200, Christian Brauner wrote:
> > > On Mon, May 24, 2021 at 10:47:46AM +0200, Jan Kara wrote:
> > > > On Sat 22-05-21 09:32:36, Matthew Bobrowski wrote:
> > > > > On Fri, May 21, 2021 at 12:40:56PM +0200, Jan Kara wrote:
> > > > > > On Fri 21-05-21 20:15:35, Matthew Bobrowski wrote:
> > > > > > > On Thu, May 20, 2021 at 03:55:27PM +0200, Jan Kara wrote:
> > > > > > > There's one thing that I'd like to mention, and it's something in
> > > > > > > regards to the overall approach we've taken that I'm not particularly
> > > > > > > happy about and I'd like to hear all your thoughts. Basically, with
> > > > > > > this approach the pidfd creation is done only once an event has been
> > > > > > > queued and the notification worker wakes up and picks up the event
> > > > > > > from the queue processes it. There's a subtle latency introduced when
> > > > > > > taking such an approach which at times leads to pidfd creation
> > > > > > > failures. As in, by the time pidfd_create() is called the struct pid
> > > > > > > has already been reaped, which then results in FAN_NOPIDFD being
> > > > > > > returned in the pidfd info record.
> > > > > > > 
> > > > > > > Having said that, I'm wondering what the thoughts are on doing pidfd
> > > > > > > creation earlier on i.e. in the event allocation stages? This way, the
> > > > > > > struct pid is pinned earlier on and rather than FAN_NOPIDFD being
> > > > > > > returned in the pidfd info record because the struct pid has been
> > > > > > > already reaped, userspace application will atleast receive a valid
> > > > > > > pidfd which can be used to check whether the process still exists or
> > > > > > > not. I think it'll just set the expectation better from an API
> > > > > > > perspective.
> > > > > > 
> > > > > > Yes, there's this race. OTOH if FAN_NOPIDFD is returned, the listener can
> > > > > > be sure the original process doesn't exist anymore. So is it useful to
> > > > > > still receive pidfd of the dead process?
> > > > > 
> > > > > Well, you're absolutely right. However, FWIW I was approaching this
> > > > > from two different angles:
> > > > > 
> > > > > 1) I wanted to keep the pattern in which the listener checks for the
> > > > >    existence/recycling of the process consistent. As in, the listener
> > > > >    would receive the pidfd, then send the pidfd a signal via
> > > > >    pidfd_send_signal() and check for -ESRCH which clearly indicates
> > > > >    that the target process has terminated.
> > > > > 
> > > > > 2) I didn't want to mask failed pidfd creation because of early
> > > > >    process termination and other possible failures behind a single
> > > > >    FAN_NOPIDFD. IOW, if we take the -ESRCH approach above, the
> > > > >    listener can take clear corrective branches as what's to be done
> > > > >    next if a race is to have been detected, whereas simply returning
> > > > >    FAN_NOPIDFD at this stage can mean multiple things.
> > > > > 
> > > > > Now that I've written the above and keeping in mind that we'd like to
> > > > > refrain from doing anything in the event allocation stages, perhaps we
> > > > > could introduce a different error code for detecting early process
> > > > > termination while attempting to construct the info record. WDYT?
> > > > 
> > > > Sure, I wouldn't like to overengineer it but having one special fd value for
> > > > "process doesn't exist anymore" and another for general "creating pidfd
> > > > failed" looks OK to me.
> > > 
> > > FAN_EPIDFD -> "creation failed"
> > > FAN_NOPIDFD -> "no such process"
> > 
> > Yes, I was thinking something along the lines of this...
> > 
> > With the approach that I've proposed in this series, the pidfd
> > creation failure trips up in pidfd_create() at the following
> > condition:
> > 
> > 	if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> > 	   	 return -EINVAL;
> > 
> > Specifically, the following check:
> > 	!pid_has_task(pid, PIDTYPE_TGID)
> > 
> > In order to properly report either FAN_NOPIDFD/FAN_EPIDFD to
> > userspace, AFAIK I'll have to do one of either two things to better
> > distinguish between why the pidfd creation had failed:
> 
> Ok, I see. You already do have a reference to a struct pid and in that
> case we should just always return a pidfd to the caller. For
> pidfd_open() for example we only report an error when
> find_get_pid(<pidnr>) doesn't find a struct pid to refer to. But in your
> case here you already have a struct pid so I think we should just keep
> this simple and always return a pidfd to the caller and in fact do
> burden them with figuring out that the process is gone via
> pidfd_send_signal() instead of complicating our lives here.

Right, that makes sense to me. In that case let's keep it simple and
lift the pid_has_task() check from pidfd_create() and leave it at
that. As you mentioned, this way we'll still end up returning a pidfd
in the event and FAN_NOPIDFD will resemble other pidfd creation
errors.

> (I think if would be interesting to see perf numbers and how high you
> need to bump the number of open fds sysctl limit to make this useable on
> systems with a lot of events. I'd be interested in that just in case you
> have something there.)

Not yet, but perhaps it's something that I'll be able to provide a
little later on. Jan, Amir and I recently have been discussing how to
go about conducting performance related regressions against fanotify,
so perhaps while at it I can think about this case too. I aim to start
on this after I've wrapped up with this series.

/M
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic