'Re: RFD: Rework/extending functionality of mdev'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       busybox
Subject:    Re: RFD: Rework/extending functionality of mdev
From:       Didier Kryn <kryn () in2p3 ! fr>
Date:       2015-03-19 14:38:15
Message-ID: 550ADF57.8010603 () in2p3 ! fr
[Download RAW message or body]

Le 18/03/2015 20:01, Harald Becker a écrit :
>
> Do you think it matters losing one more event?

     Here we are considering the case when fifosvd is killed (say by 
admin's error). I understand lost events can be recovered. However there 
is one distinctive advantage in detecting immediately the death of 
fifosvd: nldev can die immediately, causing a new working chain to be 
estabished immediately, *before* a possible burst of events. This avoids 
forking 3 daemons just when the next event happens.

     The necessary code in nldev consists only in invoking sigaction() 
with a trivial intercept function and testing some flag on return from 
any blocked state (poll/read/write). Note that you probably already want 
this sigaction and intercept to capture SIGTERM.

>
>> This is fine as long as the netlink reader keeps control on its exit,
>> not if it's killed.
>
> And when netlink is killed, the it is the responsibility of the higher
> instance to bring required stuff up again.

     Sure, we agreed on that. But living orphans should not be left 
behind, and, in this respect, it is nldev which is in charge of fifosvd; 
the higher instance can't do it.

>
>> This netlink reader you describe is not the general tool we were
>> considering up to now, the simple data funnel.
>
> My pseudo code described the principal operation and data flow, not
> every glory detail of failure management. So the here described netlink
> is what I last called t(netlink the Unix way).
>
>> If the idea is to integrate such peculiarities as execing a script,
>> then it is not the general tool and why not integrate as well the
>> supervision of mdev-i instead of needing fifosvd. The reason for
>> fifosvd was AFAIU to associate general tools, nldev and mdev-i.
>
> ??? Don't know if I fully understand you here. And why shall exec a 
> failure script violate making netlink a general tool? consider:
>
> nldev -e /path/to/failure/script

     I must say two things:

     First I didn't understand correctly what you had written and didn't 
apreciate the -e option.

     Second, I don't know what you include in the failure management, 
but I think part of it should be to get rid of the child.

     Doing it is going to be complicated in the script; at least you 
need to pass the pid because it is unknown to the shell.

      Instead, it is pretty simple in nldev: you just need to invoke 
wait() and syslog the exit status. The purpose of wait() isn't to check 
the pid of the process - we know who it is -, it's to remove the zombie, 
and get its exit status. This logic is no harm, whatever the way nldev 
is invoked. Even if it hasn't inherited a child, wait() returns 
immediately. I agree, though, that a comprenensive parsing of the status 
would take some lines of code.

>
> With may be a default of /sbin/nldev-fail.

     Maybe with a default behaviour of not execing anything - this 
option must be provided in some way.

     I skip the rest of the discussion because I would repeat the same 
things :-) And we agree that fifosvd can know the pipe is broken from 
the return code of the handler, and it's enough to have one way to know it.

     Didier

_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox
[prev in list] [next in list] [prev in thread] [next in thread]