[prev in list] [next in list] [prev in thread] [next in thread] 

List:       busybox
Subject:    Re: [PATCH 1/1] xargs: support -P <N> (parallel execution)
From:       Denys Vlasenko <vda.linux () googlemail ! com>
Date:       2017-08-28 11:39:50
Message-ID: CAK1hOcND-4BtVmJb57jLvYKPpiWHLwsF27VGRBwJJ4RyMGfRUw () mail ! gmail ! com
[Download RAW message or body]

On Fri, Aug 25, 2017 at 1:56 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>>  {
>>      int status;
>>
>> +#if !ENABLE_FEATURE_XARGS_SUPPORT_PARALLEL
>>      status = spawn_and_wait(G.args);
>> +#else
>> +    if (G.max_procs == 1) {
>> +        status = spawn_and_wait(G.args);
>> +    } else {
>> +        pid_t pid;
>> +        int wstat;
>> + again:
>> +        if (G.running_procs >= G.max_procs)
>> +            pid = safe_waitpid(-1, &wstat, 0);
>> +        else
>> +            pid = wait_any_nohang(&wstat);
>
> What if running_procs == max_procs == 0? In that case, we should avoid
> calling either safe_waitpid() or wait_any_nohang(). Maybe instead
> something like
>
> +        if (G.running_procs < G.max_procs)
> +            pid = wait_any_nohang(&wstat);
> +        else if (G.running_procs > 0)
> +            pid = safe_waitpid(-1, &wstat, 0);
>
> But then, the wait_any_nohang() call is still unnecessary if running_procs
> == 0. But you want to save space, not necessarily make the thing more
> performant, so...

Exactly. If G.running_procs == 0, waitting can be skipped.
But it's harmless.

>> +        if (pid > 0) {
>> +            /* We may have children we don't know about:
>> +             * sh -c 'sleep 1 & exec xargs ...'
>> +             * Do not make G.running_procs go negative.
>> +             */
>
> This hints very eloquently at the problem I am concerned about. But I
> think it is worse than just running_procs going negative. You could easily
> mistake child processes for having exited when they have not, including
> possibly failing children (and hence the exit status can be all wrong).

Having "unexpected children" is not typical.
Therefore we don't need to play nice with them: we only need to
avoid doing something catastrophic (such as crashing or looping forever).

>> +            if (G.running_procs != 0)
>> +                G.running_procs--;
>> +            status = WIFSIGNALED(wstat)
>> +                ? 0x180 + WTERMSIG(wstat)
>> +                : WEXITSTATUS(wstat);
>> +            if (status > 0 && status < 255) {
>> +                /* See below why 123 does not abort */
>> +                G.xargs_exitcode = 123;
>> +                status = 0;
>> +            }
>> +            if (status == 0)
>> +                goto again; /* maybe we have more children? */
>
> I think this logic still needs some love:
>
> - if running_procs is 0, there is no need to go at it again

Well, we are here only if pid > 0. This means, there *was* a child.

> - if running_procs is greater than 0, we still have to wait for the
>   remaining children to exit, even if one child exited due to a signal.

Well, actually no, we don't have to wait for all children.
If one exited due to a signal, xargs can immediately exit with 125.
All remaining children will be reaped by init.

Anyway, code does wait for all children, at the end of main:

        G.max_procs = 0;
        xargs_exec(); /* final waitpid() loop */

>> +    /* Manpage:
>> +     * """xargs exits with the following status:
>> +     * 0 if it succeeds
>> +     * 123 if any invocation of the command exited with status 1-125
>> +     * 124 if the command exited with status 255
>> +     *     ("""If any invocation of the command exits with a status of 255,
>> +     *     xargs will stop immediately without reading any further input.
>> +     *     An error message is issued on stderr when this happens.""")
>> +     * 125 if the command is killed by a signal
>> +     * 126 if the command cannot be run
>> +     * 127 if the command is not found
>> +     * 1 if some other error occurred."""
>> +     */
>>      if (status < 0) {
>>          bb_simple_perror_msg(G.args[0]);
>> -        return errno == ENOENT ? 127 : 126;
>> -    }
>> -    if (status == 255) {
>> -        bb_error_msg("%s: exited with status 255; aborting", G.args[0]);
>> -        return 124;
>> +        status = (errno == ENOENT) ? 127 : 126;
>>      }
>> -    if (status >= 0x180) {
>> +    else if (status >= 0x180) {
>>          bb_error_msg("'%s' terminated by signal %d",
>>              G.args[0], status - 0x180);
>> -        return 125;
>> +        status = 125;
>>      }
>> -    if (status)
>> -        return 123;
>> -    return 0;
>> +    else if (status != 0) {
>> +        if (status == 255) {
>> +            bb_error_msg("%s: exited with status 255; aborting", G.args[0]);
>> +            return 124;
>> +        }
>> +        /* "123 if any invocation of the command exited with status 1-125"
>> +         * This implies that nonzero exit code is remembered,
>> +         * but does not cause xargs to stop: we return 0.
>> +         */
>> +        G.xargs_exitcode = 123;
>> +        status = 0;
>> +    }
>> +
>> +    if (status != 0)
>> +        G.xargs_exitcode = status;
>
> Is it always safe to override the exitcode, even if it has already been
> set to something non-zero?

It can only be 123 if we continue xargs'ing. All other exitcodes make xargs
stop immediately.

It's not defined what to do if more than one stopping condition was seen,
I think it's okay to just use the latest status.

>> +#if ENABLE_FEATURE_XARGS_SUPPORT_PARALLEL
>> +    if (G.max_procs <= 0) /* -P0 means "run lots of them" */
>> +        G.max_procs = 100; /* let's not go crazy high */
>
> That is quite a bit arbitrary. I understand that you want to abuse
> max_procs == 0 to imply the final wrap-up, but 100 is such a non-digital
> number. If you must impose an artificial limit, why not 64, or 256, or
> 1024?

Yes, it's arbitrary. 100 would not fork bomb even a smallish machine,
yet gives a large speed boost on large SMP. 1024 might fork bomb.

63 might be a tiny bit better choice. "Store constant" insn on fixed width
insn arches usually has a restriction how wide constant can be.
63 is a maximum fitting in 6 bits.

I think the best value is `nproc`. Unfortunately, it's a bit more code.
Whoever cares about that, should use -P`nproc` in the script, I guess.
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic