[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-kernel
Subject:    Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
From:       Linus Torvalds <torvalds () linux-foundation ! org>
Date:       2024-05-04 19:11:10
Message-ID: CAHk-=wi8mArAxxkO78CTSVRCyjim4hpGbzf2NFxNMAdXWR3oJA () mail ! gmail ! com
[Download RAW message or body]

On Sat, 4 May 2024 at 11:18, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> Here is my current thoughts for possible optimizations of non-volatile
> memory_order_relaxed atomics:
>
> o       Loads from the same variable that can legitimately be
>         reordered to be adjacent to one another can be fused
>         into a single load.

Let's call this "Rule 1"

I think you can extend this to also "can be forwarded from a previous store".

I also think you're too strict in saying "fused into a single load".
Let me show an example below.

> o       Stores to the same variable that can legitimately be
>         reordered to be adjacent to one another can be replaced
>         by the last store in the series.

Rule 2.

Ack, although again, I think you're being a bit too strict in your
language, and the rule can be relaxed.

> o       Loads and stores may not be invented.

Rule 3.

I think you can and should relax this. You can invent loads all you want.

> o       The only way that a computation based on the value from
>         a given load can instead use some other load is if the
>         two loads are fused into a single load.

Rule 4.

I'm not convinced that makes sense, and I don't think it's true as written.

I think I understand what you are trying to say, but I think you're
saying it in a way that only confuses a compiler person.

In particular, the case I do not think is true is very much the
"spill" case: if you have code like this:

    a = expression involving '__atomic_load_n(xyz, RELAXED)'

then it's perfectly fine to spill the result of that load and reload
the value. So the "computation based on the value" *is* actually based
on "some other load" (the reload).

I really *REALLY* think you need to explain the semantics in concrete
terms that a compiler writer would understand and agree with.

So to explain your rules to an actual compiler person (and relax the
semantics a bit) I would rewrite your rules as:

 Rule 1: a strictly dominating load can be replaced by the value of a
preceding load or store

 Ruie 2: a strictly dominating store can remove preceding stores

 Rule 3: stores cannot be done speculatively (put another way: a
subsequent dominating store can only *remove* a store entirely, it
can't turn the store into one with speculative data)

 Rule 4: loads cannot be rematerialized (ie a load can be *combined*
as per Rule 1, but a load cannot be *split* into two loads)

Anyway, let's get to the examples of *why* I think your language was
bad and your rules were too strict.

Let's start with your Rule 3, where you said:

 - Loads and stores may not be invented

and while I think this should be very very true for stores, I think
inventing loads is not only valid, but a good idea. Example:

    if (a)
        b = __atomic_load_n(ptr) + 1;

can perfectly validly just be written as

    tmp = __atomic_load_n(ptr);
    if (a)
        b = tmp+1;

which in turn may allow other optimizations (ie depending on how 'b'
is used, the conditional may go away entirely, and you just end up
with 'b = tmp+!!a').

There's nothing wrong with extra loads that aren't used.

And to make that more explicit, let's look at Rule 1:

Example of Rule 1 (together with the above case):

    if (a)
        b = __atomic_load_n(ptr) + 1;
    else
        b =  __atomic_load_n(ptr) + 2;
    c = __atomic_load_n(ptr) + 3;

then that can perfectly validly re-write this all as

    tmp = __atomic_load_n(ptr);
    b = a ? tmp+1 : tmp+2;
    c = tmp + 3;

because my version of Rule 1 allows the dominating load used for 'c'
to be replaced by the value of a preceding load that was used for 'a'
and 'b'.

And to give an example of Rule 2, where you said "reordered to be
adjacent", I'm saying that all that matters is being strictly
dominant, so

    if (a)
        __atomic_store_n(ptr,1);
    else
        __atomic_store_n(ptr,2);
    __atomic_store_n(ptr,3);

can be perfectly validly be combined into just

    __atomic_store_n(ptr,3);

because the third store completely dominates the two others, even if
in the intermediate form they are not necessarily ever "adjacent".

(Your "adjacency" model might still be valid in how you could turn
first of the first stores to be a fall-through, then remove it, and
then turn the other to be a fall-through and then remove it, so maybe
your language isn't _tecnically_ wrong, But I think the whole
"dominating store" is how a compiler writer would think about it).

Anyway, the above are all normal optimizations that compilers
*already* do, and the only real issue is that with memory ordering,
the "dominance" thing will need to take into account the ordering
requirements of other memory operations with stricter memory ordering
in between. So, for example, if you have

    __atomic_store_n(ptr,1, RELAXED);
    __atomic_load_n(otherptr,2, ACQUIRE);
    __atomic_store_n(ptr,2, RELAXED);

then the second store doesn't dominate the first store, because
there's a stricter memory ordering instruction in between.

But I think the *important* rule is that "Rule 4", which is what
actually gives us the true "atomicity" rule:

 Loads cannot be rematerialized

IOW, when we do a

    a = __atomic_load_n(ptr);

and use the value of 'a' at any point later, it can *never* have
turned into a re-load of that ptr value. Yes, the load might have been
combined with an earlier one, and the load may have been moved up
earlier (by the "you can invent loads and then  combine them with the
later real one" rule).

So even atomic loads can be combined, they can be moved to outside of
loops etc. BUT THEY CANNOT BE RE-DONE.

I think that's really the fundamnetally different rule, and the one
that makes us use 'volatile' in many cases.

So we have a "READ_ONCE()" macro, but in many cases it's really
"READ_AT_MOST_ONCE()". It's usually not that you can't optimize the
read away - you can - but you absolutely must not turn it into two
reads.

Turning a read into two reads is what makes those TOCTOU issues problematic.

Honestly, I'd love to just have that "Rule 4" be there for *all*
variable accesses. Not just __atomic ones. If we had a compiler flag
like "-fno-TUCTOU", I'd enable it.

                    Linus

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic