'Re: [Patch] shm bug introduced with pagecache in 2.3.11'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-kernel
Subject:    Re: [Patch] shm bug introduced with pagecache in 2.3.11
From:       Linus Torvalds <torvalds () transmeta ! com>
Date:       1999-11-21 9:46:58
[Download RAW message or body]



On Sun, 21 Nov 1999, Manfred Spraul wrote:
> > 
> > - if a reader is waiting for a writer, then the reader will have
> > incremented the semaphore, and the writer will know to wake it up
> > becasue the semaphore value won't be zero after the "write_up()".
> 
> Only one thread must do that, otherwise you couldn't distinguish between
> "multiple writers are waiting, the lock is free" and "one writer is
> waiting, one reader owns the lock".

Sure you can - the "semaphore value" is only used for the fast-path to
handle the non-contention case.

In the contention case, you go into a spin-lock protected area that
maintains the complete count of readers and writers. So the real structure
looks something like this:

	{ value, sleeping_readers, sleeping_writers }

	{ 1, 0, 0 }		/* one reader, no contention */
	{ 0x80000001, 1, 0 }	/* writer active and owns lock, one reader waiting */
	{ 0x80000001, 0, 1 }	/* writer active and owns lock, one writer waiting */
	{ 0x80000001, 0, 1 }	/* reader active and owns lock, one writer waiting */

Note how in the last two cases the structure _looks_ the same, but that
doesn't actually matter at all: the real state is encoded in the actual
threads that hold the lock, and in particular the ambiguity will be gone
by the time the lock owner releases its lock.

At that point you get (after having done the in-line atomic operation to
release the lock):

	{ 0, 0, 0 }		/* no-contention reader unlocked - nothing happens */
	{ 0x00000001, 1, 0 }	/* writer released write lock, !=0 means that the reader will \
get woken up (and now it will just continue happily */  { 0x00000001, 0, 1 }	/* \
writer released the write lock, !=0 means that the other writer will now be woken up \
*/  { 0x80000000, 0, 1 }	/* reader released read lock, <0 means that writer will be \
woken up */

So notice how at all times people know what to do: the writer that "faked"
a reader (in order to get the previous writer to wake it up) knows to
unfake the reader and re-try the write-lock that it failed to aquire the
first time around.

See?

The difficult part in these things is to make sure that we never miss a
wake-up, and that we never have a situation where the counts can get
messed up. The reader/writer counts are trivially protected by a spinlock,
as they are never even touched or looked at in the fast-path. The "real
value" is the one you have to be clever and careful about, and make sure
you have the right memory ordering requirements so that even when people
race on accessing it you are guaranteed to wake up too much rather than
too little.

> I think there is a problem: neither "write_lock_trylock()" nor
> "read_lock_trylock()" [ie the inline part of your rw-semaphore
> operations] are atomic, both change a value, and if a certain flag is
> set, then they undo their change.

The rw-semaphore inline stuff should _never_ undo the change.

The out-of-line code knows which failure case it was, and after having
aquired the semaphore they can now look at the reader/writer count, and
decide to undo some _other_ sleepers change - in order to make progress
itself (and this is safe, because that other thread won't be racing any
more, because it also got contention and as such also has the spinlock
before mucking with any of the secondary counters - we're synchronous with
regard to all other contention holders at this point).

But hey, I haven't actually done the full implementaion. I'm pretty
confident it can be done, but...

Anyway, if your point was that two "trylocks" can race and neither get the
lock, then yes, you're right. That's what "trylock" is all about - it
won't schedule, it just will fail. And yes, you could have pessimistic
failures. Unlikely, but possible. Not fatal, as the whole point of trylock
is that the caller can gracefully recover from a failure.

		Linus


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic