[prev in list] [next in list] [prev in thread] [next in thread] 

List:       uclinux-dev
Subject:    [uClinux-dev] patch for semaphore of pthread lib
From:       "Falk Brettschneider" <falk.brettschneider () gmx ! de>
Date:       2006-10-25 20:50:37
Message-ID: 20061025205037.122850 () gmx ! net
[Download RAW message or body]

Hi,

I've patched semaphore.c and spinlock.h of the pthread library. Please, have a look \
at my attached diff files. I have a quite current uclinux_dist, kernel 2.4.32 and a \
Microblaze platform.

1) I replaced all occurrences of the internally used helper mutex __sem_lock with \
local_[dis|en]able_irq() in semaphore.c 2) I avoid locking on the other internal \
helper mutex (in function __pthread_set_own_extricate_if() in spinlock.h) for all \
threads which are configured as PTHREAD_CANCEL_DISABLE.

Both changes safe me from total hangs of my application, actually reasoned by thread \
priority inversion over 3 threads (SCHED_RR, all different priorities, "A" with high \
prio, "B" with medium prio, "C" with low prio). "A" is an IRQ-handler thread. "B" and \
"C" are each waiting on their own semaphore for a wakeup by "A", for executing \
certain tasks (with different priority) as reaction on those IRQs.

Situation of hang:
- "C" is in the middle of entering sem_wait() preparing to suspend there. Has locked \
                the internal mutex __sem_lock.
- an IRQ immediately switches to the IRQ-handler thread "A"
- "A" wakes up "B" with sem_post() because "B" is sleeping on its own semaphore.
- a second IRQ happens which this time is a task for "C". And so "A" is calling \
sem_post() for "C" but blocks on that mutex __sem_lock. (The same way it happens with \
                the other mutex p_lock (in spinlock.h) in other situations).
- Now "B" and "C" are scheduled to wake up, but "B" has higher priority and starts \
executing an algorithm code. Unfortunately, due a logic bug in the algorithm code, \
                "B" goes into an infinite loop anywhere.
- Now we have a total hang and the watchdog resets the hardware. The timeout IRQ \
can't be processed by "A" because "A" is hanging an the mutex. Anyway, if "B" \
wouldn't go into an infinite loop, the algorithm could calculate too long. But the \
program has no chance to cancel "B" on a timeout IRQ. Just because "A" is blocked \
because of "C", and "C" is blocked because of "B". Such hard timing conditions have \
to be preserved here in my app.

This is much like the Mars pathfinder problem in 1997 but, unfortunately I can't \
solve it in the same way because LinuxThreads don't provide mutex priority \
inheritance. (http://research.microsoft.com/~mbj/Mars_Pathfinder/Mars_Pathfinder.html)


You may say, fix your application architecture but:
1) My architecture only works if sem_post() and sem_wait() are atomic operations \
which can't block in itself. 2) Such hangs are very hard to understand when the \
system hangs in internal functions of libraries of the operating system for no \
understandable reason. 3) \
http://epoxy.mrs.umn.edu/doc/glibc-doc/html//libc_34.html#SEC677 explains the \
semaphore functions to be atomic anyway. 4) avoiding the mutex by using temporary IRQ \
disabling has better performance and is more IRQ-safe. The internal mutex just guards \
variable accesses. So the IRQ switch-off is just for a very short time and shouldn't \
disturb.

What do you think? Is the patch OK?

Cheers, F@lk

-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer


["semaphore.c-diff" (application/octet-stream)]
["spinlock.h-diff" (application/octet-stream)]

_______________________________________________
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic