On Wed, 16 Dec 2009 01:50:01 am David Nolden wrote: > Am Dienstag 15 Dezember 2009 14:58:51 schrieb Hamish Rodda: > > On Tue, 15 Dec 2009 08:47:44 pm David Nolden wrote: > > > Am Dienstag 15 Dezember 2009 02:04:53 schrieb Hamish Rodda: > > > > When I said it was slower, I meant it seemed like the background > > > > parsing was slower, but I didn't measure it. Given you've found it's > > > > faster, that's most likely the case. I didn't try to determine the > > > > UI responsiveness. The lock still prefers waiting readers over > > > > writers, so the UI should still be as fast (given the main thread > > > > should only ever use readers). > > > > > > > > If the user time is increased, that just means we were better at > > > > utilising the multiple CPUs, right? Ideally we want utilisation at > > > > 100% x all cpus, which should result in much better wall clock time > > > > but higher user time. > > > > > > That time should count the 'overall' CPU usage, and if it's higher, it > > > means that we've burnt more CPU cycles to get the same result. > > > > Well, having parsing finish earlier is a better result, isn't it? See > > results below, anyway. > > > > > > > Due to the central nature of the duchain lock, I'm actually > > > > > thinking of replacing all the mutexes in there with spin-locks, > > > > > using QAtomicInt instead of all the mutexes and wait conditions, to > > > > > make the whole thing more efficient. > > > > > > > > What are the performance differences with multiple threads in release > > > > mode? I think that is what we should be targeting, as it is our core > > > > audience (developers usually have decent machines). > > > > > > I've implemented my idea now, and it is much faster. Locking the > > > duchain now approximately equals increasing one counter, and eventually > > > waiting. > > > > Here is my test results: > > Test: clean .kdevduchain, hot disk cache, 'time duchainify kdevplatform' > > Test run on a core 2 quad running at 3.57Ghz, 4gb ram > > Non-pattern-conforming results run multiple times to get best time > > > > Spinlock, debugfull build: > > Thread count Real time User Time > > 1 41.14s 38.73s > > 2 46.97s 48.13s > > 4 45.54s 47.92s > > 8 69.37s 70.64s > > > > Waitcondition, debugfull build: > > Thread count Real time User Time > > 1 40.83s 37.92s > > 2 45.75s 49.05s > > 4 46.79s 55.55s > > 8 47.28s 54.64s > > > > Spinlock, release build: > > Thread count Real time User Time > > 1 21.35s 18.64s > > 2 23.85s 22.48s > > 4 31.63s 30.55s > > 8 39.74s 37.58s > > > > Waitcondition, release build: > > Thread count Real time User Time > > 1 22.81s 20.31s > > 2 20.82s 21.39s > > 4 20.73s 22.75s > > 8 23.25s 25.87s > > > > In conclusion, > > 1) Release builds are fast :) I might have to start using them... > > 2) Spinlock does not scale to multiple threads, as I suspected, as it > > can't efficiently handle situations of high lock contention > > 3) Waitcondition does scale up to number of threads == number of cpus, > > but does not yet offer a significant improvement with multithreading. > > User time is only slightly worse with waitcondition. > > > > Last night as I was developing the patch I found a great improvement with > > waitcondition, but that was when I had accidentally allowed write locks > > to be acquired when read locks already were. That's why the patch didn't > > quite perform as I found last night (where multithreaded parsing was ~30% > > faster in debug mode) > > > > Given I still think we can decrease the amount of time spent in write > > locks (by rewriting code to do calculations in read locks, and then get a > > write lock if changes are required), I would think continuing to work > > with the waitcondition lock would be better, possibly with spinlock being > > used when the background parser is only using one thread. > > There is some strangenesses though in the results. I've done some similar > less systematic tests, where the spin-locks were always significantly > faster with 1 or 2 threads than the wait-conditions. And from profiling I > know that the whole old duchain-locking mechanism with mutexes, a > reader-map, etc. was not efficient enough for the frequency in which it > was called, while the spin-lock is inherently more efficient, at least for > the 1-thread case. > > I think we can combine the advantages of both approaches, by adding wait- > conditions to the spin-lock approach, and letting waiters wait for the > wait- conditions instead of just sleeping. Interesting idea... I'm taking a look at reducing the amount of write locks and time within write locks now. Will try it and report back. BTW I'm also seeing a lot of variability in the results of my testing, some of the results above a really only best-case scenario. There must be some tricky timing phenomenon where sometimes threads step on each others' toes a lot, and other times they don't... Cheers, Hamish. -- KDevelop-devel mailing list KDevelop-devel@kdevelop.org https://barney.cs.uni-potsdam.de/mailman/listinfo/kdevelop-devel