[prev in list] [next in list] [prev in thread] [next in thread]
List: kdevelop-devel
Subject: Re: Patch: wake up duchainlock writers
From: Hamish Rodda <rodda () kde ! org>
Date: 2009-12-15 13:58:51
Message-ID: 200912160058.51165.rodda () kde ! org
[Download RAW message or body]
On Tue, 15 Dec 2009 08:47:44 pm David Nolden wrote:
> Am Dienstag 15 Dezember 2009 02:04:53 schrieb Hamish Rodda:
> > When I said it was slower, I meant it seemed like the background parsing
> > was slower, but I didn't measure it. Given you've found it's faster,
> > that's most likely the case. I didn't try to determine the UI
> > responsiveness. The lock still prefers waiting readers over writers, so
> > the UI should still be as fast (given the main thread should only ever
> > use readers).
> >
> > If the user time is increased, that just means we were better at
> > utilising the multiple CPUs, right? Ideally we want utilisation at 100%
> > x all cpus, which should result in much better wall clock time but higher
> > user time.
>
> That time should count the 'overall' CPU usage, and if it's higher, it
> means that we've burnt more CPU cycles to get the same result.
Well, having parsing finish earlier is a better result, isn't it? See results
below, anyway.
> > > Due to the central nature of the duchain lock, I'm actually thinking of
> > > replacing all the mutexes in there with spin-locks, using QAtomicInt
> > > instead of all the mutexes and wait conditions, to make the whole
> > > thing more efficient.
> >
> > What are the performance differences with multiple threads in release
> > mode? I think that is what we should be targeting, as it is our core
> > audience (developers usually have decent machines).
>
> I've implemented my idea now, and it is much faster. Locking the duchain
> now approximately equals increasing one counter, and eventually waiting.
Here is my test results:
Test: clean .kdevduchain, hot disk cache, 'time duchainify kdevplatform'
Test run on a core 2 quad running at 3.57Ghz, 4gb ram
Non-pattern-conforming results run multiple times to get best time
Spinlock, debugfull build:
Thread count Real time User Time
1 41.14s 38.73s
2 46.97s 48.13s
4 45.54s 47.92s
8 69.37s 70.64s
Waitcondition, debugfull build:
Thread count Real time User Time
1 40.83s 37.92s
2 45.75s 49.05s
4 46.79s 55.55s
8 47.28s 54.64s
Spinlock, release build:
Thread count Real time User Time
1 21.35s 18.64s
2 23.85s 22.48s
4 31.63s 30.55s
8 39.74s 37.58s
Waitcondition, release build:
Thread count Real time User Time
1 22.81s 20.31s
2 20.82s 21.39s
4 20.73s 22.75s
8 23.25s 25.87s
In conclusion,
1) Release builds are fast :) I might have to start using them...
2) Spinlock does not scale to multiple threads, as I suspected, as it can't
efficiently handle situations of high lock contention
3) Waitcondition does scale up to number of threads == number of cpus, but
does not yet offer a significant improvement with multithreading. User time
is only slightly worse with waitcondition.
Last night as I was developing the patch I found a great improvement with
waitcondition, but that was when I had accidentally allowed write locks to be
acquired when read locks already were. That's why the patch didn't quite
perform as I found last night (where multithreaded parsing was ~30% faster in
debug mode)
Given I still think we can decrease the amount of time spent in write locks
(by rewriting code to do calculations in read locks, and then get a write lock
if changes are required), I would think continuing to work with the
waitcondition lock would be better, possibly with spinlock being used when the
background parser is only using one thread.
Cheers,
Hamish.
--
KDevelop-devel mailing list
KDevelop-devel@kdevelop.org
https://barney.cs.uni-potsdam.de/mailman/listinfo/kdevelop-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic