From webkit-dev  Mon Jul 06 01:04:35 2009
From: George Staikos <staikos () kde ! org>
Date: Mon, 06 Jul 2009 01:04:35 +0000
To: webkit-dev
Subject: Re: [webkit-dev] Iterating SunSpider
Message-Id: <A277587E-F09A-4672-A484-9AB8652E9FFC () kde ! org>
X-MARC-Message: https://marc.info/?l=webkit-dev&m=124684230215203


On 4-Jul-09, at 2:47 PM, Mike Belshe wrote:

> #2: Use of summing as a scoring mechanism is problematic
> Unfortunately, the sum-based scoring techniques do not withstand  
> the test of time as browsers improve.  When the benchmark was first  
> introduced, each test was equally weighted and reasonably large.   
> Over time, however, the test becomes dominated by the slowest tests  
> - basically the weighting of the individual tests is variable based  
> on the performance of the JS engine under test.  Today's engines  
> spend ~50% of their time on just string and date tests.  The other  
> tests are largely irrelevant at this point, and becoming less  
> relevant every day.  Eventually many of the tests will take near- 
> zero time, and the benchmark will have to be scrapped unless we  
> figure out a better way to score it.  Benchmarking research which  
> long pre-dates SunSpider confirms that geometric means provide a  
> better basis for comparison:  http://portal.acm.org/citation.cfm? 
> id=5673 Can future versions of the SunSpider driver be made so that  
> they won't become irrelevant over time?

    Actually this doesn't happen on all CPUs.  For example CPUs  
without FPU have very different results.  memory performance is also  
a big factor.

> #3: The SunSpider harness has a variance problem due to CPU power  
> savings modes.
> Because the test runs a tiny amount of Javascript (often under  
> 10ms) followed by a 500ms sleep, CPUs will go into power savings  
> modes between test runs.  This radically changes the performance  
> measurements and makes it so that comparison between two runs is  
> dependent on the user's power savings mode.  To demonstrate this,  
> run SunSpider on two machines- one with the Windows  
> "balanced" (default) setting for power, and then again with "high  
> performance".  It's easy to see skews of 30% between these two  
> modes.  I think we should change the test harness to avoid such  
> accidental effects.

    I've noticed this issue too.

--
George Staikos
Torch Mobile Inc.
http://www.torchmobile.com/

_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev