From webkit-dev Mon Jul 06 01:04:35 2009 From: George Staikos Date: Mon, 06 Jul 2009 01:04:35 +0000 To: webkit-dev Subject: Re: [webkit-dev] Iterating SunSpider Message-Id: X-MARC-Message: https://marc.info/?l=webkit-dev&m=124684230215203 On 4-Jul-09, at 2:47 PM, Mike Belshe wrote: > #2: Use of summing as a scoring mechanism is problematic > Unfortunately, the sum-based scoring techniques do not withstand > the test of time as browsers improve. When the benchmark was first > introduced, each test was equally weighted and reasonably large. > Over time, however, the test becomes dominated by the slowest tests > - basically the weighting of the individual tests is variable based > on the performance of the JS engine under test. Today's engines > spend ~50% of their time on just string and date tests. The other > tests are largely irrelevant at this point, and becoming less > relevant every day. Eventually many of the tests will take near- > zero time, and the benchmark will have to be scrapped unless we > figure out a better way to score it. Benchmarking research which > long pre-dates SunSpider confirms that geometric means provide a > better basis for comparison: http://portal.acm.org/citation.cfm? > id=5673 Can future versions of the SunSpider driver be made so that > they won't become irrelevant over time? Actually this doesn't happen on all CPUs. For example CPUs without FPU have very different results. memory performance is also a big factor. > #3: The SunSpider harness has a variance problem due to CPU power > savings modes. > Because the test runs a tiny amount of Javascript (often under > 10ms) followed by a 500ms sleep, CPUs will go into power savings > modes between test runs. This radically changes the performance > measurements and makes it so that comparison between two runs is > dependent on the user's power savings mode. To demonstrate this, > run SunSpider on two machines- one with the Windows > "balanced" (default) setting for power, and then again with "high > performance". It's easy to see skews of 30% between these two > modes. I think we should change the test harness to avoid such > accidental effects. I've noticed this issue too. -- George Staikos Torch Mobile Inc. http://www.torchmobile.com/ _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev