[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-kernel
Subject:    Bogomips alignment tests
From:       colin () nyx ! net (Colin Plumb)
Date:       1997-08-30 15:02:55
[Download RAW message or body]

To try to resolve this issue, here's a test program I whipped up
(ignore the warnings during assembly; they're harmless) that uses
the pentium time stamp counter to measure the loop precisely at
various alignments.

The results are printed in 8 columns.  The first 4 columns are results
without the nop, and the second 4 are results with it.  All are cycles
for a 1,000,000 iteration loop.  First, for a P5/133:

32+ 0:	5006835	5003518	5004758	5008011	2001472	2002232	2001111	2002580
32+ 1:	5056950	5004625	5004297	5004739	2002338	2001145	2002384	2001266
32+ 2:	5010301	5004586	5007611	5004624	2002522	2002520	2001383	2002161
32+ 3:	1242253	5008725	5004700	5018368	2002644	2001271	2002217	2004879
32+ 4:	2370546	5004579	5220033	5008899	2001214	2002424	2001110	2002479
32+ 5:	1534927	5003213	5004527	5007696	2002180	2001418	2002373	2001157
32+ 6:	5235480	5004696	5004588	5004958	2001210	2002293	2004854	2026252
32+ 7:	5008203	5004479	5008584	5004630	2002492	2002935	2002474	2001802
32+ 8:	5004625	5004696	5004565	5004908	2001110	2002656	2001495	2002227
32+ 9:	5003334	5008416	5004937	5003306	2002192	2001448	2002475	2001110
32+10:	5009182	5004998	5004745	5004765	2001327	2006211	2001177	2002367
32+11:	5004981	5038083	5004788	5008503	2002280	2001186	2002264	2001165
32+12:	5004673	5005372	5003731	5004527	2001240	2002580	2001401	2002420
32+13:	5019638	5004932	5008140	5003535	2006242	2001131	2002563	2001238
32+14:	5004678	5003373	5004839	5004527	2001191	2002175	2001700	2002470
32+15:	5013855	5008901	5004494	5004725	2002250	2001496	2002598	2001172
32+16:	5004765	5016242	5005603	5008658	2001168	2002439	2001321	2002192
32+17:	5004456	5005256	5004847	5003329	2002479	2001130	2002277	2005041
32+18:	5003802	5003422	5008650	5004974	26038487	2002371	2001178	2002540
32+19:	5004648	5004559	5005228	5004620	2001189	2001203	2002749	2001273
32+20:	5004705	5008652	5003655	5004949	2002186	2002202	2004996	2002531
32+21:	5008167	5004600	5004767	5004607	2001387	2001272	2002260	2001270
32+22:	5003480	5004805	5004839	5011781	2002237	2002638	2001332	2002267
32+23:	5004752	5003550	5018978	5005026	2001149	2001177	2002263	2001182
32+24:	5004464	5004774	5008700	5004896	2002584	2015146	2264543	2002977
32+25:	5004693	5004811	5003358	5005218	2001185	2005031	2001964	2001269
32+26:	5008271	5008133	5005002	5003456	2002182	2001147	2002339	2002530
32+27:	3022889	5003800	5004665	1053607	2001144	2002433	2001158	2001411
32+28:	5004813	5004520	5004872	5008683	2006183	2001423	2002387	2002271
32+29:	5003739	5004933	5003441	5004755	2001176	2002228	2001165	2001154
32+30:	5004914	5004432	5008268	5015594	2002445	2001143	2002322	2002715
32+31:	5004784	5007146	5004864	5003347	2001385	2002357	3002572	2004966
32+32:	5008380	5005020	5004529	5004605	2002309	2001227	2002248	2002262

Presumably the labels are straightforward.  The loops are aligned n bytes
past a 32-byte boundary.

Theer are some wierd variations sometimes, which I can't explain, such
as the first column's 32+3 +4 and +5 figures.  And I don't know why that
32+18 number is so high, but it's always in the 32+18 row.  I assume it's
just time-slicing or some such.

On a pentium, it seems pretty alignment-independent.  The first loop
takes 5 cycles per iteration, and the second takes 2.

Now, a PPro200:

32+ 0:	2005188	2001801	2001735	2001753	2004027	2002291	2001792	2041326
32+ 1:	2001902	2001752	2001735	2001751	2001809	2078461	2001771	2002251
32+ 2:	2001837	2001758	2001764	2001753	2001793	2001799	2276487	2001783
32+ 3:	2002028	2001932	2001739	2081276	2001777	2002779	2002549	2001756
32+ 4:	2065591	2001761	2042136	2041753	2001898	2001896	2001955	2001818
32+ 5:	2002528	2001940	2001789	2001859	2001835	2115990	2001800	2002861
32+ 6:	2001790	2001827	2001949	2001996	2001947	2002621	2002017	2001781
32+ 7:	2005695	2001755	2001777	2041702	2001784	2001821	2001757	2001946
32+ 8:	2042228	2081439	2042240	2001806	2003559	2001834	2001798	2001743
32+ 9:	2002277	2001814	2041373	2001770	2001864	2041166	2001795	2001759
32+10:	2001830	2001767	2002274	2009683	2042085	2002344	2001821	2001758
32+11:	2001880	2041015	2001924	2002186	2001866	2001829	2001820	2041186
32+12:	2042128	2002023	2041400	2001829	2042860	2002062	2001778	2002258
32+13:	2001845	2001866	2193753	2001905	3001809	3003651	3002864	3003614
32+14:	3003712	3002010	3002103	3004125	3003741	3041652	3003655	3044964
32+15:	3041293	3043358	3003529	3001780	3001765	3003570	3001804	3003698
32+16:	2002269	2002777	2001770	2001784	2001770	2010020	2001771	2001826
32+17:	2001804	2001862	2001795	2001769	2001768	2002172	2001934	2002525
32+18:	2119363	2001820	2001825	2002813	2001788	2001784	2041642	2001772
32+19:	2002339	2003592	2001779	2001845	2002176	2001771	2002283	2001750
32+20:	2168050	2001934	2042162	2001768	2001916	2001917	2001868	2001755
32+21:	2002800	2040787	2001897	2042646	2001780	2001805	2001938	2001923
32+22:	2001999	2002450	2002028	2002304	2002037	2044165	2041176	2001764
32+23:	2030200	2001830	2001798	2002022	2001774	2001790	2002353	2001942
32+24:	2063411	2002272	2041882	2001787	2001771	2001768	2001833	2002712
32+25:	2002232	2001868	2001832	2041291	2001767	2003319	2001782	2001800
32+26:	2083152	2001829	2001821	2002254	2001766	2001760	2041520	2042812
32+27:	2002415	2001771	2042428	2001860	2001768	2001758	2002219	2002258
32+28:	2001827	2001764	2001849	2001769	2001781	2001764	2001833	2001814
32+29:	2001930	2001904	2001868	2002003	3003677	3042313	3053742	3003626
32+30:	3003717	3001838	3001825	3043146	3053073	3003725	3004225	3001784
32+31:	3001857	3111311	3003724	3002310	3004053	3001802	3002842	3003667
32+32:	2050220	2041260	2001762	2001784	2001789	2001762	2001848	2001753

Quite consistnet, except for that 16+14 and 16+15 cases, and 16+13 in the
case of the second loop, when it's three cycles per iteration instead
of 2.  ".balign 16,0x90,3" seems to be in order here.  The third
argument is the maximum number of bytes to skip, beyond which no skip
is done at all.  I.e. "if we're within 3 bytes of a 16-byte boundary,
insert padding, otherwise forget it."

Myabe someone cound hack this up for a '486 and try it there?  I don't
have convenient access to one.  The rdtsc() macro would need hacking,
of course.
-- 
	-Colin

#include <stdio.h>

#define MAXALIGN 32

#define rdtsc(t) asm volatile ("rdtsc" : "=a" (t) :: "dx")

#define ALIGN(base,skip) ".balign "#base",0x90\n\t.skip "#skip",0x90\n\t"

#define DELAY1(base,skip,n,dummy) \
	asm volatile(ALIGN(base,skip) "1:\tdecl %0\n\tjns 1b" \
		     : "=r" (dummy) : "0" (n))

#define DELAY2(base,skip,n,dummy) \
	asm volatile(ALIGN(base,skip) "1:\tdecl %0\n\tnop\n\tjns 1b" \
		     : "=r" (dummy) : "0" (n))

#define LOOPS 1000000

#define TEST1(delay,result,align,skip) \
	rdtsc(t1);		\
	delay(align,skip,LOOPS,dummy);	\
	rdtsc(t2);	\
	result[skip] = t2-t1;

#define TEST(delay,result) 		\
	TEST1(delay,result,MAXALIGN,0);	\
	TEST1(delay,result,MAXALIGN,1);	\
	TEST1(delay,result,MAXALIGN,2);	\
	TEST1(delay,result,MAXALIGN,3);	\
	TEST1(delay,result,MAXALIGN,4);	\
	TEST1(delay,result,MAXALIGN,5);	\
	TEST1(delay,result,MAXALIGN,6);	\
	TEST1(delay,result,MAXALIGN,7);	\
	TEST1(delay,result,MAXALIGN,8);	\
	TEST1(delay,result,MAXALIGN,9);	\
	TEST1(delay,result,MAXALIGN,10);	\
	TEST1(delay,result,MAXALIGN,11);	\
	TEST1(delay,result,MAXALIGN,12);	\
	TEST1(delay,result,MAXALIGN,13);	\
	TEST1(delay,result,MAXALIGN,14);	\
	TEST1(delay,result,MAXALIGN,15);	\
	TEST1(delay,result,MAXALIGN,16);	\
	TEST1(delay,result,MAXALIGN,17);	\
	TEST1(delay,result,MAXALIGN,18);	\
	TEST1(delay,result,MAXALIGN,19);	\
	TEST1(delay,result,MAXALIGN,20);	\
	TEST1(delay,result,MAXALIGN,21);	\
	TEST1(delay,result,MAXALIGN,22);	\
	TEST1(delay,result,MAXALIGN,23);	\
	TEST1(delay,result,MAXALIGN,24);	\
	TEST1(delay,result,MAXALIGN,25);	\
	TEST1(delay,result,MAXALIGN,26);	\
	TEST1(delay,result,MAXALIGN,27);	\
	TEST1(delay,result,MAXALIGN,28);	\
	TEST1(delay,result,MAXALIGN,29);	\
	TEST1(delay,result,MAXALIGN,30);	\
	TEST1(delay,result,MAXALIGN,31);	\
	TEST1(delay,result,MAXALIGN,32)

int main(void)
{
	unsigned t1, t2;
	unsigned dummy;
	unsigned results[8][MAXALIGN+1];

	TEST(DELAY1,results[0]);
	TEST(DELAY2,results[4]);
	TEST(DELAY1,results[1]);
	TEST(DELAY2,results[5]);

	TEST(DELAY1,results[2]);
	TEST(DELAY2,results[6]);
	TEST(DELAY1,results[3]);
	TEST(DELAY2,results[7]);

	/* Print results */
	for (t1 = 0; t1 <= MAXALIGN; t1++) {
		printf("%u+%2u:", MAXALIGN, t1);
		for (t2 = 0; t2 < 8; t2++)
			printf("\t%u", results[t2][t1]);
		putchar('\n');
	}

	return 0;
}

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic