[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-kernel
Subject: Bogomips alignment tests
From: colin () nyx ! net (Colin Plumb)
Date: 1997-08-30 15:02:55
[Download RAW message or body]
To try to resolve this issue, here's a test program I whipped up
(ignore the warnings during assembly; they're harmless) that uses
the pentium time stamp counter to measure the loop precisely at
various alignments.
The results are printed in 8 columns. The first 4 columns are results
without the nop, and the second 4 are results with it. All are cycles
for a 1,000,000 iteration loop. First, for a P5/133:
32+ 0: 5006835 5003518 5004758 5008011 2001472 2002232 2001111 2002580
32+ 1: 5056950 5004625 5004297 5004739 2002338 2001145 2002384 2001266
32+ 2: 5010301 5004586 5007611 5004624 2002522 2002520 2001383 2002161
32+ 3: 1242253 5008725 5004700 5018368 2002644 2001271 2002217 2004879
32+ 4: 2370546 5004579 5220033 5008899 2001214 2002424 2001110 2002479
32+ 5: 1534927 5003213 5004527 5007696 2002180 2001418 2002373 2001157
32+ 6: 5235480 5004696 5004588 5004958 2001210 2002293 2004854 2026252
32+ 7: 5008203 5004479 5008584 5004630 2002492 2002935 2002474 2001802
32+ 8: 5004625 5004696 5004565 5004908 2001110 2002656 2001495 2002227
32+ 9: 5003334 5008416 5004937 5003306 2002192 2001448 2002475 2001110
32+10: 5009182 5004998 5004745 5004765 2001327 2006211 2001177 2002367
32+11: 5004981 5038083 5004788 5008503 2002280 2001186 2002264 2001165
32+12: 5004673 5005372 5003731 5004527 2001240 2002580 2001401 2002420
32+13: 5019638 5004932 5008140 5003535 2006242 2001131 2002563 2001238
32+14: 5004678 5003373 5004839 5004527 2001191 2002175 2001700 2002470
32+15: 5013855 5008901 5004494 5004725 2002250 2001496 2002598 2001172
32+16: 5004765 5016242 5005603 5008658 2001168 2002439 2001321 2002192
32+17: 5004456 5005256 5004847 5003329 2002479 2001130 2002277 2005041
32+18: 5003802 5003422 5008650 5004974 26038487 2002371 2001178 2002540
32+19: 5004648 5004559 5005228 5004620 2001189 2001203 2002749 2001273
32+20: 5004705 5008652 5003655 5004949 2002186 2002202 2004996 2002531
32+21: 5008167 5004600 5004767 5004607 2001387 2001272 2002260 2001270
32+22: 5003480 5004805 5004839 5011781 2002237 2002638 2001332 2002267
32+23: 5004752 5003550 5018978 5005026 2001149 2001177 2002263 2001182
32+24: 5004464 5004774 5008700 5004896 2002584 2015146 2264543 2002977
32+25: 5004693 5004811 5003358 5005218 2001185 2005031 2001964 2001269
32+26: 5008271 5008133 5005002 5003456 2002182 2001147 2002339 2002530
32+27: 3022889 5003800 5004665 1053607 2001144 2002433 2001158 2001411
32+28: 5004813 5004520 5004872 5008683 2006183 2001423 2002387 2002271
32+29: 5003739 5004933 5003441 5004755 2001176 2002228 2001165 2001154
32+30: 5004914 5004432 5008268 5015594 2002445 2001143 2002322 2002715
32+31: 5004784 5007146 5004864 5003347 2001385 2002357 3002572 2004966
32+32: 5008380 5005020 5004529 5004605 2002309 2001227 2002248 2002262
Presumably the labels are straightforward. The loops are aligned n bytes
past a 32-byte boundary.
Theer are some wierd variations sometimes, which I can't explain, such
as the first column's 32+3 +4 and +5 figures. And I don't know why that
32+18 number is so high, but it's always in the 32+18 row. I assume it's
just time-slicing or some such.
On a pentium, it seems pretty alignment-independent. The first loop
takes 5 cycles per iteration, and the second takes 2.
Now, a PPro200:
32+ 0: 2005188 2001801 2001735 2001753 2004027 2002291 2001792 2041326
32+ 1: 2001902 2001752 2001735 2001751 2001809 2078461 2001771 2002251
32+ 2: 2001837 2001758 2001764 2001753 2001793 2001799 2276487 2001783
32+ 3: 2002028 2001932 2001739 2081276 2001777 2002779 2002549 2001756
32+ 4: 2065591 2001761 2042136 2041753 2001898 2001896 2001955 2001818
32+ 5: 2002528 2001940 2001789 2001859 2001835 2115990 2001800 2002861
32+ 6: 2001790 2001827 2001949 2001996 2001947 2002621 2002017 2001781
32+ 7: 2005695 2001755 2001777 2041702 2001784 2001821 2001757 2001946
32+ 8: 2042228 2081439 2042240 2001806 2003559 2001834 2001798 2001743
32+ 9: 2002277 2001814 2041373 2001770 2001864 2041166 2001795 2001759
32+10: 2001830 2001767 2002274 2009683 2042085 2002344 2001821 2001758
32+11: 2001880 2041015 2001924 2002186 2001866 2001829 2001820 2041186
32+12: 2042128 2002023 2041400 2001829 2042860 2002062 2001778 2002258
32+13: 2001845 2001866 2193753 2001905 3001809 3003651 3002864 3003614
32+14: 3003712 3002010 3002103 3004125 3003741 3041652 3003655 3044964
32+15: 3041293 3043358 3003529 3001780 3001765 3003570 3001804 3003698
32+16: 2002269 2002777 2001770 2001784 2001770 2010020 2001771 2001826
32+17: 2001804 2001862 2001795 2001769 2001768 2002172 2001934 2002525
32+18: 2119363 2001820 2001825 2002813 2001788 2001784 2041642 2001772
32+19: 2002339 2003592 2001779 2001845 2002176 2001771 2002283 2001750
32+20: 2168050 2001934 2042162 2001768 2001916 2001917 2001868 2001755
32+21: 2002800 2040787 2001897 2042646 2001780 2001805 2001938 2001923
32+22: 2001999 2002450 2002028 2002304 2002037 2044165 2041176 2001764
32+23: 2030200 2001830 2001798 2002022 2001774 2001790 2002353 2001942
32+24: 2063411 2002272 2041882 2001787 2001771 2001768 2001833 2002712
32+25: 2002232 2001868 2001832 2041291 2001767 2003319 2001782 2001800
32+26: 2083152 2001829 2001821 2002254 2001766 2001760 2041520 2042812
32+27: 2002415 2001771 2042428 2001860 2001768 2001758 2002219 2002258
32+28: 2001827 2001764 2001849 2001769 2001781 2001764 2001833 2001814
32+29: 2001930 2001904 2001868 2002003 3003677 3042313 3053742 3003626
32+30: 3003717 3001838 3001825 3043146 3053073 3003725 3004225 3001784
32+31: 3001857 3111311 3003724 3002310 3004053 3001802 3002842 3003667
32+32: 2050220 2041260 2001762 2001784 2001789 2001762 2001848 2001753
Quite consistnet, except for that 16+14 and 16+15 cases, and 16+13 in the
case of the second loop, when it's three cycles per iteration instead
of 2. ".balign 16,0x90,3" seems to be in order here. The third
argument is the maximum number of bytes to skip, beyond which no skip
is done at all. I.e. "if we're within 3 bytes of a 16-byte boundary,
insert padding, otherwise forget it."
Myabe someone cound hack this up for a '486 and try it there? I don't
have convenient access to one. The rdtsc() macro would need hacking,
of course.
--
-Colin
#include <stdio.h>
#define MAXALIGN 32
#define rdtsc(t) asm volatile ("rdtsc" : "=a" (t) :: "dx")
#define ALIGN(base,skip) ".balign "#base",0x90\n\t.skip "#skip",0x90\n\t"
#define DELAY1(base,skip,n,dummy) \
asm volatile(ALIGN(base,skip) "1:\tdecl %0\n\tjns 1b" \
: "=r" (dummy) : "0" (n))
#define DELAY2(base,skip,n,dummy) \
asm volatile(ALIGN(base,skip) "1:\tdecl %0\n\tnop\n\tjns 1b" \
: "=r" (dummy) : "0" (n))
#define LOOPS 1000000
#define TEST1(delay,result,align,skip) \
rdtsc(t1); \
delay(align,skip,LOOPS,dummy); \
rdtsc(t2); \
result[skip] = t2-t1;
#define TEST(delay,result) \
TEST1(delay,result,MAXALIGN,0); \
TEST1(delay,result,MAXALIGN,1); \
TEST1(delay,result,MAXALIGN,2); \
TEST1(delay,result,MAXALIGN,3); \
TEST1(delay,result,MAXALIGN,4); \
TEST1(delay,result,MAXALIGN,5); \
TEST1(delay,result,MAXALIGN,6); \
TEST1(delay,result,MAXALIGN,7); \
TEST1(delay,result,MAXALIGN,8); \
TEST1(delay,result,MAXALIGN,9); \
TEST1(delay,result,MAXALIGN,10); \
TEST1(delay,result,MAXALIGN,11); \
TEST1(delay,result,MAXALIGN,12); \
TEST1(delay,result,MAXALIGN,13); \
TEST1(delay,result,MAXALIGN,14); \
TEST1(delay,result,MAXALIGN,15); \
TEST1(delay,result,MAXALIGN,16); \
TEST1(delay,result,MAXALIGN,17); \
TEST1(delay,result,MAXALIGN,18); \
TEST1(delay,result,MAXALIGN,19); \
TEST1(delay,result,MAXALIGN,20); \
TEST1(delay,result,MAXALIGN,21); \
TEST1(delay,result,MAXALIGN,22); \
TEST1(delay,result,MAXALIGN,23); \
TEST1(delay,result,MAXALIGN,24); \
TEST1(delay,result,MAXALIGN,25); \
TEST1(delay,result,MAXALIGN,26); \
TEST1(delay,result,MAXALIGN,27); \
TEST1(delay,result,MAXALIGN,28); \
TEST1(delay,result,MAXALIGN,29); \
TEST1(delay,result,MAXALIGN,30); \
TEST1(delay,result,MAXALIGN,31); \
TEST1(delay,result,MAXALIGN,32)
int main(void)
{
unsigned t1, t2;
unsigned dummy;
unsigned results[8][MAXALIGN+1];
TEST(DELAY1,results[0]);
TEST(DELAY2,results[4]);
TEST(DELAY1,results[1]);
TEST(DELAY2,results[5]);
TEST(DELAY1,results[2]);
TEST(DELAY2,results[6]);
TEST(DELAY1,results[3]);
TEST(DELAY2,results[7]);
/* Print results */
for (t1 = 0; t1 <= MAXALIGN; t1++) {
printf("%u+%2u:", MAXALIGN, t1);
for (t2 = 0; t2 < 8; t2++)
printf("\t%u", results[t2][t1]);
putchar('\n');
}
return 0;
}
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic