<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://wiki.kram.nz/index.php?action=history&amp;feed=atom&amp;title=SE250%3Alab-1%3Ajham005</id>
	<title>SE250:lab-1:jham005 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.kram.nz/index.php?action=history&amp;feed=atom&amp;title=SE250%3Alab-1%3Ajham005"/>
	<link rel="alternate" type="text/html" href="https://wiki.kram.nz/index.php?title=SE250:lab-1:jham005&amp;action=history"/>
	<updated>2026-04-15T00:50:27Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.45.3</generator>
	<entry>
		<id>https://wiki.kram.nz/index.php?title=SE250:lab-1:jham005&amp;diff=4335&amp;oldid=prev</id>
		<title>Mark: 3 revision(s)</title>
		<link rel="alternate" type="text/html" href="https://wiki.kram.nz/index.php?title=SE250:lab-1:jham005&amp;diff=4335&amp;oldid=prev"/>
		<updated>2008-11-03T05:18:41Z</updated>

		<summary type="html">&lt;p&gt;3 revision(s)&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;=== John&amp;#039;s lab report ===&lt;br /&gt;
&lt;br /&gt;
I played around with the number of loop iterations until the program took about 1 second to run.&lt;br /&gt;
&lt;br /&gt;
  double f = 0;&lt;br /&gt;
  clock_t t0 = clock( );&lt;br /&gt;
 &lt;br /&gt;
  for( n = 0; n &amp;lt; 100000000; n++ )&lt;br /&gt;
    f = f + 1;&lt;br /&gt;
  printf( &amp;quot;%ld\n&amp;quot;, clock( ) - t0 );&lt;br /&gt;
&lt;br /&gt;
By changing the type of &amp;lt;tt&amp;gt;f&amp;lt;/tt&amp;gt;, I was able to gather data for the different C data types,&lt;br /&gt;
and by commenting out the line &amp;lt;tt&amp;gt;f = f + 1&amp;lt;/tt&amp;gt; I obtained a time for the empty loop.  To&lt;br /&gt;
check the variance in the times, I added another loop around the code to repeat 10 times.&lt;br /&gt;
&lt;br /&gt;
Here are the results from my laptop:&lt;br /&gt;
 double float long int short noop &lt;br /&gt;
 484    469   391  390 360   375    &lt;br /&gt;
 469    469   391  391 359   391    &lt;br /&gt;
 453    453   390  453 359   390    &lt;br /&gt;
 469    453   391  438 360   407    &lt;br /&gt;
 469    469   391  422 343   390    &lt;br /&gt;
 437    453   406  406 360   391    &lt;br /&gt;
 469    453   390  437 359   391    &lt;br /&gt;
 469    469   391  438 360   390    &lt;br /&gt;
 453    453   391  406 343   375    &lt;br /&gt;
 468    469   390  422 375   391    &lt;br /&gt;
 (CLOCKS_PER_SEC is 1,000)&lt;br /&gt;
&lt;br /&gt;
Subtracting the mean time for the noop loop gave the following mean times:&lt;br /&gt;
 double  float long   int  short&lt;br /&gt;
   74.9   71.9  3.1  3.12  -3.13&lt;br /&gt;
&lt;br /&gt;
Long and int both have similar runtime, as to double and float.&lt;br /&gt;
&lt;br /&gt;
The result for short is most curious: it appears that a loop in which you increment a short takes&lt;br /&gt;
less time than an empty loop.&lt;br /&gt;
&lt;br /&gt;
It is worth taking a closer look at the generated assembly code, just in case the C compiler is doing something odd.  I ran&lt;br /&gt;
&amp;lt;tt&amp;gt;gcc -S add-times.c&amp;lt;/tt&amp;gt; on the code for short, and again on the code with &amp;lt;tt&amp;gt;f = f + 1&amp;lt;/tt&amp;gt; commented out.  The block on&lt;br /&gt;
the left is the loop with the short increment, and on the right is the noop loop:&lt;br /&gt;
 L5:&lt;br /&gt;
  cmpl   $99999999, -16(%ebp)    cmpl    $99999999, -16(%ebp)&lt;br /&gt;
  jg     L6                      jg      L6&lt;br /&gt;
  movzwl -10(%ebp), %eax&lt;br /&gt;
  incl   %eax&lt;br /&gt;
  movw   %ax, -10(%ebp)&lt;br /&gt;
  leal   -16(%ebp), %eax         leal    -16(%ebp), %eax&lt;br /&gt;
  incl   (%eax)                  incl    (%eax)&lt;br /&gt;
  jmp    L5                      jmp     L5&lt;br /&gt;
 L6:&lt;br /&gt;
&lt;br /&gt;
Nothing strange here -- the instructions are identical apart from the three lines for the increment.  Those three additional&lt;br /&gt;
lines of assembler actually reduce the runtime of the loop.&lt;br /&gt;
&lt;br /&gt;
Repeating the experiment on the Linux server gives:&lt;br /&gt;
 short    int      long      float    double   noop   &lt;br /&gt;
 1100000  1620000  1620000   1170000  1120000  3000000&lt;br /&gt;
 1090000  1630000  1630000   1160000  1130000  3010000&lt;br /&gt;
 1070000  1630000  1630000   1130000  1140000  3010000&lt;br /&gt;
 1100000  1620000  1630000   1140000  1130000  3000000&lt;br /&gt;
 1090000  1640000  1620000   1180000  1120000  3010000&lt;br /&gt;
 1100000  1610000  1630000   1160000  1140000  3020000&lt;br /&gt;
 1100000  1630000  1630000   1200000  1130000  3000000&lt;br /&gt;
 1070000  1630000  1630000   1080000  1140000  3010000&lt;br /&gt;
 1100000  1630000  1630000   1170000  1130000  3010000&lt;br /&gt;
 1100000  1620000  1630000   1190000  1140000  3010000&lt;br /&gt;
 (CLOCKS_PER_SEC is 1,000,000)&lt;br /&gt;
&lt;br /&gt;
The results show similar relative times, but even stranger time for an empty loop: doing nothing&lt;br /&gt;
takes nearly &amp;#039;&amp;#039;&amp;#039;three times as long&amp;#039;&amp;#039;&amp;#039; as a loop that performs a simple operation.&lt;br /&gt;
Who would have guessed?&lt;br /&gt;
&lt;br /&gt;
It&amp;#039;s also notable that the Linux server is considerably slower than my laptop.&lt;br /&gt;
&lt;br /&gt;
Why is the noop loop so slow?  Modern processors are able to execute several instructions in parallel, but&lt;br /&gt;
this gets tricky when there is a conditional branch.  Rather than delay, waiting for the calculations for the&lt;br /&gt;
branch condition to be available, the processor may make a guess and start executing code after the branch &amp;quot;just in case&amp;quot;&lt;br /&gt;
it turns out to be the right guess.  If the guess is wrong, the work needs to be undone.  In the code above, the&lt;br /&gt;
jg (Jump Greater) instruction is only taken when the loop ends, and perhaps the processor is always guessing it is taken.&lt;br /&gt;
In the empty loop case, it would get further ahead and end up with more work to undo when it discovers its mistake.&lt;br /&gt;
&lt;br /&gt;
A completely different set of results can be obtained by taking the difference between a loop with two add operations&lt;br /&gt;
and a loop with just one.  i.e. replace the inner loop with this:&lt;br /&gt;
&lt;br /&gt;
    for( n = 0; n &amp;lt; 100000000; n++ ) {&lt;br /&gt;
      f = f + 1;&lt;br /&gt;
      g = g + 1;&lt;br /&gt;
    }&lt;br /&gt;
&lt;br /&gt;
Obtaining times with and without the second increment, &amp;lt;tt&amp;gt;g = g + 1&amp;lt;/tt&amp;gt; gives the following:&lt;br /&gt;
 short2 short1 long2 long1 float2 float1 double2 double1&lt;br /&gt;
 860    359    438   406   515    469    454     469&lt;br /&gt;
 843    360    422   391   500    468    468	 453&lt;br /&gt;
 844    359    421   375   500    453    469	 468&lt;br /&gt;
 859    359    438   422   500    454    469	 454&lt;br /&gt;
 844    344    422   391   500    468    469	 468&lt;br /&gt;
 860    375    453   406   485    453    453	 453&lt;br /&gt;
 843    359    422   390   484    469    453	 454&lt;br /&gt;
 844    360    422   391   500    453    453	 468&lt;br /&gt;
 844    359    437   406   500    453    484	 453&lt;br /&gt;
 859    344    422   391   500    454    453	 454&lt;br /&gt;
&lt;br /&gt;
Taking the mean differences gives very, very different results for short and double:&lt;br /&gt;
 mean( short2 - short1 )   = 492.2&lt;br /&gt;
 mean( long2 - long1 )     =  32.8&lt;br /&gt;
 mean( float2 - float1 )   =  39&lt;br /&gt;
 mean( double2 - double1 ) =   3.1&lt;br /&gt;
&lt;br /&gt;
It now appears that shorts cost the earth, and two double adds are as cheap as one.&lt;br /&gt;
&lt;br /&gt;
The times on the PowerPC64 Linux server are also damning for shorts and floats, but longs&lt;br /&gt;
prove to be much cheaper.&lt;br /&gt;
&lt;br /&gt;
 short2  short1  long2   long1   float2  float1  double2 double1&lt;br /&gt;
 1800000 1080000 1690000 1620000 1890000 1240000 1380000 1120000&lt;br /&gt;
 1810000 1100000 1700000 1630000 1890000 1230000 1390000 1140000&lt;br /&gt;
 1810000 1100000 1700000 1620000 1890000 1250000 1390000 1130000&lt;br /&gt;
 1810000 1100000 1700000 1630000 1890000 1240000 1380000 1130000&lt;br /&gt;
 1810000 1090000 1700000 1630000 1910000 1240000 1390000 1140000&lt;br /&gt;
 1800000 1100000 1710000 1630000 1890000 1250000 1390000 1140000&lt;br /&gt;
 1740000 1100000 1710000 1630000 1900000 1240000 1390000 1150000&lt;br /&gt;
 1730000 1100000 1700000 1630000 1900000 1240000 1390000 1130000&lt;br /&gt;
 1750000 1100000 1700000 1630000 1900000 1250000 1390000 1130000&lt;br /&gt;
 1760000 1100000 1700000 1640000 1900000 1250000 1390000 1120000&lt;br /&gt;
  &lt;br /&gt;
 mean( short2 - short1 )   = 685000&lt;br /&gt;
 mean( long2 - long1 )     =  72000&lt;br /&gt;
 mean( float2 - float1 )   = 653000&lt;br /&gt;
 mean( double2 - double1 ) = 255000&lt;br /&gt;
&lt;br /&gt;
A possible cause here is memory alignment: the PowerPC64 is optimised for fetching a 64-bit quantity&lt;br /&gt;
(the size of a long and a double), and performs poorly when writing a smaller quantity because it has&lt;br /&gt;
to first read a 64-bit block, update it, then write it back.  The 32-bit Intel processor on my Dell&lt;br /&gt;
appears to suffer the same problem when working with a 16-bit short.&lt;br /&gt;
&lt;br /&gt;
As for why a double is faster than a float, most FPUs operate on full-precision floating point numbers.&lt;br /&gt;
Working on a 32-bit float means converting it to and from 64-bit format.  I guess the Intel FPU can&lt;br /&gt;
execute several instructions in parallel, hence the &amp;quot;free&amp;quot; second increment.&lt;/div&gt;</summary>
		<author><name>Mark</name></author>
	</entry>
</feed>