SE250:lab-1:dols008

From Marks Wiki
Revision as of 05:18, 3 November 2008 by Mark (Sọ̀rọ̀ | contribs) (2 revision(s))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

At first I just wrote a for loop which performed 2000000000 (two thousand million) additions. The first problem was to define what an addition is. I decided to measure the time taken by the formula a = b + c (more on this later). I realised, however, that the loop would take significantly longer than the addition itself. After all, the loop includes an addition of sorts, along with a comparison and branch. Because it's not practical to time the addition without the loop, I opted instead to time the loop without the addition. I then subtracted the loop time from the combinded time to give an approximate time for the additions themselves. Here are my results:

Addition time with loop: 5.71s
Loop time: 4.81s
Addition time: 0.9s

One problem with this approach is that due to the complexities of modern processors, the time for an addition in isolation will not be identical to the time for an addition inside my loop. This may not be a problem however, as in practical programs additions will not be performed in isolation anyway, they will be interspersed with many other operations.

I tried running my program in release mode, but the optimiser simply removed the entire loop as the result was obvious, and the program completed instantly. For my final timings I opted for release mode with optimisation disabled. This also prompted me to have a look at the assembly output, to see what I was really timing, and how much liberty the compiler had taken with my code. I realise most students have no assembly experience at this stage, but bear with me for a second. Here is the assembly listing for the loop with addition:

; 24   : 	for (i = 0; i < iterations; i++)

	mov	DWORD PTR _i$[ebp], 0
	jmp	SHORT $LN6@main
$LN5@main:
	mov	ecx, DWORD PTR _i$[ebp]
	add	ecx, 1
	mov	DWORD PTR _i$[ebp], ecx
$LN6@main:
	mov	edx, DWORD PTR _i$[ebp]
	cmp	edx, DWORD PTR _iterations$[ebp]
	jae	SHORT $LN4@main

; 25   : 	{
; 26   : 		c = a + b;

	mov	eax, DWORD PTR _a$[ebp]
	add	eax, DWORD PTR _b$[ebp]
	mov	DWORD PTR _c$[ebp], eax

; 27   : 	}

	jmp	SHORT $LN5@main

And here is the loop without addition:

; 34   : 	for (i = 0; i < iterations; i++)

	mov	DWORD PTR _i$[ebp], 0
	jmp	SHORT $LN3@main
$LN2@main:
	mov	eax, DWORD PTR _i$[ebp]
	add	eax, 1
	mov	DWORD PTR _i$[ebp], eax
$LN3@main:
	mov	ecx, DWORD PTR _i$[ebp]
	cmp	ecx, DWORD PTR _iterations$[ebp]
	jae	SHORT $LN1@main

; 35   : 	{
; 36   : 	}

	jmp	SHORT $LN2@main

What I want you to notice is that the two are practically identical, except for the addition code itself, which is:

; 26   : 		c = a + b;

	mov	eax, DWORD PTR _a$[ebp]
	add	eax, DWORD PTR _b$[ebp]
	mov	DWORD PTR _c$[ebp], eax

Those three instructions do the following:

  1. Move the contents of variable a into a register* called eax.
  2. Add the contents of eax and the variable b together and store the result in eax.
  3. Store the contents of eax in the variable c.

A register is a piece of storage on the processor itself for values that are being worked on. Values usually have to be copied from memory to a register before they can be worked on.

Arguably, I should be timing the add instruction alone. But as you can see here, the add instruction isn't the only thing needed to perform your average addition. I decided that my definition of addition was still acceptable. But experimentally, I decided to see what the assembly for a += b looked like. This was the result:

; 26   : 		a += b;

	mov	eax, DWORD PTR _a$[ebp]
	add	eax, DWORD PTR _b$[ebp]
	mov	DWORD PTR _a$[ebp], eax

Interestingly, it looks almost identical to the version above, but stores the result in a instead of c. This would probably not be the case if optimisation was enabled, but that would remove my entire loop. So I concluded that writing the addition differently would not make a significant difference to my results.

Dols008 12:00, 4 March 2008 (NZDT)