SE250:lab-3:npit006

From Marks Wiki
Revision as of 05:19, 3 November 2008 by Mark (Sọ̀rọ̀ | contribs) (74 revision(s))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

SOFTENG 250 - Lab 3: Resizeable Array-Based Lists

Introduction

This lab involved taking the pre-written shell structure of an array-based list, implementing this in a C program, and finally measuring the time taken for push/put operations while varying factors such as the minimum allocation size, growth factor of the array capacity, the operations themselves etc. I completed this lab on campus using MSVC, but I was unhappy with my results so I repeated the tests at home using Dev-C++ which gave more feasible results.

After familiarising myself with the pre-written code, I wrote my own 'main' C file which included the ArrayList functions and structure mentioned in arraylist.h. Most of my code is intuitively obvious, but for reference, here is my code for measuring the time taken for the lab operations.

for (totalLooper=0;totalLooper<5;totalLooper++){
     initialTime=clock();

     for (looper=0;looper<n;looper++){
          //Operation to be measured goes in here.
          //repeat operation
          //repeat operation
          //repeat operation
          //repeat operation
          //repeat operation
          //repeat operation
          //repeat operation
          //repeat operation
          //repeat operation
     }

     totalTime+=(clock() - initialTime);
}

totalTime /= 5;


I set n to 1,000,000 for Task 1. I used the clock() function from Lab 1 to time n iterations of 10 operations 5 times over and divided totalTime by 5 in order to obtain an average time. And I executed my program 5-10 separate times for each test to further minimise random error. In order to minimise the effects of loop overhead, I repeated the operations in the loop so that I had 1 million loops of 10 operations rather than 10 million loops of 1 operation... thus the loop overhead per operation is lower.


Task 1 - Default Settings

As already mentioned, I set n to 1,000,000 repetitions of 10 pushes for Task 1. With the default settings of ARRAYLIST_MIN_ALLOC = 16 and growth factor set to a multiplicand of 2, I received the results below. They seem fairly reasonable and consistent.


ARRAYLIST_MIN_ALLOC = 16
GROWTH FACTOR = *2
N = 1,000,000


Times (s) = 0.038, 0.039, 0.038, 0.038, 0.038
Average Time (s) = 0.038 seconds

Task 2 - Varying Growth Factors

This task involved using the default settings of Experiment 1, but varying the growth factor. As predicted, large growth factors cause the program to crash as C runs out of memory to allocate. 75^x where x is the number of array resizings, can become very large very quickly.

On the other end. a factor of 1.07 allows the program to run properly, albeit slowly. However, as we go from 1.07 to 1.06, this alters whether the program can run, on both systems I tried. Before applying the growth factor for the first time, the array is 16 elements long (the minimum allocation). 16 * 1.06 = 16.96 which is then cast down as an integer to 16... 16 * 1.07 = 17.12 which is cast down to 17... clearly applying a factor of 1.06 doesn't actually increase the array's capacity.

As expected, a larger factor results in a quicker runtime because that means that less time is wasted increasing the array capacity over and over again which is quite time consuming because realloc() usually has to move all the existing array data for this to happen. The advantage of a larger factor levels off as the newfound capacity exceeds the capacity we actually require.

Multiplicand <= 1.06
Program crashes and closes immediately

Multiplicand = 1.07
Times (s) = 0.095, 0.096, 0.095, 0.095, 0.095
Average Time (s) = 0.095 s

Multiplicand = 1.1
Times (s) = 0.077,0.076, 0.077,0.076, 0.076
Average Time (s) = 0.076 s

Multiplicand = 1.5
Times (s) = 0.041,0.041,0.040,0.041,0.041
Average Time (s) = 0.041 s

Multiplicand = 2 (FROM TASK 1)
Times (s) = 0.038, 0.039, 0.038, 0.038, 0.038
Average Time (s) = 0.038 s

Multiplicand = 3
Times (s) = 0.035, 0.035, 0.035, 0.035, 0.035
Average Time (s) = 0.035 s

Multiplicand = 5
Times (s) = 0.038,0.037,0.037,0.038,0.038
Average Time (s) = 0.038 s

Multiplicand = 10
Times (s) = 0.033 0.034, 0.034,0.034,0.034
Average Time (s) = 0.034s

Multiplicand = 20
Times (s) = 0.032,0.032, 0.032, 0.032, 0.032
Average Time (s) = 0.032 s

Multiplicand = 50
Times (s) = 0.032,0.032, 0.032, 0.032, 0.032
Average Time (s) = 0.032 s

Multiplicand >= 75
Program crashes and closes immediately

Task 3 - Varying ARRAYLIST_MIN_ALLOC

This task involved using the default settings of Experiment 1, but varying ARRAYLIST_MIN_ALLOC and recording the results. For the most part here, there isn't really any discernible trend until we set the minimum allocation to a very high number and the program runs quickly. This is expected as, like in the last question, it requires very few array resize operations to achieve the capacity we require... few are required because the allocation is so large. As expected, it levels off since we start to allow for a bigger capacity than we require. And if the allocation size is too large, the computer runs out of memory and the program crashes - my system has 2 GB of RAM and thus can't handle an allocation size of 600 million which requires 2.4 billion bytes of memory.


ARRAYLIST_MIN_ALLOC = 1
Times (s) = 0.038, 0.038, 0.038, 0.038, 0.038
Average Time (s) = 0.038 s

ARRAYLIST_MIN_ALLOC = 4
Times (s) = 0.038, 0.038, 0.039, 0.039, 0.038
Average Time (s) = 0.038 s

ARRAYLIST_MIN_ALLOC = 16 (FROM TASK 1)
Times (s) = 0.038, 0.039, 0.038, 0.038, 0.038
Average Time (s) = 0.038 s

ARRAYLIST_MIN_ALLOC = 50
Times (s) = 0.037, 0.037, 0.037, 0.037, 0.037
Average Time (s) = 0.037 s

ARRAYLIST_MIN_ALLOC = 500
Times (s) = 0.037, 0.038, 0.037, 0.038, 0.038
Average Time (s) = 0.038 s

ARRAYLIST_MIN_ALLOC = 20000
Times (s) = 0.039, 0.039, 0.039, 0.039, 0.039
Average Time (s) = 0.039 s

ARRAYLIST_MIN_ALLOC = 200000
Times (s) = 0.036, 0.036, 0.036, 0.036, 0.036
Average Time (s) = 0.036 s

ARRAYLIST_MIN_ALLOC = 2000000
Times (s) = 0.037, 0.037, 0.037, 0.037, 0.037
Average Time (s) = 0.037 s

ARRAYLIST_MIN_ALLOC = 20000000
Times (s) = 0.037, 0.037, 0.037, 0.037, 0.037
Average Time (s) = 0.037 s

ARRAYLIST_MIN_ALLOC = 200000000
Times (s) = 0.032, 0.031, 0.032, 0.031, 0.032
Average Time (s) = 0.032 s

ARRAYLIST_MIN_ALLOC = 500000000
Times (s) = 0.032, 0.032, 0.031, 0.032, 0.032
Average Time (s) = 0.032 s

ARRAYLIST_MIN_ALLOC = 600000000
Program crashes and closes immediately

Task 4 - Calling ensure_capacity at start

This task involved using the default settings but calling ensure_capacity at the start of the program. I called 'ensure_capacity(&workingArray,50000000);' in order to set aside 50 million elements worth of array space at the start of the program. As you can see, the average time given below is much the same as the quickest time in the above two questions. This is expected as this operates very similarly to the code in Task 3. Task 3's code (when set to a high minimum memory allocation) will put away a large chunk of memory for the array as soon as the loop starts and the program wants to push one value onto an array with 0 capacity. Task 4 will put away a large chunk of memory, but before the loop runs for the first time.


Times (s) = 0.033, 0.032, 0.032, 0.032, 0.032
Average Time (s) = 0.032 s

Task 5 - Using an arithmetic growth strategy instead of a geometric one

This task involved experimenting with an arithmetic (incremental) growth strategy instead of the geometric one from previous experiments. In general, using an arithmetic growth strategy seems to be slower than a geometric one. Of course if the arithmetic factor is increased enough, then we achieve the optimal time of 0.032 s. That is effectively almost identical to increasing the minimum allocation size to something ridiculous or calling ensure_capacity(large number) at the start.


Capacity+1000 and n=10,000 lots of 10 pushes

Times (s) = 0.011, 0.012, 0.011, 0.012, 0.011
Average Time (s) = 0.011 s


Capacity+1,000,000 and n=1,000,000 lots of 10 pushes

Times (s) = 0.142, 0.141, 0.142, 0.141, 0.140
Average Time (s) = 0.141 s


Capacity+5,000,000 and n=1,000,000 lots of 10 pushes

Times (s) = 0.051, 0.051, 0.051, 0.051, 0.051
Average Time (s) = 0.051 s


Capacity+10,000,000 and n=1,000,000 lots of 10 pushes

Times (s) = 0.040, 0.040, 0.040, 0.040, 0.040
Average Time (s) = 0.040 s


Capacity+50,000,000 and n=1,000,000 lots of 10 pushes

Times (s) = 0.031, 0.032, 0.032, 0.032, 0.032
Average Time (s) = 0.032 s

Task 6 - Using arraylist_put instead of arraylist_push

This task involved repeating experiment 1 with default settings, but putting elements at the front rather than pushing elements on to the end each time. As the results show, this is a VERY slow process and I had to lower n in order to obtain reasonable times. As I doubled n, the time more than doubled - it exponentially increases. From code inspection, it should be fairly obvious that putting an element at the start of the array means that the whole array has to be shifted along one space by memmove()... this is extra work on top of the rudimentary parts of the function. Mathematically, the reason for the exponentially growing time is: Time to shift data α Array size α n


n = 500 lots of 10

Times (s) = 0.005, 0.005, 0.005, 0.005, 0.005
Average Time (s) = 0.005 s


n = 1000 lots of 10

Times (s) = 0.031, 0.031, 0.031, 0.030, 0.030
Average Time (s) = 0.031 s


n = 2000 lots of 10

Times (s) = 0.139, 0.140, 0.142, 0.141, 0.140
Average Time (s) = 0.140 s

Conclusion

There are risks and trade-offs when using arithmetic growth, geometric growth and putting aside memory at the start of the program. Geometric growth factors seem to be the most consistent, but there is always a danger of overstepping the computer memory as the array capacity increases very quickly. Putting aside memory at the start can be quite useful if we are able to anticipate what is required before the program is run. The problem with arithmetic is that it doesn't seem to scale as easily/well as geometric factors, but that can be remedied with a bit of tinkering. And lastly, a larger investment of memory (in one shot) for all systems, tends to result in a faster runtime because the program is not forced to continually resize the array and move plenty of data around using realloc().

I had next to no problems in this lab... in fact the only problems were the discrepancies between the MSVC campus results and the Dev-C++ home system results.