SE250:lab-3:Maintainers report
SOFTENG 250 - Lab 3: Resizeable Array-Based Lists - Lab Maintenance Report
Introduction
Lab 3 involved taking the pre-written shell structure of an array-based list, implementing this structure in a C program, and measuring the time taken for push/put operations while varying factors such as the minimum allocation size, growth factor of the array capacity and the pre-allocation.
Most people received results which were consistent with what was intuitively expected, but there were a few cases where the results of some simply didn't indicate any trend after varying certain factors - we suspect this was often a case of different compilers, platforms and workload from arbitrary background processes. Multiple trials could be used to minimise workload effects, but it doesn't seem that many people tried that... or perhaps they did try it, averaged the results and included those results in their lab reports but didn't mention it - it isn't obvious.
Task 1 - Default Settings
As the clock() function and use of a loop to measure execution time were both familiar from Lab 1, most people had few problems with the coding side of this.
Most people recorded their Task 1 results for a range of values of n. And most results showed that the time taken increased linearly with the increasing n values (as expected). The loop overhead might offset this slightly, but most people would expect a linear relationship otherwise. Very few people explained their reasoning behind their chosen range of n - perhaps they felt it wasn't needed as the nature of n was hinted in the lab briefing. Presumably a large value of n would minimise the effect of loop overhead which we encountered in Lab 1. Most people settled for a range of 1-10 million.
Time taken using arraylist_push 0.394000 seconds (value 5) Time taken using arraylist_push 0.429000 seconds (value 5000) Time taken using arraylist_push 0.393000 seconds (value 5000000) Time taken using arraylist_push 0.481000 seconds (value 5000000000)
Task 2 - Varying Growth Factors
This task involved using the default settings of Experiment 1, but varying the growth factor. The majority of people correctly predicted that increasing the growth factor would speed up the program execution but squander memory and eventually cause the program to crash.
Unfortunately some people did not experiment with a very large range of growth factors, and so perhaps couldn't discern a trend based on results. Most people didn't seem to intrinsically understand why small factors (>=1 but still small) caused the program to crash.
3 Growth Factor - The for loop of 10 million loops took 594 ms. 9^99999 GF - The for loop of 10 million loops took 531 ms. 1 - Error - /usr/bin/bash: line 1: 5576 Segmentation fault (core dumped) ./arraylist2.exe. 1.1 - The for loop of 10 million loops took 1204 ms. 1.2 - The for loop of 10 million loops took 890 ms. 1.3 - The for loop of 10 million loops took 750 ms. 1.4 - The for loop of 10 million loops took 672 ms.
Explanations:
A factor of 1.07 allows the program to run properly, albeit slowly. However, as we go from 1.07 to 1.06, this alters whether the program can run, on both systems I tried. Before applying the growth factor for the first time, the array is 16 elements long (the minimum allocation). 16 * 1.06 = 16.96 which is then cast down as an integer to 16... 16 * 1.07 = 17.12 which is cast down to 17... clearly applying a factor of 1.06 doesn't actually increase the array's capacity.
As expected, a larger factor results in a quicker runtime because that means that less time is wasted increasing the array capacity over and over again which is quite time consuming because realloc() usually has to move all the existing array data for this to happen. The advantage of a larger factor levels off as the newfound capacity exceeds the capacity we actually require.
Task 3 - Varying ARRAYLIST_MIN_ALLOC
This task involved using the default settings of Experiment 1, but varying ARRAYLIST_MIN_ALLOC and recording the results. For the most part here, people understood why setting ARRAYLIST_MIN_ALLOC to small values didn't really affect times much. As explicitly requested in the lab briefing, people experimented with high values of ARRAYLIST_MIN_ALLOC which led to faster runtimes because the array had to be reallocated far less often. Obviously, setting this value to an extreme number causes the program to crash.
ARRAYLIST_MIN_ALLOC time 16 405 *from previous example* 2 452 8 421 32 452 100 390 10000 373 100000000 327
and use different system,the result is very different,below shows the result from use iMAC
ARRAYLIST_MIN_ALLOC time 1 415 3 373 6 355 16 348 30 387 100 353 1000 374 10000 335 100000000 277
even though small values, time change as well.
Task 4 - Calling ensure_capacity at start
This task involved using the default settings but calling ensure_capacity at the start of the program. I'd say most people understood that Task 4 with a high capacity set aside is more or less the same as Task 3 with ARRAYLIST_MIN_ALLOC set to quite a high value. Task 4 sets aside a huge amount of memory aside at the very start of the program whereas Task 3 does this during the first iteration of the loop (and whenever required thereafter)... so they're very similar in operation. But it is worth noting that people received a range of results depending on the value they passed to ensure_capacity(), which wasn't explicitly defined in the briefing. The idea of that question was that we should examine our value of n, consider the number of operations/trials we were doing etc. and estimate a good value to pass to ensure_capacity().
ensure_capacity(&workingArray,50000000) Times (s) = 0.033, 0.032, 0.032, 0.032, 0.032 Average Time (s) = 0.032 s
Task 5 - Using an arithmetic growth strategy instead of a geometric one
This task involved experimenting with an arithmetic (incremental) growth strategy instead of the geometric one from previous experiments. Most people found that using an arithmetic growth strategy slowed the program execution by quite a bit. Again there was quite a bit of room for different settings here... different values of n and different increments. If n/increment is too large, realloc() has to be called many times and this results in a time penalty. Naturally, setting the increment to an extremely high value will ensure that realloc() only has to be called once or twice and so fast times can be achieved (as with Task 3 and 4). Of course, this would be more appropriate for pre-determined arrays.
Capacity+5,000,000 and n=1,000,000 lots of 10 pushes Times (s) = 0.051, 0.051, 0.051, 0.051, 0.051 Average Time (s) = 0.051 s Capacity+10,000,000 and n=1,000,000 lots of 10 pushes Times (s) = 0.040, 0.040, 0.040, 0.040, 0.040 Average Time (s) = 0.040 s Capacity+50,000,000 and n=1,000,000 lots of 10 pushes Times (s) = 0.031, 0.032, 0.032, 0.032, 0.032 Average Time (s) = 0.032 s
when add the number is bigger, the result looks more strange, sometimes it takes longer,sometimes it takes very short.
Task 6 - Using arraylist_put instead of arraylist_push
This task involved repeating experiment 1 with default settings, but putting elements at the front rather than pushing elements on to the end each time. Using arraylist_put instead of arraylist_push resulted in much slower times for practically everybody, and the vast majority of people were able to provide an explanation. Most found that the times exponentially increased as n was increased.
From code inspection, it should be fairly obvious that putting an element at the start of the array means that the whole array has to be shifted along one space by memmove()... this is extra work on top of the rudimentary parts of the function (realloc() etc ). As the array increases in size, it takes even longer to shift the whole thing to the right by one position. This is where it might be more prudent to use another data structure.
Conclusion
This lab received a better student response than the last one, and as such, we think it is a good introduction to the mechanics of array-based lists. Most of the problems were based around inconclusive results as explained above... some people were also short on time and thus, didn't finish their lab reports. Since most of the code was provided, this was a relatively painless lab... and very educational.
Perhaps in the future, people could include compiler/platform information with their lab reports, especially with labs involving measurement.