SE250:lab-3:hpan027
Not a very exciting lab as most of the time is spent waiting for the programme to run.
Task 1
The following code was added to the arraylist.c file
int main () {
    ArrayList myArray;
    int i;
    int n = 1000000;
    element_t value = 3;
    clock_t clockStart;
    clock_t clockFinish;
    clock_t timeTaken;
    arraylist_init(&myArray);
    clockStart = clock();
    for (i = 0; i < n; i ++ ) {
	arraylist_push(&myArray, value);
    }
    clockFinish = clock();
    timeTaken = ( clockFinish - clockStart );
    printf("%ld", timeTaken);
    
    return 0; 
}
The the time taken was recorded for different values of n between 1,000,000 to 1,000,000,000
n = 1,000,000 clock = 46
n = 10,000,000 clock = 672
n = 100,000,000 clock = 6640
n = 1,000,000,000 Programme runs out of memory
- Time taken scales in a somewhat linear fashion. However, sometime before n reaches the value of 1,000,000,000, the programme is stopped with the error message "assertion "alist->arr != (element_t*)0"". i.e. the programme has run out of memory and hence did not allocate any more to arr.
Task 2
Growth factor: 1.5
n = 1,000,000 clock = 94
n = 10,000,000 clock = 671
n = 100,000,000 clock = 7297
n = 1,000,000,000 Programme runs out of memory
- No drastic change here. Takes a little bit longer, and runs out of memory at the same place
Growth factor: 3.0
n = 1,000,000 clock = 46
n = 10,000,000 clock = 577
n = 100,000,000 Programme runs out of memory
n = 1,000,000,000 Programme runs out of memory
- Slight improvement to speed here, but the program runs out of memory earlier
Growth factor: 4.0
n = 1,000,000 clock = 77
n = 10,000,000 clock = 641
n = 100,000,000 Programme runs out of memory
n = 1,000,000,000 Programme runs out of memory
- At this point it's pretty clear that there aren't enough data to draw any conclusions, so more data need to be collected.
- The following data was collected using a loop with n as the counter variable.
(n = 1000000; n < 100000000; n += 5000000)
- Time in clock ticks along y axis, value of n along x. Each series represent a different growth factor
- We can see there is some relationship between time taken and growth factor - increasing growth factor tends to decrease time taken
- Higher growth factors run out of memory sooner
- The above relationship can be quite easily explained by how many times the array list has to be reallocated
1.5x 2x 3x 4x 5x A 27 15 10 7 6 B 11 6 4 3 2
- Row A is the number of reallocations required for a given factor to reach n = 1,000,000
- Row B is the number of additional reallocations required for a given factor to reach n = 96,000,000
- For larger growth factors, less reallocations are required - especially ones dealing with large chunks of memory - hence less time is taken
- The programme clearly runs out of memory somewhere between n = 10,000,000 and n = 100,000,000
- Closer examination of of values of n for different growth factors show that:
- Making the jump from 76,527,504 to 229,582,512 and 67,108,864 to 268,435,456 for factors 3x and 4x respectively results in the programme running out of memory
- Jumping from 31,250,000 to 156,250,000 for factor 5x did not cause any problems
- Hence the programme runs out of memory somewhere between 600mb to 875mb of memory used
 
- That was some pretty meaningless data
Task 3
Looking at the value of ARRAYLIST_MIN_ALLOC vs the number of reallocations required to reach n = 1,000,000 showed that there shouldn't really be much difference for differences in MIN_ALLOC, whereas increasing it by larger factors should have greater effects.
MIN_ALLOC Number of realloc to reach n = 1,000,000 8 17 16 16 32 15 10000 7 100000 3
Results for n = 1,000,000 to n = 10,000,000 step 1,000,000
8 16 32 10000 100000 1000000 62 46 62 62 61 2000000 109 94 110 110 110 3000000 157 157 156 187 156 4000000 264 171 235 219 235 5000000 329 250 297 250 282 6000000 329 328 359 376 359 7000000 406 438 439 437 453 8000000 484 453 468 453 516 9000000 579 562 593 515 546
- For smaller steps of n there is no difference
Hopefully there is some change for larger steps of n - as this is taking 1 minute each time it is run
Results for n = 1,000,000 to n = 100,000,000 step 5,000,000
8 16 32 10000 100000 1000000 62 62 62 78 62 6000000 344 360 376 344 328 11000000 688 641 671 734 609 16000000 1015 1032 938 969 1016 21000000 1360 1296 1328 1313 1328 26000000 1422 1719 1563 1703 1703 31000000 1984 1703 1781 1626 2016 36000000 2312 2172 2406 2124 2062 41000000 2657 2453 2468 2578 2297 46000000 2906 2749 3047 3109 2515 51000000 3000 3063 3031 3297 2734 56000000 3468 3280 3203 3437 3360 61000000 3548 3422 3609 3562 3969 66000000 3969 3797 3969 4141 4266 71000000 5016 4469 4625 4220 4703 76000000 5375 5047 5891 4343 4797 81000000 5422 5188 5563 4985 4937 86000000 6047 5562 5687 5500 5469 91000000 5640 5750 6046 5936 5343 96000000 6079 6172 5813 5720 5703
- Again no real difference
The time taken was expected to drop but this didn't really happen. This could be due to ratio of memory reallocations to the number of times that a variable is pushed
i.e Memory is reallocated about 30 times, while there are about 970,000,000 calls to the arraylist_push function.
However that doesn't explain why changing the growth factor makes a difference while the the MIN_ALLOC value does not.
It is therefore more likely that reallocating small chunks of memory does not take much time at all - which is really what changing the MIN_ALLOC variable reduces. Going from n = 1,000,000 to 100,000,000 still requires pretty much the same number of reallocs regardless of MIN_ALLOC.
Task 4
Data with and without calls to the ensure_capacity function
Without With 47 46 360 265 656 516 953 844 1281 1062 1468 1250 1922 1421 2282 1672 2656 2063 2844 2421 2985 2750 3250 2906 3563 3047 3828 3311 4391 3516 5250 3749 5250 4094 5484 4266 5531 4532 6547 4797
- There is a clear difference here. Notice that for smaller values of n, the difference is not as great. This reinforces the idea that larger chunks of memory takes more time to move - which is kind of intuitive in the first place.
Task 5
+1e6 +1e7 +1e8 1000000 31 47 47 6000000 468 297 281 11000000 1001 626 546 16000000 1656 937 843 21000000 2657 1265 1048 26000000 3843 1516 1265 31000000 5266 2094 1422 36000000 6781 2125 1687 41000000 8452 2859 1953 46000000 10750 3359 2141 51000000 12937 3718 2391 56000000 15173 4141 2891 61000000 17954 4719 3063 66000000 20469 5250 3328 71000000 23188 5875 3172 76000000 26938 6344 3813 81000000 30343 7047 4047 86000000 33548 7390 4250 91000000 38188 8312 4578 96000000 41781 8890 4767 Calls to realloc 970 97 9.7
- The realloc calls make a more noticeable difference in larger numbers
Task 6
arraylist_put(&myArray, value,0);
- Needless to say it takes an unreasonable amount of time to have to shift the whole array every single time especially when dealing with such large values of n. I gave up waiting after ten minutes as I've finished typing up the rest of my report.
- For smaller values of n it is most likely putting a value at the front of the array will take much longer