SE250:lab-6:Maintainers report
Task One
Student Approaches
The first task was completed by the majority of students. Of the students that displayed their code as a part of the lab report, there seemed to be two main approaches to this problem:
The most common approach was to create a tree using makeRandomTree(size) and calculate the height using the function height(Node*). A simple for loop was used to increase the height of the tree and print the height to the screen each time the loop was run.
There was a wide variety of values used for the tree sizes. Some people used from 1 to 10, while others went all the way to 1000, with the majority of students using up to a couple hundred.
The code looked something like:
int size, heightOfTree; Node* randomTree = 0; for (size = 1; size < (a number); size+5) { //There were some variations on the size+5 - some used size++, size+10 etc. randomTree = makeRandomTree(size); heightOfTree = height(randomTree); printf("Size = %d Height = %d", size, heightOfTree); }
The problem with this code, as expressed by Mark Wilson during the post-lab lecture, is each tree of x size is only generated once, i.e. each tree height is only sampled once, which is not statistically sound.
The other approach took this into account by using a for loop to create multiple trees of size x. This was the less common of the two main approaches.
int size, i; double averageHeight; Node* randomTree = 0; for (size = 1; size < (a number); size+5) { //There were some variations on the size+5 - some used size++, size+10 etc. averageHeight = 0; for(i = 0; i < 100; i++) { //Loop to generate tree height multiple times randomTree = makeRandomTree(size); averageHeight += height(randomTree); } printf("Size = %d Height = %d", size, averageHeight/100); }
However, during to the post-lab discussion, it was brought up that because rand() was not seeded, it would produce the same results each time, therefore the extra loop did not make a difference. However, despite some students recognising this, no one actually seeded rand() during the lab.
Results
While I would have liked to graph people's results, this was not feasible, owing to the fact that there was not enough data. Instead I have placed here a graph of the results, taken from jhor053's lab report (with permission). Looking at the graphs of people that did the graphs and were able to load their graphs onto the wiki, this is a typical graph.
The graph produced is set when the tree height is incremented by 1. As can be seen, the graph fluctuates a lot but follows a general trend.
<html><img src="http://www.gotmilk.co.nz/~gm_modnar/250labs/lab6/task1-01.jpg" border="0" /></html>
We know from the lecture on Friday, that the height = log(base2)(n).
For comparison, here is a graph taken from rwan064's lab report of log(x) vs. x:
<html> <image src="http://studwww.cs.auckland.ac.nz/~rwan064/lab6/xlogx.png" width="552" height="340" alt="task1" /> </html>
There were some speculation in the lab reports as to the nature of the relationship. Not all people correctly determined that it is a logarithmic relationship. Some people believed that it was a square root relationship, others thought it looked linear. While the general shape of the curve is similar to that of a square root graph, that relationship is easily discounted when the actual values are examined. As for the relationship being linear, this is clearly not the case when enough values are looked at.
Four people used excel to find a line of 'best fit'.
The results are:
height = -0.0002*size^2 + 0.1029*size + 5.3138 height = 1.8378*size^0.4463 height = 1.8364*size^0.4466 height = 3.4896ln(size) - 3.0508
What is interesting is that only one of these relationships is logarithmic - all the rest are polynomial. This could be due to a number of factors: they did not chose the 'best' line of best fit, or that there results were affected by problems mentioned earlier in the report with people not seeding rand() and/or not taking multiple samples of each height.
Problems Encountered by Students
- One of the most common problems was loading the graph onto the wiki as students are not allowed to directly upload images. It was easy for those who had already set up their intranet or had access to an image-hosting site, but most of the others did not put their graph onto the wiki.
- While task one was relatively straightforward it did involve writing your own experiment and some students have mentioned that this task seemed to take a long time. We had to figure out the implementation of the code and then to write it (without any bugs) and also to then produce the graph. Possibly as a result, most students were only able to complete up to Task Three.
- A mistake that arose among people who used two for loops was forgetting to reset the average to 0. However as this problem only occured in a few lab reports I am not sure how widespread this was.
- Some students mentioned during the post-lab lecture that they were unable to get the code to run at all, however luckily this was not the case with most students and seemed limited to those using Visual Studio.
Task 2
Overview
For this task we had to implement functions to return the node with the smallest, and largest, keys in a tree. The majority of students where able to do this task, with minimal problems.
Common implementations
All the students that attempted to task realized that you simply had to move down the left or right side of the tree. There are not many ways you can do this, but a few different structural setups where used.
The most common structure was using a while loop:
Node* minimum( Node* node ) { while (node->left != empty) node = node->left; return node; } Node* maximum( Node* node ) { while (node->right != empty) node = node->right; return node; }
However, this common code actually results in a segmentation fault if called with an empty tree. Some students realized this(but not a lot of them), and added a if statement check before the while loop to work around this problem.
A couple of students used a for loop to do the incrementation internally, and there was one recursive implementation: (the need for the return statement before the recursive call is unclear)
Node* minimum( Node* node ) { if( node->left != empty) return minimum(node->left); return node; } Node* maximum( Node* node ) { if( node->right != empty) return maximum(node->right); return node; }
There where also a few students who had incorrect code(more than just the empty tree issue), but due to a lack of testing provided, apparently had not realized this. For example, this code has the return statement inside the while loop, rather than after it, resulting in the functions always returning the first left/right node, if it exists:
Node* minimum( Node* node ) { while (node->left != empty) { node = node->left; return node; } } Node* maximum( Node* node ) { while (node->right != empty) { node = node->right; return node; } }
Conclusion
This task was found to be quite easy by most, but due to a lack of easy way test it, some incorrect implementations slipped through. Next time it might be good to include some test code, to ensure that what is asked for is actually delivered.
Task 3
Overview
A quick brief is to write the function lookup taking a int called key and the pointer to a Node called node and to find the location of the key and return the pointer to the location of the key.
Main solutions done by the class
Recursion
Taken from dols008's work which was a popular solution but not as popular as use of the while loop.
Node* lookup( int key, Node* node ) { if (node == empty || key == node->key) return node; if (key > node->key) return lookup(key, node->right); else return lookup(key, node->left); }
The key here was to check first if the passed node was either empty or contained the key so as to not lead on a wild goose chase.
While Loop
Take from stsa027's work which was the more favoured way of doing it.
Node* lookup( int key, Node* node ) { while (key!=node->key & node!=empty){ if (key >= node->key) node = node->right; if (key < node->key) node = node->left; } return node; }
A class member (bvic005) even reasoned why as to to use a while loop "I suppose i could have used recursion for this, but decided against it as there is almost no difference in code size, recursion is harder to write and for others to understand quickly, and it is often less efficient than loops at runtime anyway."
Another key here was to make sure the node being circulated in the loop didn't contain the key and the node was not empty.
Traps for Task 3
One general traps the class fell into was that of when using the defining properties of a Binary search tree they forgot to take into account if the key was only equal or if they did they included equal and greater then checked after if it was equal but due to the nature of it being recursive it came up with the first if statement.
if ( node->key <= key ) { //wiki edit: possible solution is to remove the = from the line lookup(key, node->right); } else { lookup(key, node->left); } if ( node->key == key ) {
Another was that some felt it hard to test to see how their code was in action and had to rely on the known values of the not so random rand() in the make random tree function. One solution to this was to use the previously done maximum function to get a known key. Another was to print the tree out first then get the users input via scanf then find the location of the key.
Some felt that the lookup function was at this point was useless but a possible way would be to see if the said value actually exists (so could use it as a rudimentary search function).
Task 4
There seemed to be a lack of time for task 4 after students attempted the previous questions. Other reasons reasons of not completing task 4 include not being able to understand what the task demanded, or what 'pre-order' actually means in terms of iterating through a tree.
Among the class, there were about a dozen proposed solutions, fewer complete attempts, and still fewer that seemed like it's close to working.
Most of the proposed solutions look similar, presumably because they have a similar rationale behind them. The most common approach was writing code that dealt with all significantly different situations of moving to the next or previous node, for example, from a right/left leaf, or from a right or left child.
The main problem, in terms code writing, seems to be not accounting for all possible cases (of which there are many). For example the following code will not give the right 'next node' if the current node is a leaf of a parent with no right child. However, it does seem to give the correct result for all other cases.
Node* next_pre( Node* node ) { if( node == empty ) //empty tree return empty; if( node->left == empty && node->right == empty ) { //case for leaf while( node->parent != empty) { //backtrack if( node == node->parent->left && node->parent->right != empty) return node->parent->right; node = node->parent; } return empty; } else { //case for parent/root if( node->left != empty) return node->left; return node->right; //else }
Also, it was noted that the code "node == node->parent->left" or "node == node->parent->right" was commonly used to circumvent the 'rule' of not using the key's value to determine whether the node is a left or right child, as seen in the last code example.
The following code is a correct version of the next_pre function (in my opinion), written by bvic005:
Node* next_pre( Node* node ) { if(node != empty) { if(node->left != empty) return node->left; if(node->right != empty) return node->right; while(node->parent != empty) { if((node->parent->right != empty) && (node->parent->right != node)) return node->parent->right; node = node->parent; } } return empty; }
The following is also correct, and uses a similar approach to complete the function. However, instead of using "node->parent->right" to detect when the node is returning to the original right node, dcho040 uses the local variable "before" to store information about the right node that we do not want to return to.
Node* next_pre( Node* node ) { if (node == empty) return empty; if (node->left != empty) return node->left; before = empty; while (node != empty) { if ((node->right != empty) & (node->right != before)) return node->right; before = node; node = node->parent; } return node; }
The problems encountered in writing the second function for task 4 were similar - the main problem is not foreseeing all possible scenarios when moving to the previous node, especially when we have to move backwards from a right child back to a left subtree with children, or that fact that that left subtree's nodes may have ONLY a left child, in which case it would be wrong to just use the maximum function to just progress down the right path, as was done by stsa027 and bvic005.
Copy of prev_pre by dcho040 that avoids the error just mentioned:
Node* prev_pre( Node* node ) { if ((node == empty) || (node->parent == empty)) return empty; if (((node->parent)->left == node) | ((node->parent)->left == empty)) return node->parent; node = (node->parent)->left; search = 1; while (search){ if (node->right != empty) node = node->right; else if (node->left != empty) node = node->left; else search = 0; } return node; }
sdal039 gave code which iterated through all elements in the tree in pre-order. Not what was required, but still interesting. One thing that struck me was how short the function was, compared to just going to the next node in pre-order.
Node* next_pre( Node* node ) { if ( node != empty) { printf("%d, ", node->key); next_pre(node->left); next_pre(node->right); } return node; }
Task5
Task6
Overview
Very few managed to get it done at all with only a few managing to get it done and this was post lab time according to the history page
Main example
based off hpan027
#include <math.h> int height = ceil( log( n + 1 ) / log( 2 ) - 1 ); Node* newTree = mkNode( pow(2, height), empty ); int i,j; i = height - 1; while( i >= 0 ) { j = 0; while( pow(2, i) + j * pow(2, (i+1)) < n ) { insert( pow(2, i) + j * pow(2, (i+1)), &newTree ); j++; } i--; } return newTree;
This relies off the formula as seens in class it seems (unless taken from elsewhere)
Overall
The Vast majority of the class did not get over Task 3 during the lab time and post that a few more went onto task 4. Generally a good effort for what was done and well written code.
For next year I don't think the lab should be shortened too much as there is quite a lot of content that could be done by more enthusiastic or faster people outside of the lab too.