SE250:HTTB:WIP:Trade-offs
TODO: Image for Trade-Offs
Notes to anyone doing this page for the HTTB
CURRENT STATUS: NEAR ENDING
I have partially moved this page to the final site SE250:HTTB:Trade-offs so as to get some marks in case if it gets marked before this is fully completed.
NOTE: EDIT HERE AND CARRY ACROSS CHANGES
Need some more diagrams/animations please
Also see the discussion page and post your thoughts and ideas on it Talk:SE250:HTTB:WIP:Trade-offs
Need to Cover
Feel free to add here or discuss
- General memory vs CPU (and techniques to maximise each)
- ArrayLists vs LinkedLists
- For short and Long lists
- Comparison of Data Structures (including comparing Big O ) notation
- All data structures we have used(AL, LL, HashMaps, BST's)
- For Methods that can be sorted (or height be reduced) is the overhead worth it)
- Searching
- Naïve methods (Un-informed)
- Breadth-first, depth-first
- Smarter methods
- Informed methods...
- Naïve methods (Un-informed)
Any others?
Any point of having parsing here?
Introduction
(possibly make General Trade-offs specific to 250 in its own section?) When it comes to engineering software we may particular constraints that we have to fulfill or that we wish to desire to have, these can be put down to the basic three: Time, Resources and Use. In our endeavors we try to maximize all three but often we have to make Trade-offs. Here we will cover the Trade-offs and point out what would be better for what.
We can further specify our basic three to elaborate it for Software Engineering:
- Time
- We may have a set amount time, as little as possible or none at all
- Resources
- A small or large amount of memory available.
- A slow or fast CPU
- A low or high amount of Hard Drive space (not covered here though).
- Use
- May require to get your information back in a particular order
- May need to insert in either: In a particular order or randomly
Comparison of Data Structures
Data structures form the back bone of any program that needs to store information. They also decide how the information is going to be accessed as well as time taken to access and store it.
Array-based Lists
Main article: SE250:HTTB:WIP:Array-based_Lists
Characterization of Array-based List
1.Size must be predetermined before the Array can be allocated.
2.Cannot grow beyond their predetermined size.
3.Whenever the list contains only a few elements,a substantial an\mount of space might be tied up in a largely empty array
4.There is no wasted space for an individual element.It is completed filled,there is no storage overhead.
Linked Lists
Main article: SE250:HTTB:WIP:Linked_Lists
Characterization of Linked List
1.It have the advantage that they only need space for the object actually on the list.
2.NO limit to the number of element on a linked list,as long as there is free-store memory available
3.It require a pointer be added to every list node
Array-based List and Linked-list
Efficiency of memory space
Linked lists required a pointer be added to every list node.If the element size is small,then the overhead for links can be a significant fraction of the total storage.If a array-based list is completed filled,there is no storage overhead.In this situation,the array-based list is more space efficient by a constant factor.
A simple formula can be used to determine whether the array-based list or linked list will be more efficient in a particular situation.
n = number of element
P = size of a pointer in storage unit
E = size of a data element in storage unit
D = maximum number of list elements
The amount of space required for the Array-based list:
D‧E
The amount of space required for the linked List:
n‧(P+E)
In general,the linked list requires less space than array-based list when relatively few elements are in the list.
Conversely,the array-based list become more space efficient when it is close to full.
Using the equation,we can determine the break-even point beyond which the array-based list is more space efficient in any particular situation
n〉D‧E/(P+E)
If P=E(4-byte long element and 4-byte pointer)
The break-even point is at D/2.
That means a Array-based list would be more efficient whenever the array is more than half full.
Linked list is better when number of element varies widely or is known.
Array-based list are generally more space efficient when the user knows approximately how large the list will become.
Access Time
Array-based List are faster for random access by position,and the operations always take O(1) time.Linked list has no explicit access to the previous element,
and access by position requires marching down the list form the front to the specified position. These operation s require O(n) time in the average and worst cases.
To visually sum this up see hals016's Screen-Cast on comparing Array-Based List's and Linked-Lists
Hash Tables
Main article: SE250:HTTB:WIP:Hash_Tables
Hash Tables are quite possibly the best data structure generally one can use if they don't want the data automatically sorted (E.g. Names sorted alphabetically), as they strike a balance between Memory use and CPU time.
- Advantages
- Can be constant access and insert Time ( O(1) (...Link to Big O table)
- Disadvantages
- Can have same memory overhead as a Linked-List
- Not user sorted
- Only as good as the Hashing function (Could index to particular rows too many times)
Binary Search Trees
Main article: SE250:HTTB:WIP:Binary_Search_Trees
Binary Search Trees have a desired property of being easily able to get sorted information as one of the quickest of our main data structures.
They can also have some calculation overhead if it is a self balancing tree as every time a node is inserted it will have to make the call of balancing the tree so as to have the minimal height. But this overhead can reduce its overhead (if it can effectively sort out the tree) as when it comes to accessing elements it will reduce the time taken (O(log n)).
Efficiency of memory space
The overhand in BST depends on several factors including which nodes store data values,whether there is a parent pointer,whether the leaves stores child pointers,and whether the tree is a full binary tree.
In a simple pointer-based binary tree,every node has two pointers to its children.
n = numbers of node
p = the amount of space required by a pointer in storage unit
d = the amount of space required by a data value in storage unit
The total space amount for a tree of n nodes:
n*(2p+d)
The total overhead space will be
2pn
for the tree.So the overhead fraction will be
2p/(2p+d)
If we assume that p=d,then a full tree has about two thirds of its total space taken up in overhead.Also,about half of the pointers are "wasted" NULL values that serve only to indicate tree structure,but which do not provide access to new data.
Algorithms
Algorithms decide how we interact with our data and as such the same Trade-offs apply. Here we will discuss how each apply what you may want to use in practice.
State Based Search
Main article: SE250:HTTB:WIP:State-space_search
This is based on how much of the current state you know and use in deciding your searching algorithm and whether you can implement your knowledge or not.
Uninformed Search
Uninformed searches can be thought of as naive and a means of "brute-force". In other words they are like blind searches, they search a tree without any information that helps speed up the time to reach the goal. The two uninformed searches we are learning are the Breadth first search and the Depth first search. In these types of searches there are four main categories to determine the trade-offs:
- Time complexity – How long, in the worst case, the algorithm takes to find the goal?
- Space complexity – How much space is needed, in the worst case scenario, for storage by the algorithm?
- Completeness – Is the algorithm guaranteed to find a solution?
- Optimal – Does the algorithm find the shortest path to the goal (if there is more than one goal)?
To get an idea of how these algorithms work here are the links to the screencasts:
BFS and DPS
Trade-offs: The following are used inside the equations: b: maximum branching factor of the search tree (largest number of children branching off one node) d: depth of the least-cost solution m: maximum depth of the state space (the maximum height when all the possibilities are shown) Breadth first search
- Time complexity - Since in the worst case, breadth-first search has to consider all paths to all possible nodes. The time complexity is therefore "b + b2 + b3 + ... + bd"(every branch of every node) which asymptotically approaches O(b^(d+1)).
- Space Complexity - Since all of the nodes of a level must be stored in memory until their children in the next level have been generated, the space complexity is proportional to the number of nodes at the deepest level. The asymptotic space complexity is the number of nodes at the deepest level, O(b^(d+1)).
- Completeness - Yes since it visits every node.
- Optimal - Yes if cost per path is 1.
Depth first search
- Time complexity - Assuming worst case, there is 1 goal leaf all the way to the RHS at depth d. DFS will be exponential O(b^m).
- Space Complexity - Space complexity of DFS is much lower than BFS (breadth-first search). It also tends to behave much better with heuristic methods of choosing a likely-looking branch(leads closer to the goal). At depth m we have b nodes and b-1 nodes at earlier depths "total = b + (m-1)*(b-1)", therefore the space complexity is linear, O(bm). This is terrible if m is much larger than d however, if solutions are dense it may be much faster than BFS.
- Completeness - No, fails in spaces with loops and large depth trees.
- Optimal - No
Conclusion
Time complexity is the same but in the worst-case BFS is always better than DFS. Sometimes DFS is better if there are many goals, no loops in the state spaces and no large depth paths. BFS is much worse memory-wise and may store the whole search space while the space of DPS is modeled linearly.
In conclusion, BFS is better if the goal is not deep, there are infinite paths, there are many loops and a small search space. While on the other hand, DFS is better if there are many goals and not many loops. Regardless of conditions, DFS is always better in terms of memory.
Informed Search
Informed searches are done by providing heuristic information with the problem that might help reduce the amount of traversal required to reach the goal by the search algorithm. The informed search we learn is Best First Search. In a informed search, the trade-off can be between these:
- Time complexity - Whether or not the program is indeed faster.
- Programming complexity - Whether the priority function is worth the time coding.
- Completeness - Whether the heuristic information given is correct.
- Optimal solution - Whether the priority function does find the shortest path to the goal.
ScreenCasts
User | Description |
---|---|
hlin079 | The differences between hash tables and binary search trees |
dcho040 |
|
hals016 | The trade offs of a Linked List and an Array List (possible exam question) |
hals016 | The trade offs of Breadth-first search and Depth-first search algorithms |
Contributors
jhor053 (Editor, so blame me if anything goes wrong )
dcho040 (screen cast)
References
John Hamer's Lectures
Mark Wilson's Lectures and Notes (via [1])
Recordings of Minutes by Fellow Students
SOFTENG250 2008 Lab reports
<references/>