Collection Classes

Collections of data can be stored in various different manners in the computer. Each technique has certain properties and advantages that can be exploited in particular situations.

If data is of various different types, then they can be accumulated in a record, but the number of data items must be pre-declared. If multiple items of data are of the same type, then they can be stored in an array.

In the simplistic case, the items can be stored in the array and accessed when required. However, arrays are not very advanced data structures so every program must contain functions to access and manipulate the contents of the array. It is feasible for such instances to generalise the access methods so that they can applied to all similar problems. The operation of arrays can be extended by including useful functions (lists) and/or the arrays can be restricted to subsets of their fucntions, thereby forcing desirable properties on the data structures (stacks, queues). In all cases, the functions applied to the array need to be as efficient as possible to cater for arbitrarily large data structures.

Complexity

To accomplish the greatest possible efficiency, the programmer has to select algorithms that have the least complexity. There are various types of efficiency. Space efficiency means that the programs must take as little memory/disk space as possible. Time efficiency means that the program must execute as fast as possible. Space efficiency is accomplished quite often by using generalised dynamic data structures as studied later in this course. Time efficiency can only be realised by selecting the fastest algorithm in the particular context of the program. This implies that it must be possible to measure the speed of an algorithm. Since computers themselves have varying speeds, it is better to measure the time efficiency of an algorithm relative to other algorithms rather than as an absolute figure.

Instead of running a program and measuring its performance, it is more accurate to analyse the code to determine its efficiency. To do this, the programmer must know how fast each part of the code executes. It generally turns out that the primary function of an algorithm contributes the most to the time taken to execute. For example, in a mathematical program the mathematical operations take the most time; in a sorting algorithm comparisons of data take the most time; in a searching algorithm accessing of data takes the most time. Thus, in order to estimate the time taken by an algorithm, the number of basic operations must be counted.

If the number of basic operations is constant, irrespective of the number of data items in the collection (n), then it is said that the algorithm is of order 1, normally written as O(1).

If the number of basic operations is proportionate to n then it is said that the algorithm is of order n, O(n). Similarly, if the number of basic operations is proportionate to any formula involving n, then the algorithm is of that order.

For example, assignment of a constant value to a variable takes a constant amount of time, and is therefore O(1). Searching for a value in an array necessitates going through the entire array, there it is O(n). The bubble-sort algorithm uses (n-1)*(n-2) = n² -3n +2 comparisons. In this case, the value of n² increases much more than the other terms - n² is the dominant term - hence it is said that the algorithm is of O(n²).

For many algorithms it is also important to consider the various cases of execution: best, worst and average. The best case is when the algorithm terminates as quickly as possible - eg. when searching an array, the first element is the one sought. The worst case is when the algorithm takes the maximum possible execution time - eg. when searching an array, the last element is the one sought. The average case is an approximation of the average time taken for a run of the algorithm. Best cases do not occur frequently so they are not as useful as worst cases and average cases.

Lists

A list of elements of the same data type can be stored in an array.

To put an element into a particular position requires only a simple assignment, with O(1).

To retrieve an element from a particular position requires a data access, also with O(1).

To insert an element and all push subsequent elements to the right incurs between O(1) (best) and O(n) (worst).

To delete an element and move all subsequent elements to the left incurs between O(1) (best) and O(n) (worst).

Sample Code:
void List::Insert ( int Position, int Data )
{
   for ( int i=EndOfList; i>=Position; i-- )
      Storage[i+1] = Storage[i];
   Storage[Position] = Data;
   EndOfList++;
}
void List::Delete ( int Position )
{
   for ( int i=Position+1; i<=EndOfList; i++ )
      Storage[i-1] = Storage[i];
   EndOfList--;
}

Stacks

A stack is a linear list of elements such that the elements may only be inserted and removed from one end. This enforces the property that the first element added onto the stack is the last one that may be removed (FILO). In terms of a stack, PUSH refers to the addition of an element onto the end of the stack and POP refers to the removal of the element at the end of the stack.

Stacks are thus named because they are analagous to a vertical stack of physical objects.

Sample Code:
void Stack::Push ( int Data )
{
   Storage[EndOfStack++] = Data;
}
int Stack::Pop ()
{
   return Storage[--EndOfStack];
}

In the above code, EndOfStack points to the position where the next element will be stored. If EndOfStack = 0, then the stack is empty. It is advisable to check that the stack isn't empty before POPing an element. Also, the value of EndOfStack can be compared to the size of the array to check that it isn't full before PUSHing an element.

EndOfStack takes on an initial value of 0 when a stack is created/initialised.

Queues

A queue is a linear list of elements such that elements may only be added from one end and removed from the other end. Queues are analagous to people standing in a queue at, for example, a bank - the person at the head of the queue is assisted next while the next person arriving joins the queue at the tail. Unlike stacks, with queues the first element to enter the queue is the first element to be removed (FIFO).

Sample Code:
void Queue::Add ( int Data )
{
   Storage[Tail++] = Data;
}
int Queue::Remove ()
{
   return Storage[Head--];
}

Head points to the position where the next element will be removed from. Tail indicates where the next element will be added. Once again, empty and full queues must be checked for to avoid data corruption.

Since a queue moves continuously in one direction, a number of Add and Remove operations may eventually cause the queue to reach the limit of its storage space, even though the queue may not be full. To prevent this the array can be considered to be circular. After the last position is used up, the pointers for Tail/Head can wrap around to the beginning of the array (provided that that position is not currently occupied. A gap has to be introduced into the circular queue to distinguish between a full queue and an empty one. Thus, if Head = Tail, the queue is empty, but if (Tail+1)=Head then the queue is full - the (Tail+1) calculation must be done modulus the length of the array to allow for wrapping around of indices.