Algorithms and Data Structures 2020

Week 1. Foundations of Algorithms

Key Concepts

2.1 Key Concepts

2.1.1 Description of Algorithms
2.1.2 Analysis

2.1.1 Description of Algorithms

Consider the sorting of a bridge hand. Thirteen cards to be sorted in increasing order.

We sort first by suit, so that

♠ > ♡ > ♢ > ♣

and then within each suit so that

A > K > Q > K n > 10 > 9 > 8 > 7 > 6 > 5 > 4 > 3 > 2 .

Step 1. Specification of the Concrete Problem

Input. A hand of 13 cards.

Output. A hand of 13 cards sorted in increasing order.

Step 2. Specification of the Abstract Problem

Input. An array of $n$ objects, $A_{1}, A_{2}, \dots, A_{n}$ .

Output. An array of $n$ objects sorted in increasing order.

The objects can be of any type (class), as long as we have a binary relation $\leq$ , so that for any two objects $x, y$ we can determine whether $x \leq y$ is true or false.

In algorithm theory, we tend to think of the objects as numbers, so that we have an intuitive grasp on what $\leq$ means, but there is no loss of generality. In Object Oriented Programming, the notion of less than or equal is encapsulated in the class, and we may have to rewrite it as $x . isSmallerThanOrEqual (y)$ instead of $x \leq Y$ . In Functional Programming, the function implementing $\leq$ may be passed as an argument alongside the array $A$ using a lambda expression which also the later versions of Java supports, and in this case it may be rewritten as $isSmallerThanOrEqual (x, y)$ instead of

Step 3. Pseudo-Code One way to solve the sorting problem is the following.

Input: Array $A_{i}$ , $i = 1, \dots, n$
Output: The same array $A_{i}$ , $i = 1, \dots, n$ sorted in place so that $A_{1} \leq A_{2} \leq \dots \leq A_{n}$

1 for

i : = 2, 3, \dots, n

j : = i

3 while

(j \geq 2)

and

(A_{j} < A_{j - 1})

4 swap

A_{j}

and

A_{j - 1}

j : = j - 1

Observe how we combine well-known programming constructs (for, while), mathematical notation ( $\geq, A_{i}, A_{j}$ ), and natural language (swap). This hybrid language is called pseudo-code. There is no standard for how to write it. As long as it is legible and unambiguous to human readers, it is ok.

I have in this case used $: =$ as the assignment operator, just to illustrate the variation you will encounter. Java and C uses the equality sign = for assignment, and for equality they use a double equality sign ==.

Some authors would prefer pseudo-code closer to their favourite programming language, for instance:

1         for i = 2 to n
2            j = i
3            while (j >= 2) and (A[j] < A[j-1]
4               swap A[j] and A[j-1]
5               j = j-1

This is a matter of taste, more community taste than personal taste though, but you may have to deal with more than one community..

The ‘swap’ line deserves some comment, as we use this in both the example styles, in spite of its being far from known programming languages, where it would have to be rewritten as

1 t =

A_{j}

A_{j}

A_{i}

A_{i}

= t

The swap statement is a simple and easily understandable instruction in natural language, and most human reader would find the programming construct much harder to read. This is why we resort to natural language here. In the choice of style, you should always strive to maximise the reader’s comprehension.

2.1.2 Analysis

Empirical versus Theoretical Analysis In the first year, you have had to test your programs, to see if they work as intended. Testing is, of course, important both in its informal forms and in more structured and analytical forms, where we can talk about empirical analysis.

Empirical analysis is limited, however, by the number of examples you have time to test. Each test you make will only validate one single case, and there may be an infinite number of cases with different properties.

Algorithm theory, and therefore this module, focuses on theoretical analysis, searching for proofs that are valid for any data set. Note that theoretical analysis will also help in the design of good and complete test sets for empirical analysis.

Correctness of Insertion Sort Let’s see what it does.

The index $i$ is the card we are looking at. Cards to the left have already been looked at, and cards to the right not yet. In the first iteration, $i = 2$ , so there is only one card to the left. Obviously an array of a single card is always sorted. The inner loop (while) tries to insert the $i$ th card in the correct position among the $i - 1$ previous cards.

Some programming languages allow assertions, establishing claims relevant to the analysis. Let’s write them into the pseudo-code as follows.

1 for

i : = 2, 3, \dots, n

2 assert

A_{1} \leq A_{2} \leq \dots \leq A_{i - 1}

j : = i

4 while

(j \geq 2)

and

(A_{j} < A_{j - 1})

5 swap

A_{j}

and

A_{j - 1}

j : = j - 1

7 assert

A_{1} \leq A_{2} \leq \dots \leq A_{i}

8 assert

A_{1} \leq A_{2} \leq \dots \leq A_{n}

When the programming languages support assertions, they are tested in debug mode, to identify places where critical assumptions are broken. In theoretical analysis, they are claims used to structure the proof. The two first assertions are examples of loop invariants, i.e. properties which invariable holds in every iteration of the loop.

The purpose of the while loop, is to reestablish the loop invariant for when the index $i$ increases.

The first iteration of while compares $A_{i}$ to $A_{i - 1}$ (because $j = i$ ). If $A_{i}$ is larger, it belongs where it is and the while loopis not entered. If it is smaller, the two cards are swapped, and the loop continues to compare it with the next card.

When the while loop terminates, the $i$ first cards are in sorted order, and in the next iteration of the for loop, we can again say that the cards to the left of $i$ are sorted.

Complexity of Insertion Sort

Number of instructions. What is an instruction?
Formal computing models
Worst case, best case, and average case
Complexity