Algorithms and Data Structures 2020

Week 1. Foundations of Algorithms

Algorithms and Data Structures

2 Week 1. Foundations of Algorithms

Reading 1 Goodrich, Tamassia, & Goldwasser: Chapter (1), 2, 4.1, and 4.4.

Chapter 1 and most of Chapter 2 is cursory background material, but it is useful reading because it will help you connect the theoretical and high level contents of the module to your concrete experience with Java programming.

Algorithm Theory works with Language-Independent Models. We do not want to be tied up in the details of particular programming languages. This is important. We choose a simpler language to be understandable by a wider audience, across different language traditions (C, Java, Ada, Python).

The first thing to do in this module is to learn the basics of this language. We shall try to map the concepts to Java jargon as we go along.

2.1 Key Concepts

2.1.1 Description of Algorithms
2.1.2 Analysis

2.1.1 Description of Algorithms

Consider the sorting of a bridge hand. Thirteen cards to be sorted in increasing order.

We sort first by suit, so that

♠ > ♡ > ♢ > ♣

and then within each suit so that

A > K > Q > K n > 10 > 9 > 8 > 7 > 6 > 5 > 4 > 3 > 2 .

Step 1. Specification of the Concrete Problem

Input. A hand of 13 cards.

Output. A hand of 13 cards sorted in increasing order.

Step 2. Specification of the Abstract Problem

Input. An array of $n$ objects, $A_{1}, A_{2}, \dots, A_{n}$ .

Output. An array of $n$ objects sorted in increasing order.

The objects can be of any type (class), as long as we have a binary relation $\leq$ , so that for any two objects $x, y$ we can determine whether $x \leq y$ is true or false.

In algorithm theory, we tend to think of the objects as numbers, so that we have an intuitive grasp on what $\leq$ means, but there is no loss of generality. In Object Oriented Programming, the notion of less than or equal is encapsulated in the class, and we may have to rewrite it as $x . i s S m a l l e r T h a n O r E q u a l (y)$ instead of $x \leq Y$ . In Functional Programming, the function implementing $\leq$ may be passed as an argument alongside the array $A$ using a lambda expression which also the later versions of Java supports, and in this case it may be rewritten as $i s S m a l l e r T h a n O r E q u a l (x, y)$ instead of

Step 3. Pseudo-Code One way to solve the sorting problem is the following.

Input: Array $A_{i}$ , $i = 1, \dots, n$
Output: The same array $A_{i}$ , $i = 1, \dots, n$ sorted in place so that $A_{1} \leq A_{2} \leq \dots \leq A_{n}$

1 for

i : = 2, 3, \dots, n

j : = i

3 while

(j \geq 2)

and

(A_{j} < A_{j - 1})

4 swap

A_{j}

and

A_{j - 1}

j : = j - 1

Observe how we combine well-known programming constructs (for, while), mathematical notation ( $\geq, A_{i}, A_{j}$ ), and natural language (swap). This hybrid language is called pseudo-code. There is no standard for how to write it. As long as it is legible and unambiguous to human readers, it is ok.

I have in this case used $: =$ as the assignment operator, just to illustrate the variation you will encounter. Java and C uses the equality sign = for assignment, and for equality they use a double equality sign ==.

Some authors would prefer pseudo-code closer to their favourite programming language, for instance:

1         for i = 2 to n
2            j = i
3            while (j >= 2) and (A[j] < A[j-1]
4               swap A[j] and A[j-1]
5               j = j-1

This is a matter of taste, more community taste than personal taste though, but you may have to deal with more than one community..

The ‘swap’ line deserves some comment, as we use this in both the example styles, in spite of its being far from known programming languages, where it would have to be rewritten as

1 t =

A_{j}

A_{j}

A_{i}

A_{i}

= t

The swap statement is a simple and easily understandable instruction in natural language, and most human reader would find the programming construct much harder to read. This is why we resort to natural language here. In the choice of style, you should always strive to maximise the reader’s comprehension.

2.1.2 Analysis

Empirical versus Theoretical Analysis In the first year, you have had to test your programs, to see if they work as intended. Testing is, of course, important both in its informal forms and in more structured and analytical forms, where we can talk about empirical analysis.

Empirical analysis is limited, however, by the number of examples you have time to test. Each test you make will only validate one single case, and there may be an infinite number of cases with different properties.

Algorithm theory, and therefore this module, focuses on theoretical analysis, searching for proofs that are valid for any data set. Note that theoretical analysis will also help in the design of good and complete test sets for empirical analysis.

Correctness of Insertion Sort Let’s see what it does.

The index $i$ is the card we are looking at. Cards to the left have already been looked at, and cards to the right not yet. In the first iteration, $i = 2$ , so there is only one card to the left. Obviously an array of a single card is always sorted. The inner loop (while) tries to insert the $i$ th card in the correct position among the $i - 1$ previous cards.

Some programming languages allow assertions, establishing claims relevant to the analysis. Let’s write them into the pseudo-code as follows.

1 for

i : = 2, 3, \dots, n

2 assert

A_{1} \leq A_{2} \leq \dots \leq A_{i - 1}

j : = i

4 while

(j \geq 2)

and

(A_{j} < A_{j - 1})

5 swap

A_{j}

and

A_{j - 1}

j : = j - 1

7 assert

A_{1} \leq A_{2} \leq \dots \leq A_{i}

8 assert

A_{1} \leq A_{2} \leq \dots \leq A_{n}

When the programming languages support assertions, they are tested in debug mode, to identify places where critical assumptions are broken. In theoretical analysis, they are claims used to structure the proof. The two first assertions are examples of loop invariants, i.e. properties which invariable holds in every iteration of the loop.

The purpose of the while loop, is to reestablish the loop invariant for when the index $i$ increases.

The first iteration of while compares $A_{i}$ to $A_{i - 1}$ (because $j = i$ ). If $A_{i}$ is larger, it belongs where it is and the while loopis not entered. If it is smaller, the two cards are swapped, and the loop continues to compare it with the next card.

When the while loop terminates, the $i$ first cards are in sorted order, and in the next iteration of the for loop, we can again say that the cards to the left of $i$ are sorted.

Complexity of Insertion Sort

Number of instructions. What is an instruction?
Formal computing models
Worst case, best case, and average case
Complexity

2.2 Projects and Exercises

2.2.1 Compulsory Projects
2.2.2 Exercises

2.2.1 Compulsory Projects

These projects should be solved as part of the weekly assignment. When you phrase your answers, you should always write with fellow students in mind. Write so that they would understand, and be convinced by your argument. If you are unsure what is a comprehensible argument, discuss it with fellow students. Always use drawings and sketches when they are more comprehensible than prose.

Problem 2.1 (Sorting Cards)

Consider the problem of sorting a hand of cards. How would you do it naturally?

If you have a deck of cards, shuffle and draw at least ten cards at random. If you do not, you can write down ten (or more) numbers at random.

Look at the cards/numbers. How would you intuitively start to sort them?

Answer the following,

1.: Is your approach systematic or not?
2.: Can you write down pseudo-code describing how you do it?
3.: If you cannot, why is that? Can you make changes so that it is possible?
4.: Is your approach similar to insertion sort? What are the main differences?

(You should do this exercise together with other students, and discuss and compare approaches. Do not spend more than about 30 minutes on the problem. There is no ideal answer.)

Problem 2.2 (Letter Frequencies) Simple substitution ciphers work by replacing each letter in the alphabet with another. To encrypt a text, the same substitution is applied throughout the text. Such ciphers are easily broken by using frequency analysis.

Consider a program which takes a text as input and outputs frequency tables for each letter in the text. I.e. for each letter in the alphabet, output the number of occurrences of this letter in the text. (See also P-2.22 in the textbook.)

You are going to describe (not implement) such a program. Think throught the following questions first:

How do you model the text (input)?
How do you model the frequency tables (output)?
How do you parse the text?

Describe your program in the form of an algorithm, with pseudo-code and precise definitions of input and output.

How do you know that the algorithm produces the correct answer?
How many operations does the algorithm require when the text is $n$ characters long and there are $k$ letters in the alphabet?

2.2.2 Exercises

Problem 2.3 (Change Making) Design and describe a program (algorithm) which takes two amounts (numbers) as input. One is the amount charged and the other the amount given. The program should return the number of each kind of bill and coin to give back as change for the difference between the amount given and the amount charged. The values assigned to the bills and coins avalaible can be based on the monetary system of any current or former government. Try to design your program so that it returns the fewest number of bills and coins as possible.

You can for instance use the denominations available for Norwegian kroner:

1, 5, 10, 20, 50, 100, 200, 500, 1000

Does your algorithm always produce the fewest number of bills and coints? Would it always give the fewest coins and bills if you change the denominations?

Consider the similar problem of selecting stamps to make up a given amount of postage. Suppose the stamps have values 1, 24, 33, and 44 pence. Test the algorithm to find the required stamps for 67p postage.

(Cf. Goodrich, Tamassia, & Goldwasser P-2.26)

Problem 2.4 (Selection Sort) Selection Sort is a sorting algorithm, sinilar to insertion sort. It can be described as follows:

Input: Array A[] of size n.
Output: The same array A[] in sorted order.

1           for i := 1 to n-1
2              for j := i+1 to n
3                 if A[i] > A[j]
4                    swap A[i] with A[j]

1.: Rewrite the pseudo-code in Java. Note in the indices in particular. Is the first element A[0] or A[1].
2.: How many swap operations must be made in the worst case? What about the best case?
3.: Can you prove that the algorithm is correct, i.e. produces a sorted array? Compare it to the proof for insertion sort.

2.3 Home study

2.3.1 Proof Techniques
2.3.2 Review Material

2.3.1 Proof Techniques

TODO video on Induction and loop invariants

2.3.2 Review Material

These videos where made for the old module in Discrete Mathematics 2013-2015, but they are also relevant here.

Sorting Algorithms

Proofs of Correctness