# Week 9. Text Processing

## Algorithms and Data Structures

### 10 Week 9. Text Processing

Reading 9 Goodrich & Tamassia: Chapter 12 (except 12.5 which we take next week).

##### 10.0.1 Compulsory Assignment

Problem 10.1 (based on textbook problem P-12.47) Design a search engine with page ranking. You can index the search terms using tries, storing a list of relevant pages for each search term, and a page rank for each page to indicate its relevance. The search result should list the most relevant pages first.

Your design must include

- 1.
- A discussion of possible rank heuristics. How do you measure the relevance of a page in response to a search?
- 2.
- A description of the data structure, using class and object diagrams or otherwise.
- 3.
- The algorithm to build the index.
- 4.
- The algorithm to search for a given word.

Problem 10.2 The basic definition of tries assumes that they are prefix free, i.e. no word may be the prefix of another. Explain how to modify the data structure to waive this restriction. Give the pseudo-code for a revised lookup (search) algorithm, and demonstrate that it will work even if a word may be a prefix of another.

*Hint.* It is mentioned in the textbook that this can be achieved with a special character to mark
the end of a word.

##### 10.0.2 Other Exercises

Problem 10.3 In the textbook R-12.1, R-12.2, R-12.4.

Problem 10.4 In the textbook R-12.7.

Problem 10.5 In the textbook R-12.12.

Problem 10.6 In the textbook C-12.24.