Written by Daniel Graf. Translated by Daniel Graf.
- Before we start
- What is an algorithm?
- Algorithm Design
- Cost model
- Where are we now?
- Example Maximum Subarray
In this introductory chapter, you will learn what we at SOI mean when we say algorithm and how we can determine if one algorithm is faster than another one.
This topic is always my favorite at the SOI workshops, so it is my pleasure to also present this as a written chapter here. I try to assume no prior knowledge at all and to keep it as easy to follow as I can. If I fail and you get lost along the way or there are just some questions popping up in your head, please let me know at firstname.lastname@example.org. These notes are still fairly new, so we appreciate your help to keep improving them.
This chapter is based on the content of the SOI workshops and of the first two lectures on “Datenstrukturen & Algorithmen” in spring 2016 given by Prof. Peter Widmayer at ETH Zürich. If you want to see more of them (in German), you can find my other notes from that course here. But now, let’s get started!
In the newspapers, algorithms usually are the evil machines that recommend us red pencils because we recently bought blue ones. But as computer scientists, we mean something else: systematic instructions to solve a problem.
An example that you all know: let us multiply two numbers like we learned it in primary school, digit by didigt: If we want to compute times , we start with . We write that right-aligned on a sheet of paper. We continue with and write that, shifted one to the left, one row below. If we do this also for (shifted by one digit) and (shifted by two digits), we get a list of four numbers which we can add up to get the final result:
62*37 -------- 14 42 6 18 -------- 2294
The general approach of how we can multiply two numbers like this is an algorithm, so an instruction manual for how to get from the input (two numbers) to the desired output (their product) and what exactly we have to do step by step. It is not enough to just go through an example, as we want a general method that works for multiplying any two numbers.
At the olympiad in informatics, we often study the properties of such algorithms. Is this multiplication method really correct? Is it clever? Fast? Memory efficient? Then these questions are at the heart of computer science. The Association for Computing Machinery, short ACM, (the world’s biggest association of computer scientists) gives this definition:
- What is computer science?
“Computer science is the systematic study of algorithms and data structures, more precisely:
- their formal properties
- their linguistic and mechanical realisation
- their applications
The formal properties include correctness and efficiency of an algorithm, which we start discussing here. The realisation is for us the programming part, which we cover in several other introductory chapters. The applications are finally what our tasks are all about and what we will cover in the more advanced topics.
As a first property of algorithms, we often just want to distinguish between solvable and unsolvable. An example:
We know the outline of our living room and we want to place power plugs in our room such that we can clean all of it with a vacuum clenar that has a power cord of a given length. How many power plugs do we need for this? How do we place them optimally? These are typical questions that SOI tasks can cover.
Today there are also vacuum cleaners that do not need a power cord and drive through your flat like robots. Instead of power plugs these robots need charging stations that they have to reach before their battery is empty. Also this leads to interesting questions: How do we spread out these charging stations optimally? What is a good cleaning route for a robot? The search for an optimal cleaning tour is similar to the following well known problem:
A salesperson wants to travel through all the cities in a country as quickly as possible. So we search for the shortest tour that visits all cities. This is one of the most famous problems in theoretical computer science and is called Traveling Salesman  Problem, short TSP.
One way of solving this problem is just to try out all possible orders of cities. For each permutation, this is how we call the orders in mathematics, of the cities, we compute the costs and we store the minimum. So we already found a first algorithm for TSP. But is it a good or a bad one?
For cities, that are all connected with each other, we check orders this way. This number is called -factorial and is written as “”.
How large can our problem be so that this algorithm can still solve it? cities? For this we would need to try permutations. But that are more than many. Is this a lot? Imagine a piece of paper. Fold it times, then the resulting pile is as thick as layers of paper. This is just about as much as the size of the known universe! 
Interestingly no significantly better algorithms for TSP are known today. In 1970, one could solve the problem for 120 cities with a slightly better algorithm. With the same algorithm one can solve 140 cities today, just by using faster computers. So only waiting until the computers get faster will not bring us far.
Meanwhile, there are also better algorithms for certain types of graphs, so that we can also find tours for hundreds of thousands of cities. So an algorithmic breakthrough can really lead to an improved efficiency. How does one invent a faster algorithm?
The design and analysis of an algorithm are often intertwined. Sometimes one needs a clever insight, more often the design of algorithms is a very systematic, comprehensible thing. An example for this:
Imagine you are sitting in an SOI workshop and are waiting until Niklaus Wirth enters the room. Who is Niklaus Wirth? Niklaus Wirth is the only Swiss winner of the Turing award (the nobel prize of computer science) and the inventor of the programming language Pascal.
Is Niklaus Wirth already in the room? Who even recognises him? You should all know him.  So we want to find Niklaus Wirth or another star among us. What is a star? How de we find one?
- Definition: Star
- everyone knows him/her
- he/she knows nobody
As the basic operation, we consider a question to Person A of the form “Do you know person B?”. The answer is either Yes or No. Other questions are not allowed. How do we ask such questsions in a room of persons most effectively?
- Can it be that there is no star? Yes, for instance if everyone knows everyone.
- Can it be that there is exactly one star? Yes, for instance if Niklaus Wirth is here, because he probably does not know us.
- Can it be that there are several stars? No! See footnote [^ZweiStars] to see why not. 
We want to minimize the number of questions.
A naive approach: Asking everyone about everybody else takes questions. We fill the complete table with the exception of the diagonal (everyone knows himself):
How do we know find the star in this table? We are looking for a person whose column only contains Yes (everyone knows her/him) and whose row only contains No (she/he does not know anyone). So in the example above, Bettina is the star.
So if we ask all possible questions then we can decide if there is a star and if so find her/him. When designing algorithms, we always want to be as efficient as possible, so minimize the number of questions that we ask. So is it possible to find a star or say with certainty that there is none without asking all possible questions? How many questions do we need at least? Let be the number of questions that we ask for persons. So far, we have . For , we ask questions. We now show how we can reduce this to just questions.
We use induction for this: Induction means nothing else than starting with few people at first and then considering the others one by one. The easiest cases are with (a single person is always a star) and . With only two people in the room, we can always solve the task with two questions: We write . For more than two: We send someone out of the room to wait in the hallway, determine the potential star among the remaining people and then fetch back the person in the hallway. For this extra person we have to check whether she/he is a star or confirm that the star from before is still a star. This takes up to questions. This, we can write as:
Unfortunately, we did not save anything and still end up asking almost questions. But we have applied a new concept: we use the same method on a subset of people over and over again. This is called recursion (from the latin “going back”).
Why do we not save any questions? As it could be that the person that we send to the hallway is the star, we need many questions when that person reenters. Can we guarantee that we do not send out the star? Here comes the trick: If we ask A for B: If A knows B, then A is not a star. If A does not know B, then B is no star. In both cases, we find one person that is not a star and can send her/him to the hallway. (Whether or not the remainig person is a star is not clear yet.) So after just one question, we can always send a non-star to the hallway. We can repeat this “send out the non-star” procedure until only one person remains. This last person we have to question more thoroughly as she could be a star but does not have to be one. To this end, we fetch one person after the other from the hallway and ask her if she knows the star candidate and vice versa. For each person reentering, we can verify the potential star with just two questions. That’s cool right?
To compute the precise number of questions for persons, we need a bit of calculation. If you have not seen this in your math class yet, no problem. You can just remember that questions are enough: for each person other than the star, we spend three questions, one when sending her out and two when bringing her back in. Thus for people, questions suffice. If you are super interested, you can find the mathematical proof in the appendix.
We summarize: At first we needed questions. But after we looked at it closely, we figured out a much much faster method. This new method is so fast that it only asks a small fraction of all possible questions.
Challenge task: Can you do even better?
For problems that we want to solve with a computer, the elementary operation is often not “asking a question” but rather a things like reading a number, comparing two numbers, assigning values etc. These are all simple operations. Further into the details, into the level of bits and bytes, we often do not want to go at our olympiad. How long an addition of two (small) numbers really takes depends on many things: the clock rate of your CPU, the usage of the cache, the choice of programming language etc. For simplicity, we assume that each operation takes a single identical step and we call this the unit cost model. So as a first idea, we want to ignore constant factors between the operations.
This is in contrast to our example at the beginning, where the multiplication of two numbers resulted in several operations. For numbers of arbitrary size (often called BigInts) it is no longer realistic to assume that they can be multiplied in a single step. There, it reasonable to look at the individua ldigits and the operations with them. “Usually”, the numbers that we work with are of “reasonable” size, e.g. at most a few billions. On modern CPUs, such numbers really can be multiplied in a single step, so the unit cost assumption is not too far fetched. If we don’t say anything else, we always use the unit cost model at SOI. So we assume that the effort required to perform arithmetical operations is independent of the size of the operands.
The second idea is that we are only interested in large inputs, where the algorithm runs for a long time. Because for small inputs, also a naive algorithm is sufficiently fast.
Whenever we compare two algorithms then the one that runs faster on large inputs we consider as being the better one. To make it mathematically precise, we make some simplifications: constant factors and offsets in the cost function we want to ignore. A linear growth curve with slope we deem equally well as a linear growth curve with slope . With that we want to compensate the hiding of constant factors that we already brought onto ourselves with the unit cost model.
What does this mean concretely? For a program that performs steps for an input of size , we look at this polynomial and but only care about the summand with the highest power, so the , as everything else is really really small in comparison for large . In addition, we ignore constant factors so we simplify to . We can write this in shorthand using the -Notation:
and read this as
The program with steps runs asymptotically at most as slow as a program with steps.
For polynomials, we only care about the highest power. Frequent other functions that you will enounter: Logarithms are faster (= result in faster programs = grow slower) then square roots, square roots are faster than polynomials and polynomials are faster than exponential functions.
This “Oh-Notation” can also be made mathematically formal. If you don’t care about this, just jump over this section.
For this we define a set of functions, that also does not grow faster than a growth function . We write this with a big, calligraphic ‘O’, as
All functions in do not grow asymptotically faster than .
We can use this to simplify terms like by setting e.g. and . Usually, we are happy to just say that
The search for a start takes questions.
We find the star in linear time.
The precise number of questions (the that we showed above) often does not even interest us.
The two constants in the definition allow us to scale and shift one function against the other one.
As one could not easily typeset the symbol in the beginning, one often wrote , but correct is the notation with , as is one of many functions that are contained in the set .
Now we can do the same for “growing at least as fast” as class . Or for “strictly less fast” with small-O, so .
If the function is both in and , we say that and grow asymptotically equally fast and write . For example: .
One has to get used to this asymptotic notation for a bit. In algorithm design it is everywhere, so we will see it in every SOI topic again. But don’t get frightened by it. Most things that you will see, are those:
- : linear running time (e.g. for the fast star finding algorithm)
- : quadratic running time (e.g. for the star finding algorithm that asks all possible questions)
- : logarithmic running time (e.g. for binary search)
- : slightly super-linear running time (e.g. for sorting algorithms)
We have seen how we want to design and analyse algorithms. We have a model of analysis that is quite rough: we consider an algorithm that runs in steps as fast as one that takes steps. The is very imprecise, but the general success of algorithm design relies on the fact that this analysis usually works very well. And if you choose the input size large enough (and today’s inputs are often huge), then is always faster than , no matter how large the hidden constant is.
Next, we want to again design some “good” algorithms for yet another problem. How do we measure the quality of algorithms?
- Correctness: Does the program do what it should? We often argue this rather informally.
- Efficiency: time and space: How long does the algorithm run for and how much memory does it consume?
- Quality of the solution: how close do we get to the optimum? This is something that we did not consider so far. So here is another problem:
Let us consider these numbers
5 7 12 -48 9 36 -17 22 11 -49 49 -49 111 -117
From these numbers, we want to extract a segment, an intervall, such that the sum of the numbers therein is maximized. For small inputs as the one above, we can easily try out all possible segments, but for this can take a long time.
We can see this as stock prices. The numbers are the daily changes and we ask in hindsight: What would have been the best day to buy and later sell a share so that we can maximize our profit?
Here is a pseudocode implementation of an algorithm that simply tries out everything. Let d[k] deonte the -th number in the input, so the stock price change on day .
for i = 1 to n do for j = i to n do S := 0 for k = i to j do S := S + d[k] store the maximum S seen so far
We look at every possible interval and compute the corresponding sum . How long does this algorithm run?
Upper bound on the running time: Let us count the number of additions generously: each loop is executed at most times. As the loops are nested, we take the product, so in .
But is this algorithm really that slow or did we just bound way too roughly? Let us also bound the running time conservatively, so counting rather too few additions. For this, let us assume that the start of the interval is only in the first third of the array and that the end of the interval is only in the last third of the array. For each such pair , it holds that has to go over at least values. In total, even under these assumptions, we still perform at least steps which is in .
As we got the same cubic upper and lower bound, we can thus say that the running time of our algorithm is in . This is not especially quick and only suffices to work through a couple hundred days of the stock exchange.
So we got a program that is correct (in the sense that it computes the optimal solution), but we would like to get a faster one. We have to ask ourselves the question: What do we do that is too much? Do we computee something that we would not need to or do we compute the same thing over and over again? If we watch our algorithm “at work” we can see that it repeats itself. We sum up the numbers from to and in the next round from to . So in doing that we sum up almost the same set of numbers again, just one more number at the end. Couldn’t we compute the sum of numbers from to faster somehow?
It suffices to compute all the prefix sums: A prefix is an initial part of a sequence. The prefix of length of our input are simply the first numbers.
For each position , we compute the sum as the sum of numbers up to position . All these values , we can compute in linear time, so in , increasingly in : To compute , we just have to add to . After that, the sum of numbers from to is simply .
We can now get rid off the innermost loop:
// Compute the prefix sums in O(n) S := 0 for i := 1 to n do S[i] := S[i-1]+d[i] // Take the sum of each interval in O(n^2) for i := 1 to n do for j := i to n do S := S[j]-S[i-1] merke maximales S
How much faster is this? The precomputation of the prefix sums is in steps. After that, we can compute the sum of each of the intervals in constant time. Hence we get steps overall.
Are we happy now? can still be too slow for many real inputs. But contrary to before, this solution does not have an obvious speed bottleneck that we could attack.
To get faster, we want to look at another trick of algorithm design: divide and conquer. It is a powerful technique, that already the ancient Romans knew (latin: divide et impera) – although not in an algorithmic context but rather related to strong foreign policy.
As with the star, we try an inductive approach. But here we will not just take out one element (like with the star). Instead, we will split in the middle:
We split our sequence into two parts. Where can the solution, i.e. the interval with the biggest sum, be? We distinguish three cases: Either completely in the left half, completely in the right half or it contains part of the left and the right half. Exactly these three cases we want to consider and to compute the maximum interval for each case. Then we just have to pick the best of those three intervals.
Such a strategy, that splits the problem into smaller portions, solves each of them separately and then recombines them, is called “divide and conquer”. Did this split help us?
The first two cases are easy to cover: We compute the solution recursively (with the same approach) on both halves separately. But how can we compute the solution for the third case?
We search in the left half for the best piece that ends at the right border, and we combine it with the best piece from the right half that ends at the left border. Hence we have to consider the prefix sums of the right half and the suffix sums of the left half (A suffix is just piece from the the tail, so a prefix “from the other side). We already saw, that we can compute them in linear time. We the best prefix from the right to the best suffix from the left. So we can cover the third case in linear time.
- Divide-and-Conquer algorithm:
- If there is at most one number:
- If exactly one positive number: take this number
- sonst: 0
- If there are several numbers
- divide in the middle
- determine the optimum solution separately in both parts
- determine the best border pieces on the left and right
- take the best of the three potential solutions
- If there is at most one number:
Analysis: Now we want to analyze the running time of our algorithm to see if we improved anything at all. This is a bit of a lengthy calculation - if you want, you are happy to skip it and directly jump to the linear solution. 
We write for the time with elements. For the case where we compute (potentially multiple) partial solutions, we can write the running time as
Let be a constant that bounds the effort to determine the solution across the middle, and for the base case with let for some constant .
How does one solve such a recursion? We do what we did before. We insert a couple of times and see if we get a feeling for the solution.
This stops if or equivalently if , because then and the recursion stops. So we suspect something like , which would mean .
But is this really true? To formally prove it, we do not want to use -notation because hiding constants in a recurrence can be dangerous.
Proof by induction: To prove: There are and , such that
Induction base: To get , pick .
Induction hypothesis (IH): For we already know that .
Can we make the last inequality hold? We have some room left to play, as we did not fix yet. Which condition does need to fulfill?
Pick for example. What this shows is that our Divide-and-Conquer algorithm runs in time .
Can we do even better? Let us try from the beginning and try again with an induction from left to right.
As a condition that should always stay true, we want to assume that after step , we know the optimum solution for the first elements. If we now add element we want to quickly figure out if this new element belongs to the best interval or not. For this, we do not only compute the maximum up to but also how much we can get if we end at the right border of the current prefix, so at element . Using this “border maximum”, we can now quickly compare whether we can get a new maximum with the new element or not.
For this we need that if we got up to position , we computed already these things:
- the maximum in the range ,
- the maximum, that ends at position .
bordermaxmax := 0 max := 0 for i = 1 to n do bordermax := bordermax + d[i] if bordermax < 0 then bordermax := 0 if max < bordermax then max := bordermax
This approach is super simple to implement and has even just linear running time! So , perfect!
Now we get ambitious and want to do even better. Can we do or even ? Or is no further improvement possible?
The following consideration shows that we can not be better than linear: Assume we have an algorithm that does not look at all the elements. Then this algorithm can not be correct:
- If the algorithm computes a range that contains the not-looked-at element, then let it be . We would better not take anything.
- If the solution does not contain the not-looked-at element, then let it be , so that it really should be in the solution.
Hence every correct algorithm has to look at every element at least once. The running time is thus in . This is not a property of a specific algorithm, but for all algorithms that solve the maximum subarray problem. We call this the runtime complexity of the problem
However, the size of the input is not always a lower bound for the runtime. Remember the star problem. The input was quadratic (all pairs), but we only had to ask linearly many questions.
Here some extra questions (with their answers):
- How do we find the interval (and not just its sum)? Answer: During the algorithm, we keep track of the bounds for max and bordermax and update them upon each change.
- What is the memory usage? Answer: We distinguish two types of memory: Memory used by the program (including the input) and how much memory we use additionally during the computation. This way, the first and last algorithm only require constant additional memory for the values S, bordermax and max. But as soon as we for instance compute all the prefix sums, like in the second algorithm, we need linear, so , additional memory. Bonus question: How much additional memory does the divide-and-conquer algorithm need? 
So this is it. You reached the end of our crash course in algorithm design and analysis. If you have any questions, please let me know (email@example.com). Otherwise have fun with the other sections of our tutorials.
Here we give a few proofs that are not really important for understanding the rest of this chapter.
We can write the function as follows:
The two cases distinguish whether we still send someone out or not. Now we telescope, meaning that we plug in the formula a couple of times and see what we get:
To formally prove that , we prove it by induction:
- Hypothesis: =
- Base of the induction: = ✓
- Step of the indution: ✓
|||Or politically correct: Traveling Salesperson Problem|
|||The size of the known universe is about kilometers (Source: Wolfram Alpha). The world record for folding paper is at times. In 2002, the high school student Britney Gallivan has folded a piece of paper twelve times. Even for this she needed a roll of toilet paper of length more than one kilometer [Details]. Since then several attempts were made to break this record. 2012 a group of students at MIT managed to do folds [Artikel] and also a team of BBC has tried [Video].|
|||More about him on Wikipedia.|
|||If there were two stars one would need to know the other so that the other is known by everyone and can be a star. But then the first star does not know nobody and hence, by definition, is not a star because he knows the other star. Confused?|
|||Spoiler: We will get .|
|||Implemented cleverly, the recursive call can be done with constant () additional memory and the depth of the recursion is limited to , hence the additional memory needed is .|