Alien Optimization

Written by Daniel Rutschmann. Translated by Daniel Rutschmann.

\newcommand{\kopt}{k_{\mathrm{opt}}} \newcommand{\ktask}{k_{\mathrm{task}}}

This article covers an advanced technique know as the “Alien trick”. The name originates from the task “Aliens” from IOI 2016, the task that made this technique popular. Other less popular names for this technique are “WQS/WPS binary search” or “Lagrange optimization”.

This article assumes that the reader is familiar with dynamic programming, greedy and scanline ideas. After all, you first need to come up with a correct solution before you can optimize it.

Suppose we are solving some minimization problem where we have some parameter kk that we need to keep in our state. Common examples of this are:

  • Splitting an array into exactly kk parts.
  • Picking exactly kk elements / edges / ranges.
  • Picking at most kk elements, but picking more elements is never worse.

In many cases, this parameter kk has to be kept as an additional dimension in our dp state, making our solution slower by a factor of kk. For instance, our solution might run in O(nk)O(n k), but if we didn’t care about kk, we could get a solution with a runtime of O(n)O(n) instead. The Alien trick allows you to get rid of the parameter kk under some convexity assumptions.

The Technique

For some minimization problem with some parameter kZk \in \mathbb{Z}, let f(k)f(k) denote the answer. A good example to think of is f(k)=f(k)= “minimal cost of splitting an array into kk parts”. To solve task, we want to compute the answer for some fixed k=\ktaskk = \ktask, i.e. we want to compute f(\ktask))f(\ktask)). Often, this is much more difficult than computing minkf(k)\min_k f(k) (i.e. solving the problem without caring about the parameter kk).

The alien trick applies to the case where ff is convex.

A function f:ZRf : \mathbb{Z} \to \mathbb{R} is convex if

f(k+1)f(k)f(k)f(k1) nZf(k+1) - f(k) \geq f(k) - f(k-1) \ \forall n \in \mathbb{Z}

Some examples of convex functions are f(k)=k2f(k) = k^{2} or f(k)=2kf(k) = 2^k.

Instead of computing f(\ktask)f(\ktask), the alien trick fixes some λR\lambda \in \mathbb{R} and computes

mink(f(k)λk)\min_{k} (f(k) - \lambda k)

i.e. it doesn’t require the solution to have a correct values of kk, but it helps (if λ>0\lambda > 0) or penalizes (if λ<0\lambda < 0) solutions that have a large value of kk. Let kopt(λ)k_{\mathrm{opt}}(\lambda) denote the optimal value of kk for a certain λ\lambda, i.e.

kopt(λ)=argmink(f(k)λk)k_{\mathrm{opt}}(\lambda) = \arg\min_k (f(k) - \lambda k)

In our example, this would correspond to the number blocks we would use if every block has a cost of λ-\lambda. The key observation is the following:

If ff is convex, then koptk_{\mathrm{opt}} is non-descreasing. In other words, higher values of λ\lambda will lead to higher values of koptk_{\mathrm{opt}}.

This property allows us to binary search for λ\lambda to find a solution for which kopt=kk_{\mathrm{opt}} = k. This allows us to replace the factor kk in the runtime with a log-factor from the binary search.

Intuition behind the Technique

Let f:ZRf : \mathbb{Z} \to \mathbb{R} be a convex function. If we plot a convex function, the graph looks like the bottom half of a convex polygon

alienopt2

Computing mink(f(k)λk)\min_k (f(k) - \lambda k) corresponds to finding a lower tangent with slope kk to this graph / polygon.

alienopt2

We can also see this mathematically. As \kopt\kopt is the optimial kk, we have

f(\kopt)λ\koptf(\kopt+1)λ(\kopt+1)f(\kopt) - \lambda \kopt \leq f(\kopt+1) - \lambda (\kopt+1)

i.e.

λf(\kopt+1)f(\kopt)\lambda \leq f(\kopt+1) - f(\kopt)

Similarly, we have

f(\kopt)λ\koptf(\kopt+1)λ(\kopt+1)f(\kopt) - \lambda \kopt \leq f(\kopt+1) - \lambda (\kopt+1)

i.e.

f(\kopt)f(\kopt1)λf(\kopt) - f(\kopt-1) \leq \lambda

The two equations

f(\kopt)f(\kopt1)λf(\kopt+1)f(\kopt)λ\begin{array}{rl} f(\kopt) - f(\kopt-1) &\leq \lambda\\ f(\kopt+1) - f(\kopt) &\geq \lambda \end{array}

tell us that the “slope” to the left of \kopt\kopt is at most λ\lambda and the “slope” to the right of \kopt\kopt is at least λ\lambda. Hence the line with slope λ\lambda at (\kopt,f(\kopt))(\kopt, f(\kopt)) is a tangent.

As ff is convex, the “slope” f(k+1)f(k)f(k+1) - f(k) is non-decrasing in kk, hence higher values of λ\lambda will lead to higher values of \kopt\kopt.

Degeneracy

TODO

For (2), just interpolate linearly between the integer values of f. This will look something like this

alienopt2

Our interpolated function has corners, but we can still find the point (or segment in the case of the green line) where a line gg with slope λ\lambda touches ff. Once again, increasing the slope of the line will make us find points further to the right, so we can binary search for λ\lambda. The green line covers the one corner case we have to handle: If we’re looking for point CC or DD (i.e. k=3k = 3 or k=4k = 4), then we might only find BB and EE (i.e. we skip from k=2k=2 to k=5k=5). In the picture, we see that in this case we just linearly interpolate between BB and EE.

Implementation

To efficiently implement the Alien trick, we want to do everything with integers. Hence we will only use λR\lambda \in \mathbb{R}.