DP optimization

Written by Daniel Rutschmann. Translated by Johannes Kapfhammer.

Convex Hull Trick

The convex hull trick applies to any DP that looks like this:

DP[0]=0DP[i]=a[i]+max0k<i{m[k]x[i]+q[k]}for 1in\begin{aligned} \mathtt{DP}[0]&=0\\ \mathtt{DP}[i]&=a[i] + \max_{0 \le k < i}\{m[k]\cdot x[i]+q[k]\} \quad \text{for } 1 \leq i \leq n \end{aligned}
m[k]m[k] and q[k]q[k] are constants only depending on kk (not ii) and have to be known as soon as i=ki=k.
a[i]a[i] and x[i]x[i] are constants only depending on ii (not kk) and have to be known right before DP[i]\mathtt{DP}[i] is computed.

Example: Commando

Before we show how the trick works, we solve the task Commando.

The DP looks like this:

DP[i]=max0k<i{a(j=k+1ix[j])2+b(j=k+1ix[j])+c+DP[k]}\mathtt{DP}[i] = \max_{0 \le k < i} \left\{a\cdot\Big(\sum_{j=k+1}^i x[j]\Big)^2 + b\cdot\Big(\sum_{j=k+1}^i x[j]\Big) + c + \mathtt{DP}[k]\right\}

This looks quadratic which and not at all in the form above, but we can simplify it by computing the prefix sums s[i]=x[i]+s[i1]s[i]=x[i]+s[i-1] (with s[0]=0s[0]=0), multiplying out and collecting the terms:

DP[i]=max0k<i{a(s[i]s[k])2+b(s[i]s[k])+c+DP[k]}=max0k<i{as[i]2+as[k]22as[i]s[k]+bs[i]bs[k]+c+DP[k]}=as[i]2+bs[i]+ca[i]+max0k<i{2as[k]m[k]s[i]x[i]+as[k]2bs[k]+DP[k]q[k]}\begin{align*} \mathtt{DP}[i] &= \max_{0 \le k < i} \{a\cdot (s[i] - s[k])^2 + b\cdot(s[i] - s[k]) + c + \mathtt{DP}[k]\}\\ &= \max_{0 \leq k < i} \{a\cdot s[i]^2 + a\cdot s[k]^2 - 2\cdot a\cdot s[i]\cdot s[k] + b\cdot s[i] - b\cdot s[k] + c + \mathtt{DP}[k]\}\\ &= \underbrace{a\cdot s[i]^2 + b\cdot s[i] + c}_{a[i]} + \max_{0 \le k < i} \{\underbrace{-2\cdot a\cdot s[k]}_{m[k]}\cdot \underbrace{s[i]}_{x[i]} + \underbrace{a\cdot s[k]^2 - b\cdot s[k] + \mathtt{DP}[k]}_{q[k]}\}\\ \end{align*}

Convex Hull Data Structure

To speed up the DP equation

DP[i]=a[i]+max0k<i{m[k]x[i]+q[k]}\mathtt{DP}[i]=a[i] + \max_{0 \le k< i}\{m[k]\cdot x[i]+q[k]\}

we implement a hull data structure that supports the following queries:

  • insert(m,q)\mathtt{insert}(m, q): inserting a new line y=mx+qy=m\cdot x+q in O(logn)\mathcal O(\log n)
  • query(x)\mathtt{query}(x): computing max{mx+q}\max_\ell\{m_\ell\cdot x+q_\ell\} in O(logn)\mathcal O(\log n)

This allows us to solve the recurrence by repeatedly adding new lines and querying for the maximum:

hull h;
vector<long long> dp(n);
for (int i = 1; i <= n; ++i) {
  h.insert({m[i-1], q[i-1]});
  dp[i] = a[i] + h.query(x[i]);
}

This data structure works by dynamically storing the convex hull of linear functions and finding the value at a given xx by binary searching to find the line that is currently dominating.

Deque Implementation

Quite often, the DP equation of the convex hull trick

DP[i]=a[i]+max0k<i{m[k]x[i]+q[k]}\mathtt{DP}[i]=a[i] + \max_{0 \le k < i}\{m[k]\cdot x[i]+q[k]\}

additionally satisfies

  • m[j+1]>m[j]m[j+1]>m[j] (slopes are increasing) and
  • x[j+1]x[j]x[j+1]\ge x[j] (queries are increasing).

This is the case in our example problem above: The queries at x[i]=s[i]x[i]=s[i] are increasing because ss is the prefix sum of positive values. Because a<0a<0, the slope m[i]=2as[i]m[i]=-2\cdot a\cdot s[i] is also increasing.

The increasing slopes simplify the insertion: We know that the line must be added to the right side. So we can remove lines as long as they are hidden (similar to the convex hull idea), and then add the new line. An update is amortized O(1)\mathcal O(1).

The same trick can be applied for the query: We know that whereever the current maximal line is, we can never “go back” afterwards, so we can just remove the lines at the beginning. Such a query is also amortized O(1)\mathcal O(1).

Below is code that case using a std::queue.

struct line {
  long long m, q;
  long long eval(long long x) const { return m*x + q; }
};

// check if l2 is completely below max(l1, l3)
// requires that l1.m < l2.m < l2.m
bool bad(line const &l1, line const &l2, line const &l3) {
  // or long double if __int128 is not available
  return ((__int128)l2.q - l3.q) * (l2.m - l1.m) <=
         ((__int128)l1.q - l2.q) * (l3.m - l2.m);
}

struct hull {
  deque<line> slopes;

  // insert line to hull
  void insert(line l) {
    assert(slopes.empty() || l.m > slopes.back().m);
    while (slopes.size() > 1 &&
           bad(slopes.rbegin()[1], slopes.rbegin()[0], l))
      slopes.pop_back();
    slopes.push_back(l);
  }

  // maximum at x:  max_l l.m*x + l.q
  long long query(long long x) {
    assert(!slopes.empty());
    while (slopes.size() > 1 && slopes[0].eval(x) < slopes[1].eval(x))
      slopes.pop_front();
    return slopes.front().eval(x);
  }
};

To support arbitrary queries, the query function can perform a binary search with upper_bound, similar to the std::multiset implementation.

In case m[j+1]=m[j]m[j+1]=m[j] can happen, there needs to be a collinearity check at the beginning of insert.

Tasks:

Arbitrary Queries and Inserts

To support inserts in the middle and arbitrary queries, we need something more dynamic. We can avoid implementing our own balanced binary search tree by exploiting that multiset supports different comparison operators for insert and calls to upper_bound.

As we have to binary search over the lines, we additionally need to remember xleft, the left endpoint of each segment in the hull. As we need to change that during an insert, we make it mutable.

struct line {
  long long m, q; // y = m*x + q
  // x from which this line becomes relevant in the hull
  // mutable means we can change xleft on a const-reference to line
  mutable long double xleft = numeric_limits<long double>::quiet_NaN();
};
bool operator<(line const& a, line const& b) { // sort lines after m
  return make_pair(a.m, a.q) < make_pair(b.m, b.q);
}
bool operator<(long long x, line const& l) { // binary search for x
  return x < l.xleft;
}

// x coordinate of the intersection between l1 and l2
long double intersect(line const &l1, line const &l2) {
  return (l2.q - l1.q) / (long double)(l1.m - l2.m);
}

// check if l2 is completely below max(l1, l3)
// requires that l1.m < l2.m < l2.m
bool bad(line const &l1, line const &l2, line const &l3) {
  // or long double if __int128 is not available
  return (__int128)(l2.q - l3.q) * (l2.m - l1.m) <=
         (__int128)(l1.q - l2.q) * (l3.m - l2.m);
}

struct hull {
  multiset<line, less<>> slopes; // less<> to support upper_bound on long long's

  // insert line to hull
  void insert(line const &l) {
    // insert l and then fix the hull until it is convex again
    auto e = slopes.insert(l);

    // delete collinear lines
    if (e != slopes.begin() && prev(e)->m == e->m) {
      slopes.erase(prev(e));
    } else if (next(e) != slopes.end() && e->m == next(e)->m) {
      slopes.erase(e);
      return;
    }

    // delete l again if it is hidden by the lines to the left and the right
    if (e != slopes.begin() && next(e) != slopes.end() &&
        bad(*prev(e), *e, *next(e))) {
      slopes.erase(e);
      return;
    }

    // delete lines to the right of l and adjust their xleft
    if (next(e) != slopes.end()) {
      while (next(e, 2) != slopes.end() && bad(*e, *next(e), *next(e, 2)))
        slopes.erase(next(e));
      next(e)->xleft = intersect(*e, *next(e));
    }

    // delete lines to the left of l and adjust xleft of l
    if (e != slopes.begin()) {
      while (prev(e) != slopes.begin() && bad(*prev(e, 2), *prev(e), *e))
        slopes.erase(prev(e));
      e->xleft = intersect(*e, *prev(e));
    } else {
      e->xleft = -numeric_limits<long double>::infinity();
    }
  }

  // maximum at x:  max_l l.m*x + l.q
  long long query(long long x) {
    assert(!slopes.empty());
    line const& l = *prev(slopes.upper_bound(x)); // upper_bound can never return begin() because it is -inf
    return l.m * x + l.q;
  }
};

Tasks:

Generalized Deque Trick

The idea of storing lines in a deque can be generalized to any DP formula of the form

DP[0]=0DP[i]=a[i]+max0k<if[k][i]for 1in\begin{aligned} \mathtt{DP}[0] &= 0\\ \mathtt{DP}[i] &= a[i] + \max_{0 \leq k < i} f[k][i] \qquad \text{for } 1 \leq i \leq n \end{aligned}

where the cost function f[k][i]f[k][i] defined on 0k<in0 \leq k < i \leq n satisfies the convex quadrangle inequality

f[a][d]+f[b][c]f[a][c]+f[b][d]abcdf[a][d] + f[b][c] \leq f[a][c] + f[b][d] \qquad \forall a\leq b\leq c\leq d

The intuition behind this inequality is the following: If for the current i1i_{1} a larger value k2k_{2} is better than a smaller value k1k_{1}, then k2k_{2} will be a better option than k1k_{1} for all i2i1i_{2}\geq i_{1}, i.e. for all future values of ii. This is some sense generalizes the fact that two distinct lines intersect at at most one point, which was crucial for the convex hull trick to work.

In general, checking this inequality can be quite annoying. Fortunately, it often is intuitively clear that if a newer option (larger kk) is better at some ii, then it will also be better for larger ii. As a last resort option, one could always write a brute-force solution and check the inequality on some examples.

We are interested in the point where k2k_{2} becomes a better option than k1k_{1}, so let D(k1,k2)D(k_{1}, k_{2}) be the smallest ii for which k2k_{2} is better than k1k_{1}, i.e.

D(k1,k2)=min{k2<inf[k1][i]f[k2][i]}D(k_1, k_2) = \min \left\{k_2 < i \leq n \Big| f[k_1][i]\leq f[k_2][i]\right\}

DD can be computed in O(logn)\mathcal{O}(\log n) time with binary search. Sometimes, special properties of ff allow you to compute it in O(1)\mathcal{O}(1).

If there are three options k1<k2<k3k_{1}< k_{2}< k_{3} with D(k1,k2)D(k2,k3)D(k_{1}, k_{2}) \geq D(k_{2}, k_{3}), then by the time k2k_{2} is better than k1k_{1}, k3k_{3} will already be better than k2k_{2}. In this case, k2k_{2} is never optimal, so we way ignore it. (Similar to lines which are hidden below two other lines in the convex hull case.)

We maintain our candidates k1<k2<<kmk_{1}< k_{2}< \dots < k_m in a deque where we have D(k1,k2)<D(k2,k3)<<D(km1km)D(k_{1}, k_{2}) < D(k_{2}, k_{3}) < \dots < D(k_{m-1} k_m). k1k_{1} is the value of kk this is currently optimal and k2,,kmk_{2}, \dots, k_m are values that will be optimal for some larger ii.

  • When inserting ii as a new candidate for kk, if D(km1,km)D(km,i)D(k_{m-1}, k_m) \geq D(k_m, i), we pop kmk_m from the deque. Repeat this until D(km1,km)<D(km,i)D(k_{m-1}, k_m) < D(k_m, i) and then push ii to the back of the deque.
  • When computing DP[i]\mathtt{DP}[i], we pop k1k_{1} from the deque until k1k_{1} is better than k2k_{2} (in other words, until D(k1,k2)>iD(k_{1}, k_{2}) > i). We then know that k1k_{1} is the optimal value of kk.

This ensure that after any operation, we still have k1<k2<<kmk_{1}< k_{2}< \dots < k_m and D(k1,k2)<D(k2,k3)<<D(km1,km)D(k_{1}, k_{2}) < D(k_{2}, k_{3}) < \dots < D(k_{m-1}, k_m).

The total runtime is Θ(nlogn)\Theta(n \log n), as we need to evaluate DD Θ(n)\Theta(n) times.

Implementation

struct Deque_Optimization {
    const int n;
    deque<int> cands;
    function<int64_t(int, int)> > f;
    Deque_Optimization (int n_, function<int64_t(int, int)> f_) : n(n_), f(f_) {}

    // minimum i s.t. f(k_1, i) <= f(k_2, i)
    // returns n+1 if D(k_1, k_2) would be infinite.
    int D(int k_1, int k_2){
        // k_1 is better at l, k_2 is better at r
        int l = k_2, r = n+1;
        while(l+1 < r){
            const int m = l + (r-l)/2;
            if(f(k_1, i) <= f(k_2, i)){
                r = m;
            } else {
                l = m;
            }
        }
        return r;
    }
    void insert(int i){
        assert(cands.empty() || i > cands.back()); // candidates are added in increasing order.
        while(cands.size() > 1 && D(cands.rbegin()[1], cands.rbegin()[0]) >= D(cands.rbegin()[0], i)){
            cands.pop_back();
        }
        cands.push_back(i);
    }
    int64_t query(int i){
        assert(!cands.empty());
        while(cands.size() > 1 && f(cands[0], i) <= f(cands[1], i)){
            cands.pop_front();
        }
        return f(cands[0], i);
    }
};

Some calculations with the quadrangle inequality

In this part, we show how the quadrangle inequality implies the intuition that if a larger value of kk is better now, it will also be better in the future.

Recall that our DP has the form

DP[i]=a[i]+max0k<i(f[k][i]+q[k])\mathtt{DP}[i] = a[i] + \max_{0 \leq k < i} (f[k][i] + q[k])

where ff satisfies the quadrangle inequality

f[a][d]+f[b][c]f[a][c]+f[b][d]abcdf[a][d] + f[b][c] \leq f[a][c] + f[b][d] \qquad \forall a\leq b\leq c\leq d

To see this, rewrite the inequality as

f[b][c]f[a][c]f[b][d]f[a][d]f[b][c] - f[a][c] \leq f[b][d] - f[a][d]

and set a=k1a = k_{1}, b=k2b = k_{2}, c=i1c = i_{1}, d=i2d=i_{2} for some k1k2k_{1}\leq k_{2} and i1i2i_{1}\leq i_{2}. Then we get

f[k2][i1]f[k1][i1]f[k2][i2]f[k1][i2]f[k_2][i_1] - f[k_1][i_1] \leq f[k_2][i_2] - f[k_1][i_2]

If at i=i1i = i_{1}, k2k_{2} is a better option than k1k_{1}, then we have

f[k1][i1]f[k2][i1]f[k_1][i_1] \leq f[k_2][i_1]

i.e.

0f[k2][i1]f[k1][i1]0 \leq f[k_2][i_1] - f[k_1][i_1]

so the quadrangle inequality tells us that

0f[k2][i1]f[k1][i1]f[k2][i2]f[k1][i2]0 \leq f[k_2][i_1] - f[k_1][i_1] \leq f[k_2][i_2] - f[k_1][i_2]

hence

f[k1][i2]f[k2][i2]f[k_1][i_2] \leq f[k_2][i_2]

so k2k_{2} is a better option than k1k_{1} at i=i2i = i_{2}.