# DP optimization

Written by Daniel Rutschmann. Translated by Johannes Kapfhammer.

## Convex Hull Trick

The convex hull trick applies to any DP that looks like this:

\begin{aligned} \mathtt{DP}&=0\\ \mathtt{DP}[i]&=a[i] + \max_{0 \le k < i}\{m[k]\cdot x[i]+q[k]\} \quad \text{for } 1 \leq i \leq n \end{aligned}
$m[k]$ and $q[k]$ are constants only depending on $k$ (not $i$) and have to be known as soon as $i=k$.
$a[i]$ and $x[i]$ are constants only depending on $i$ (not $k$) and have to be known right before $\mathtt{DP}[i]$ is computed.

### Example: Commando

Before we show how the trick works, we solve the task Commando.

The DP looks like this:

$\mathtt{DP}[i] = \max_{0 \le k < i} \left\{a\cdot\Big(\sum_{j=k+1}^i x[j]\Big)^2 + b\cdot\Big(\sum_{j=k+1}^i x[j]\Big) + c + \mathtt{DP}[k]\right\}$

This looks quadratic which and not at all in the form above, but we can simplify it by computing the prefix sums $s[i]=x[i]+s[i-1]$ (with $s=0$), multiplying out and collecting the terms:

\begin{align*} \mathtt{DP}[i] &= \max_{0 \le k < i} \{a\cdot (s[i] - s[k])^2 + b\cdot(s[i] - s[k]) + c + \mathtt{DP}[k]\}\\ &= \max_{0 \leq k < i} \{a\cdot s[i]^2 + a\cdot s[k]^2 - 2\cdot a\cdot s[i]\cdot s[k] + b\cdot s[i] - b\cdot s[k] + c + \mathtt{DP}[k]\}\\ &= \underbrace{a\cdot s[i]^2 + b\cdot s[i] + c}_{a[i]} + \max_{0 \le k < i} \{\underbrace{-2\cdot a\cdot s[k]}_{m[k]}\cdot \underbrace{s[i]}_{x[i]} + \underbrace{a\cdot s[k]^2 - b\cdot s[k] + \mathtt{DP}[k]}_{q[k]}\}\\ \end{align*}

### Convex Hull Data Structure

To speed up the DP equation

$\mathtt{DP}[i]=a[i] + \max_{0 \le k< i}\{m[k]\cdot x[i]+q[k]\}$

we implement a hull data structure that supports the following queries:

• $\mathtt{insert}(m, q)$: inserting a new line $y=m\cdot x+q$ in $\mathcal O(\log n)$
• $\mathtt{query}(x)$: computing $\max_\ell\{m_\ell\cdot x+q_\ell\}$ in $\mathcal O(\log n)$

This allows us to solve the recurrence by repeatedly adding new lines and querying for the maximum:

hull h;
vector<long long> dp(n);
for (int i = 1; i <= n; ++i) {
h.insert({m[i-1], q[i-1]});
dp[i] = a[i] + h.query(x[i]);
}


This data structure works by dynamically storing the convex hull of linear functions and finding the value at a given $x$ by binary searching to find the line that is currently dominating.

### Deque Implementation

Quite often, the DP equation of the convex hull trick

$\mathtt{DP}[i]=a[i] + \max_{0 \le k < i}\{m[k]\cdot x[i]+q[k]\}$

• $m[j+1]>m[j]$ (slopes are increasing) and
• $x[j+1]\ge x[j]$ (queries are increasing).

This is the case in our example problem above: The queries at $x[i]=s[i]$ are increasing because $s$ is the prefix sum of positive values. Because $a<0$, the slope $m[i]=-2\cdot a\cdot s[i]$ is also increasing.

The increasing slopes simplify the insertion: We know that the line must be added to the right side. So we can remove lines as long as they are hidden (similar to the convex hull idea), and then add the new line. An update is amortized $\mathcal O(1)$.

The same trick can be applied for the query: We know that whereever the current maximal line is, we can never “go back” afterwards, so we can just remove the lines at the beginning. Such a query is also amortized $\mathcal O(1)$.

Below is code that case using a std::queue.

struct line {
long long m, q;
long long eval(long long x) const { return m*x + q; }
};

// check if l2 is completely below max(l1, l3)
// requires that l1.m < l2.m < l2.m
bool bad(line const &l1, line const &l2, line const &l3) {
// or long double if __int128 is not available
return ((__int128)l2.q - l3.q) * (l2.m - l1.m) <=
((__int128)l1.q - l2.q) * (l3.m - l2.m);
}

struct hull {
deque<line> slopes;

// insert line to hull
void insert(line l) {
assert(slopes.empty() || l.m > slopes.back().m);
while (slopes.size() > 1 &&
slopes.pop_back();
slopes.push_back(l);
}

// maximum at x:  max_l l.m*x + l.q
long long query(long long x) {
assert(!slopes.empty());
while (slopes.size() > 1 && slopes.eval(x) < slopes.eval(x))
slopes.pop_front();
return slopes.front().eval(x);
}
};


To support arbitrary queries, the query function can perform a binary search with upper_bound, similar to the std::multiset implementation.

In case $m[j+1]=m[j]$ can happen, there needs to be a collinearity check at the beginning of insert.

### Arbitrary Queries and Inserts

To support inserts in the middle and arbitrary queries, we need something more dynamic. We can avoid implementing our own balanced binary search tree by exploiting that multiset supports different comparison operators for insert and calls to upper_bound.

As we have to binary search over the lines, we additionally need to remember xleft, the left endpoint of each segment in the hull. As we need to change that during an insert, we make it mutable.

struct line {
long long m, q; // y = m*x + q
// x from which this line becomes relevant in the hull
// mutable means we can change xleft on a const-reference to line
mutable long double xleft = numeric_limits<long double>::quiet_NaN();
};
bool operator<(line const& a, line const& b) { // sort lines after m
return make_pair(a.m, a.q) < make_pair(b.m, b.q);
}
bool operator<(long long x, line const& l) { // binary search for x
return x < l.xleft;
}

// x coordinate of the intersection between l1 and l2
long double intersect(line const &l1, line const &l2) {
return (l2.q - l1.q) / (long double)(l1.m - l2.m);
}

// check if l2 is completely below max(l1, l3)
// requires that l1.m < l2.m < l2.m
bool bad(line const &l1, line const &l2, line const &l3) {
// or long double if __int128 is not available
return (__int128)(l2.q - l3.q) * (l2.m - l1.m) <=
(__int128)(l1.q - l2.q) * (l3.m - l2.m);
}

struct hull {
multiset<line, less<>> slopes; // less<> to support upper_bound on long long's

// insert line to hull
void insert(line const &l) {
// insert l and then fix the hull until it is convex again
auto e = slopes.insert(l);

// delete collinear lines
if (e != slopes.begin() && prev(e)->m == e->m) {
slopes.erase(prev(e));
} else if (next(e) != slopes.end() && e->m == next(e)->m) {
slopes.erase(e);
return;
}

// delete l again if it is hidden by the lines to the left and the right
if (e != slopes.begin() && next(e) != slopes.end() &&
slopes.erase(e);
return;
}

// delete lines to the right of l and adjust their xleft
if (next(e) != slopes.end()) {
while (next(e, 2) != slopes.end() && bad(*e, *next(e), *next(e, 2)))
slopes.erase(next(e));
next(e)->xleft = intersect(*e, *next(e));
}

// delete lines to the left of l and adjust xleft of l
if (e != slopes.begin()) {
while (prev(e) != slopes.begin() && bad(*prev(e, 2), *prev(e), *e))
slopes.erase(prev(e));
e->xleft = intersect(*e, *prev(e));
} else {
e->xleft = -numeric_limits<long double>::infinity();
}
}

// maximum at x:  max_l l.m*x + l.q
long long query(long long x) {
assert(!slopes.empty());
line const& l = *prev(slopes.upper_bound(x)); // upper_bound can never return begin() because it is -inf
return l.m * x + l.q;
}
};


## Generalized Deque Trick

The idea of storing lines in a deque can be generalized to any DP formula of the form

\begin{aligned} \mathtt{DP} &= 0\\ \mathtt{DP}[i] &= a[i] + \max_{0 \leq k < i} f[k][i] \qquad \text{for } 1 \leq i \leq n \end{aligned}

where the cost function $f[k][i]$ defined on $0 \leq k < i \leq n$ satisfies the convex quadrangle inequality

$f[a][d] + f[b][c] \leq f[a][c] + f[b][d] \qquad \forall a\leq b\leq c\leq d$

The intuition behind this inequality is the following: If for the current $i_{1}$ a larger value $k_{2}$ is better than a smaller value $k_{1}$, then $k_{2}$ will be a better option than $k_{1}$ for all $i_{2}\geq i_{1}$, i.e. for all future values of $i$. This is some sense generalizes the fact that two distinct lines intersect at at most one point, which was crucial for the convex hull trick to work.

In general, checking this inequality can be quite annoying. Fortunately, it often is intuitively clear that if a newer option (larger $k$) is better at some $i$, then it will also be better for larger $i$. As a last resort option, one could always write a brute-force solution and check the inequality on some examples.

We are interested in the point where $k_{2}$ becomes a better option than $k_{1}$, so let $D(k_{1}, k_{2})$ be the smallest $i$ for which $k_{2}$ is better than $k_{1}$, i.e.

$D(k_1, k_2) = \min \left\{k_2 < i \leq n \Big| f[k_1][i]\leq f[k_2][i]\right\}$

$D$ can be computed in $\mathcal{O}(\log n)$ time with binary search. Sometimes, special properties of $f$ allow you to compute it in $\mathcal{O}(1)$.

If there are three options $k_{1}< k_{2}< k_{3}$ with $D(k_{1}, k_{2}) \geq D(k_{2}, k_{3})$, then by the time $k_{2}$ is better than $k_{1}$, $k_{3}$ will already be better than $k_{2}$. In this case, $k_{2}$ is never optimal, so we way ignore it. (Similar to lines which are hidden below two other lines in the convex hull case.)

We maintain our candidates $k_{1}< k_{2}< \dots < k_m$ in a deque where we have $D(k_{1}, k_{2}) < D(k_{2}, k_{3}) < \dots < D(k_{m-1} k_m)$. $k_{1}$ is the value of $k$ this is currently optimal and $k_{2}, \dots, k_m$ are values that will be optimal for some larger $i$.

• When inserting $i$ as a new candidate for $k$, if $D(k_{m-1}, k_m) \geq D(k_m, i)$, we pop $k_m$ from the deque. Repeat this until $D(k_{m-1}, k_m) < D(k_m, i)$ and then push $i$ to the back of the deque.
• When computing $\mathtt{DP}[i]$, we pop $k_{1}$ from the deque until $k_{1}$ is better than $k_{2}$ (in other words, until $D(k_{1}, k_{2}) > i$). We then know that $k_{1}$ is the optimal value of $k$.

This ensure that after any operation, we still have $k_{1}< k_{2}< \dots < k_m$ and $D(k_{1}, k_{2}) < D(k_{2}, k_{3}) < \dots < D(k_{m-1}, k_m)$.

The total runtime is $\Theta(n \log n)$, as we need to evaluate $D$ $\Theta(n)$ times.

### Implementation

struct Deque_Optimization {
const int n;
deque<int> cands;
function<int64_t(int, int)> > f;
Deque_Optimization (int n_, function<int64_t(int, int)> f_) : n(n_), f(f_) {}

// minimum i s.t. f(k_1, i) <= f(k_2, i)
// returns n+1 if D(k_1, k_2) would be infinite.
int D(int k_1, int k_2){
// k_1 is better at l, k_2 is better at r
int l = k_2, r = n+1;
while(l+1 < r){
const int m = l + (r-l)/2;
if(f(k_1, i) <= f(k_2, i)){
r = m;
} else {
l = m;
}
}
return r;
}
void insert(int i){
assert(cands.empty() || i > cands.back()); // candidates are added in increasing order.
while(cands.size() > 1 && D(cands.rbegin(), cands.rbegin()) >= D(cands.rbegin(), i)){
cands.pop_back();
}
cands.push_back(i);
}
int64_t query(int i){
assert(!cands.empty());
while(cands.size() > 1 && f(cands, i) <= f(cands, i)){
cands.pop_front();
}
return f(cands, i);
}
};


### Some calculations with the quadrangle inequality

In this part, we show how the quadrangle inequality implies the intuition that if a larger value of $k$ is better now, it will also be better in the future.

Recall that our DP has the form

$\mathtt{DP}[i] = a[i] + \max_{0 \leq k < i} (f[k][i] + q[k])$

where $f$ satisfies the quadrangle inequality

$f[a][d] + f[b][c] \leq f[a][c] + f[b][d] \qquad \forall a\leq b\leq c\leq d$

To see this, rewrite the inequality as

$f[b][c] - f[a][c] \leq f[b][d] - f[a][d]$

and set $a = k_{1}$, $b = k_{2}$, $c = i_{1}$, $d=i_{2}$ for some $k_{1}\leq k_{2}$ and $i_{1}\leq i_{2}$. Then we get

$f[k_2][i_1] - f[k_1][i_1] \leq f[k_2][i_2] - f[k_1][i_2]$

If at $i = i_{1}$, $k_{2}$ is a better option than $k_{1}$, then we have

$f[k_1][i_1] \leq f[k_2][i_1]$

i.e.

$0 \leq f[k_2][i_1] - f[k_1][i_1]$

so the quadrangle inequality tells us that

$0 \leq f[k_2][i_1] - f[k_1][i_1] \leq f[k_2][i_2] - f[k_1][i_2]$

hence

$f[k_1][i_2] \leq f[k_2][i_2]$

so $k_{2}$ is a better option than $k_{1}$ at $i = i_{2}$.