*Written by Timon Gehr.*

Contents

## Recap: Prefix Sums

Given numbers \(a_{0},\ldots,a_{n-1}\), we can compute a list \(b\) of sums of prefixes of \(a\), one for each length, in time \(O(n)\):

vector<int> a(n); // ... vector<int> b(n+1, 0); for (int i = 0; i < n; i++) { b[i + 1] = b[i] + a[i]; }

(We can use associative binary operations other than addition.)

One application for this is to support fast range queries to compute the sum of contiguous elements of \(a\): Instead of

int r = 0; for (int i = l; i < r; i++) { r += a[i]; }

we write

int r = b[r] - b[l];

(This works for every binary operation that has an associated inverse, for other operations we can in general only query prefixes.)

## Dynamic Prefix Sums

However, if we then update one of the elements of \(a\), we also have to update \(b\) so it remains consistent:

a[k] += x; // update underlying array for (int i = k+1; i <= n; i++) { b[i] += x; // make prefix sums consistent }

Therefore, prefix sums allow element updates in time \(O(n)\) while range queries take time \(O(1)\).

It is also easy to have element updates in time \(O(1)\) and range queries in time \(O(n)\). (We just update the array \(a\) and compute the sums explicitly every time.)

Those two algorithmic strategies already cover use cases where we have vastly more queries than updates or vastly more updates than queries.

## Fenwick Trees

If we have a similar amount of updates and queries, we need to balance their cost. Fenwick trees allow element updates and prefix sum queries both in time \(O(\log n)\). Note that whenever we can solve a task using a Fenwick tree, we could instead have used a segment tree, while the opposite is not always true. The benefit of Fenwick trees is that while segment trees are a step up in implementation complexity, the implementation of a Fenwick tree is almost as simple as the code snippets we have seen so far. Therefore, if the same task can be solved with either a Fenwick tree or a segment tree, we probably want to use a Fenwick tree.

Consider an single array \(b\) with \(n+1\) elements. In \(b_i\), we store a sum of consecutive entries \(a_{\text{left}(i)}+a_{\text{left}(i)+1}\cdots+a_{i-1}\). Here, \(\text{left}(i)\) describes the left end of the range whose sum is stored in \(b_i\).

The following general strategy for queries and updates generalizes the two approaches to dynamic prefix sums that we have seen so far:

// compute a[0] + ... + a[k - 1]. int get_prefix_sum(int k) { auto r = b[0]; for (int i = k; i > 0; i = left(i)) { r = b[i] + r; } return r; } // increment a[k] by x void increment_at(int k, int x) { a[k] += x; // update underlying array for ( ... /* each index i where left(i) <= k < i */ ) { b[i] += x; // make stored range sums consistent } }

To compute a prefix sum, we add the sum of the segment stored in \(b_i\) to the running total and skip to the left end of that segment. Once we reach \(0\), we have computed the entire sum. When we change \(a_k\), we have to update all segment sums that include \(a_k\).

Choosing \(\operatorname{left}(i) = 0\) corresponds to storing prefix sums in the array \(b\) and recomputing them after each update, while choosing \(\operatorname{left}(i) = i-1\) corresponds to storing the elements of \(a\) in \(b\) and computing the prefix sum manually for each query.

It is instructive to choose \(\operatorname{left}(i)\) such that the segments whose sum we store have length \(l\): \(\operatorname{left}(i)=\max(0, i-l)\). If we concretize the general strategy for this function \(\operatorname{left}\), we obtain:

// compute a[0] + ... + a[k - 1]. int get_prefix_sum(int k) { int r = b[0]; for (int i = k; i > 0; i = max(i - l, 0)) { r = b[i] + r; } return r; } // increment a[k] by x void increment_at(int k, int x) { a[k] += x; // (update underlying array) for (int i = k + 1; i <= min(k + l, n); i++) { b[i] += x; } }

Queries run in time \(\mathcal O(n/l)\) while updates run in time \(\mathcal O(l)\). If we choose \(l\) around \(\sqrt{n}\), we can get queries and updates both in \(O(\sqrt{n})\).

However, our goal is \(O(\log n)\) updates and queries. Fenwick trees achieve this by using a function \(\operatorname{left}(i)\) that leads to segments of different lengths for different \(i\). It is based on the binary representation of \(i\): \(\text{left}(i)\) clears the least significant (rightmost) bit that is set in \(i\).

For example, we might call `get_prefix_sum` with \(k=42\), whose binary representation is \(101010\).
The variable \(i\) will take on the values \(42\) (\(101010\)), \(40\) (\(101000\)), \(32\) (\(100000\)) and \(0\), in this order.
In general, \(i\) will take on at most \(\mathcal O(\log n)\) different values, because in each step we remove a \(1\) from its binary representation and we stop as soon as none are left.

Consider the following illustration that shows some indices \(i\) together with ranges whose sums we store in \(b\):

i: 1 2 3 4 5 6 7 8 9 ─ ─ ─ ─* ─ ─── ───* ───────* ─────────────── k: 0 1 2 3 4 5 6 7 8

The three ranges that are marked with an asterisk (*) are those we would add up in order to query the sum of the prefix of length \(7\): \(b_{4}+b_{6}+b_{7}= (a_{0}+a_{1}+a_{2}+a_{3})+(a_{4}+a_{5})+a_{6}\).

On contemporary computers, numbers are stored in two’s complement representation. This means that for an integer \(i\), we have \(-i = {\sim} i+1\), where \({\sim}i\) flips all bits in the binary representation of \(i\).

For example, with \(4\)-bit numbers in binary representation:

-1: -0001 = 1110+1 = 1111 -2: -0010 = 1101+1 = 1110 -3: -0011 = 1100+1 = 1101 -4: -0100 = 1011+1 = 1100 -5: -0101 = 1010+1 = 1011 -6: -0110 = 1001+1 = 1010 -7: -0111 = 1000+1 = 1001

Due to carry, the binary representation of \(-i\) has opposite bits to the one of \(i\) for each bit that is left of the least significant \(1\)-bit of \(i\).

For example:

1: 0001 -1: 1111 2: 0010 -2: 1110 4: 0100 -4: 1100

This bit is also set in \(-i\), and all lower bits are zero in both \(i\) and \(-i\). Therefore, the only bit that is set in both \(i\) and \(-i\) is precisely the least significant bit that is set in \(i\). We can therefore extract it using bitwise AND: \(i\,\&\,{{-}i}\).

This already allows us to implement Fenwick tree prefix sum queries:

// compute a[0] + ... + a[k - 1]. int get_prefix_sum(int k) { int r = b[0]; for (int i = k; i > 0; i -= i&-i) { r = b[i] + r; } return r; }

A priori, Fenwick tree updates appear to be a bit more tricky. Recall that we have to find a way to enumerate all indices \(i\) that satisfy \(\operatorname{left}(i)\le k<i\)

For example, let \(k=2\). The segments marked with asterisks (*) are the ones that contain \(a_{2}\). If we update \(a_{2}\), we would also have to update the following entries: \(b_{3}=a_{2}\), \(b_{4}=a_{0}+a_{1}+a_{2}+a_{3}\) and \(b_{8}=a_{0}+a_{1}+a_{2}+a_{3}+a_{4}+a_{5}+a_{6}+a_{7}\).

i: 1 2 3 4 5 6 7 8 9 l=1 ─ ─* ─ ─ ─ l=2 ─── ─── ─── l=4 ───────* ─────── l=8 ───────────────* k: 0 1 2 3 4 5 6 7 8

Clearly, \(b_{k+1}\) is always one of the entries we need to update, because its range ends with \(a_k\). Also note that every other segment that contains \(b_{k+1}\) has to be longer than this initial segment. This is because the segments form a *laminar* set, i.e., any two segments are either disjoint or one of them contains the other. This is also why the data structure is called a Fenwick *tree*: the parent of a segment is the smallest segment that contains it. (Technically this is a slight misnomer as there can be multiple roots if \(n\) is not a power of two.)

For a length \(l\) where \(l\) is a power of two, we have \(l\) elements of \(a\) that are part of a segment of length \(l\) alternating with \(l\) elements of \(a\) that are not part of a segment of length \(l\). Therefore, we can skip ahead as many entries of \(b\) as the current segment’s length \(l\). The resulting position describes a segment whose length is at least twice \(l\), therefore it also contains \(a_k\) and we have found the parent of the current segment. We can repeat this until we reach a root of the tree. This happens within \(\mathcal O(\log n)\) steps because the size of the considered segment will at least double in each step.

// increment a[k] by x void increment_at(int k, int x) { a[k] += x; // (update underlying array) for (int i = k + 1; i <= n; i += i&-i) { b[i] += x; } }

## Implementation

In practice, we usually do not explicitly store the underlying array \(a\). The full implementation of a Fenwick tree is then given by:

vector<int> b; // size n+1 // compute a[0] + ... + a[k - 1]. int get_prefix_sum(int k) { int r = b[0]; for (int i = k; i > 0; i -= i&-i) r = b[i] + r; return r; } // increment a[k] by x void increment_at(int k, int x) { for (int i = k + 1; i <= n; i += i&-i) b[i] += x; }

## Application: Longest increasing subsequence

(Disclaimer: For this specific problem, the standard solution using binary search is usually preferable as it is significantly shorter.)

The length \(l_i\) of the longest increasing subsequence of an array \(a\) that ends at index \(i\) satisfies \(l_i = 1 + \max(\{0\}\cup\{l_j\mid 0\le j<i, a[j]<a[i]\})\). I.e., it is either \(1\) (just \(a_i\)), or it extends a longest increasing subsequence of \(a\) that ends at an earlier index \(j\) with a value \(a[j]\) that is lower than \(a[i]\). With a Fenwick tree, we can compute each entry in logarithmic time to obtain the length of the longest increasing subsequence of \(a\) in time \(\mathcal O(n\log n)\) (as wlog all elements of \(a\) are at most \(n\), otherwise use coordinate compression):

vector<int> b; // compute a[0] + ... + a[k - 1]. int get_prefix_max(int k) { int r = b[0]; for (int i = k; i > 0; i -= i&-i) r = max(b[i], r); return r; } // set a[k] to max of a[k] and x int update_at(int k, int x) { for (int i = k + 1; i <= n; i += i&-i) b[i] = max(b[i], x); } int lis(const vector<int> &a) { int n = a.size(); b.assign(n + 1, 0); for (int i = 0; i < n; i++) { int current = 1 + get_prefix_max(a[i]); update_at(a[i], 1); } }

Note that we can only use \(\max\) here because the entries of the Fenwick tree will only increase. This is a limitation of Fenwick trees that is not shared by segment trees.

## Binary search over Fenwick trees

Consider the following (non-standard) implementation of binary search that computes the largest number for which a given monotone function \(f\,\colon\{0,\ldots,2^{t+1}-1\}\to\{0,1\}\) is \(0\). (Assuming there is at least one value \(i\) for which \(f(i)\) is \(0\).)

int r = 0; for (int i = 1 << t; i != 0; i >>= 1) if (f(r | i) == 0) r |= i;

This code figures out each bit of the result, starting from the most significant one. The structure of this code nicely matches the way values are stored in Fenwick trees: we want to determine how much a the sum would increase if we added a new bit, and there is an entry in the Fenwick tree that stores exactly this information.

Therefore we can use this kind of binary search on the prefix sums represented by a Fenwick tree in time \(\mathcal O(t)=\mathcal O(\log n)\). (The following code hardcodes \(t=30\) for the full range of `int`.)

// assumption: all entries of `a` are non-negative vector<int> b; // size n+1 // compute a[0] + ... + a[k - 1]. int get_prefix_sum(int k) { int r = b[0]; for (int i = k; i > 0; i -= i&-i) r = b[i] + r; return r; } // increment a[k] by x void increment_at(int k, int x) { for (int i = k + 1; i <= n; i += i&-i) b[i] += x; } // compute largest k such that get_prefix_sum(k) < x int last_below(int x) { // requires x > 0 int k = 0, sum = 0; for (int i = 1 << 30; i != 0; i >>= 1) if ((k|i) <= n && sum + b[k|i] < x) { k |= i; sum += b[k]; } } return k; }

## Suffix Updates, Point Queries

To increase all values in a suffix and query single entries, we can use the same code, but interpret the operations dually:

vector<int> b; // size n+1 // get a[k] for 1 <= k <= n int get(int k) { int r = b[0]; for (int i = k; i > 0; i -= i&-i) r = b[i] + r; return r; } // increment a[k+1], a[k+2], …, a[n] by x (0 <= k <= n) void increment_suffix(int k, int x) { for (int i = k + 1; i <= n; i += i&-i) b[i] += x; }

The only problem is that the interface is a bit unintuitive, which we can fix by adapting parameters:

vector<int> b; // size n+1 // get a[k] for 0 <= k < n int get(int k) { int r = b[0]; for (int i = k + 1; i > 0; i -= i&-i) r = b[i] + r; return r; } // increment a[k], a[k+1], …, a[n-1] by x (0 <= k <= n) void increment_suffix(int k, int x) { for (int i = k + 1; i <= n; i += i&-i) b[i] += x; }

We can use `increment_suffix` twice to increment all values \(a_l,a_{l+1}\ldots,a_{r-1}\) by a specified amount \(x\):

// increment a[l], a[l+1], …, a[r-1] by x (0 <= l <= r <= n) void increment_range(int l, int r, int x) { increment_suffix(l, x); increment_suffix(r, -x); }

## Range Updates and Range Queries with Linear Function Trick

If we store linear functions \(i\mapsto a\cdot i+b\) instead of integers, we can support operations that increment \(a_l,a_{l+1}\ldots,a_{r-1}\) by a given value and query sums \(a_l+a_{l+1}\cdots+a_{r-1}\) for arbitrary ranges \([l,r)\), both in time \(O(\log n)\).

struct LF { int a, b; LF operator+(LF g) { return LF{a + g.a, b + g.b}; } int operator()(int i) { return a*i + b; } }; vector<LF> b; // size n+2 // get f[k] where f[k](k) = ∑ᵢ[0≤i<k]a[i] for 0 <= k <= n LF get_f(int k) { LF r = b[0]; for (int i = k + 1; i > 0; i -= i&-i) r = b[i] + r; return r; } // increment f[k], f[k+1], …, f[n] by g (0 <= k <= n+1) void increment_suffix_f(int k, LF g) { for (int i = k + 1; i <= n+1; i += i&-i) b[i] = b[i] + g; } // compute a[l] + a[l+1] + ... + a[r-1] void query_range(int l, int r) { return get_f(r)(r) - get_f(l)(l); } // increment a[l], a[l+1], …, a[r-1] by x (0 <= l <= r <= n) void increment_range(int l, int r, x) { increment_suffix_f(l, LF{x, -x*l}); increment_suffix_f(r, LF{-x, x*r}); }

## Multidimensional Fenwick Trees

We can support dynamic 2D prefix sums by creating a Fenwick tree whose entries are themselves Fenwick trees. The implementation is remarkably straightforward:

vector<vector<int>> b; // size (n+1)×(m+1) // compute ∑ᵢ∑ⱼ[0≤i<k][0≤j<l] a[i][j] int get_prefix_sum(int k, int l) { int r = 0; for (int i = k; i > 0; i -= i&-i) for(int j = l; j > 0; j-= j&-j) r += b[i][j]; return r; } // increment a[k][l] by x void increment_at(int k, int l, int x) { for (int i = k + 1; i <= n; i += i&-i) for (int j = l + 1; j <= m; j += j&-j) b[i][j] += x; }

Queries and updates both take time \(O((\log n)^{2})\).

## Coordinate Compression

A Fenwick tree maintains a dynamic prefix sum for dense coordinates. I.e., all coordinates \(k\) used for queries and updates have to be small. However, if we know the set of coordinates that will be used for updates in advance, we can apply coordinate compression:

vector<int> b; vector<int> coords; // add a new coordinate x void add_coord(int x) { coords.push_back(x); } // build a Fenwick tree specialized to the given set of coordinates void build() { sort(coords.begin(), coords.end()); x_coords.erase(unique(coords.begin(), coords.end()), coords.end()); // remove duplicates b.resize(coords.size() + 1); } // compute a[0] + ... + a[x - 1]. int get_prefix_sum(int x) { int k = lower_bound(coords.begin(), coords.end(), x) - coords.begin(); // min k s.t. coords[k]≥x int r = b[0]; for (int i = k; i > 0; i -= i&-i) r = b[i] + r; return r; } // increment a[x] by v void increment_at(int x, int v) { int k = lower_bound(coords.begin(), coords.end(), x) - coords.begin(); assert(k < (int)coords.size() && coords[k] == x); // x must be one of the given coordinates for (int i = k + 1; i <= (int)coords.size(); i += i&-i) b[i] += v; }

We first call `add_coord` with all coordinates \(x\) that we will later use in `increment_at` calls. Then we call `build` and after that we can use the Fenwick tree with arbitrarily large coordinates \(x\).

Similarly the 2D Fenwick tree given above only works in settings where \(k\) and \(l\) for updates and queries are small. However, if we know the parameters for `increment_at` in advance, we can associate a sorted list of coordinates to the outer Fenwick tree as well as to each inner Fenwick tree. In this way, we can support 2D queries and updates with very large coordinates.

vector<vector<int>> b; vector<int> x_coords; vector<vector<int>> y_coords; // add a new x coordinate void add_x_coord(int x) { x_coords.push_back(x); } // build the outer Fenwick tree void build_x() { sort(x_coords.begin(), x_coords.end()); x_coords.erase(unique(x_coords.begin(), x_coords.end()), x_coords.end()); b.resize(x_coords.size() + 1); y_coords.resize(x_coords.size() + 1); } // add a new y coordinate to inner Fenwick trees corresponding to given x coordinate void add_y_coord(int x, int y) { int k = lower_bound(x_coords.begin(), x_coords.end(), x) - x_coords.begin(); assert(k < (int)x_coords.size() && x_coords[k] == x); for (int i = k + 1; i < (int)y_coords.size(); i += i&-i) y_coords[i].push_back(y); } // build the inner Fenwick trees void build_y() { for (int i = 1; i < (int)y_coords.size(); i++) { sort(y_coords[i].begin(), y_coords[i].end()); y_coords[i].erase(unique(y_coords[i].begin(), y_coords[i].end()), y_coords[i].end()); b[i].resize(y_coords[i].size() + 1); } } // compute ∑ᵢ∑ⱼ[0≤i<x][0≤j<y] a[i][j] int get_prefix_sum(int x, int y) { int k = lower_bound(x_coords.begin(), x_coords.end(), x) - x_coords.begin(); int r = 0; for (int i = k; i > 0; i -= i&-i) { int l = lower_bound(y_coords[i].begin(), y_coords[i].end(), y) - y_coords[i].begin(); for(int j = l; j > 0; j-= j&-j) r += b[i][j]; } return r; } // increment a[x][y] by v void increment_at(int x, int y, int v) { int k = lower_bound(x_coords.begin(), x_coords.end(), x) - x_coords.begin(); assert(k < (int)x_coords.size() && x_coords[k] == x); for (int i = k + 1; i <= (int)x_coords.size(); i += i&-i) { int l = lower_bound(y_coords[i].begin(), y_coords[i].end(), y) - y_coords[i].begin(); assert(l < y_coords[i].size() && y_coords[i][l] == y); for (int j = l + 1; j <= (int)y_coords.size(); j += j&-j) b[i][j] += v; } }

To initialize, first call `add_x_coord` with all \(x\) coordinates that will occur in `increment_at` calls. Call `build_x`, then call `add_y_coords` with all \((x,y)\) coordinate pairs that will occur in `increment_at` calls. Call `build_y`, then the data structure will be ready for calls to `get_prefix_sum` and `increment_at`.

The total running time will be \(\mathcal O(q\cdot(\log q)^{2})\), where \(q\) is the number of queries and updates.