Fenwick Trees (Binary Indexed Trees)

Written by Timon Gehr.

Recap: Prefix Sums

Given numbers a0,,an1a_{0},\ldots,a_{n-1}, we can compute a list bb of sums of prefixes of aa, one for each length, in time O(n)O(n):

vector<int> a(n);
// ...
vector<int> b(n+1, 0);
for (int i = 0; i < n; i++) {
    b[i + 1] = b[i] + a[i];
}

(We can use associative binary operations other than addition.)

One application for this is to support fast range queries to compute the sum of contiguous elements of aa: Instead of

int r = 0;
for (int i = l; i < r; i++) {
    r += a[i];
}

we write

int r = b[r] - b[l];

(This works for every binary operation that has an associated inverse, for other operations we can in general only query prefixes.)

Dynamic Prefix Sums

However, if we then update one of the elements of aa, we also have to update bb so it remains consistent:

a[k] += x; // update underlying array
for (int i = k+1; i <= n; i++) {
    b[i] += x; // make prefix sums consistent
}

Therefore, prefix sums allow element updates in time O(n)O(n) while range queries take time O(1)O(1).

It is also easy to have element updates in time O(1)O(1) and range queries in time O(n)O(n). (We just update the array aa and compute the sums explicitly every time.)

Those two algorithmic strategies already cover use cases where we have vastly more queries than updates or vastly more updates than queries.

Fenwick Trees

If we have a similar amount of updates and queries, we need to balance their cost. Fenwick trees allow element updates and prefix sum queries both in time O(logn)O(\log n). Note that whenever we can solve a task using a Fenwick tree, we could instead have used a segment tree, while the opposite is not always true. The benefit of Fenwick trees is that while segment trees are a step up in implementation complexity, the implementation of a Fenwick tree is almost as simple as the code snippets we have seen so far. Therefore, if the same task can be solved with either a Fenwick tree or a segment tree, we probably want to use a Fenwick tree.

Consider an single array bb with n+1n+1 elements. In bib_i, we store a sum of consecutive entries aleft(i)+aleft(i)+1+ai1a_{\text{left}(i)}+a_{\text{left}(i)+1}\cdots+a_{i-1}. Here, left(i)\text{left}(i) describes the left end of the range whose sum is stored in bib_i.

The following general strategy for queries and updates generalizes the two approaches to dynamic prefix sums that we have seen so far:

// compute a[0] + ... + a[k - 1].
int get_prefix_sum(int k) {
    auto r = b[0];
    for (int i = k; i > 0; i = left(i)) {
        r = b[i] + r;
    }
    return r;
}

// increment a[k] by x
void increment_at(int k, int x) {
    a[k] += x; // update underlying array
    for ( ... /* each index i where left(i) <= k < i */ ) {
        b[i] += x; // make stored range sums consistent
    }
}

To compute a prefix sum, we add the sum of the segment stored in bib_i to the running total and skip to the left end of that segment. Once we reach 00, we have computed the entire sum. When we change aka_k, we have to update all segment sums that include aka_k.

Choosing left(i)=0\operatorname{left}(i) = 0 corresponds to storing prefix sums in the array bb and recomputing them after each update, while choosing left(i)=i1\operatorname{left}(i) = i-1 corresponds to storing the elements of aa in bb and computing the prefix sum manually for each query.

It is instructive to choose left(i)\operatorname{left}(i) such that the segments whose sum we store have length ll: left(i)=max(0,il)\operatorname{left}(i)=\max(0, i-l). If we concretize the general strategy for this function left\operatorname{left}, we obtain:

// compute a[0] + ... + a[k - 1].
int get_prefix_sum(int k) {
    int r = b[0];
    for (int i = k; i > 0; i = max(i - l, 0)) {
        r = b[i] + r;
    }
    return r;
}

// increment a[k] by x
void increment_at(int k, int x) {
    a[k] += x; // (update underlying array)
    for (int i = k + 1; i <= min(k + l, n); i++) {
        b[i] += x;
    }
}

Queries run in time O(n/l)\mathcal O(n/l) while updates run in time O(l)\mathcal O(l). If we choose ll around n\sqrt{n}, we can get queries and updates both in O(n)O(\sqrt{n}).

However, our goal is O(logn)O(\log n) updates and queries. Fenwick trees achieve this by using a function left(i)\operatorname{left}(i) that leads to segments of different lengths for different ii. It is based on the binary representation of ii: left(i)\text{left}(i) clears the least significant (rightmost) bit that is set in ii.

For example, we might call get_prefix_sum with k=42k=42, whose binary representation is 101010101010. The variable ii will take on the values 4242 (101010101010), 4040 (101000101000), 3232 (100000100000) and 00, in this order. In general, ii will take on at most O(logn)\mathcal O(\log n) different values, because in each step we remove a 11 from its binary representation and we stop as soon as none are left.

Consider the following illustration that shows some indices ii together with ranges whose sums we store in bb:

i: 1 2 3 4 5 6 7 8 9
   ─   ─   ─   ─*  ─
   ───     ───*
   ───────*
   ───────────────
k: 0 1 2 3 4 5 6 7 8

The three ranges that are marked with an asterisk (*) are those we would add up in order to query the sum of the prefix of length 77: b4+b6+b7=(a0+a1+a2+a3)+(a4+a5)+a6b_{4}+b_{6}+b_{7}= (a_{0}+a_{1}+a_{2}+a_{3})+(a_{4}+a_{5})+a_{6}.

On contemporary computers, numbers are stored in two’s complement representation. This means that for an integer ii, we have i=i+1-i = {\sim} i+1, where i{\sim}i flips all bits in the binary representation of ii.

For example, with 44-bit numbers in binary representation:

-1: -0001 = 1110+1 = 1111
-2: -0010 = 1101+1 = 1110
-3: -0011 = 1100+1 = 1101
-4: -0100 = 1011+1 = 1100
-5: -0101 = 1010+1 = 1011
-6: -0110 = 1001+1 = 1010
-7: -0111 = 1000+1 = 1001

Due to carry, the binary representation of i-i has opposite bits to the one of ii for each bit that is left of the least significant 11-bit of ii.

For example:

 1: 0001
-1: 1111

 2: 0010
-2: 1110

 4: 0100
-4: 1100

This bit is also set in i-i, and all lower bits are zero in both ii and i-i. Therefore, the only bit that is set in both ii and i-i is precisely the least significant bit that is set in ii. We can therefore extract it using bitwise AND: i&ii\,\&\,{{-}i}.

This already allows us to implement Fenwick tree prefix sum queries:

// compute a[0] + ... + a[k - 1].
int get_prefix_sum(int k) {
    int r = b[0];
    for (int i = k; i > 0; i -= i&-i) {
        r = b[i] + r;
    }
    return r;
}

A priori, Fenwick tree updates appear to be a bit more tricky. Recall that we have to find a way to enumerate all indices ii that satisfy left(i)k<i\operatorname{left}(i)\le k<i

For example, let k=2k=2. The segments marked with asterisks (*) are the ones that contain a2a_{2}. If we update a2a_{2}, we would also have to update the following entries: b3=a2b_{3}=a_{2}, b4=a0+a1+a2+a3b_{4}=a_{0}+a_{1}+a_{2}+a_{3} and b8=a0+a1+a2+a3+a4+a5+a6+a7b_{8}=a_{0}+a_{1}+a_{2}+a_{3}+a_{4}+a_{5}+a_{6}+a_{7}.

   i: 1 2 3 4 5 6 7 8 9
l=1   ─   ─*  ─   ─   ─
l=2   ───     ───     ───
l=4   ───────*        ───────
l=8   ───────────────*
   k: 0 1 2 3 4 5 6 7 8

Clearly, bk+1b_{k+1} is always one of the entries we need to update, because its range ends with aka_k. Also note that every other segment that contains bk+1b_{k+1} has to be longer than this initial segment. This is because the segments form a laminar set, i.e., any two segments are either disjoint or one of them contains the other. This is also why the data structure is called a Fenwick tree: the parent of a segment is the smallest segment that contains it. (Technically this is a slight misnomer as there can be multiple roots if nn is not a power of two.)

For a length ll where ll is a power of two, we have ll elements of aa that are part of a segment of length ll alternating with ll elements of aa that are not part of a segment of length ll. Therefore, we can skip ahead as many entries of bb as the current segment’s length ll. The resulting position describes a segment whose length is at least twice ll, therefore it also contains aka_k and we have found the parent of the current segment. We can repeat this until we reach a root of the tree. This happens within O(logn)\mathcal O(\log n) steps because the size of the considered segment will at least double in each step.

// increment a[k] by x
void increment_at(int k, int x) {
    a[k] += x; // (update underlying array)
    for (int i = k + 1; i <= n; i += i&-i) {
        b[i] += x;
    }
}

Implementation

In practice, we usually do not explicitly store the underlying array aa. The full implementation of a Fenwick tree is then given by:

vector<int> b; // size n+1

// compute a[0] + ... + a[k - 1].
int get_prefix_sum(int k) {
    int r = b[0];
    for (int i = k; i > 0; i -= i&-i)
        r = b[i] + r;
    return r;
}
// increment a[k] by x
void increment_at(int k, int x) {
    for (int i = k + 1; i <= n; i += i&-i)
        b[i] += x;
}

Application: Longest increasing subsequence

(Disclaimer: For this specific problem, the standard solution using binary search is usually preferable as it is significantly shorter.)

The length lil_i of the longest increasing subsequence of an array aa that ends at index ii satisfies li=1+max({0}{lj0j<i,a[j]<a[i]})l_i = 1 + \max(\{0\}\cup\{l_j\mid 0\le j<i, a[j]<a[i]\}). I.e., it is either 11 (just aia_i), or it extends a longest increasing subsequence of aa that ends at an earlier index jj with a value a[j]a[j] that is lower than a[i]a[i]. With a Fenwick tree, we can compute each entry in logarithmic time to obtain the length of the longest increasing subsequence of aa in time O(nlogn)\mathcal O(n\log n) (as wlog all elements of aa are at most nn, otherwise use coordinate compression):

vector<int> b;
// compute a[0] + ... + a[k - 1].
int get_prefix_max(int k) {
    int r = b[0];
    for (int i = k; i > 0; i -= i&-i)
        r = max(b[i], r);
    return r;
}
// set a[k] to max of a[k] and x
int update_at(int k, int x) {
    for (int i = k + 1; i <= n; i += i&-i)
        b[i] = max(b[i], x);
}

int lis(const vector<int> &a) {
    int n = a.size();
    b.assign(n + 1, 0);
    for (int i = 0; i < n; i++) {
        int current = 1 + get_prefix_max(a[i]);
        update_at(a[i], 1);
    }
}

Note that we can only use max\max here because the entries of the Fenwick tree will only increase. This is a limitation of Fenwick trees that is not shared by segment trees.

Binary search over Fenwick trees

Consider the following (non-standard) implementation of binary search that computes the largest number for which a given monotone function f ⁣:{0,,2t+11}{0,1}f\,\colon\{0,\ldots,2^{t+1}-1\}\to\{0,1\} is 00. (Assuming there is at least one value ii for which f(i)f(i) is 00.)

int r = 0;
for (int i = 1 << t; i != 0; i >>= 1)
    if (f(r | i) == 0) r |= i;

This code figures out each bit of the result, starting from the most significant one. The structure of this code nicely matches the way values are stored in Fenwick trees: we want to determine how much a the sum would increase if we added a new bit, and there is an entry in the Fenwick tree that stores exactly this information.

Therefore we can use this kind of binary search on the prefix sums represented by a Fenwick tree in time O(t)=O(logn)\mathcal O(t)=\mathcal O(\log n). (The following code hardcodes t=30t=30 for the full range of int.)

// assumption: all entries of `a` are non-negative
vector<int> b; // size n+1

// compute a[0] + ... + a[k - 1].
int get_prefix_sum(int k) {
    int r = b[0];
    for (int i = k; i > 0; i -= i&-i)
        r = b[i] + r;
    return r;
}
// increment a[k] by x
void increment_at(int k, int x) {
    for (int i = k + 1; i <= n; i += i&-i)
        b[i] += x;
}
// compute largest k such that get_prefix_sum(k) < x
int last_below(int x) { // requires x > 0
    int k = 0, sum = 0;
    for (int i = 1 << 30; i != 0; i >>= 1)
        if ((k|i) <= n && sum + b[k|i] < x) {
            k |= i;
            sum += b[k];
        }
    }
    return k;
}

Suffix Updates, Point Queries

To increase all values in a suffix and query single entries, we can use the same code, but interpret the operations dually:

vector<int> b; // size n+1

// get a[k] for 1 <= k <= n
int get(int k) {
    int r = b[0];
    for (int i = k; i > 0; i -= i&-i)
        r = b[i] + r;
    return r;
}
// increment a[k+1], a[k+2], …, a[n] by x (0 <= k <= n)
void increment_suffix(int k, int x) {
    for (int i = k + 1; i <= n; i += i&-i)
        b[i] += x;
}

The only problem is that the interface is a bit unintuitive, which we can fix by adapting parameters:

vector<int> b; // size n+1

// get a[k] for 0 <= k < n
int get(int k) {
    int r = b[0];
    for (int i = k + 1; i > 0; i -= i&-i)
        r = b[i] + r;
    return r;
}
// increment a[k], a[k+1], …, a[n-1] by x (0 <= k <= n)
void increment_suffix(int k, int x) {
    for (int i = k + 1; i <= n; i += i&-i)
        b[i] += x;
}

We can use increment_suffix twice to increment all values al,al+1,ar1a_l,a_{l+1}\ldots,a_{r-1} by a specified amount xx:

// increment a[l], a[l+1], …, a[r-1] by x (0 <= l <= r <= n)
void increment_range(int l, int r, int x) {
    increment_suffix(l, x);
    increment_suffix(r, -x);
}

Range Updates and Range Queries with Linear Function Trick

If we store linear functions iai+bi\mapsto a\cdot i+b instead of integers, we can support operations that increment al,al+1,ar1a_l,a_{l+1}\ldots,a_{r-1} by a given value and query sums al+al+1+ar1a_l+a_{l+1}\cdots+a_{r-1} for arbitrary ranges [l,r)[l,r), both in time O(logn)O(\log n).

struct LF {
    int a, b;
    LF operator+(LF g) {
        return LF{a + g.a, b + g.b};
    }
    int operator()(int i) {
        return a*i + b;
    }
};

vector<LF> b; // size n+2

// get f[k] where f[k](k) = ∑ᵢ[0≤i<k]a[i] for 0 <= k <= n
LF get_f(int k) {
    LF r = b[0];
    for (int i = k + 1; i > 0; i -= i&-i)
        r = b[i] + r;
    return r;
}
// increment f[k], f[k+1], …, f[n] by g (0 <= k <= n+1)
void increment_suffix_f(int k, LF g) {
    for (int i = k + 1; i <= n+1; i += i&-i)
        b[i] = b[i] + g;
}

// compute a[l] + a[l+1] + ... + a[r-1]
void query_range(int l, int r) {
    return get_f(r)(r) - get_f(l)(l);
}
// increment a[l], a[l+1], …, a[r-1] by x (0 <= l <= r <= n)
void increment_range(int l, int r, x) {
    increment_suffix_f(l, LF{x, -x*l});
    increment_suffix_f(r, LF{-x, x*r});
}

Multidimensional Fenwick Trees

We can support dynamic 2D prefix sums by creating a Fenwick tree whose entries are themselves Fenwick trees. The implementation is remarkably straightforward:

vector<vector<int>> b; // size (n+1)×(m+1)

// compute ∑ᵢ∑ⱼ[0≤i<k][0≤j<l] a[i][j]
int get_prefix_sum(int k, int l) {
    int r = 0;
    for (int i = k; i > 0; i -= i&-i)
        for(int j = l; j > 0; j-= j&-j)
            r += b[i][j];
    return r;
}
// increment a[k][l] by x
void increment_at(int k, int l, int x) {
    for (int i = k + 1; i <= n; i += i&-i)
        for (int j = l + 1; j <= m; j += j&-j)
            b[i][j] += x;
}

Queries and updates both take time O((logn)2)O((\log n)^{2}).

Coordinate Compression

A Fenwick tree maintains a dynamic prefix sum for dense coordinates. I.e., all coordinates kk used for queries and updates have to be small. However, if we know the set of coordinates that will be used for updates in advance, we can apply coordinate compression:

vector<int> b;
vector<int> coords;
// add a new coordinate x
void add_coord(int x) {
    coords.push_back(x);
}
// build a Fenwick tree specialized to the given set of coordinates
void build() {
    sort(coords.begin(), coords.end());
    x_coords.erase(unique(coords.begin(), coords.end()), coords.end()); // remove duplicates
    b.resize(coords.size() + 1);
}
// compute a[0] + ... + a[x - 1].
int get_prefix_sum(int x) {
    int k = lower_bound(coords.begin(), coords.end(), x) - coords.begin(); // min k s.t. coords[k]≥x
    int r = b[0];
    for (int i = k; i > 0; i -= i&-i)
        r = b[i] + r;
    return r;
}
// increment a[x] by v
void increment_at(int x, int v) {
    int k = lower_bound(coords.begin(), coords.end(), x) - coords.begin();
    assert(k < (int)coords.size() && coords[k] == x); // x must be one of the given coordinates
    for (int i = k + 1; i <= (int)coords.size(); i += i&-i)
        b[i] += v;
}

We first call add_coord with all coordinates xx that we will later use in increment_at calls. Then we call build and after that we can use the Fenwick tree with arbitrarily large coordinates xx.

Similarly the 2D Fenwick tree given above only works in settings where kk and ll for updates and queries are small. However, if we know the parameters for increment_at in advance, we can associate a sorted list of coordinates to the outer Fenwick tree as well as to each inner Fenwick tree. In this way, we can support 2D queries and updates with very large coordinates.

vector<vector<int>> b;
vector<int> x_coords;
vector<vector<int>> y_coords;
// add a new x coordinate
void add_x_coord(int x) {
    x_coords.push_back(x);
}
// build the outer Fenwick tree
void build_x() {
    sort(x_coords.begin(), x_coords.end());
    x_coords.erase(unique(x_coords.begin(), x_coords.end()), x_coords.end());
    b.resize(x_coords.size() + 1);
    y_coords.resize(x_coords.size() + 1);
}
// add a new y coordinate to inner Fenwick trees corresponding to given x coordinate
void add_y_coord(int x, int y) {
    int k = lower_bound(x_coords.begin(), x_coords.end(), x) - x_coords.begin();
    assert(k < (int)x_coords.size() && x_coords[k] == x);
    for (int i = k + 1; i < (int)y_coords.size(); i += i&-i)
        y_coords[i].push_back(y);
}
// build the inner Fenwick trees
void build_y() {
    for (int i = 1; i < (int)y_coords.size(); i++) {
        sort(y_coords[i].begin(), y_coords[i].end());
        y_coords[i].erase(unique(y_coords[i].begin(), y_coords[i].end()), y_coords[i].end());
        b[i].resize(y_coords[i].size() + 1);
    }
}
// compute ∑ᵢ∑ⱼ[0≤i<x][0≤j<y] a[i][j]
int get_prefix_sum(int x, int y) {
    int k = lower_bound(x_coords.begin(), x_coords.end(), x) - x_coords.begin();
    int r = 0;
    for (int i = k; i > 0; i -= i&-i) {
        int l = lower_bound(y_coords[i].begin(), y_coords[i].end(), y) - y_coords[i].begin();
        for(int j = l; j > 0; j-= j&-j)
            r += b[i][j];
    }
    return r;
}
// increment a[x][y] by v
void increment_at(int x, int y, int v) {
    int k = lower_bound(x_coords.begin(), x_coords.end(), x) - x_coords.begin();
    assert(k < (int)x_coords.size() && x_coords[k] == x);
    for (int i = k + 1; i <= (int)x_coords.size(); i += i&-i) {
        int l = lower_bound(y_coords[i].begin(), y_coords[i].end(), y) - y_coords[i].begin();
        assert(l < y_coords[i].size() && y_coords[i][l] == y);
        for (int j = l + 1; j <= (int)y_coords[i].size(); j += j&-j)
            b[i][j] += v;
    }
}

To initialize, first call add_x_coord with all xx coordinates that will occur in increment_at calls. Call build_x, then call add_y_coords with all (x,y)(x,y) coordinate pairs that will occur in increment_at calls. Call build_y, then the data structure will be ready for calls to get_prefix_sum and increment_at.

The total running time will be O(q(logq)2)\mathcal O(q\cdot(\log q)^{2}), where qq is the number of queries and updates.