Topological Sorting

Written by Luc Haller and Raphael Fischer. Translated by Luc Haller.

This tutorial assumes that you already know what a graph is and how depth first search (DFS) works. You should therefore read those tutorials first.

The problem

The goal of toposort is to find a certain order of the vertices of a directed graph. More concretely we are given a directed graph and want to order its vertices such that all edges go from the left to the right. For example we might have a list of dependencies of different tasks, where some tasks can only be started after others are finished. The goal is to order the tasks in such a way that all of them can be performed. Thus, all dependencies of a task have to be finished when a task is started. Toposort finds such an ordering.

The problem is modeled as a graph where the vertices represent the tasks and the edges represent the dependencies. An edge from vertex A to vertex B means that A needs to be done after B.

Example

Let’s look at a concrete example: There are five tasks: A, B, C, D, E. We need to execute B after A, E after D, and E after A.

The corresponding graph looks like this:

%3 B B A A B->A E E E->A D D E->D C C

There are multiple ways to satisfy the constraints, for example the ordering (A, B, D, E, C).

Solution idea

The first observation is that the dependency graph cannot contain any cycles, otherwise a topological ordering cannot exist (if A has to be executed after B, but also B after A, it is impossible to order them). We will see later that this is also a sufficient condition, that is, for every acyclic graph there is a topological ordering.

The main idea of the algorithm is that a task can be executed as soon as all of its dependencies have been finished. In the graph representation this means that we delete vertices which are done, and that we can put a vertex in the ordering / execute it as soon as it has no outgoing edges anymore.

To implement this idea efficiently we use a modified DFS. We start DFS at an arbitrary vertex, i.e. we follow edges to unvisited vertices as long as possible and remember the path that we took. If we encounter an edge to a vertex which we visited earlier in the path, the graph has a cycle, and therefore there is no solution. Otherwise we get to a vertex without outgoing edges in the end, that is, a vertex that we can put in the ordering now. From here, the DFS returns to the previous vertex on the path etc. In doing so we always put vertices in the ordering when they are removed from the current DFS-path (that is, when we are leaving them on the way back). If not all vertices have been visited by the end of DFS, we repeat everything from an unvisited vertex, ignoring all edges to vertices that we already saw in an earlier run.

From this explanation it follows that we only put vertices in the ordering when all their out-neighbours have been finished, therefore the output is a correct topological order. We can also see that the algorithm always finds a topological ordering if there is no cycle. Therefore one exists for every directed acyclic graph.

The running time is that of DFS plus what we need to put vertices in the topological ordering, that is \(O(n+m) + O(n) = O(n+m)\), where \(n\) is the number of vertices and \(m\) the number of edges. The memory usage is also that of DFS plus \(O(n)\), therefore still \(O(n)\).

Applying the algorithm to the example

We can apply the algorithm by hand to the example from above. One possible execution of toposort is as follows: - We start at A. A has no outgoing edges, so we output A. - We start again at B. We ignore the edge to A because we have already finished it. B has no other outgoing edges, so we output B. - We start again at E. We again ignore the edge to A because we have already finished it. We remember E as our current path and go to D. - D has no outgoing edges, so we output D and go back to E. - E has no outgoing edges to new vertices anymore, so we output E. - We start again at C. C has no outgoing edges, so we output C.

The output is therefore A B D E C.

Implementation

#include <iostream>
#include <vector>
using namespace std;

// Do a depth first search from vertex v, and put the vertices in toposort-order into sorted.
// Return true if a cycle was found, otherwise return false.
bool dfs(const vector<vector<int>>& g, vector<bool>& visited, vector<bool>& currentsearch, vector<int>& sorted, int v) {
    visited[v] = true; // we have handled the vertex
    currentsearch[v] = true; // the vertex is now in the current search path
    for(int next: g[v]) {
        if(currentsearch[next]) // found a cycle
            return true;
        if(visited[next]) // ignore finished vertices
            continue;
        if(dfs(g, visited, currentsearch, sorted, next)) // do DFS recursively from next
            return true;
    }
    sorted.push_back(v);
    currentsearch[v] = false; // the vertex isn't in the current search path anymore
    return false;
}

// Return a topological ordering of the vertices of graph g or an empty vector if none exists.
vector<int> toposort(const vector<vector<int>>& g) {
    int n = g.size();
    vector<bool> visited(n, false);
    vector<int> sorted;
    vector<bool> currentsearch(n, false); // this tells us if a vertex is in the current search path
    for(int i=0; i<n; ++i) {
        if(visited[i])
            continue;
        if(dfs(g, visited, currentsearch, sorted, i))
            return vector<int>(); // DFS found a cycle in the graph, therefore no topological ordering exists.
    }
    return sorted;
}

int main() {
    // We read the graph in the usual format and store it as an adjacency list.
    int n, m;
    cin >> n >> m;
    vector<vector<int>> g(n);
    for(int i=0; i<m; ++i) {
        int v, w;
        cin >> v >> w;
        g[v].push_back(w);
    }
    // We can now call our toposort function.
    vector<int> ts = toposort(g);
    // We output the result.
    if(n != 0 && ts.size() == 0)
        cout << "There is no topological order.\n";
    else {
        cout << "One possible topological order of the vertices: ";
        for(int v: ts)
            cout << v << " ";
        cout << "\n";
    }
}

Maybe you are wondering why we pass the graph as const vector<vector<int>>& g to the function toposort, instead of simply as vector<vector<int>> g. The & means that the graph doesn’t get copied for the function call, instead the original object is used. The const means that we don’t change it in the function. So the difference is just that we save running time and memory because no unnecessary copies are made.

Checking for cycles

As mentioned above the algorithm also detects if a topological ordering exists at all. It does this by checking if the graph contains a cycle. In contrast to undirected graphs it isn’t sufficient to check if we only encounter previously unvisited vertices. Here we need to check if an encountered vertex is on the path (that DFS took) from the starting vertex to the current vertex. This is done using the vector currentsearch: each entry currentsearch[v] is true if the vertex v is on this path.

Example: The DFS was started at a and has already taken all colored edges. The blue vertices (i.e. d) are only marked in visited, the red vertices (a, b, c, e) also in currentsearch. Note that at d there is no cycle, even though we found a second edge to this already visited vertex in some (blue) step. At the orange edge a cycle is being detected, since b is in the current path:

%3 a a b b a->b e e a->e c c b->c d d b->d c->d c->e e->b