Matchings

Written by Timon Gehr.

Given a graph \(G=(V,E)\) with a given set of vertices \(V\) and a set of edges \(E\), a matching is a subset \(M\subseteq E\) such that the edges in \(M\) are pairwise disjoint (i.e., they do not have a common incident vertex). A matching \(M\) covers a vertex \(v\) if there is some other vertex \(u\) such that the edge \(\{u,v\}\) is in \(M\). Each vertex is covered by zero or one edges in \(M\). A matching \(M\) intersects an edge \(e\) if it covers one of the two incident vertices of \(e\). If a matching \(M\) covers each vertex, then we call \(M\) a perfect matching. Not every graph has a perfect matching, but we can try to find a maximal matching.

A matching is called maximal with respect to inclusion if there is no matching \(M'\neq M\) with \(M\subset M'\subseteq E\). It is easy to find a matching that is maximal in this sense: We iterate over the edges once in order, and add to our matching all edges that are not already incident to a covered vertex. Matchings \(M\) that are maximal in this sense already enjoy a few useful properties. For example, the matching must intersect all edges. This means that each vertex \(v\) is either covered by the matching \(M\), or all its incident vertices are covered by the matching \(M\).

A matching is called maximum (with respect to cardinality) if there is no matching \(M'\subseteq E\) with \(\lvert M\rvert < \lvert M'\rvert\).

Such a matching is a bit more tricky to find, but already now it is easy to give lower and upper bounds on its size which differ by at most a factor of two.

On one hand, a maximum matching \(M\) is at least as large as any inclusion-maximal matching \(M'\). On the other hand, \(M\) cannot contain more than \(2\cdot\lvert M'\rvert\) edges. This is because \(M'\) intersects all edges of \(M\) and each edge \(e'\) in \(M'\) intersects at most two edges of \(M\). (More formally, we have \(M=\bigcup_{e'\in M'}\{e \in M\mid e\cap e'\neq \emptyset\}\), therefore \(\lvert M\rvert\le\sum_{e'\in M'}\lvert\{e\in M\mid e\cap e'\neq \emptyset\}\rvert\le\sum_{e'\in M'}2=2\lvert M'\rvert\).)

But how do we actually find a maximum matching? Our idea will be to start with some matching \(M\) (it could even be empty), and to progressively improve it until it has maximal cardinality. Note that while following this process, we may need to remove some edges from our matching that we already added to it in a previous step, because otherwise we can only hope to find an inclusion-maximal matching. We ensure progress by designing a single iteration of our algorithm such that it improves the cardinality of the matching by at least one.

Let’s now assume that \(M\) is some matching which is not maximum. Consider some maximum matching \(M'\). (We do not know what \(M'\) is and will therefore not be able to use it in our algorithm, but we know that it exists.) Let’s color the edges from \(M\) blue, those from \(M'\) red and consider the symmetric difference \(M\triangle M'\) of \(M\) and \(M'\). (The symmetric difference contains all edges that belong to either \(M\) or \(M'\), but not to both.) Note that this symmetric difference contains more red edges than blue edges, because \(M\)’ is larger than \(M\). Now consider the graph \(G'=(V,M\triangle M')\). Clearly, each vertex in \(G'\) has degree at most \(2\), because it can be incident to at most one blue and one red edge. Each connected component of a graph with maximum degree \(2\) is either a path or a cycle. In our case even more is true: paths and cycles need to alternate between red and blue edges and therefore each cycle in \(G'\) must have even length. It follows that each cycle in \(G'\) contains equally many red and blue edges. Therefore, there must exist a path in \(G'\) which contains more red edges than blue edges, because overall, \(G'\) contains more red edges than blue edges. Such a path must start with a red edge, then alternate between blue and red edges, until it stops with a red edge. Now note that we can improve the size of our matching \(M\) by removing all blue edges from our path from it and adding all red edges from our path to it.

Inspired by those observations, we say that for a matching \(M\), a path in \(G\) is \(M\)-augmenting if it starts and ends with vertices that are not covered by \(M\) and alternates between edges of \(M\) and edges that do not belong to \(M\). If there exists an \(M\)-augmenting path, we can improve our matching by switching all edges on the path into or out of the matching \(M\). On the other hand, we have just shown that whenever \(M\) is not maximum, there is an \(M\)-augmenting path.

Therefore, our algorithm will proceed as follows: While there exists an \(M\)-augmenting path, use it to improve \(M\). This algorithm terminates because \(M\) is improved in each iteration and the maximal cardinality of a matching in \(G\) is bounded. When the algorithm terminates, this means that there does not exist an \(M\)-augmenting path in \(G\), which means that the returned matching \(M\) is maximum.

Computing Maximum Matchings in Bipartite Graphs

It remains to find an algorithm that finds an \(M\)-augmenting path or decides that it does not exist. We will only consider the case where our graph is bipartite. (In a general graph, we can still compute maximal matchings by repeatedly finding augmenting paths, but it is a bit more complicated and outside the scope of IOI.)

In this case, finding an augmenting path is in fact easy: we can use DFS or BFS, slightly adapted such that they alternate between covered and uncovered vertices. We start the graph search from an uncovered vertex and try to find another uncovered vertex reachable on an alternating path. A path found this way is an augmenting path, and if we can’t find any such path, there is in fact no augmenting path.

Because the resulting matching has at most \(\lvert V\rvert/2\) edges, we need to find at most \(O(\lvert V\rvert)\) augmenting paths. The total running time of the algorithm is therefore \(O(\lvert V\rvert\cdot(\lvert V\rvert+\lvert E\rvert))\).

Implementation

We provide an implementation using DFS which combines finding augmenting paths and improving the matching in the same recursion. The first partition contains \(n\) vertices and the second partition contains \(m\) vertices. The graph is stored as an adjacency list, where the nodes in the first partition are numbered from \(0\) to \(n-1\) and the nodes in the second partition are numbered from \(n\) to \(n+m-1\). The matching is stored as a vector of size n. The \(i\)-th entry of the vector contains the number of the partner of node \(i\) in the matching, or \(-1\) if the matching does not cover the vertex.

The DFS is modified as follows: In the first partition, we may only use edges that are in the matching, and in the second partition, we may only use edges that are not in the matching. This can be interpreted as implicitly directing the graph, such that edges in the matching go from the first partition to the second partition, and edges outside the matching go from the second partition to the first partition. In this graph, any path from the second partition to the first partition between two non-covered vertices alternates between non-matching edges and matching edges and therefore is an augmenting path. This is also an easy way to see why DFS can indeed be used to decide whether there is an augmenting path.

vector<vector<int>> g(n + m); // bipartite graph on n+m nodes as adjacency list.
vector<int> matching(n, -1); // the current matching.

vector<bool> visited(n + m);
bool improveMatching(int i) {
    if (visited[i]) return false;
    visited[i] = true;
    if (i < n) { // in first partition, must use matching edge
        if (matching[i] == -1) return true; // found augmenting path
        if (improveMatching(matching[i]))
            return true;
    } else { // in second partition, may only use edges not in the matching
        for (int j : g[i]){
            if (matching[j] == i) continue; // matching edge
            if (improveMatching(j)){
                matching[j] = i; // update matching
                return true; // matching improved
            }
        }
    }
    return false;
}
vector<bool> covered(m);
bool improveMatching() {
    covered.assign(m, false);
    for (int i = 0; i < n; i++)
        if (matching[i] != -1)
            covered[matching[i] - n] = true;
    visited.assign(n + m, false);
    for (int i = n; i < n + m; i++)
        if (!covered[i - n] && improveMatching(i))
            return true;
    return false; // cannot improve matching, it is maximum
}
int maximumMatching() { // returns size of maximum matching
    while (improveMatching()) {}
    return n - (int)count(matching.begin(), matching.end(), -1);
}

(Note that this algorithm would still compute the correct result even if, for the second partition, we didn’t track visited flags nor checked for matching edges.)

The Minimum Vertex Cover Problem

A vertex cover of a graph \(G=(V,E)\) is a subset \(C\subseteq V\) of the vertices such that each edge in \(E\) is incident to at least one vertex in \(C\). A minimum vertex cover is a vertex cover of minimal cardinality. Because each edge in a matching needs to contain at least one vertex of a cover \(C\), each vertex cover is at least as large as each matching. In particular, the minimum vertex cover is at least as large as the size of a maximum matching. Interestingly, even the opposite inequality holds: the size of a minimum vertex cover is exactly the size of a maximum matching. For bipartite graphs, this result is known as Kőnig’s theorem. We can prove it by giving an algorithm which computes a minimum vertex cover from a maximum bipartite matching.

Let \(G=(A\cup B,E)\) be a bipartite graph with first partition \(A\) and second partition \(B\). Let \(M\) be a maximum matching in \(G\). Denote by \(B'\subseteq B\) the set of vertices in \(B\) that are not covered by \(M\). Let \(R\) be the set of vertices that can be reached from the vertices in \(B'\) using a path that alternates between edges not in the matching and edges in the matching. (Note that this is exactly the set of vertices that are visited by the last iteration of the improveMatching algorithm.)

Let \(C=(A\cap R)\cup(B\setminus R)\). I.e., a vertex in \(A\) belongs to \(C\) iff it is reachable by an alternating path, and a vertex in \(B\) belongs to \(C\) iff it is not reachable.

The set \(C\) is in fact a vertex cover. To see this, we show that for each edge \(e\), either the incident vertex in \(A\) is reachable using an alternating path from \(B\) or the incident vertex in \(B\) is not reachable by such an alternating path. We argue for edges inside and outside \(M\) separately.

If \(e\) is in \(M\) and its incident vertex in \(A\) is not reachable from \(B'\) by an alternating path, then neither is the incident vertex in \(B\), because such an alternating path would have to use \(e\). Note that the other direction is also true: if \(A\) is reachable, we can reach \(B\) by using the edge \(e\) in the direction from \(A\) to \(B\). Therefore, \(C\) contains exactly one incident vertex of each edge in \(M\).

If \(e\) is not in \(M\) and its incident vertex in \(B\) is reachable from \(B'\) by an alternating path, then so is the incident vertex in \(A\) (because we can just extend the path by using the edge \(e\) in the direction from \(B\) to \(A\)).

This shows that \(C\) is a vertex cover. Note that every vertex of \(C\) is covered by \(M\): By construction \(C\cap B=B\setminus R\subseteq B\setminus B'\) does not contain any vertices that are not covered, because those vertices by definition belong to \(B'\). On the other hand, if there was some vertex in \(C\cap A=A\cap R\) that was not covered by \(M\), then this vertex would be the end of an augmenting path, which is impossible because \(M\) is maximum.

Therefore, each vertex in \(C\) is incident to some edge in \(M\). On the other hand, we know that \(C\) contains exactly one incident vertex of each edge in \(M\), which implies that \(\lvert C\rvert=\lvert M\rvert\). Therefore, \(C\) is a minimum vertex cover.

Assuming the above implementation of maximumMatching, we can compute a minimum vertex cover as follows:

vector<int> cover;
int minimumVertexCover() {
    int size = maximumMatching();
    for (int i = 0; i < n; i++)
        if (visited[i]) cover.push_back(i);
    for (int i = n; i < n + m; i++)
        if (!visited[i]) cover.push_back(i);
    assert(cover.size() == size);
    return size;
}