The Cantor-Bernstein-Schröder Theorem

July 31, 2014

And whose theorem is it anyway?

Georg Cantor, Felix Bernstein, and Ernst Schröder are each famous for many things. But together they are famous for stating, trying to prove, and proving, a basic theorem about the cardinality of sets. Actually the first person to prove it was none of them. Cantor had stated it in 1887 in a sentence beginning, “One has the theorem that…,” without fanfare or proof. Richard Dedekind later that year wrote a proof—importantly one avoiding appeal to the axiom of choice (AC)—but neither published it nor told Cantor, and it wasn’t revealed until 1908. Then in 1895 Cantor deduced it from a statement he couldn’t prove that turned out to be equivalent to AC. The next year Schröder wrote a proof but it was wrong. Schröder found a correct proof in 1897, but Bernstein, then a 19 year old student in Cantor’s seminar, independently found a proof, and perhaps most important, Cantor himself communicated it to the International Congress of Mathematicians that year.

Today I want to go over proofs of this theorem that were written in the 1990’s not the 1890’s.

Often Cantor or Schröder gets left out when naming it, but never Bernstein, and Dedekind never gets included. Steve Homer and Alan Selman say “Cantor-Bernstein Theorem” in their textbook, so let’s abbreviate it CBT. The theorem states:

Theorem: If there is an injective map from a set ${A}$ to ${B}$ and also an injective map from ${B}$ to ${A}$ , then the sets ${A}$ and ${B}$ have the same cardinality.

Recall that a map ${h: A \longrightarrow B}$ is injective provided ${f(x)=f(y)}$ implies that ${x=y}$ , surjective if its range is all of ${B}$ , and bijective if it is both injective and surjective. We can thank the great nonexistent French mathematician Nicolas Bourbaki for standardizing these terms, noting that sur is French for “on.” The definition of ${A}$ and ${B}$ having the same cardinality is that there exists a map ${h: A \longrightarrow B}$ that is bijective. All of this applies for sets of any cardinality, finite or infinite.

CBT and Graph Theory

The key insight for me is to think about CBT as a theorem about directed bipartite graphs. This insight is due to Gyula König. He was a set theorist, but his son became a famous graph theorist. So perhaps this explains the insight. The ideas which follow are related to proofs in a 1994 paper by John Conway and Peter Doyle, which is also summarized here and relates to this 2000 note by Berthold Schweizer. We mentioned Doyle recently here.

A directed bipartite graph has two sets of disjoint vertices the left side and the right side. All edges go only from a left vertex to a right or from a right to a left.

In a directed graph every vertex has an out-degree and an in-degree: the former is the number of edges leaving the vertex and the latter is number of edges that enter the vertex.

In order to study CBT via graph theory we need to restrict our attention to a special type of directed bipartite graph. Say ${G}$ is an injective bipartite graph provided it is a directed bipartite graph with the following restrictions:

The out-degree of all vertices is one;
The in-degree of all vertices is at most one.

The claim is:

Theorem 2 Any injective directed bipartite graph has the same number of left and right vertices.

This theorem proves CBT. Let ${f,g}$ be the injective maps from ${A}$ and ${B}$ as above, where we assume that ${A}$ and ${B}$ are disjoint. Then define ${G}$ as a directed bipartite graph with ${A}$ as the left vertices and ${B}$ as the right. There is an edge from ${x \in A}$ to ${f(x) \in B}$ and an edge from ${y \in B}$ to ${g(y) \in A}$ . The graph is injective: property (1) follows since both ${f}$ and ${g}$ are functions, and property (2) follows since both ${f}$ and ${g}$ are injective.

Basic Properties

From now on, let ${G}$ be an injective directed bipartite graph with left vertices ${A}$ and right vertices ${B}$ . Note that every path in ${G}$ must alternate left and right vertices, because ${G}$ is bipartite.

The key concept is that of a maximal path. A maximal path in ${G}$ is a path that cannot be extended on either end. One simple example of a maximal path is a cycle:

$\displaystyle x_{1} \rightarrow \cdots \rightarrow x_{m} \rightarrow x_{1}.$

However, in an infinite graph there can be other maximal paths. One is a two-way infinite path

$\displaystyle \cdots \rightarrow x_{-m} \rightarrow \cdots \rightarrow x_{-1} \rightarrow x_{0} \rightarrow x_{1} \rightarrow \cdots$

And the other is a one-way infinite path

$\displaystyle x_{0} \rightarrow x_{1} \rightarrow \cdots$

Here there is no edge into ${x_{0}}$ . A simple but important observation is that two distinct maximal paths have no vertices in common.

A final basic observation is the following: Suppose that ${A}$ is partitioned into sets

$\displaystyle A_{1} \cup A_{2} \cup \dots$

and ${B}$ is also partitioned into

$\displaystyle B_{1} \cup B_{2} \cup \dots$

If for each index ${i}$ , there is a bijection from ${A_{i}}$ to ${B_{i}}$ , then ${A}$ and ${B}$ have the same cardinality. Let’s call this the partition trick.

I am trying to include even the simplest idea of the proof. Is this helpful, or am I being too detailed? You can skip the easy parts, but my experience is that people sometimes get hung up on the most basic steps of a proof. This is why I am including all the details.

Finite Graphs

Let’s prove the theorem for the case when the graph is finite.

Theorem 3 An injective directed finite bipartite graph has the same number of left and right vertices.

We had better be able to do this. We claim the following decomposition fact: The graph ${G}$ is the union of disjoint cycles. This follows since the only maximal paths in ${G}$ are cycles—this uses that ${G}$ is finite.

This proves the theorem, since each cycle has the same number of left vertices as right, and therefore each cycle has a bijection from left to right. By the partition trick the theorem is proved.

So far, pretty easy.

Infinite Graphs

We now will prove the theorem in the case that ${G}$ is infinite. By the partition trick we need only show that there is a bijection on each maximal path. This is clear for cycles—it is the same as above even for a two-way infinite cycle: since it alternates left and right, we can just take ${h}$ to be the map that defines the bijection. The case of a one-way infinite path is just barely harder. Let

$\displaystyle x_{0} \rightarrow x_{1} \rightarrow \cdots$

be such a path. If ${x_{0}}$ is a left node, then we define ${h(x_0) = f(x_0) = x_1}$ and ${h(x_1) = g^{-1}(x_1) = x_0}$ to make the bijection go back and forth over the first edge in the path, then ${h(x_2) = f(x_2) = x_3}$ and ${h(x_3) = g^{-1}(x_3) = x_2}$ similarly for the third edge, and so on. If ${x_{0}}$ is a right vertex, we instead get ${h(x_j) = g(x_j)}$ for even ${j}$ and ${h(x_j) = f^{-1}(x_j)}$ for odd ${j}$ . Pretty easy too.

Some Puzzles About The Proof

Even thought the proof is about any sets of any cardinality—as large as you like—the proof employs paths that are either finite or countable in length. This seems a bit strange—no? I have wondered whether this ability to work only with such paths can be exploited in some manner. I do not see how to use this fact. Oh well.

The countable case ${A,B \subset \mathbb{N}}$ matters most in computability and complexity theory. John Myhill proved that if ${f}$ and ${g}$ are computable, then so is some bijection ${h}$ . This at first seems strange—how can you recognize you are in an infinite cycle?—but it works this way in finite stages ${1,2,3,\dots}$ : At any odd stage, let ${x}$ be the least number for which ${h(x)}$ has not been defined. Let ${y_1 = f(x)}$ . If ${y_1}$ has not been listed as a value of ${h}$ at a previous stage, put ${h(x) = y_1}$ . Else, we have already handled some ${x_1}$ such that ${h(x_1) = y_1}$ . Then ${x_1 \neq x}$ since we hadn’t handled ${x}$ , and ${f(x_1) \neq f(x)}$ since ${f}$ is injective. So put ${y_2 = f(x_1)}$ . If ${y_2}$ has not already been listed as a value of ${h}$ , then put ${h(x) = y_2}$ . Else, we have already handled ${x_2}$ such that ${h(x_2) = y_2}$ , and we repeat with ${y_3 = f(x_2)}$ …

Eventually we must exhaust the finitely many strings previously placed into the range of ${h}$ , and the first new string ${y_i}$ becomes ${h(x)}$ . Since ${y_i}$ is a value of ${f}$ , inductively we are also preserving the property ${x \in A \iff h(x) \in B}$ . Next, however, we need to do an even stage. Then we call ${z}$ the least number not already placed into the range of ${h}$ , and consider ${x_1 = g(z)}$ . Then ${x_1 \in A \iff z \in B}$ . If ${x_1}$ is not already in the domain of ${h}$ , we define ${h(x_1) = z}$ . Else, we have already handled some ${z_1}$ such that ${h(x_1) = z_1}$ , and inductively ${z_1 \in B \iff x_1 \in A \iff z \in B}$ , so we repeat with ${x_2 = g(z_1)}$ . Eventually we obtain ${x_i}$ not already in the domain of ${h}$ , and define ${h(x_i) = z}$ . This entire process computably defines both ${h}$ and ${h^{-1}}$ in “zipper” fashion for any ${x}$ or ${z}$ , yielding a computable bijection of ${\mathbb{N}}$ that induces an isomorphism between ${A}$ and ${B}$ .

In complexity theory, we want ${h}$ to be polynomial-time computable if ${f}$ and ${g}$ are. This is true provided ${f}$ and ${g}$ are length increasing as well as polynomial-time invertible, but not by the same algorithm—given an ${x}$ we can’t go back and do all previous stages in polynomial time. Instead, the algorithm found by Leonard Berman and Juris Hartmanis alternates trying

$\displaystyle g^{-1}(x),f^{-1}(g^{-1}(x)),g^{-1}(f^{-1}(g^{-1}(x))),\dots$

as far as possible, which must stop within length-of- ${x}$ steps because the length is always decreasing. If ${g^{-1}}$ fails first then ${h(x) = f(x)}$ , else ${h(x) = g^{-1}(x)}$ .

This is the second puzzle: why do the ideas have to be so different? Is there a common formulation that might be used with other levels and kinds of complexity? The Berman-Hartmanis proof does resemble the one-way-infinite path case more than it does Myhill’s proof.

Open Problems

Does this help in understanding the proof? There are many proofs of CBT on the web, perhaps this is a better version. Take a look.

[fixed “domain”->”range” in one place in Myhill proof; worked around WordPress bug with length-of-x for |x|.]

The Cantor-Bernstein-Schröder Theorem

The Cantor-Bernstein-Schröder Theorem

CBT and Graph Theory

Basic Properties

Finite Graphs

Infinite Graphs

Some Puzzles About The Proof

Open Problems

Like this:

Recommend

偶像选秀，从娱乐至死到娱乐之死

涿州楼盘停工下，被“不交房”拖住的生活 | 钛媒体影像《在线》

gdb 不肯加载调试信息怎么办？

Top Marketing Books: Experts Recommend the Best!

BeWater 大会｜BMAN: 建设一个不焦虑的开发者社区

Towards a Reliable Device Management Platform

新消费的流量泡沫不减，“暴雷”是迟早的事儿？

微信终于要对聊天记录动手了？

将样品误售顾客导致后者洗胃，喜茶致歉：全国门店已下架同款陈列道具

Cloud Native Meetup Saar #3

About Joyk