Dynamical systems and Ramsey theory – Ramsey DocCourse Prague 2016 – Mike Pawliuk

The following notes are from the Ramsey DocCourse in Prague 2016. The notes are taken by me and I have edited them. In the process I may have introduced some errors; email me or comment below and I will happily fix them.

Title: Ergodic theory and amenable groups

Lecturer: Benjamin Weiss

Date: October 31, 2016.

Main Topics: Three equivalent notions of amenability, Basic concepts in ergodic actions, Furstenberg’s Ergodic proof of Szemerédi’s theorem

Definitions: Ergodic action, weak mixing, mixing, Banach limit, amenable group, left invariant mean, paradoxical decomposition, Følner sequence

Introduction

Our presentation will go in reverse order that Benji Weiss presented. This is mostly to highlight the presentation of amenable groups, which is independent from ergodic actions. The punchline is that the tools of ergodic theory apply in amenable groups.

In the first half we present three equivalent definitions of amenable group. In the second half we look at basic concepts, theorem and equivalences in ergodic theory.

Amenable groups

We will present three equivalent definitions of an amenable group and then show their equivalence.

Almost invariance, Følner’s condition

We start with a practical definition that captures the asymptotic invariance of a group’s action.

Definition 1 (Følner 1955). Let $G$ be a discrete countable group.

For $A,B \subseteq G$ define the pointwise product $AB = \{ab : a \in A, b \in B\}$ . Recall $[X]^{<c} := \{Y \subseteq X : \vert Y \vert < \vert c \vert\}$ , and we use $\omega = \vert \mathbb{N} \vert$ . The symmetric difference is denoted $A \Delta B$ .

We say $G$ is amenable if there is a sequence $(F_n)_{n \in \mathbb{N}}$ of finite subsets of $G$ such that for all $K \in [G]^{<\omega}$ we have

$\displaystyle \lim_{n \rightarrow \infty} \frac{\vert KF_n \Delta F_n \vert}{\vert F_n \vert} = 0.$

Such a sequence $\langle F_n \rangle$ is called a Følner sequence.

Exercise. Note that we may assume that $K$ contains the identity of $G$ and is symmetric, i.e. $K = K^{-1} := \{k^{-1} : k \in K\}$ . Such an assumption only makes it harder for a group to be amenable. Show that in this case in the definition of amenability we can replace the quantity $\vert KF_n \Delta F_n \vert$ by $\vert KF_n \setminus F_n \vert$ .

Later on we will need a finitary version of this definition. We add it here for contrast, but I realize it might be too abstract to grok at this moment. Feel free to come back to it later after you’ve seen some examples of amenable groups.

Definition 1 (finite version). Let $G$ be a discrete countable group.

We say $G$ is amenable if for all $K \in [G]^{0$ , there is a $F \in [G]^{<\omega }$ such that

$\displaystyle \frac{\vert KF \setminus F \vert}{\vert F \vert} = < \epsilon.$

Exercise. Here are three abstract exercises to encourage you to play with the definitions. Define $E(K,F) = \frac{\vert KF \setminus F \vert}{\vert F \vert}$ .

Show if a group is amenable is the sense of definition 1, then it is amenable in the finite version of definition 1. (Easy.)
Suppose that a group $G$ is not amenable. Write down what the finite version of definition 1 tells you. (Easy.) Interpret this result as a statement about Cheeger constants.
Show that if $K_1 \subset K_2$ , then $E(K_1, F) \leq E(K_2, F)$ for all $F$ . (Easy.)
Show that $\vert K(F_1 \cup F_2) \setminus (F_1 \cup F_2) \vert \leq \vert KF_1 \setminus F_1 \vert + \vert KF_2 \setminus F_2 \vert$ . (Do it in two steps.) (Easy-ish.)
Show if a group is amenable is the sense of definition 1, then it is amenable in the finite version of definition 1. Use exercise (4) above. (Medium.)

Examples

Example. Consider $\mathbb{Z}$ with the discrete topology. Let $F_n = [-n, n]$ . For any finite set $K \subset \mathbb{Z}$ , let $N = \max\{\vert k \vert : k \in K\}$ . Note that $\vert K[-N^2, N^2] \Delta [-N^2, N^2] \vert \approx 2N << N^2$ . (Exercise. Check this last line. Draw a picture.)

We look at two other quick examples of amenable classes of groups. We will not need these, but they illustrate that amenable groups form a broad class.

Fact. Abelian groups are amenable.

The “Meatloaf” lemma.. Let $H$ be a normal subgroup of $G$ . If any two of the following are amenable

$G$ ,
$H$ ,
the quotient $G$ by $H$ ,

then the third is amenable.

From this we can conclude that solvable groups are amenable.

Exercise. The proof is left as an exercise. You may wish to watch this video to explain the name.

Means and ultrafilter limits

The Følner condition definition was not the original definition for amenable groups. We present the original definition (due to von Neumann) here.

Definition 2 (von Neumann 30s?). Let $G$ be a discrete countable group. Let $L^\infty(G)$ be the collection of bounded functions from $G$ to $\mathbb{C}$ . Let $\chi_X$ be the indicator function of $X \subseteq G$ .

A (multiplicative left $G$ -invariant) mean is a linear functional $\Lambda: L^\infty (G) \rightarrow \mathbb{C}$ such that

$\Lambda$ is positive ( $f \geq 0$ implies $\Lambda(f) \geq 0$ ), of bounded norm 1 and $\Lambda(\chi_G) = 1$ (mean),
$\Lambda (g \cdot f) = \Lambda (f)$ for all $g \in G, f \in L^\infty (G)$ where $(g \cdot f)(x) = f(g^{-1}x)$ (left $G$ -invariant),
$\Lambda(fh) = \Lambda(f)\Lambda(h)$ for all $f,h \in L^\infty (G)$ (multiplicative).

A group $G$ is amenable if there is a (multiplicative left $G$ -invariant) mean on $L^{\infty}(G)$ .

Example. In particular, a mean $\Lambda$ on $L^{\infty}(G)$ defines a left $G$ -invariant probability measure $\mu$ on $G$ by defining $\mu(X) = \Lambda(\chi_X)$ . (Exercise. Check this.)

The example shows that a mean contains at least enough information to define a measure on $G$ . However, a mean also assigns values to bounded linear functions that are not characteristic functions. In practice, this means that constructing a left $G$ -invariant mean can be complicated, even in the case where $G=\mathbb{Z}$ . In most (all?) interesting cases, the construction of a mean will involve a use of the axiom of choice. Sometimes this will be Zorn’s Lemma and other times it will be an ultrafilter limit (or Banach limit).

Exercise. The following series of exercises will walk you through creating a left $\mathbb{Z}$ -invariant mean on $L^{\infty} (\mathbb{Z})$ . Look at Chapter 28, Problem 17 of the wonderful book Problems and Theorems in Classical Set Theory by Péter Komjáth. Note the important use of the axiom of choice.

Before we show that definition 1 and 2 are equivalent, we remind you about the notion of an ultrafilter limit. It replaces one notion of largeness (“the tail of a sequence”) with an ultrafilter notion of largeness (“in a specific ultrafilter”).

Definition. Let $\mathcal{U}$ be an ultrafilter on $\mathbb{N}$ . Let $(x_n)_{n \in \mathbb{N}}$ be a sequence in $\mathbb{R}$ . For $L \in \mathbb{R}$ , define the $\mathcal{U}$ -limit of $(x_n)$ by

$\displaystyle \lim_\mathcal{U} x_n = L \Leftrightarrow (\forall \epsilon > 0)(\exists A \in \mathcal{U})(\forall n \in A)[\vert x_n - L \vert < \epsilon].$
Such a limit is also sometimes called a Banach limit.

The remarkable fact is that ultrafilter limits always converge! This is useful when defining measures and means.

Fact. Let $\mathbb{U}$ be an ultrafilter on $\mathbb{N}$ . For every sequence $(x_n)_{n \in \mathbb{N}}$ of real numbers, there is a (unique) $L \in \mathbb{R}$ such that $\lim_\mathcal{U} x_n = L$ .

In particular, it says that an ultrafilter can decide whether the sequence $(-1)^n$ converges to $1$ or $-1$ (depending on whether the evens are an ultrafilter set or whether the odds are an ultrafilter set.)

Exercise. If you’re interested in ultrafilter limits, here are some exercises for you.

Prove the above fact (including uniqueness).
Prove that different ultrafilter limits may assign different values to a sequence $(x_n)$ .
Suppose that $\lim_{n \rightarrow \infty} x_n = L$ (in the first year calculus sense). Prove that $\lim_\mathcal{U} x_n = L$ , for any (nonprinciple) ultrafilter $\mathcal{U}$ .

You can find more exercises in Chapter 17, problems 19 and 20 of Komjáth’s “Problems and Theorems in Classical Set Theory“.

Now we can prove the following result.

Proposition (Følner 1955). If a group is amenable in the sense of Følner, definition 1, then it is amenable in the sense of Von Neumann, definition 2.

Proof. Suppose that $G$ satisfies the Følner condition, so has a Følner sequence $(F_n)_{n \in \mathbb{N}}$ . Let $\mathcal{U}$ be an ultrafilter on $\mathbb{N}$ . We define a linear functional $\Lambda: L^\infty (G) \rightarrow \mathbb{C}$ by

$\displaystyle \Lambda(\phi) = \lim_{\mathcal{U}} \frac{1}{\vert F_n \vert} \sum_{g \in F_n} \phi(g).$
It is an exercise to check $G$ -invariance of $\Lambda$ . See the exercises after the definition of mean for intuition on why this is the correct way to define a mean.

Paradoxical decompositions

So far we have not spoken about non-amenable groups. This is partly because the conditions we have are better suited for showing amenability. We fix that by introducing another equivalent definition for amenability.

This definition is motivated by von Neumann’s study of the Banach-Tarski paradox.

Definition 3. We say that a group $G$ has a paradoxical decomposition if there is a partition $G$ into disjoint sets $A_1, \ldots, A_k, B_1, \ldots, B_l$ and elements $\{g_i : 1 \leq i \leq n_A\}$ and $\{h_i : 1 \leq i \leq n_B\}$ such that

$\displaystyle \bigcup_{i=1}^{n_A} g_i A_i = G = \bigcup_{i=1}^{n_B} h_i B_i.$

Exercise. Show that it is equivalent to ask that the sets $A_1, \ldots, A_k, B_1, \ldots, B_l$ are merely disjoint; their union need not be all of $G$ .

For other exercises, consult Chapter 19 of Komjáth’s “Problems and Theorems in Classical Set Theory“.

Example. Let $F_2$ be the free group on generators $a,b$ , with empty word $e$ . Let $W(a)$ be the set of words in $F_2$ that start with an $a$ . Similarly we define $W(b), W(a^{-1})$ and $W(b^{-1})$ . Note that

$\displaystyle F_2 = W(a) \cup W(b) \cup W(a^{-1}) \cup W(b^{-1}) \cup \{e\}.$
Also we see that

$\displaystyle aW(a^{-1}) \cup a^{-1}W(a) \cup e\{e\} = F_2 = bW(b^{-1}) \cup b^{-1}W(b).$
(Exercise. Ignoring $\{e\}$ , find a paradoxical decomposition of $F_2$ where the $g_i A_i$ are pairwise disjoint, and the $h_i B_i$ are pairwise disjoint.)

The forward direction of the following is an easy exercise.

Proposition (Tarksi). If group has a paradoxical decomposition, then it does not have a left invariant measure (and hence it is not amenable in the sense of von Neumann).

Conversely, if a group is amenable in the sense of von Neumann, then it does not have a paradoxical decomposition.

Corollary. The free group $F_2$ is not amenable.

Tarski’s original proof is combinatorial. We will give a (shocking) direct proof of the following fact, using graph theory.

Theorem (Deuber-Simonovits-Sos 1995). If a countable discrete group $G$ is amenable in the sense of Følner (definition 1), then it has a paradoxical decomposition.

In order to prove this, we will require the following infinite version of Hall’s marriage theorem.

Definition. Let $G^\prime = (V^\prime, E^\prime)$ be a graph, and $S \in [V^\prime]^{<\omega}$ .

Define $N(S) = \{v \in V^\prime : \exists w \in S, \{v,w\} \in E^\prime\}$ .

Now assume $V^\prime = V_1 \cup V_2$ is a bipartition. For $k,l \in \mathbb{N}$ a $(k,l)$ matching is a subcollection $E \subseteq E^\prime$ such that every vertex in $V_1$ has degree $k$ (in $E$ ) and every vertex in $V_2$ has degree $l$ (in $E$ ).

Theorem (Generalized Hall-Rado). Let $G^\prime = (V_1 \sqcup V_2, E^\prime)$ be a countable, locally finite (i.e every vertex has finite degree) bipartite graph. Then there is a $(k,l)$ -matching iff

$\forall S \in [V_1]^{<\omega}$ we have $\vert N(S) \vert \geq k \vert S \vert$ , and
$\forall S \in [V_2]^{<\omega}$ we have $\vert N(S) \vert \geq l \vert S \vert$ .

Proof of Deuber-Simonovits. We prove the contrapositive. Suppose that $G$ fails the finite Følner condition. We will find a condition that can be turned into an estimate for the Hall-Rado theorem. (It will become clear later what graph we will be using. You can skip to the box below if you only care about what that estimate is.)

So there is a symmetric $K_0 \in [G]^{ 0$ such that for all $F \in [G]^{<\omega}$ we have $\frac{\vert K_0 F \setminus F \vert}{\vert F \vert} \geq 0$ .

Note that this can be interpreted as an "expansion property". It can be interpreted to say that the Cheeger constant of a Cayley graph is positive. (That remark was just for intuition, we will not use it.)

This gives the estimate $(1 + \epsilon_0) \vert F \vert \leq \vert K_0 F \vert$ .

Applying it again gives $(1 + \epsilon_0)^2 \vert F \vert \leq (1 + \epsilon_0)\vert K_0 F \vert \leq \vert K_0^2 F \vert$ .

In general after $d$ applications we get $(1 + \epsilon_0)^d \vert F \vert \leq \vert K_0^d F \vert$ .

Choose a $d_0$ large enough so that $2 \leq (1 + \epsilon_0)^d$ . Let $K_1 := K_0^{d_0}$ . Note that it is symmetric.

We summarize this information:

Expansion condition There is a symmetric $K_1 \in [G]^{<\omega}$ such that for all $F \in [G]^{<\omega}$ we have

$\displaystyle 2 \vert F \vert \leq \vert K_1 F \vert.$

We set up an application of the generalized Hall-Rado theorem to a graph $G^\prime$ . Let $G^\prime$ be a bipartite graph where $V_1$ and $V_2$ are each copies of $G$ . Put an edge $\{g_1, g_2\} \in E$ if $g_1,g_2$ are in different copies of $V_i$ and there is a $k \in K_1$ such that $g_2 = kg_1$ . (Since $K_1$ is symmetric this is a graph and not a directed graph.)

Note that the expansion condition, applied to $F = \{v\}$ , says that every vertex in $G$ must have at least $2$ neighbours. In fact, the expansion condition gives us the required condition for the Hall-Rado theorem for $(k,l) = (2,1)$ .

It produces two functions $\phi: A \rightarrow G, \psi: B \rightarrow G$ with disjoint domains ( $A,B \subset G$ ) such that $\{a, \phi(a)\}$ and $\{b, \psi(b)\}$ are always edges and $\text{range}(\phi) = G = \text{range}(\phi)$ .

This means that $K_1 A = G$ and $K_1 B = G$ .

Exercise. Convince yourself (1) that the previous sentence is true, (2) that this is actually a paradoxical decomposition and (3) the Hall-Rado theorem actually gives us two such functions.

Summary

Here is a summary of the previous sections.

Theorem. Let $G$ be a discrete countable group. TFAE:

$G$ has a Følner sequence,
$L^{\infty}(G)$ admits a multiplicative left $G$ -invariant mean,
$G$ has no paradoxical decompositions.

If any of these conditions happen, then we say that $G$ is amenable.

Proof strategies.

[ $(1) \Leftrightarrow (1^\prime)$ .] We left this as an exercise using a diagonalization argument.

[ $(1) \Rightarrow (2)$ .] This used the concept of a $\mathcal{U}$ -limit.

[ $(2) \Rightarrow (3)$ .] The fact that $\neg (3) \Rightarrow \neg (2)$ is more or less obvious and is left as an exercise.

[ $(3) \Rightarrow (1^\prime)$ .] We presented the Hall’s Marriage argument that showed $\neg (1^\prime) \Rightarrow \neg (3)$ .

Historically, we also have the following

[ $(2) \Rightarrow (1)$ .] Due to Følner 1955.

[ $(3) \Rightarrow (2)$ .] Due to Tarski.

Ergodic theory

Now we switch gears to talk about the ergodic theory. We introduce the basic notions with the ultimate goal of wanting to show how the tools of ergodic theory have been used to show Ramsey results.

Definitions and examples

Definition. An measure preserving action on a measure space is a quadruple $\mathcal{X} = (X, \mathcal{B}, \mu, T)$ where $(X,\mathcal{B}, \mu)$ is a probability measure space and $T:X \longrightarrow X$ has the properties:

$T[\mathcal{B}] = \mathcal{B}$ , and
$\mu(TA) = \mu(A)$ for all $A \in \mathcal{B}$ . (measure preserving)

The action is ergodic if in addition:

If $\mu(TA \Delta A) = 0$ then $\mu(A) \in \{0,1\}$ .

We now present two classic examples.

Example 1. Irrational rotation. Fix an irrational number $r$ . Let $\mathbb{T}^1$ be $\mathbb{R}$ mod $\mathbb{Z}$ , which is the circle. Let $T(x) = x+r$ . Exercise: Show that this is an ergodic action.

Example 2. Coin flipping, the 2-shift. Let $X = \{0,1\}^\mathbb{Z}$ , with the product measure where each point in $\{0,1\}$ gets measure $\frac{1}{2}$ . This is a compact space. The action $T(\ldots, x_{-2}, x_{-1}, x_0, x_1, x_2, \ldots) = (\ldots, x_{-3}, x_{-2}, x_{-1}, x_0, x_1, \ldots)$ .

The ergodic theorem

The ergodic theorem, first proved by Birkhoff in 1931.

Theorem (Birkhoff 1931). [THE ERGODIC THEOREM]

Products, mixing and weak-mixing

We now get to two fundamental properties that an ergodic action may have. First we introduce the product action.

Definition. Let $\mathcal{X} = (X, \mathcal{B}, \mu, T)$ be an action on a measure space. The product action on the product space is $\mathcal{X}^2 = (X\times X, \mathcal{B} \times \mathcal{B}, \mu \star \mu, T\oplus T)$ , where this is the product measure and the componentwise action.

A natural question is “When is the product of an ergodic action still ergodic?”. Let’s see what the answer is for our earlier examples.

Example 1. For an irrational rotation of $\mathbb{T}$ , the product is not ergodic. Consider the subspace where $x-y$ is a constant, then $(x+\alpha)-(y+\alpha) = x-y$ , so this will be an invariant subspace.

Example 2. The product of the coin flipping shift action is just $\{00,01,10,11\}^\mathbb{Z}$ where each point has measure $\frac{1}{4}$ . This is ergodic.

This gives us the following definition.

Definition. Let $\mathcal{X} = (X, \mathcal{B}, \mu, T)$ be an ergodic action on a measure space. It is weakly mixing if the product is ergodic.

The name comes from the following property which we will not really study here.

Definition. Let $\mathcal{X} = (X, \mathcal{B}, \mu, T)$ be an action on a measure space. It is mixing if for all $A,B \in \mathbb{B}$ we have
$\displaystyle \lim_{n \rightarrow \infty} \mu(T^{-n}A \cap B) = \mu(A)\mu(B).$

I don’t know why I wrote this. This is verbatim..
In $L^2$ ,
$\displaystyle \lim_{n \rightarrow \infty} \frac{1}{n} \sum_{i=0}^{n-1} \mu(T^{-i}A \cap B) = \mu(A)\mu(B).$

The following explains the name “weak-mixing”.

Fact. If an action is weak-mixing, then there is a set $D \subset \mathbb{N}$ of density $1$ such that
$\displaystyle \lim_{D} \mu(T^{-n}A \cap B) = \mu(A)\mu(B).$

Definition. Let $A \subset \mathbb{N}$ . The (upper) density is defined as
$\displaystyle D(A) = \limsup_{n \rightarrow \infty} \frac{\vert A \cap [0, 1 \ldots, n-1] \vert }{n}.$

Exercise. Density is just a measure of largeness, but here are some exercises about it if you want to understand it better.

$D(\mathbb{N}) = 1$
The set of all even numbers has density $\frac{1}{2}$ .
The set of square numbers has density $0$ .
Every finite set has measure $0$ .
If $D(A)$ is defined, then $D(\mathbb{N}\setminus A) = 1 - D(A)$ .
There are sets where the density is not defined.

There was a question from the audience about mixing vs weak-mixing. The following example distinguishes them.

Example. Let $MPT$ be the group of all measure-preserving transformations of $[0,1]$ with the Lebesgue measure.

Topologize it to make it a Polish group so that $T_n \rightarrow T$ if for all $f \in L^2$ we have $\|T_n f - T f\|_{L^2} = 0$ .

Generically, $T$ is weak-mixing but not mixing.

This is actually two results, one due to Halmos (“In general a measure preserving transformation is mixing” 1944) and one due to Rokhlin (“A ‘general’ measure-preserving transformation is not mixing.” 1948). The titles are hilariously contradictory, but in reality one is referring to weak-mixing and the other is refering to mixing.

The correspondence principle

Ultimately, we will want to talk about Furstenberg’s 1977 proof of Szemerédi’s theorem. To do this we introduce the correspondence principle, which connects measure and density.

The Correspondence Pinciple. Let $\hat{A} \subset \mathbb{N}$ have positive density. There is an ergodic system $(X, \mathbb{B}, \mu, T)$ and a set $A \in \mathbb{B}$ such that $\forall t_1, \ldots, t_l \in \mathbb{N}$ TFAE:

$\mu(T^{-{t_1}}A \cap \ldots \cap T^{-{t_l}}A) > 0$ ,
$(\hat{A} - t_1)\cap \ldots \cap (\hat{A} - t_l)$ has positive density.

Whatever happens here is relfected in the set of positive density. Once you have multiple recurrence in the ergodic setting, then you get it in the set of positive density.

Furstenberg’s proof of Szemerédi’s theorem uses the following ergodic result.

Theorem (Furstenberg 1977). If $\mathcal{X}$ is weak-mixing, $A \in \mathbb{B}$ , and $k$ is fixed, then

$\displaystyle \lim_{n \rightarrow \infty}\frac{1}{n} \sum_{i=0}^{n-1} \mu(A \cap T^n A \cap \ldots \cap T^{(k-1)n} A) = \mu(A)^k.$

Sketch of proof of the correspondence principle. Fix $\hat{A} \subset \mathbb{N}$ of positive density. Let $X = \{0,1\}^\mathbb{Z}$ .

We need to define a measure on each set $X[g]$ , where $g: F \rightarrow \{0,1\}$ and $F \subset \mathbb{Z}$ is finite. Here $X[g] : \{f \in X : f(x)=g(x), \forall x \in F\}$ is the basic open set given by the finite function $g$ . It will be enough to define the measure on this set because of the Caratheodory extension theorem.

Fix an ultrafilter $\mathcal{U}$ . For a finite $g:F \rightarrow \{0,1\}$ , define

$\displaystyle \mu(X[g]) = \lim_{\mathcal{U}} \frac{\rho(g,F,n)}{n},$
where $\rho(g,F,n)$ counts the number of times that the “pattern of $g$ occurs in $\hat{A}$ “, to be precise

$\displaystyle \sum_{i=0}^{n-1} \text{BOOLE}(g(x+i) = \chi_\hat{A} (x+i), \forall x \in F)$
where $\text{BOOLE}$ is a function that evalutes to $1$ if the statement is true and $0$ if it is false.

Note that this isn’t an ergodic measure, but we can fix that.

Multiple recurrence for an irrational rotation. Let $T$ be an irrational rotation. There is a sequence $n_i \rightarrow \infty$ such that for a fixed $k$ ,

$\displaystyle \lim_{i \rightarrow \infty} \mu(A \cap T^{-n_i} A \cap T^{-2n_i} A \cap \ldots \cap T^{-(k-1)n_i} A) = \mu(A)^k.$

Proof. Suppose that $T$ is given by the rotation by $\alpha$ . Take a suquence $n_i$ so that $\lim_{i \rightarrow \infty} n_i \alpha = 0$ .

In the next lecture we will see how this applies to Furstenberg’s proof.

Amenable groups and ergodic theory

We now make the connection between amenable groups and ergodic theory.

Definition. We need a structural theorem for $L_2$ that can be proved using a Reiss-representation style argument. Suppose $(X, \mathcal{B}, \mu, T_g)$ is a measure-preserving action of $X$ for each $g \in G$ .

Define $H_0 \leq L^2$ to be the linear span of the functions of the form $\Psi(T_a x) - \Psi(x)$ , where $\Psi \in L^\infty$ and $a \in G$ .

Define $\text{Inv} \leq L^2$ to be the $G$ -invariant onto functions.

We can write $L^2 = \text{Inv} \oplus H_0$ , and it comes with two projection functions that project functions down to the corresponding subspace.

The $L^2$ ergodic theorem for amenable groups. Suppose $G$ is amenable, $\langle F_n \rangle$ is a Følner sequence, and $(X, \mathcal{B}, \mu, T_g)$ is a measure-preserving action of $X$ for each $g \in G$ . Then for all $f \in L^2(X, \mathcal{B}, \mu)$ we have

$\displaystyle \lim_{n \rightarrow \infty} \frac{1}{\vert F_n \vert} \sum_{g \in F_n} f(T_g x) = \text{Proj}_\text{Inv}(f).$

Note that

$\displaystyle \lim_{n \rightarrow \infty} \frac{1}{\vert F_n \vert} \sum_{g \in F_n} \left(\Psi(T_g T_a x) - \Psi(T_g x) \right) = 0.$

For example, we can look at the 2-shift of an amenable group $G$ as it acts on $\{0,1\}^G$ , with the product measure where the points in $\{0,1\}$ each get measure $\frac{1}{2}$ .

References

Birkhoff (“Proof of the ergodic theorem“, 1931)
Deuber-Simonovits-Sós (“A note on paradoxical metric spaces.”, 1995) – Graph theory proof.
Halmos (“In general a measure preserving transformation is mixing“, 1944) – Weak mixing
Rokhlin (“A ‘general’ measure-preserving transformation is not mixing”, 1948) – not strong mixing
Furstenberg (“Ergodic behavior of diagonal measures and a theorem of Szemerédi on arithmetic progressions“, 1977)
Szemerédi (“On sets of integers containing no $k$ elements in arithmetic progression“, 1975) – Proof of Szemerédi’s theorem
Følner (“On groups with full Banach mean value“, 1955) – equivalence of amenable

Expository material and exercises

Tao (“Some notes on amenability“, 2009) – Notes about amenability
Tao (“254A, Lecture 10: The Furstenberg correspondence principle“, 2008) –
Zhao (“Szemerédi’s Theorem via Ergodic Theory“, 2011) – Exhaustive primer about the topic
Komjath (“Problems and Theorems in Classical Set Theory“, 2006) – Exercises about amenability, Ultrafilter limits and paradoxical decompostitions.