<<

Lecture Notes

Lecture by Maria Axenovich and Torsten Ueckerdt (KIT) Problem Classes by Jonathan Rollin (KIT)

Lecture Notes by Stefan Walzer (TU Ilmenau) Last updated: May 30, 2017

1 Contents

0 What is Combinatorics?4

1 and Combinations9 1.1 Basic Principles...... 9 1.1.1 Addition Principle...... 10 1.1.2 Multiplication Principle...... 10 1.1.3 Subtraction Principle...... 11 1.1.4 Principle...... 11 1.1.5 ...... 11 1.1.6 Double counting...... 11 1.2 Ordered Arrangements – Strings, Maps and Products...... 12 1.2.1 Permutations...... 13 1.3 Unordered Arrangements – , Subsets and ...... 15 1.4 Multinomial Coefficients...... 16 1.5 The – Balls in Boxes...... 22 1.5.1 U L: n Unlabeled Balls in k Labeled Boxes...... 22 1.5.2 L → U: n Labeled Balls in k Unlabeled Boxes...... 25 1.5.3 L → L: n Labeled Balls in k Labeled Boxes...... 27 1.5.4 U → U: n Unlabeled Balls in k Unlabeled Boxes.... 28 1.5.5 Summary:→ The Twelvefold Way...... 30 1.6 Binomial Coefficients – Examples and Identities...... 30 1.7 Permutations of Sets...... 34 1.7.1 Generating Permutations...... 36 1.7.2 Cycle Decompositions...... 37 1.7.3 Transpositions...... 39 1.7.4 Derangements...... 42

2 Inclusion-Exclusion-Principle and M¨obiusInversion 45 2.1 The Inclusion-Exclusion Principle...... 45 2.1.1 Applications...... 47 2.1.2 Stronger Version of PIE...... 52 2.2 M¨obiusInversion Formula...... 53

3 Generating Functions 58 3.1 Newton’s Binomial ...... 62 3.2 Exponential Generating Functions...... 64 3.3 Recurrence Relations...... 67 n 3.3.1 Special Solution an = x and the Characteristic Polynomial 70 3.3.2 Advancement Operator...... 76 3.3.3 Non-homogeneous Recurrences...... 80 3.3.4 Solving Recurrences using Generating Functions..... 82

2 4 Partitions 85 4.1 Partitioning [n] – the on n elements...... 85 4.1.1 Non-Crossing Partitions...... 86 4.2 Partitioning n – the ...... 87 4.3 ...... 93 4.3.1 Counting Tableaux...... 99 4.3.2 Counting Tableaux of the Same Shape...... 100

5 Partially Ordered Sets 107 5.1 Subposets, Extensions and ...... 111 5.2 Capturing Posets between two Lines...... 117 5.3 Sets of Sets and Multisets – Lattices...... 123 5.3.1 Symmetric Chain Partition...... 126 5.4 General Lattices...... 130

6 Designs 132 6.1 (Non-)Existence of Designs...... 133 6.2 Construction of Designs...... 135 6.3 Projective Planes...... 137 6.4 Steiner Triple Systems...... 138 6.5 Resolvable Designs...... 139 6.6 Latin Squares...... 140

3 What is Combinatorics?

Combinatorics is a young field of , starting to be an independent branch only in the 20th century. However, combinatorial methods and problems have been around ever since. Many combinatorial problems look entertaining or aesthetically pleasing and indeed one can say that roots of combinatorics lie in mathematical recreations and games. Nonetheless, this field has grown to be of great importance in today’s world, not only because of its use for other fields like physical , social sciences, biological sciences, and computer .

Combinatorics is concerned with: Arrangements of elements in a set into patterns satisfying specific rules, • generally referred to as discrete structures. Here “discrete” (as opposed to continuous) typically also means finite, although we will consider some infinite structures as well.

The existence, enumeration, and optimization of discrete struc- • tures. Interconnections, generalizations- and specialization-relations between sev- • eral discrete structures.

Existence: We want to arrange elements in a set into patterns satisfying certain rules. Is this possible? Under which conditions is it possible? What are necessary, what sufficient conditions? How do we find such an arrangement?

Enumeration: Assume certain arrangements are possible. How many such arrangements exist? Can we say “there are at least this many”, “at most this many” or “exactly this many”? How do we generate all arrangements efficiently?

Classification: Assume there are many arrangements. Do some of these arrangements differ from others in a particular way? Is there a natural partition of all arrangements into specific classes? Meta-Structure: Do the arrangements even carry a natural underlying structure, e.g., some ordering? When are two arrangements closer to each other or more similar than some other pair of arrangements? Are different classes of arrangements in a particular ? Optimization: Assume some arrangements differ from others according to some measurement. Can we find or characterize the arrangements with maximum or minimum measure, i.e. the “best” or “worst” ar- rangements?

4 Interconnections: Assume a discrete structure has some properties (num- ber of arrangements, . . . ) that match with another discrete structure. Can we specify a concrete connection between these structures? If this other structure is well-known, can we draw conclusions about our structure at hand?

We will give some life to this abstract list of tasks in the context of the following example. Example (Dimer Problem). Consider a generalized chessboard of size m n (m rows and n columns). We want to cover it perfectly with dominoes of size× 2 1 or with generalized dominoes – called polyominoes – of size k 1. That means× we want to put dominoes (or polyominoes) horizontally or vertically× onto the board such that every of the board is covered and no two dominoes (or polyominoes) overlap. A perfect covering is also called tiling. Consider Figure 1 for an example.

!

Figure 1: The 6 8 board can be tiled with 24 dominoes. The 5 5 board cannot be tiled with× dominoes. ×

Existence If you look at Figure1, you may notice that whenever m and n are both odd (in the Figure they were both 5), then the board has an odd number of squares and a tiling with dominoes is not possible. If, on the other hand, m is even or n is even, a tiling can easily be found. We will generalize this observation for polyominoes: Claim. An m n board can be tiled with polyominoes of size 1 k if and only if k divides m or× n. × Proof. “ ” If k divides m, it is easy to construct a tiling: Just cover every column⇐ with m/k vertical polyominoes. Similarly, if k divides n, cover every row using n/k horizontal polyominoes. “ ” Assume k divides neither m nor n (but note that k could still divide ⇒ the product m n). We need to show that no tiling is possible. We · write m = s1k + r1, n = s2k + r2 for appropriate s1, s2, r1, r2 N and 0 < r , r < k. Without loss of generality, assume r r (the argument∈ 1 2 1 ≤ 2 is similar if r2 < r1). Consider the colouring of the m n board with k colours as shown in Figure2. ×

5 1 2 3 4 5 6 7 8 9 1 2 3 4 k 5 k 6 7 8

Figure 2: Our polyominoes have size k 1. We use k colours (1 = white, k = black) to colour the m n board (here:× k = 6, m = 8, n = 9). Cutting the board at coordinates that× are multiples of k divides the board into several chunks. All chunks have the same number of squares of each color, except for the bottom right chunk where there are more squares of color 1 (here: white) than of color 2 (here: light gray).

Formally, the colour of the square (i, j) is defined to be ((i j) mod k)+1. Any of size k 1 that is placed on the board will− cover exactly one square of each colour.× However, there are more squares of colour 1 than of colour 2, which shows that no tiling with k 1 dominoes is possible. × Indeed, for the number of squares coloured with 1 and 2 we have:

# squares coloured with 1 = ks1s2 + s1r2 + s2r1 + r2 # squares coloured with 2 = ks s + s r + s r + r 1 1 2 1 2 2 1 2 − Now that the existence of tilings is answered for rectangular boards, we may be inclined to consider other types of boards as well: Claim (Mutilated Chessboard). The n n board with bottom-left and top-right square removed (see Figure3) cannot be× tiled with (regular) dominoes.

Figure 3: A “mutilated” 6 6 board. The missing corners have the same colour. ×

Proof. If n is odd, then the total number of squares is odd and clearly no tiling can exist. If n is even, consider the usual chessboard-colouring: In it, the missing squares are of the same colour, say black. Since there was an equal number of black and white squares in the non-mutilated board, there are now two more white squares than black squares. Since dominoes always cover exactly one black and one white square, no tiling can exist.

6 Other ways of pruning the board have been studied, but we will not consider them here.

Enumeration A general formula to determine the number of ways an m n board can be tiled with dominoes is known. For an 2m 2n board the following× formula is due to Temperly and Fisher [TF61] and independently× Kasteleyn [Kas61]

m n mn 2 iπ 2 jπ 4 cos 2m+1 + cos 2n+1 . i=1 j=1 Y Y   The special case of an 8 8 board is already non-trivial: × Theorem (Fischer 1961). There are 24 172 532 = 12, 988, 816 ways to tile the 8 8 board with dominoes. · · × Classification Consider tilings of the 4 4 board with dominoes. For some of these tilings there is a vertical line through× the board that does not cut through any domino. Call such a line a vertical cut. In the same way we define horizontal cuts. As it turns out, for every tiling of the 4 4 board at least one cut exists, possibly several (try this for yourself!). × Hence the set of all tilings can be partitioned into T = T T allows a horizontal cut but no vertical cut , T1 { | } = T T allows a vertical cut but no horizontal cut , T2 { | } = T T allows both a horizontal and a vertical cut . T3 { | } Figure4 shows one tiling for each of these three classes.

Figure 4: Some tilings have horizontal cuts, some have vertical cuts and some have both.

As another example consider the board B consisting of two 4 4 boards joint by two extra squares as shown in Figure5. × In a tiling of B it is impossible that one extra square is covered by a domino containing a square from the left side, while the other extra square is covered by a domino containing a square from the right side: If we place the two dominoes as shown on the right in Figure5, we are left with two disconnected parts of size 15. And since 15 is odd, this cannot work out. Hence the set of all tilings T

7 Figure 5: The board B consisting of two 4 4 boards and two extra squares connecting them as shown. The partial covering× on the right cannot be extended to a tiling. . of B can be partitioned into

= T T matches both extra squares to the left T1 { | } = T T matches both extra squares to the right . T2 { | } Meta Structure We say two tilings are adjacent, if one can be transformed into the other by taking two dominoes lying like this and turning them by 90 degrees so they are lying like this (or vice versa). Call this a turn operation. If we draw all tilings of the 4 4 board and then draw a line between tilings that are adjacent, we get the picture× on the left of Figure6. With this in mind, we can speak about the distance of two tilings, the number of turn operations required to transform one into the other. We could also look for a pair of tilings with maximum distance (this would be an optimization problem). We can even discover a deeper structure, but for this we need to identify different types of turn operations. We can turn into , call this an up-turn, or into , call this a side-turn. The turn can happen on the background or the background (white squares form a falling or rising diagonal). We call an operation a flip if it is an up-turn on or a side-turn on . Call it a flop otherwise. Quite surprisingly, walking upward in the graph in Figure6 always corre- sponds to flips and walking downwards corresponds to flops. This means that there is a natural partial ordering on the tilings: We can say a tiling B is greater than a tiling A if we can get from A to B by a of flips. As it turns out, the order this gives has special properties. It is a so-called distributive , in particular, there is a greatest tiling.

Optimization Different tilings have a different set of decreasing free paths. Such a path (see Figure7) proceeds monotonously from the top left corner of the board along the borders of squares to the bottom right, and does not “cut through” any domino. Some tilings admit more such paths then others. It is conjectured that in an m n board the maximum number is always attained by one of the two tilings where× all dominoes are oriented the same way (see right of Figure7). But no proof has been found yet.

8 Figure 6: On the left: The set of all tilings of the 4 4 board. Two tilings are connected by an edge if they can be transformed× into one another by a single flip. On the right: The same picture, but with the chessboard still drawn beneath the tilings, so you can check that upward edges correspond to flips (as defined in the text).

1 Permutations and Combinations

Some people mockingly say that combinatorics is merely about counting things. In the section at hand we will do little to dispel this prejudice. We assume the reader is familiar with basic and notions such as unions, intersections, Cartesian products and differences of two finite sets.

1.1 Basic Counting Principles We list a few very simple and intuitive observations. Many of them where already used in the introduction. You may think that they are so easy that they do not even deserve a name. Still: If we take counting seriously, we should be aware of simple facts we implicitly use all the time.

9 Figure 7: The green path does not cut through dominoes. We believe that the boring tiling on the right admits the maximum number of such paths, but this has not been proved for large boards.

1.1.1 Addition Principle

We say a finite set S is partitioned into parts S1,...,Sk if the parts are disjoint and their union is S. In other words Si Sj = for i = j and S1 S2 Sk = S. The addition principle claims that in∩ this case∅ 6 ∪ ∪· · ·∪ S = S + S + + S . | | | 1| | 2| ··· | k| Example. Let S be the set of students attending the combinatorics lecture. It can be partitioned into parts S1 and S2 where

S1 = set of students that like easy examples.

S2 = set of students that don’t like easy examples. If S = 22 and S = 8 then we can conclude S = 30. | 1| | 2| | | 1.1.2 Multiplication Principle

If S is a finite set S that is the product of S1,...,Sk, i.e. S = S1 S2 ... Sk, then the multiplication principle claims that × × S = S S S . | | | 1| × | 2| × · · · × | m| Example. Consider the license plates in Karlsruhe. They have the form

KA - l1l2 n1n2n3 where l , l are letters from A, . . . , Z and n , n , n are digits from 0,..., 9 . 1 2 { } 1 2 3 { } However, l2 and n3 may be omitted. We count the number of possibilities for each position:

KA- l1 l2 n1 n2 n3

1↑ 1↑ 1↑ 26 ↑ 27 ↑ 10 ↑ 10 ↑ 11 ↑ Here we modeled the case that l is omitted by allowing an additional value 2 ⊥ for l2 that stands for “omitted”. The same holds for n3. Formally we could say that the set of all valid license plates is given as the product: K A - A, . . . , Z A, . . . , Z, 0,..., 9 0,..., 9 0,..., 9, . { }×{ }×{ }×{ }×{ ⊥}×{ }×{ }×{ ⊥} By the Multiplication Principle the size of this set is given as 1 1 1 26 27 10 10 11 = 772200. · · · · · · ·

10 1.1.3 Subtraction Principle Let S be a subset of a finite set T . We define S := T S, the of S in T . Then the substraction principle claims that \

S = T S . | | | | − | | Example. If T is the set of students studying at KIT and S the set of students studying neither math nor . If we know T = 23905 and S = 20178, then we can compute the number S of students| | studying either math| | or computer science: | |

S = T S = 23905 20178 = 3727. | | | | − | | − 1.1.4 Bijection Principle If S and T are finite sets, then the bijection principle claims that

S = T there exists a bijection between S and T. | | | | ⇔ Example. Let S be the set of students attending the lecture and T the set of homework submissions for the first problem sheet. If the number of students and the number of submissions coincide, then there is a a bijection between students and submissions1 and vice versa. We now consider two other principles that are similarly intuitive and natural but will frequently and explicitly occur as patterns in proofs.

1.1.5 Pigeonhole Principle

Let S1,...,Sm be finite sets that are pairwise disjoint and S1 + S2 + ... + S = n. Then the pigeonhole principle claims that | | | | | m| n n i 1, . . . , m : S and j 1, . . . , m : S . ∃ ∈ { } | i| ≥ m ∃ ∈ { } | j| ≤ m l m j k Example. Assume there are 5 holes in the wall where pigeons nest. Say there is a set Si of pigeons nesting in hole i. Assume there are n = 17 pigeons in total. Then we know: There is some hole with at least 17/5 = 4 pigeons. • d e There is some hole with at most 17/5 = 3 pigeons. • b c 1.1.6 Double counting If we count the same quantity in two different ways, then this gives us a (perhaps non-trivial) identity. This method often helps get control on overcounting. Let A and B be finite sets and consider a set S A B. Then the method of double counting claims that ⊆ ×

b B (a, b) S = S = a A (a, b) S . |{ ∈ | ∈ }| | | |{ ∈ | ∈ }| a A b B X∈ X∈ 1Whether a “natural” correspondence can easily be determined is an entirely different matter: That would require that you cleanly write your name onto your submission.

11 Example (Handshaking Lemma). Assume there are n people at a party and everybody will shake hands with everybody else. What is the total number N of handshakes that occur? If we just sum up for each guest each of its handshakes, then we overcount each handshake. In order to control this overcount we shall instead count the number M of pairs (P,S) where P is a guest and S is a handshake involving P . We count this number in two ways: First way: There are n guests and everybody shakes n 1 hands. So − M = (P,S) S handshake with P = (n 1) = n (n 1). |{ | }| − · − P Xguest P Xguest Second way: Every handshake involves two guests. So

M = (P,S) P involved in S = 2 = 2 N. |{ | }| · S handshakeX S handshakeX Together, we see that n(n 1) = M = 2N and hence we obtain the identity − n (n 1) N = · − . 2 Example. In the example above we have seen a first way how two count the number N (already utilizing double counting). Here we count N in a second way. Label the guests from 1 to n. To avoid counting a handshake twice, we count for guest i only the handshakes with guests of smaller . Then the total number of handshakes is

n n 1 n 1 − − (i 1) = i = i. − i=1 i=0 i=1 X X X Hence we obtain another identity

n 1 − n (n 1) i = N = · − . 2 i=1 X 1.2 Ordered Arrangements – Strings, Maps and Products Define [n] := 1, . . . , n to be the set of the first n natural number and let X be a finite set. { } An ordered arrangement of n elements of X is a map s :[n] X. Ordered arrangements are so common that they occur in many situations→ and are known under different names. We might be inclined to say that “Banana” is a string (or word) and that “0815422372” is a sequence even though they are both ordered arrangements of characters. Depending on the circumstances different notation may be in order as well. We will introduce the most common perspectives on what an ordered ar- rangement is and the accompanying names and notation. Throughout, we use the example X = , , , , and the ordered arrangement s of n = 7 ele- ments given as the{ table: }

12 i 1 2 3 4 5 6 7 s(i) : We call [n] the domain of s and s(i) the of i under s (i [n]). The set x X s(i) = x for some i [n] is the range of s. ∈ { ∈ | ∈ } In the example, the domain of s is 1, 2, 3, 4, 5, 6, 7 , the image of 3 is and the range of s is , , , . Note{ that is not} in the range of s. { } string: We call s an X-string of length n and write s = s(1)s(2)s(3) s(n). ··· The i-th position (or character) in s is denoted by si = s(i). The set X is an alphabet and its elements are letters. Often s is called a word. In the example, we would say that s = is a string (or word) of length n over the five-letter alphabet X. The fourth character of s is s4 = . : We can view s as an of the n-fold Cartesian product X1 ... X , where X = X for i [n]. We call s a tuple and write× it × n i ∈ as (s1, s2, . . . , sn). The element si is called the i-th coordinate (i [n]). Viewing arrangements as elements of products makes it easy to restrict∈ the number of allowed values for a particular coordinate (just choose Xi ( X). In the example we would write s = ( , , , , , , ). Its first coordinate is . sequence: If X is a set of numbers, then s is often called a sequence and si is the i-th position, term or element in the sequence. We would not use this terminology in the example above but could have, for instance, s(i) := 3i + 2/5. In that case the fourth term is equal to 12.4.

In the following we will mostly view arrangements as functions but will freely switch perspective when appropriate.

1.2.1 Permutations The most important ordered arrangements are those in which the mapping is injective, i.e., s(i) = s(j) for i = j. 6 6 Definition 1.1 (). Let X be a finite set. We define permutation: A permutation of X is a bijective map π :[n] X. Usually we → choose X = [n] and denote the set of all permutations of [n] by Sn. We tend to write permutations as strings if n < 10, take for example π = 2713546 by which we mean the function:

i 1 2 3 4 5 6 7 π(i) 2 7 1 3 5 4 6 k-Permutation: For 1 k X a k-permutation of X is an ordered arrange- ment of k distinct≤ elements≤ | | of X, i.e. an injective map π :[k] X. The set of all k-permutations of X = [n] is denoted by P (n, k). In→ particular we have Sn = P (n, n).

13 Circular k-Permutation: We say that two k-permutations π1, π2 P (n, k) are circular equivalents if there exists a shift s [k] such that the∈ following implication holds: ∈

i + s j (mod k) π (i) = π (j) ≡ ⇒ 1 2 This partitions P (n, k) into equivalence classes. A class is called a circular k-permutation. The set of all circular k-permutations is denoted by Pc(n, k).

Take for example π1 = 76123, π2 = 12376 and π3 = 32167. Then π1 and π2

6 2 2 1 3 1

π1 7 π2 1 π3 3 2 7 6 3 6 7

Figure 8: of the 5-permutations π1, π2 and π3 where values are arranged in a circular fashion (the first value being on the right of the cycle). Note that π1 and π2 can be transformed into one another by “turning”. They are therefore circular equivalents. Flipping is not allowed however, so π3 is not equivalent to π1 and π2.

are circular equivalents as witnessed by the shift s = 3. They are therefore representatives of the same circular 5-permutation (with elements from [7]). π3 belongs to a different circular 5-permutation. Consider Figure8 for a visualization. We now count the number of (circular) k-permutations. For this we need the notation of : n! := 1 2 n. · ····· Theorem 1.2. For any natural numbers 1 k n we have ≤ ≤ n! (i) P (n, k) = n (n 1) ... (n k + 1) = (n k)! . | | · − · · − − n! (ii) Pc(n, k) = k (n k)! . | | · − Proof. (i) A permutation is an injective map π :[k] [n]. We count the number of ways to pick such a map, picking the→ images one after the other. There are n ways to choose π(1). Given a value for π(1), there are n 1 ways to choose π(2) (we may not choose π(1) again). Continuing like− this, there are n i + 1 ways to pick π(i) and the last value we pick is π(k) with n k + 1 possibilities.− Every k-permutation can be constructed like this in exactly− one way. The total number of k-permutations is therefore given as the product: n! P (n, k) = n (n 1) ... (n k + 1) = . | | · − · · − (n k)! − (ii) We count P (n, k) in two ways:

14 n! First way: P (n, k) = (n k)! which we proved in (i). | | − Second way: P (n, k) = P (n, k) k because every in | | | c | · Pc(n, k) contains k permutations from P (n, k) (since there are k ways to rotate a k-permutation). n! From this we get (n k)! = Pc(n, k) k which implies the claim. − | | · 1.3 Unordered Arrangements – Combinations, Subsets and Multisets Let X be a finite set and k N. An unordered arrangement of k elements of X is a S of size k with∈ elements from X. Take for example X = , , , , , then an unordered arrangement of 7 elements could be S = , {, , , , , } . Order in sets and multisets does not matter, so we could write{ the same multiset} as S = , , , , , , . However, we prefer the following notation, where every{ type x X }occurring ∈ in S is given only once and accompanied by its repetition number rx that cap- tures how often the type occurs in the multiset. The example above is written as: S = 2 , 1 , 3 , 1 and we would say the type has repetition number 2,{ the· type· has· repetition· } number 1 and so on. We write S, r = 2, S, r = 1 and so on. ∈ ∈ The difference between ordered and unordered arrangements is that ordered arrangements are selections of elements of X that are done one at a time, while unordered arrangements are selections of objects done all at the same time. A typical example for unordered arrangements would be shopping lists: You may need three bananas and two pears, but this is the same as needing a pear, three bananas and another pear. In a sense, you need everything at once with no temporal ordering. This is in contrast to something like a telephone number (which is an ordered arrangement) where dialing a 5 first and a 9 later is differ- ent from dialing 9 first and 5 later.

As with ordered arrangements, the most important case for unordered ar- rangements is that all repetition numbers are 1, i.e. rx = 1 for all x S. Then S is simply a subset of X, denoted by S X. ∈ ⊆ Definition 1.3. k-: Let X be a finite set. A k-combination of X is an unordered arrangement of k distinct elements from X. We prefer the more standard term subset and use “combination” only when we want to emphasize the selection process. The set of all k-subsets of X is denoted by X and if X = n then we denote k | |  n X := . k k    

k-Combination of a Multiset: Let X be a finite set of types and let M be a finite multiset with types in X and repetition numbers r1, . . . , r X .A k-combination of M is a multiset with types in X and repetition numbers| | X s1, . . . , s X such that si ri, 1 i X , and i|=1| si = k. | | ≤ ≤ ≤ | | If for example M = 2 , 1 , 3 , 1 , then T = 1 , 2 is a { · · · · } P { · · } 3-combination of the multiset M, but T 0 = 3 , 0 is not. { · · }

15 k-Permutation of a Multiset: Let M be a finite multiset with set of types X. A k-permutation of M is an ordered arrangement of k elements of M where different orderings of elements of the same type are not distinguished. This is a an ordered multiset with types in X and repetition numbers X s1, . . . , s X such that si ri, 1 i X , and i|=1| si = k. | | ≤ ≤ ≤ | | Note that there might be several elements ofP the same type compared to a permutation of a set (where each repetition number equals 1). If for example M = 2 , 1 , 3 , 1 , then T = ( , , , ) is a 4-permutation of the multiset{M· . · · · }

Theorem 1.4. For 0 k n, we have ≤ ≤ n n n! P (n, k) = k! and therefore = . k · k k!(n k)!     − Proof. To build an element from P (n, k), we first choose k elements from [n], n n by our definition of k , there are exactly k ways to do so. Then choose an order of the k elements, there are P (k, k) = k! = k! ways to do so.  0! Every element from P (n, k) can| be constructed| like this in exactly one way n so P (n, k) = k k! which proves the claim. |Using Theorem| · 1.2(i) now gives the identity on the right as well.  n The numbers k for 0 k n are called binomial coefficients because these are the coefficients when expanding≤ ≤ the n-th power of a sum of two variables:  n n n k n k (x + y) = x y − . k kX=0   1.4 Multinomial Coefficients The binomial formula can be generalized for more than two variables.

Definition 1.5. For non-negative k1, . . . , kr with k1 + ... + kr = n the multinomial coefficients n are the coefficients when expanding the n-th k1,...,kr power of a sum of r variables. In other words, define them to be the unique  numbers satisfying the following identity:

n n k1 k2 kr (x1 + ... + xr) = x1 x2 . . . xr . k1, . . . , kr k1,...,kr   k1+...X+kr =n

Example. Verify by multiplying out:

(x + x + x )4 = 1 (x4 + x4 + x4) 1 2 3 · 1 2 3 + 4 (x3x + x3x + x x3 + x3x + x x3 + x x3) · 1 2 1 3 1 2 2 3 1 3 2 3 + 6 (x2x2 + x2x2 + x2x2) · 1 2 1 3 2 3 + 12 (x2x x + x x2x + x x x2). · 1 2 3 1 2 3 1 2 3 2 2 2 The coefficient in front of x1x2x3 is 12, the coefficient in front of x1x3 is 6 and 4 the coefficient in front of x3 is 1.

16 According to the last definition this means:

4 4 4 = 12, = 6, = 1. 2, 1, 1 2, 0, 2 0, 0, 4       Theorem 1.6 (). For non-negative integers k1, . . . , kr with k1 + ... + kr = n we have:

n n n k1 n k1 ... kr 1 n! = − ... − − − − = . k , . . . , k k · k · · k k ! k ! ... k !  1 r  1  2   r  1 · 2 · · r n Proof. The naive way to multiply out (x1 + ... + xr) would be:

r r r n (x1 + ... + xr) = ... xi1 xi2 . . . xin . i =1 i =1 i =1 X1 X2 Xn n signs

The x x . . . x is equal| to{zxk1 . . . x}kr if from the indices i , . . . , i i1 i2 in 1 r { 1 n} exactly k1 are equal to 1, k2 equal to 2 and so on. We count the number of assignments of values to the indices i1, . . . , in satisfying this. { }

Choose k indices to be equal to 1: There are n ways to do so. 1 k1

Choose k2 indices to be equal to 2: There are n k1 indices left to choose n k1 − from, so there are − ways to choose k of them. k2 2

Choose kj indices to be equal to j (j [r]): There n k1 ... kj 1 − ∈ n k1 ... kj−1 − − − indices left to choose from, so there are − − − ways to choose k kj j of them. 

k1 k2 kr Hence the multinomial coefficient (i.e. the coefficient of x1 x2 . . . xr ) is

n n n k1 n k1 k2 ... kr 1 = − ... − − − − − k , . . . , k k · k · · k  1 r  1  2   r  and the first identity is proved. Now use Theorem 1.4 to rewrite the binomial coefficients and obtain

n! (n k1)! (n k1 ... kr 1)! − ... − − − − k !(n k )! · k !(n k k )! · · k !(n k ... k )! 1 − 1 2 − 1 − 2 r − 1 − − r Conveniently, many of the terms cancel out, like this:  (((  ((( n! (n k1)! (n((k1( ... kr 1)!  − (((( ... − − − − . k1!(n k1)! · k2!(n((k1 k2)! · · kr!(n k1 ... kr)! − − − − − − Also, (n k ... k )! = (n n)! = 0! = 1 so we end up with: − 1 − − r − n! . k ! k ! ... k ! 1 · 2 · · r 5 5! 5 4 3 2 1 Example. = = · ·£·£·£ = 10. 3,0,2 3!0!2! 3 2 1 2 1 £·£·£· ·  17 Note that the last Theorem establishes that binomial coefficients are special cases of multinomial coefficients. We have for 0 k n: ≤ ≤ n n! n = = . k k!(n k)! k, n k   −  −  We extend the definition of binomial and multinomial coefficients by setting n = 0 if k = 1 for some i, and n = n = 0. This makes stating k1,...,kr i 1 n+1 the following lemma− more convenient. −    Lemma 1.7 (Pascal’s Formula). If n 1 and 0 k n, we have ≥ ≤ ≤ n n 1 n 1 = − + − . k k k 1      −  More generally, for n 1 and k , . . . , k 0 with k + ... + k = n, we have ≥ 1 r ≥ 1 r n r n 1 = − . k , . . . , k k , . . . , k 1, . . . , k 1 r i=1 1 i r   X  −  Proof. Note first, that in the case of binomial coefficients, the claim can be rewritten as: n n 1 n 1 = − + − k, n k k, n k 1 k 1, n k  −   − −   − −  so it really is a special case of the second claim, which we prove now. By definition of multinomial coefficients we have the identity:

n n k1 kr (x1 + ... + xr) = x1 . . . xr . k1, . . . , kr k1,...,kr 0 ≥   k1+...X+kr =n

n Exploring a different way to expand (x1 + ... + xr) , we obtain:

n n 1 (x + ... + x ) = (x + ... + x ) (x + ... + x ) − 1 r 1 r · 1 r 0 0 n 1 k1 kr = (x1 + ... + xr) − x1 . . . xr · 0 0 k0 , . . . , kr0 k ,...,k 0  1  1 r ≥ k0 +...X+k0 =n 1 1 r − r 0 0 0 n 1 k1 ki+1 kr = − x1 . . . xi . . . xr . 0 0 k0 , . . . , kr0 i=1 k ,...,k 0  1  1 r ≥ X k0 +...X+k0 =n 1 1 r −

By substituting indices as k := k0 + 1 and k := k0 (for j = i) this is equal to: i i j j 6 r n n 1 k1 kr (x1 + ... + xr) = − x1 . . . xr . k1, . . . , ki 1 . . . , kr i=1 k1,...,kr 0 ≥  −  X kiX1 0 k1+...−+≥kr =n

18 Note that the condition k 1 0 is not needed, because for the summands i − ≥ with ki 1 = 1 we defined the coefficient under the sum to be 0. We remove it and swap− the− summation signs, ending up with:

r n n 1 k1 kr (x1 + ... + xr) = − x1 . . . xr k1, . . . , ki 1 . . . , kr k1,...,kr 0 i=1 ≥  −  k1+...X+kr =n X

n Now we have two ways to write the coefficients of (x1 + ... + xr) and therefore the identity:

n r n 1 = − k , . . . , k k , . . . , k 1 . . . , k 1 r i=1 1 i r   X  −  as claimed. You may already know Pascal’s Triangle. It is a way to arrange binomial coefficients in the as shown in Figure9. It is possible to do something similar in the case of multinomial coefficients, however, when drawing all coef- ficients of the form n , the drawing will be r-dimensional. For r = 3, a k1...,kr “Pascal Pyramid” is given in Figure 10.  Now that we know what multinomial coefficients are and how to compute them, it is time to see how they can help us count things. Example. How many 6-permutations are there of the multiset , , , , , ? Trying to list them all (( , , , , , ), ( , , , , , ), ), ( ,{ , , , , ),...}) would be tedious. Well, there are 6 possible positions for . Then there are 5 positions left, 2 of which need to contain . The three need to go to the three remaining positions, so there is just one choice left for them. This means there 5 6 5 3 6 are 6 2 1 = 60 arrangements in total. This is equal to 1 2 3 = 1,2,3 , which· is no· coincidence:      Theorem 1.8. Let S be a finite multiset with k different types and repetition numbers r1, r2, . . . rk. Let the size of S be n = r1 + r2 + + rk. Then the number of n-permutations of S equals ···

n . r , . . . , r  1 k

Proof. Label the k types as a1, . . . , ak. In an n-permutation there are n positions that need to be assigned a type. First choose the r1 positions for the first type, there are n ways to do so. Then assign r positions for the second type, out r1 2 n r1 of the n r1 positions that are still left to choose. That amounts to −  r2 choices. Continuing− like this, the total number of choices will be: 

n n r1 n r1 r2 ... rk 1 Thm 1.6 n − ... − − − − − = . r · r · · r r , r , . . . , r  1  2   k   1 2 k

n n We have seen before how the coefficient k = k,n k counts the number of k-combinations of [n] (i.e. number of subsets of [n]).− Now we learned, that  

19 = 0 k

n = 0 1 = 1 k

n = 1 1 1 = 2 k

n = 2 1 2 1 = 3 k

n = 3 1 3 3 1 = 4 k

n = 4 1 4 6 4 1 = 5 k

n = 5 1 5 10 10 5 1 = 6 k

n = 6 1 6 15 20 15 6 1

n Figure 9: Arrangement of the binomial coefficients k . Lemma 1.7 shows that n n 1 n 1 the number is obtained as the sum of the two numbers − and − k  k 1 k directly above it. −   

1 1 1 1 1 2 1 2 1 3 2 1 3 1 3 4 6 1 6 3 1 4 3 4 5 12 3 1 5 6 1 10 12 20 12 10 4 30 6 5 10 4 20 1 30 4 5 10 1 30 20 10 5 10 5 1

Figure 10: Arrangement of the numbers n with k + k + k = n in a k1,k2,k3 1 2 3 pyramid. To make the picture less messy, the numbers in the back (with large  k3) are faded out. The three flanks of the pyramid are pascal triangles, one of which is the black triangle in the front with numbers n . The first number k1,k2,0 3 not on one of the flanks is 1,1,1 = 6.  

20 it also counts the number of n-permutations of a multiset with two types and repetition number k and n k. How come ordered and unordered things are counted by the same number?− We demystify this by finding a natural bijection: Consider the multiset M := k X, (n k) X with types X and X (the “chosen” type and the “unchosen”{ type).· Now− associate· } with an n-permutation of M the set of positions that contain X, for instance with n = 5, k = 2 and the permutation ( , X, X, , X), the corresponding set would be 1, 4 [5] X X { } ⊂ (since the first and fourth position received X). It is easy to see that every n-permutation of M corresponds to a unique k element subset of [n] and vice versa. Note that so far we only considered n-permutations of multisets of size n. What about general r-permutations of multisets of size n? For example, the number of 2 permutations of the multiset , , , , is 7, since there is: { } , , , , , , .

Note that and is not possible, since we have only of one copy of and at our disposal. The weird number 7 already suggests that general r- permutations of n element multisets may not be as easy to count. Indeed, there is no simple formula as in Theorem 1.8 but we will see an answer using the principle of inclusion and exclusion later. There is a special case other than r = n that we can handle, though: If all repetition numbers ri of a multiset with k types are bigger or equal than r, for instance when considering 2-permutations of M := , , , , , , , then those repetition numbers do not actually impose a restriction,{ since we} will never run out of copies of any type. We would sometimes sloppily write M = , , where the infinity indicates that there are “many” copies{∞· of the∞· corresponding∞· } elements. The number of r-permutations of M is then equal to kr, just choose one type of the k types for each of the r positions. After permutations of multisets, we now consider combinations. Example. Say you are told to bring two pieces of fruit from the supermarket and they got , and (large quantities of each). How many choices do you have? Well, there is: , , , , , , , , , , , , so six combinations. Note that{ bringing} { a }and{ an } {is the} same{ as} bringing{ } an and a (your selection is not ordered), so this option is counted only once. We now determine the number of combinations for arbitrary number of types and number of elements to choose.

Theorem 1.9. Let r, k N and let S be a multiset with k types and large repe- ∈ tition numbers (each r1, . . . , rk is at least r), then the number of r-combinations of S equals k + r 1 − . r   Proof. For clarity, we do the proof alongside an example. Let the types be a1, a2, . . . , ak, for instance k = 4 and a1 = , a2 = , a3 = , a4 = . Then imagine the r-combinations laid-out linearly, first all elements of type a1 then all of type a2 and so on. In our example this could be

for the combination 2 , 2 , 1 . { · · · }

21 Now for each i [k 1], draw a delimiter between types ai and ai+1, in our example: ∈ − .

Note that, since we have no elements of type a2 = , there are two delimiters directly after one another. Given these delimiters, drawing the elements is actually redundant, just replacing every element with “ ” yields: • . • • • • • The original k-combination can be reconstructed from this sequence of and : Just replace every with a until the first occurrence of , then use a until the• • 1 2 next , then a3 and so on. So every (r+k 1)-permutation of T := (k 1) , r corresponds to an r-combination of S and− vice versa. { − · ·•} We know how to count the former, by Theorem 1.8 the number of (r+k 1)- r+k 1 r+k 1 − permutations of T is r,k −1 = r− . − Counting r-combinations  of multisets where repetition numbers r1, . . . , rk may be smaller than r is more difficult. We will see later how the inclusion- exclusion principle provides an answer in this case.

1.5 The Twelvefold Way – Balls in Boxes The most classic combinatorial problem concerns counting arrangements of n balls in k boxes. There are four cases: U L, L U, L L, U U, for arrangements of Labeled or Unlabeled→ balls →in Labeled→ or Unlabeled→ boxes. Here “labeled” means distinguishable and “unlabeled”−→ means indistin- guishable. These four cases become twelve cases if we consider the following sub- variants:

(i) No box may contain more than one ball. Example: When assigning stu- dents to topics in a seminar, there may be at most one student per topic. (ii) Each box must contain at least one ball. Example: When you want to get 10 people and 5 cars to Berlin, you have some flexibility in distributing the people to cars, but a car cannot drive on its own.

(iii) No restriction. A summary of the results of this section is given in the end in Table1. Instead of counting the ways balls can be arranged in boxes, some people count ways that balls can being put into the boxes or being picked from the boxes, but this does not actually make a difference. We now systematically examine all 12 cases.

1.5.1 U L: n Unlabeled Balls in k Labeled Boxes → Example 1.10. Torsten decided there should always be k = 30 points to a worksheet and n = 5 problems per worksheet. Stefan believes some problems are easier than others and deserve more points. He wonders how many ways there are to distribute the points to the problems. In this setting, balls correspond

22 to points (points are not labeled, they are just points) and boxes correspond to problems (they are labeled: There is problem 1,2,3,4 and 5). In other words, we search for solutions to the equation

30 = x1 + x2 + x3 + x4 + x5 with non-negative integers x1, . . . , x5, for example: 30 = 8 + 4 + 5 + 5 + 7 or 30 = 10 + 5 + 8 + 4 + 3. The students say that they would like it if some of the problems are “bonus problems” worth zero points, so the partition 30 = 10 + 10 + 10 + 0 + 0 should be permissible. Torsten remains unconvinced. We come back to this later and examine three cases for general n and k in the balls-and-boxes formulation: 1 ball per box Of course, this is only possible if there are at most as many ≤ balls as boxes (n k). For n = 2 and k = 5 one arrangement would be: ≤

1 2 3 4 5

Each of the k boxes can have two states: Occupied (one ball in it) or empty (no ball in it) and exactly n boxes are occupied. The number of k ways to choose these occupied boxes is n . 1 ball per box Of course, this is only possible  if there are at least as many ≥ balls as boxes (n k). For example for n = 9 and k = 5: ≥

1 2 3 4 5

To count the number of ways to do this, arrange the balls linearly, like this:

and choose k 1 out of n 1 gaps between the balls that correspond to the beginning− of a new box.− In the example above this would look like this:

There is a bijection between the arrangements with no empty box and the choices of n 1 gaps for delimiters out of k 1 gaps in total. We know − n 1 − how to count the latter: There are k−1 possibilities. − 29 Going back to the example, we now know there are 5 solutions to the equation:  x1 + x2 + x3 + x4 + x5 = 30

where x1, . . . , x5 are positive integers.

23 arbitrary number of balls per box Now boxes are allowed to contain any number of elements, including 0. One example for n = 7 and k = 5 would be:

1 2 3 4 5

We count the number of such arrangements in three different ways:

1. From an arrangement we obtain a string by “reading” it from left to right, writing when we see a ball and when a new box starts. For the example above this yields the string

We see: The permutations of the multiset n , (k 1) corre- spond directly to the arrangements of the{ balls.· Note− the· difference} to the case of non-empty boxes we discussed before. We know how to count the permutations of the multiset, by Theorem n+k 1 1.8 there are k −1 of them. − 2. There is a bijection between: (i) Arrangements of n balls in k boxes (ii) Arrangements of n + k balls in k boxes with no empty box. For a map from (i) to (ii), just add one ball to each box. For the inverse map from (ii) to (i) just remove one ball from each box. n+k 1 We already know how to count (ii), there are k −1 such arrange- ments. −  3. Assume n, k 1.To count the arrangements, first choose the number i of boxes that≥ should be empty (0 i k 1), then choose which k ≤ ≤ − boxes that should be (there are i choices) and then distribute all n balls to the remaining boxes k i boxes such that non of those boxes  is empty (we already know how− to count this). This yields:

k 1 − k n 1 − . i k i 1 i=0 X   − −  This looks different from the other two results, which means we “ac- cidentally” proved the non-trivial identity:

k 1 n + k 1 − k n 1 − = − (n, k 1). k 1 i k i 1 ≥ i=0  −  X   − −  30+6 1 Going back to the example, we now know there are 6 1− solutions to the equation: −  30 = x1 + x2 + x3 + x4 + x5 with non-negative integers, i.e. that many assignments of points to the five exercises on exercise sheets such that the total number of points is 30.

24 Torsten thinks that no exercise should be worth more than 10 points. In the balls and boxes setting this limits the capacity ri of a box i. This makes counting much more difficult but we will see a way to address this in Chapter2 using the principle of inclusion and exclusion. (As for homework problems, it turns out that Jonathan put def points 6 into his latex preamble, which settles the issue.) \ \ { }

1.5.2 L U: n Labeled Balls in k Unlabeled Boxes → Example 1.11. There are n = 25 kids on a soccer field that want to form k = 4 teams. The kids are on different skill levels (e.g. a team of five bad players is quite a different thing than a team of five good players) but the teams are unlabeled (it wouldn’t make a difference if you swap all players of two teams). In how many ways can the n kids form k teams? In other words: In how many ways can the set of kids be partitioned into k parts? Here, kids correspond to balls and teams correspond to boxes. Again, we con- sider three subcases for general k and n. 1 ball per box Of course, this is only possible if there are at most as many ≤ balls as boxes (n k). Each ball will be in its own box. If the balls are labeled with numbers≤ from [n], there will be one box with the ball with label 1, one box with the ball with label 2 and so on. In other words, there is only 1 possibility, for n = 3 and k = 5 the only arrangement looks like this:

1 2 3

1 ball per box Of course, this is only possible if there are at least as many ≥ balls as boxes (n k). ≥ This is the same as the number of partitions of [n] into k non-empty parts II which we also call Stirling Numbers of the second kind and write as sk (n). Some values are easy to determine: sII (0) = 1: There is one way to partition the into non- • 0 empty parts: = X X. Each X is non-empty (because no such X exists).∅ ∈∅ ∈ ∅ II S s0 (n) = 0 (for n 1): There is no way to partition non-empty sets • into zero parts. ≥ II s1 (n) = 1 (for n 1): Every non-empty set X can be partitioned • into one non-empty≥ set in exactly one way: X = X. II 2n 2 n 1 s2 (n) = 2− = 2 − 1 (for n 1): We want to partition [n] into • two non-empty parts. If− we consider≥ the parts labeled (there is a first part and a second part), then choosing the first part fully determines the second and vice versa. Every subset of [n] is allowed to be the first part – except for and [n]. This amounts to 2n 2 possibilities, however, since the parts∅ are actually unlabeled (there− is no “first” or “second”) every possibility is counted twice so we need to divide by 2.

25 II sn (n) = 1: There is only one way to partition [n] into n parts: Every • number gets its own part.

II A formula for sk (n) is given as:

II II II sk (n) = ksk (n 1) + sk 1(n 1). − − − To see this, count the arrangements of the n labeled balls in unlabeled boxes with no empty box as follows:

The ball of label n may have its own box (with no other ball in • it). The number of such arrangements is equal to the number of arrangements of the remaining n 1 balls in k 1 boxes such that − II− none of those k 1 boxes is empty. There are sk 1(n 1) of those. − − − The box with the ball of label n contains another ball. Then, when • removing ball n, there is still at least one ball per box. So removing ball n gets us to an arrangements of n 1 balls in k non-empty boxes. II − There are sk (n 1) of those and for each there are k possibilities where ball n could− have been before removal (note that the boxes are distinguished by the balls that are already in it). So there are k sII (n 1) arrangements where ball n is not alone in a box. · k − The recursion formula follows from summing up the values for the two cases. Note that the Stirling Numbers fulfill a recursion similar to the recursion of binomial coefficients, so there is something similar to the Pascal Triangle. See Figure 11.

= 0 k

n = 0 1 = 1 k

n = 1 0 1 = 2 k

n = 2 0 1 1 = 3 k

n = 3 0 1 3 1 = 4 k

n = 4 0 1 7 6 1 = 5 k

n = 5 0 1 15 25 10 1 = 6 k

n = 6 0 1 31 90 65 15 1

II Figure 11: “Stirling Triangle”. A number sk (n) is obtained as the sum of the II II number sk 1(n 1) toward the top left and k times the number sk (n 1) towards the− top− right. E.g. for n = 6, k = 3: 90 = 15 + 3 25. − ·

26 II We will later see a closed form of sk (n) in Chapter2 using the principle of inclusion and exclusion. arbitrary number of balls per box Empty boxes are allowed. To count all arrangements, first choose the number i of boxes that should be non-empty (0 i k), then count the number of arrangements of n balls into the i boxes≤ such≤ that non of them is empty. This gives a total of:

k II si (n). i=0 X 1.5.3 L L: n Labeled Balls in k Labeled Boxes → Example 1.12. At the end of the semester, each of the n = 70 students of combinatorics will be assigned one of the k = 11 grades from 1.0, 1.3,..., 5.0 . Both Torsten and the students think that the outcome { }

C AB H F G I J K 1.0 1.3 1.7 5.0 ... would be preferable to the outcome

H F G I J K C AB 1.0 1.3 1.7 5.0 ... so the boxes (grades) are clearly labeled (we could not draw the boxes 2.0 till 4.0 due to lack of space) . Furthermore, Alice (A) and Bob (B) insist that they would notice the difference between the arrangement

A E CD F G B 1.0 1.3 1.7 3.0 ...... and the arrangement

B E CD F G A 1.0 1.3 1.7 3.0 ...... so the balls (students) are labeled as well. Such arrangements correspond di- rectly to functions f :[n] [k], mapping each of the n balls to one of the k boxes, or here: Mapping every→ student to their grade. As before, we consider three subcases: 1 ball per box Of course, this is only possible if there are at most as many ≤ balls as boxes (n k). ≤ Such arrangements correspond to injective functions f :[n] [k]. We first choose the image of 1 [n] (there are k possibilities), then→ the image ∈

27 of 2 [n] (there are k 1 possibilities left) and so on. Therefore, the number∈ of injective functions− (and therefore arrangements with at most one ball per box) is:

k! k k (k 1) (k n + 1) = = n!. · − ····· − (k n)! n −   1 ball per box Of course, this is only possible if there are at least as many ≥ balls as boxes (n k). ≥ These arrangements correspond to surjective functions from [n] to [k]. They can also be thought of as partitions of [n] into k non-empty distin- guishable (!) parts. So we count the number of ways to partition [n] into II k non-empty indistinguishable parts (there are sk (n)) and multiply this by the number of ways k! to assign labels to the parts afterwards. So in total, there are II k!sk (n) ways to assign n labeled balls to k non-empty labeled boxes. arbitrary number of balls per box There are k choices for each of the n balls, so kn arrangements in total.

1.5.4 U U: n Unlabeled Balls in k Unlabeled Boxes → Example 1.13. Adria used to play tennis but now wants to sell her old tennis balls (n in total) on the local flee market. She wants to sell them in batches since selling them individually is not worth the trouble. So she packs them into k bags. Some people may want to buy bigger batches than others so she figures it might be good to have batches of varying sizes ready. She wonders how many ways there are to pack the balls into bags. One such arrangement (n = 9, k = 4) could be:

Balls and boxes are unlabeled. Adria cannot distinguish any two balls and can also not distinguish boxes with the same number of balls. Even though boxes have no intrinsic ordering, we need to somehow arrange them on this two-dimensional paper. In order to not accidentally think that and are different, we use the convention of drawing boxes in decreasing order of balls. With this convention, an arrangement will look different on paper if and only if it is actually different in our sense. With this in mind we see the number of arrangements of n unlabeled balls in k unlabeled boxes is equal to the number of ways to partition the n into k non-negative summands. For example:

9 = 4 + 2 + 2 + 1.

Partitions where merely the order of the summands differ are considered the same, so again we use the convention of writing summands in decreasing order.

28 1 ball per box Of course, this is only possible if there are at most as many ≤ balls as boxes (n k). ≤ In that case, there is only one way to do it: Put every ball in its own box. Then there are n boxes with a ball and k n empty boxes. − 1 ball per box Of course, this is only possible if there are at least as many ≥ balls as boxes (n k). ≥ As discussed before, we count the number of ways in which the integer n can be partitioned into exactly k positive parts, i.e.

n = a + a + ... + a , where a a ... a 1. 1 2 k 1 ≥ 2 ≥ ≥ k ≥

This is counted by the partition function, denoted by pk(n). A few values are easy to determine:

p0(n) = 0 (for n 1): No positive number can be partitioned into • zero numbers. ≥

pn(n) = 1: To write n as the sum of n positive numbers, there is • exactly one choice: n = 1 + 1 + ... 1 . ntimes To compute other values, observe the| following{z } recursion:

pk(n) = pk(n k) + pk 1(n 1). − − − To see this, observe the two cases:

Either ak = 1. Then a1 + ... + ak 1 is a partition of n 1 and there • − − are pk 1(n 1) ways for this. − − Or ak 2. This means each ai is at least 2 (for 0 i k). There is • a bijection≥ between: ≤ ≤ (i) Partitions of n into k parts of size at least 2. (ii) Partitions of n k into k parts of size at least 1. − To go from (i) to (ii), just remove 1 from each part and to go back, just add 1 to each part. We already know how to count (ii), there are p (n k) such partitions. k − arbitrary number of balls per box Similar to what we did in the L U case, first decide how many boxes i should be non-empty and then count→ how many arrangements with exactly i non-empty boxes exist:

k pi(n). i=1 X This can be thought of as the number of integer partitions of n into at most k non-zero parts.

29 n balls k boxes 1 per box 1 per box arbitrary ≤ ≥ k n 1 n+k 1 UL n k−1 k −1 − − II k II LU 1s k (n) i=1 si (n) k II n LL n n! sk (n)k! P k k UU 1 pk(n) qk(n) = i=1 pi(n) P Table 1: The twelvefold way.

1.5.5 Summary: The Twelvefold Way The twelve variations of counting arrangements of balls in boxes are called the Twelvefold Way. A summary of our results is given in Table1. We also summarize the interpretations of n Labeled or Unlabeled balls in k Labeled or Unlabeled boxes: Arrangements Correspond to U L non-negati ve Integer solutions of x1 + ... + xk = n. L → U Partitions of the set [n] into k parts. L → L Functions from [n] to [k] U → U Partitions of the number n into k non-negative integers. → 1.6 Binomial Coefficients – Examples and Identities Example 1.14 (Lattice Paths). A monotone lattice paths in the grid L := 0, 1, . . . , m 0, 1, . . . , n is a path starting in the bottom left corner (0, 0) and ending{ in the}×{ top right corner} (m, n), taking single steps upwards or rightwards see Figure 12.

(7,4)

(0,0)

Figure 12: Here m = 7 and n = 4. The lattice path is shown. →↑→↑→→→↑→→↑

In other words, a lattice path is a sequence of “ ” (upward steps) and “ ” (rightward steps) with a total of n times and m times↑ . In yet other words,→ the lattice paths correspond to permutations↑ of the multiset→ m , n and by Theorem 1.8 their number is { · → · ↑} m + n . m  

30 Example (Cake Number). For positive integers m and n the cake number c(m, n) is the maximum number of pieces obtained when cutting an m-dimensional cake by n cuts, i.e. the maximum number of connected components of Rm n \ i=1 Hi, where Hi is an (m 1)-dimensional affine hyperplane. For m = 2, this simply means: Put n lines into− the plane and observe into how many pieces the planeS is cut. See Figure 13 for an example.

2 3 1 10 4 11 8 9 6 7 5

Figure 13: The plane (m = 2) is cut by n = 4 lines. In this particular configu- ration, this gives 11 pieces. No configuration with this n and m can achieve a larger number of pieces.

It turns out that m n c(m, n) = . i i=0 X   n Here i is defined to be zero for i > n. For instance  4 4 4 c(2, 4) = + + = 1 + 4 + 6 = 11, 0 1 2       so the example given in Figure 13 is tight.

The cake numbers arise in a seemingly unrelated setting: For m 3, consider m 1 ≥ any finite set X of n points in R − in general position (no m points on a hyperplane, no m + 1 on a (m 2)-dimensional sphere). Then the number of subsets Y of X that can be separated− from X Y by a (m 2)-dimensional sphere is equal to c(m, n). \ − For m = 3 this means: Consider n points in the plane where no 3 points are on a line and no 4 points on a circle2. Then the number of subsets Y of X that can be captured by a circle is equal to c(3, n). Consider Figure 14 for an example. We count all sets that can be captured there. size 0: The empty set. size 1: All 5 single element sets. size 2: All except for b, e and a, c , so 5 2 = 8. { } { } 2 − size 3: a, b, c , a, b, d , a, b, e , a, d, e , b, c, d , b, d, e , c, d, e , so 7 of them. { } { } { } { } { } { } { } size 4: All except for a, b, c, e , so 4 of them. { } 2These assumptions are very week: If you pick random points, say by throwing a bunch of darts onto the plane, the that the points will be in general position is 1.

31 e

d a c b

Figure 14: There are n = 5 points in the plane (m = 3). Some subsets can be captured by a circle (the pictures shows: a, d , b, d, e , c, d, e , c , ), some cannot, for example b, c, e (such a circle{ would} { also contain} { d).} { } ∅ { } size 5: The set a, b, c, d, e . { } That is 1 + 5 + 8 + 7 + 4 + 1 = 26 sets in total, exactly the cake number 5 5 5 5 c(3, 5) = 0 + 1 + 2 + 3 = 1 + 5 + 10 + 10 = 26. There is a seemingly  infinite  number of useful identities involving binomial co- efficients, we will show a few of them. Theorem 1.15 (Proofs by Picture). We have: n (n+1)2 (n+1) n+1 (i) k = 2− = 2 . k=1 Pn  (ii) (2k 1) = n2. k=1 − Proof.P (i) For the first identity, arrange 1+2+...+n dots in a triangle, mirror it, and observe that this fills all positions of a square of size (n+1) (n+1) except for the diagonal. Here is a picture for n = 6: ×

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • −→ • • • • • • • • • • • • • • • • • • • • • • • • • • • n Therefore 2 k = (n + 1)2 (n + 1) = n(n + 1). Dividing by 2 proves · 1 − the claim. P (ii) Note how tiles of with sizes of the first n odd numbers can be arranged to form a square of size n n, here a picture for n = 5: × 1 3 5 7 9

32 Theorem 1.16 (Proofs by Double Counting). We have n k n n (i) 2 k = 3 . k=0 P  m n + i n + m + 1 (ii) = . i m i=0 X     Proof. (i) We double count strings over the alphabet 0, 1, 2 of length n, in other words, the n permutations of the multiset { 0, } 1, 2 where there is infinite supply of the types 0, 1, 2. {∞ · ∞ · ∞ · } First way: There are three possibilities per character, so 3n possibilities in total. Second way: First choose the number of times k that the letter “0” should be used ( 0 k n). Then choose the positions for those ≤ n ≤ characters, there are k possibilities. Finally, choose for each of the remaining (n k) positions if they should be 1 or 2, there are 2n k  − choices. So in− total, there is this number of possibilities:

n n k n 2 − · k kX=0   By reversing the order of summation, this is equal to

n n n n 2k = 2k . · n k · k kX=0  −  kX=0   This proves the claim. (ii) We double count the lattice paths in the lattice 0, . . . , m 0, . . . , n of width m and height n. { } × { }

n+m First way: We already counted them in Example 1.14. There are m of them.  Second way: Any path must reach the last row (row n) eventually, using exactly one of the “ ”-steps from (k, n 1) to (k, n) where 0 k m. See Figure 15 for a↑ sketch. − ≤ ≤ There is only one way a lattice path can continue from (k, n), namely k+n 1 go rightwards until (m, n). There are k− ways to get from (0, 0) to (k, n 1) though, so the number of lattice paths is: −  m k + n 1 − . k kX=0   n+m m k+n 1 This gives the equality m = k=0 k− . In the claim, we merely replaced n by n + 1.  P  A few other identities can be derived by using the connection of multinomial coefficients to polynomials.

33 (7,4)

(0,0)

Figure 15: At some point, a lattice path must cross the dashed line, i.e. use one of the edges going to the last row. We count the number of paths using the 4+3 highlighted edge from (4, 3) to (4, 4): There are 4 ways to get from (0, 0) to (4, 3) and one way to get from (4, 4) to (7, 4), so 4+3 1 paths in total. 4 ·  Theorem 1.17 (Proofs by Analysis). We show three equations (i), (ii) and (iii), see below. Proof. Start with the binomial formula: n n n i n i (x + y) = x y − . i i=0   Setting y =X 1 yields:

n n (i)( x + 1)n = xi. i i=0   Deriving byXx gives: n n 1 n i 1 n(x + 1) − = i x − . i i=1   Now set x = 1X and get:

n n 1 n (ii) n 2 − = i . · i i=0   Taking theX second of (i) yields: n n 2 n i 2 n(n 1)(x + 1) − = i(i 1) x − . − − i i=2   Again, we set x = 1 X n n 2 n n(n 1)2 − = i(i 1) . − − i i=0   Adding (ii) to thisX gives:

n n 2 2 n (iii) n(n + 1)2 − = i . i i=0 X   1.7 Permutations of Sets Remember that n-permutations of [n] are π :[n] [n]. From now on we will just say “permutation of [n]” and drop the “n-”. →

34 For such a permutation, the pair (i, j) of two numbers from [n] is called an inversion if the numbers are ordered, but swapped by the permutation, i.e.

1 1 (i, j) is inversion i < j π− (i) > π− (j). ⇔ ∧ Take for example the permutation π = 31524. It has the inversions (1, 3), (2, 3), (2, 5), (4, 5). For i [n] define α := j [n] (i, j) is inversion . The inversion sequence ∈ i |{ ∈ | }| of π is the sequence α1α2 . . . αn. The inversion number or disorder of a permutation π is the number of its inversions, so α1 + +αn. For our example the inversion sequence is 1, 2, 0, 1, 0 and its disorder is··· 4. Since for any i [n], there are only n i numbers bigger than i, any inversion ∈ − sequence α1, . . . , αn satisfies:

0 α n i (i [n]). (?) ≤ i ≤ − ∈ There are exactly n! integer satisfying (?)(n choices for α , n 1 1 − choices for α2 . . . ), just as many as there are permutations. This is no coin- cidence: We now show that each permutation has a distinct such sequence as an inversion sequence. By we then also know that each such se- quence is the inversion sequence of some permutation, i.e. there is a one-to-one correspondence. Theorem 1.18. Any permutation can be recovered from its inversion sequence.

Proof. Let α = (α1, . . . , αn) be integers satisfying (?). We will construct a permutation π with this inversion sequence and it will be clear that we will not have any freedom in the construction of π, i.e. π will be the unique permutation with inversion sequence α. In the beginning, π has n free positions still to be determined:

π =

n positions

1 We determine the position containing| 1,{z i.e. we determine} π− (1): Remember that α1 is the number of elements that are bigger than 1 but arranged to the left of 1 by π. Since every element is bigger than 1, α1 is just the number of 1 elements arranged to the left of 1, in other words π− (1) = α1 + 1. Take for instance the inversion sequence α = (5, 3, 4, 0, 2, 1, 0, 0). Then from α1 = 5 we know that 1 is in the sixth position like this:

π = 1

Now recall that α2 is the number of elements bigger than 2 that are sorted to the left of 2, this means, α2 is exactly the number of (currently) unoccupied positions to the left of 2. In the example we have α2 = 3, so π must look like this: π = 2 1 For general: i [n], if we have already placed all number from 1 to i 1, we ∈ 1 − can derive that π− (i) must be the position of the free slot that has exactly αi free slots to the left of it.

35 In the example, since α3 = 4 we put 3 to the unoccupied position that has four unoccupied positions to the left of it:

π = 2 1 3

Continuing this way, we can reconstruct π completely:

π = 4 7 6 2 5 1 3 8 .

1.7.1 Generating Permutations To generate all permutations of [n] use the following .

Start: Write 1, 2, . . . , n with left arrows over them. Step: – Locate the largest mobile integer m, i.e., an integer such that it arrows to a smaller neighbouring number. – Swap m with that neighbour. – Change the direction of the arrows on all integers p, p > m.

Let us illustrate this method with the following example.

←−1 ←−2 ←−3 ←−4 ←−1 ←−2 ←−3 ←−1 ←−2 ←−4 ←−3 ←−1 ←−4 ←−2 ←−3 ←−4 ←−1 ←−2 ←−3

−→4 ←−1 ←−3 ←−2 ←−1 ←−3 ←−2 ←−1 −→4 ←−3 ←−2 ←−1 ←−3 −→4 ←−2 ←−1 ←−3 ←−2 −→4

←−3 ←−1 ←−2 ←−4 ←−3 ←−1 ←−2 ←−3 ←−1 ←−4 ←−2 ←−3 ←−4 ←−1 ←−2 ←−4 ←−3 ←−1 ←−2

−→4 −→3 ←−2 ←−1 −→3 ←−2 ←−1 −→3 −→4 ←−2 ←−1 −→3 ←−2 −→4 ←−1 −→3 ←−2 ←−1 −→4

←−2 −→3 ←−1 ←−4 ←−2 −→3 ←−1 ←−2 −→3 ←−4 ←−1

36 ←−2 ←−4 −→3 ←−1 ←−4 ←−2 −→3 ←−1

−→4 ←−2 ←−1 −→3 ←−2 ←−1 −→3 ←−2 −→4 ←−1 −→3 ←−2 ←−1 −→4 −→3 ←−2 ←−1 −→3 −→4

1.7.2 Cycle Decompositions Remark. The lecture mentioned a puzzle presented on the Youtube Channel MinutePhysics. It may have something to do with cycles (but we do not want to spoil the puzzle for you), feel free to check it out on http://youtube.com/watch?v=eivGlBKlK6M We already know several ways to specify a permutation. Write it as a string: π = 78546132. • Write it explicitly as a map: • i 1 2 3 4 5 6 7 8 π(i) 7 8 5 4 6 1 3 2

Another way to specify the map would be to connect i with π(i), maybe • like this: 1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

In the last drawing every number occurs twice (once as a preimage and • once as an image), now we only take one copy of every number and give a direction to the edges (from preimage to image):

1 7 6 2 8 4 3 5 In other words, we draw an edge from i j if π(i) = j. Since π is a map, every node gets one outgoing edge and since→ π is bijective, every node gets exactly one incoming edge. From this it is easy to see that the image we get is a collection of cycles: When starting at any node i and following the arrows, i.e. walking along the path i π(i) π(π(i)) π(π(π(i))) ..., we must at some point – since there→ are only→ finitely→ many elements→ – come for the first time to an element where we already were. This must be i, since all other nodes already have an incoming edge. So i was indeed in a cycle.

37 Instead of drawing the picture we prefer to just list the cycles like this:

π = (17356)(28)(4)

where cycles are by “()” and elements within a cycle are given in order. The representation at hand is called a disjoint cycle decomposition. Note that every element occurs in exactly one cycle (some cycles may have length 1). Note that this composition is, strictly speaking, not unique, we could also write π = (4)(73561)(28), where we changed the order of two cycles and “rotated” 7 to the beginning of its cycle. In the following, we will not distinguish disjoint cycle decompositions that differ only in these ways. Before we generalize disjoint cycle decompositions to arbitrary cycle decom- positions, we define the product of two permutations: If π , π S are permutations, the product π π (or just π π for short) 1 2 ∈ n 1 · 2 1 2 is the permutation π with π(i) = π2(π1(i)) (i.e. first apply π1, then π2). Note, that when viewed as maps: π π = π π . 1 · 2 2 ◦ 1 Now if we write a single cycle, e.g. π S7, π = (137), we mean the permu- tation that permutes the elements in the∈ cycle along the cycle (here π(1) = 3, π(3) = 7, π(7) = 1) and leaves all other elements in place, in our example π is the permutation with the disjoint cycle decomposition (137)(2)(4)(5)(6). We can now talk about cycle decompositions where elements may occur more than once, e.g. π = (136)(435)(7452)(6)(23). It is the product of the involved cycles ( (136), (435), . . . ). If π Sn is a permutation given as a cycle decomposition, then π can be evaluated∈ by parsing, where parsing an element i [n] in a cycle decomposition means ∈ Have a current element c, initially c = i. • Go through cycles from left to right. • – If the current element c is part of a cycle, replace c with the next element in this cycle. In the end of this process π(i) is the current element c. • Take for instance π = (136)(435)(7452)(6)(23) and i = 3. Then we start with the current element c = 3. The first cycle (136) contains c = 3, so we change c to 6 and go on. The next cycle (435) does not contain c = 6, and neither does (7452) so we move past these cycles without changing c. The next cycle (6) contains c = 6 but does not alter it. We therefore end with π(3) = 6. As another example, consider how π(1) is evaluated:

c=1 c=3 c=5 c=2 c=2 c=3 ↓(136) ↓(435) ↓(7452) ↓(6) ↓(23) ↓ So π(1) = 3.

Theorem 1.19. If π1, π2 Sn are permutations given in cycle decomposition, then a cycle decomposition∈ of π π is given by concatenating the cycle decom- 1 · 2 positions of π1 and π2.

38 Proof. Let the cycle decompositions be π1 = C1C2 ...Ck, π2 = C10 C20 ...Cl0. When parsing C C ...C C0 C0 ...C0 with i [n] then the current element 1 2 k 1 2 l ∈ after C1C2 ...Ck will be π1(i) and therefore, after going through the remaining cycles C10 ...Cl0, the current element will be π2(π1(i)) = (π1 π2)(i). So the concatenation is indeed a cycle decomposition of π π . · 1 · 2 Note that non-disjoint cycle decompositions do not always commute, e.g. (435)(321) = (13542) = (15432) = (321)(435) 6 where the identities can be verified by parsing. The claims of the following theorem are easy and we will skip the proofs: Theorem 1.20. Let π S be given in cycle decomposition. ∈ n Swapping adjacent cycles has no effect if they are disjoint (i.e. no number • occurs in both cycles), e.g. (13)(524) = (524)(13).

1 A cycle decomposition of π− is obtained by swapping the order of all • 1 cycles and reversing the elements in each cycle, e.g. ((321)(435))− = (534)(123). Cyclic shifts in a cycle have no effect. (534) = (345) = (453). • Up to cyclic shifts and order of cycles, there is a unique decomposition • into disjoint cycles.

Let id Sn be the identity permutation ( id(i) = i for i [n]). Define the order of π ∈S to be the smallest k such that ∈ ∈ n πk = π π ... π = id. · · · k times Theorem 1.21. The order of π is| the{z least} common multiple of the lengths of the cycles in the disjoint cycle decomposition of π.

Proof. Assume π is composed of the disjoint cycles C1C2 ...Cm and an element i [n] is contained in a cycle Cj of length l. Then for any k N, the value πk∈(i) is obtained by parsing i through i ∈

C1C2 ...CmC1C2 ...Cm ...C1C2 ...Cm

k copies of C1 ...Cm

Only the copies of C|j can affect the current{z value if we start} with i, so the result is the same as when parsing i through k copies of just Cj. From this we see that πk(i) = i if and only if k is a multiple of l. This shows that πk = id if and only if k is a common multiple of the cycle lengths. So, the order of π is the least common multiple of them.

1.7.3 Transpositions A cycle of length 2 is a transposition. Define discriminant of π as N(π) := n #C where #C is the number of cycles in a disjoint cycle decomposition of π−. Note that we may not omit single element cycles now, they count towards #C: In S5 we have for example N(id) = N((1)(2)(3)(4)(5)) = n 5 = 0 and N((134)(25)) = 5 2 = 3. − −

39 Theorem 1.22. (i) Any permutation can be written as the product of transpositions.

(ii) If π is the product of k transpositions (k N), then N(π) and k have the same (i.e. N(π) k (mod 2)). ∈ ≡ Proof. (i) Let π be any permutation. We already know that we can write π as a product of cycles. So to show that π can be written as a product of transpositions, it suffices to show that any cycle can be written as the product of transpositions.

Let C = (a1a2 . . . al) be a cycle of length l (assume l 2, since cycles of length 1 correspond to identity permutations and can be≥ omitted from any cycle decomposition). Then we can write C as the product of l 1 transpositions: − C = (a1a2 . . . al) = (al 1al)(al 2al 1) ... (a2a3)(a1a2) − − − Verify this by parsing: If i = l, then parsing a yields: 6 i c=ai c=ai c=ai c=ai c=ai+1 c=ai+1 c=ai+1 c=ai+1 c=ai+1 ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ (al 1al) (al 2al 1) ... (aiai+1) (ai 1ai) ... (a2a3) (a1a2) − − − −

So parsing sends ai to ai+1 as desired. When parsing al we have:

c=al c=al−1 c=al−2 c=a3 c=a2 c=a1 ↓ ↓ ↓ ↓ ↓ ↓ (al 1al) (al 2al 1) ... (a2a3) (a1a2) − − − so we get a1 as desired. (ii) Let π be any permutation with k cycles in a disjoint cycle decomposition. Let (ab) be any transposition. We determine the number of cycles in a disjoint cycle decomposition of (ab) π depending on a and b. · Case 1: a and b are part of different cycles in π. Then there is a dis- joint cycle decomposition of π looking like this:

π = (aX)(bY )C3 ...Ck

where X and Y are sequences of elements and C3,..., Ck are cycles. By parsing we verify: (ab) π = (ab)(aX)(bY )C ...C = (aY bX)C ...C . · 3 k 3 k The last term is a disjoint cycle decomposition of (ab) π into k 1 cycles. · − Case 2: a and b are part of the same cycle in π. Then there is a dis- joint cycle decomposition of π looking like this:

π = (aXbY )C2 ...Ck

where X and Y are sequences of elements and C2,..., Ck are cycles. By parsing we verify: (ab) π = (ab)(aXbY )C ...C = (aY )(bX)C ...C . · 2 k 2 k The last term is a disjoint cycle decomposition of (ab) π into k + 1 cycles. ·

40 In both cases, adding a transposition changed the number of cycles by one, which means that the parity of the number of cycles changed. Therefore, the claim follows by induction.

I We call the number sk(n) of n-permutations of [n] with exactly k cycles in a disjoint cycle decomposition the unsigned of first kind. More precisely sI (n) = π S N(π) = n k . k |{ ∈ n | − }| Theorem 1.23. For all n, k 1 ≥ I I I (i) s0(0) = 1 and sk(0) = s0(n) = 0, I I I (ii) sk(n) = (n 1)sk(n 1) + sk 1(n 1). − − − − Proof. (i) The empty permutation is the only permutation of 0 elements. I I Thus s0(0) = 1 and sk(0) = 0, since the empty permutation has no cycle and k 1. ≥ On the other hand every permutation of n 1 elements has at least one I ≥ cycle and hence s0(n) = 0. k (ii) Let Sn Sn denote the set of permutations in Sn with exactly k cycles ⊂ k I in a disjoint cycle decomposition. Then Sn = sk(n). We define a map k k k 1 | | k φ : Sn Sn 1 Sn−1 as follows. Consider a permutation π Sn and let (AnB→) denote− ∪ the− cycle containing n (A and B may be empty).∈ If (AnB) = (n), remove the entire cycle (n). Otherwise replace the cycle (AnB) with (AB). In the first case there are k 1 cycles in a disjoint cycle decomposition of φ(π). In the latter case φ(π−) still has k cycles in a disjoint cycle decomposition. k k 1 Consider a disjoint cycle decomposition of some σ Sn 1 Sn−1. We count how many permutations π Sk are mapped to∈σ. − ∪ − ∈ n k 1 k Case 1: σ Sn−1. Then the only permutation in Sn mapped to σ is obtained∈ by− adding the cycle (n) to σ. k Case 1: σ Sn 1. In this case we may place n between any element x [n ∈ 1]− and its successor in σ (i.e., put it right behind x in the cycle∈ containing− x). Then n has different positions for distinct choices k of x [n 1]. Hence n 1 distinct elements from Sn are mapped to σ∈. − −

k k k 1 Combining these two cases we see that Sn = (n 1) Sn 1 + Sn−1 . | | − | − | | − |

II Recall that the Stirling numbers of second kind sk (n) count the number of partitions of [n] into k non-empty parts. For these numbers we had the formula II II II sk (n) = ksk (n 1) + sk 1(n 1). There is also a similar picture of the − − − I recursion, see Figure 16. From the picture we easily see that s1(n) = (n 1)! (for n 1). This is the number of different cycles of length n formed− by n elements.≥

41 = 0 k

n = 0 1 = 1 k

n = 1 0 1 = 2 k

n = 2 0 1 1 = 3 k

n = 3 0 2 3 1 = 4 k

n = 4 0 6 11 6 1 = 5 k

n = 5 0 24 50 35 10 1 = 6 k

n = 6 0 120 274 225 85 15 1

I I Figure 16: A number sk(n) is obtained as the sum of the number sk 1(n 1) I − − toward the top left and (n 1) times the number sk(n 1) towards the top right. E.g. for n = 6, k = 3:− 225 = 50 + 5 35. − ·

1.7.4 Derangements

A permutation π Sn is a derangement if it is fixpoint-free, i.e., i [n]: π(i) = i. Note that∈ this means that each cycle has length at least 2. The∀ ∈ set of 6 all derangements of [n] is denoted by Dn. We close this chapter by “counting” derangements in Theorem 1.24. This is also meant to demonstrate that there are several different ways to count. Depending on the purpose, different counts may be more or less helpful.

Remark. You might find the notation Dn =!n in the literature (sic!). Since we think this notation is too confusing we| won’t| use it here.

Before we start observe that D1 = 0 and D2 = 1. We may also agree on D = 1, since the empty permutation| | has no| fixpoint,| but we try to avoid | 0| using D0.

42 Theorem 1.24. For a natural number n 1 we have ≥ Dn = (n 1)( Dn 1 + Dn 2 ), if n 2, (Recursion (i)) − − | | − | | n | | ≥ Dn = n Dn 1 + ( 1) , (Recursion (ii)) | | | − | − n n 1 n − n n! = D , D = n! D , (Recursion (iii)) k | k| | n| − k | k| kX=0   kX=0   n n n ( 1)k D = ( 1)n+kk! = n! − , (Summation) | n| k − k! kX=0   kX=0 n! 1 D = + , (Explicit) | n| e 2  n n n log n Dn √2πn ( c for some c R), (Asymptotic) | | ∼ en+1 ∼ ∈ x Dn n e− | |x = x R 1 . () n! 1 x ∀ ∈ \{ } n 0 X≥ − Proof.

Recursion (i): We prove this statement using a map φ : Dn Dn 1 Dn 2 → − ∪ − as follows. Consider a derangement π Dn and let x, y [n] be the unique numbers such that π(x) = n and ∈π(n) = y. Written as∈ a string, π is of the form π = AnBy where A is a sequence of x 1 elements and B is a sequence of n x 1 elements. − − − If (case 1) x = y, then define φ(π) := AyB. Note that σ = φ(π) Dn 1, since σ(x) = y6 = x. For example, if π = 45123, then x = 2, y =∈ 3 and− φ(π) = 4312. 6

If (case 2) x = y, then define φ(π) := A∗B∗ where A∗ and B∗ are identical to A and B except that all numbers bigger than x are decreased by 1. We claim that σ = φ(π) Dn 2. If i < x, then σ(i) = φ(i) = i if φ(i) < x, and σ(i) = φ(i) 1 ∈x >− i if φ(i) > x. Otherwise i x6 and σ(i) = φ(i+1) < x < i if φ(−i+1)≥< x, and σ(i) = φ(i+1) 1 = i+1≥ 1 = i if φ(i + 1) > x. For example, if π = 45132, then x = y =− 2, AB6 = 413− and φ(π) = 312.

Claim. Each σ Dn 1 Dn 2 is the image of exactly (n 1) permutations π D . ∈ − ∪ − − ∈ n Case 1: σ Dn 1. Then there are n 1 choices to pick a position x [n ∈ 1].− Let σ(x) = y, write σ =− AyB and define π = AnBy. Then∈ φ−(π) = σ. Note that π D , since π(x) = n = x. ∈ n 6 Case 2: σ Dn 2. Then there are n 1 choices to put n on position x (as the∈ first− element, into a gap− or after the last element). Then there is a unique way to increase σ(i) by 1 for each i with σ(i) x. Finally put x on position n. We obtain π = AnBx, which is mapped≥ to σ. Note that π D due to similar arguments as above. ∈ n Recursion (ii): We will apply induction on n. An induction basis is given by D = 1 and D = 0 . Suppose n 3. We rewrite Recursion (i) as | 2| | 1| ≥ Dn n Dn 1 = ( Dn 1 (n 1) Dn 2 ). By induction hypothesis, | | − | − | − | − | − n −1 | −n | the right side is equal to ( 1) − = ( 1) which proves the claim. − − −

43 Remark. For the number S = P (n) = n! of permutations of [n] we have | n| a similar recursion Sn = (n 1)( Sn 1 + Sn 2 ), since n! = (n 1)((n | | − | − | | − | − − 1)! + (n 2)! and Sn = n Sn 1 . − | | | − | Recursion (iii): We count all permutations in S . On the one hand S = n!. n | n| On the other hand each π Sn induces a derangement on [n] F (π), where ∈ n \ F (π) is the set of all fixpoints of π. Thus there are Dn k different k | − | permutations in Sn with exactly k fixpoints each. Hence n! = Sn = n n  | | Dn k . k=0 k | − | P  1 ( 1)k summation: Proof by induction on n. If n = 1, then D1 = 0 = 1 k=0 −k! = 1(1 1). This gives an induction basis. | | − P Consider n 2. Then, using the induction hypothesis (IH): ≥ n 1 k n IH − ( 1) n Dn = n Dn 1 + ( 1) = n(n 1)! − + ( 1) | | | − | − − k! − kX=0 n 1 n − ( 1)k n! ( 1)k = n! − + ( 1)n = n! − . k! n! − k! kX=0 kX=0

z zk explicit: Recall that e = k 0 k! . Then ≥

P k n ( 1) n k Dn n! k=0 −k! ( 1) 1 1 lim | | = lim = lim − = e− = . n Sn n n! n k! e →∞ →∞ P →∞ k=0 | | X So for sufficiently large n we have D n! . Applying standard argu- | n| ∼ e ments for the speed of convergence of we obtain D = n! + 1 . | n| e 2 √ n n asymptotic: This follows immediately from Stirling’s formula n! 2πn e and the explicit result. ≈  Dn n generating function: Let F (x) := n 0 | n! | x . Then the derivative of F ≥ Dn n 1 Dn n 1 is given by F 0(x) = n 1 | n! | nx − = n 1 (n| 1)!| x − . From Recur- ≥ P ≥ − Dn+1 Dn Dn−1 sion (i) we have | n! P| (n| 1)!| = |(n 1)!| Pand thus − − −

Dn n 1 n (1 x)F 0(x) = | | (x − x ) − (n 1)! − n 1 X≥ − D D = | n+1| | n| xn n! − (n 1)! n 1   X≥ − Dn 1 n = | − | x (n 1)! n 1 X≥ − = xF (x).

e−x This differential equation with F (0) = 1 is solved by F (x) = 1 x . −

44 2 Inclusion-Exclusion-Principle and M¨obiusInversion

In the last theorem of the previous chapter, and in several other places, we have seen with alternating signs. This chapter will deal with such kind of results. Consider a finite set X and some properties P1,...,Pm such that for each x X and each i [m] it is known whether x has property Pi or not. We are∈ interested in the∈ number of elements from X satisfying none of the properties Pi.

Example. Let X be the set of students in the room and let P1 and P2 be the properties “being male” and “studies math”, respectively. Suppose there are 36 students, 26 of which are male and 32 of which study math. Among the male students 23 study math. We are interested in the number of students which are neither male nor study math. Let X1, X2 denote the set of male students and the set of math students, respectively. Then X (X X ) = X X X + X X = 36 26 32 + 23 = 1. | \ 1 ∪ 2 | | | − | 1| − | 2| | 1 ∩ 2| − − See Figure 17.

36 students

male math 26 23 32

1

Figure 17: When there are 26 male and 32 math students among 36 students in the class, and 23 of the male students study math, then there is exactly 1 female student who does not study math.

2.1 The Inclusion-Exclusion Principle

Let P1,...,Pm be some m properties and X be an n-element set, where for each x X and each i 1, . . . , m we have that x has property P or x does ∈ ∈ { } i not have property Pi. For a subset S of properties, let N(S) denote the set of elements in X that have properties P for all P S. Note that N( ) = X. ∈ ∅ Clearly, a property Pi just corresponds to the subset Xi of elements of X having this property and N(S) = i S Xi. However, taking properties instead of sets has the advantage that we can∈ take the same subset more than once. T Theorem 2.1 (Inclusion-Exclusion Principle). Let X be a finite set and P ,...,P properties. Further define for S [m] the 1 m ⊆ set N(S) = x X i S : x has Pi , i.e. the set of all elements from X having property{ ∈P for| all ∀ i∈ S. The number} of elements of X that satisfy none i ∈ of the properties P1,...,Pm is given by

S ( 1)| | N(S) . (1) − | | S [m] X⊆

45 Proof. Consider any x X. If x X has none of the properties, then x N( ) and x N(S) for any S∈ = . Hence∈ x contributes 1 to the sum (1). ∈ ∅ If x6∈ X has exactly6 k∅ 1 of the properties, call this set of properties T [m] ∈. Then x N(S) iff≥S T . ∈ k ∈ ⊆ The contribution of x to the sum (1) is ( 1) S = k k ( 1)i = 0.  S T | | i=0 i ⊆ − k −k In the last step we used that for any y R we have (1 y)k = ( y)i ∈P − P i=0 i − which implies (for y = 1) that 0 = k k ( 1)i. i=0 i − P  The previous theorem can alsoP be proved  inductively.

In some settings we might be interested in a set of elements having a certain set of properties A and none of the properties from a set of properties B. We can handle this by considering the opposites A¯ of the properties from A and searching for the elements having none of the properties from A¯ and B. In other settings we might be interested in the number of elements satisfying at least one of the properties. The following corollary answers this question.

Corollary 2.2. The number of elements of X that have at least one of the properties P1,...,Pm is given by

S S 1 X ( 1)| |N(S) = ( 1)| |− N(S). | | − − − S [m] =S [m] X⊆ ∅6 X⊆

Given a set X and three properties P1,P2,P3, the Inclusion-Exclusion-Principle states that the number of elements not in P P P is: 1 ∪ 2 ∪ 3 X P P P + P P + P P + P P P P P . | | − | 1| − | 2| − | 3| | 1 ∩ 2| | 1 ∩ 3| | 2 ∩ 3| − | 1 ∩ 2 ∩ 3| Figure 18 illustrates this: For example, the number of elements in P (P P ) 1 \ 2 ∪ 3

P1 P2 −+− − − +−+−+− −+− − −+−

− P3

Figure 18: An illustration of the inclusion-exclusion principle for three proper- ties. is subtracted once from the total number of elements. The number of elements in (P P ) P is subtracted twice in the beginning (since the elements are 1 ∩ 2 \ 3 in P1 as well as in P2) and then added back once. The number of elements

46 in P1 P2 P3 is subtracted three times in the beginning, then added back three∩ times∩ and finally subtracted yet again. The same holds for the other intersections. Altogether one can see that each element in P1 P2 P3 contributes exactly 1 to the total sum. ∪ ∪ − 2.1.1 Applications In the following we apply the principle of inclusion and exclusion (PIE) to count things. The arguments have a fairly rigid pattern: (i) Define “bad” properties: We identity the things to count as the elements of some universe X except for those having at least one of a set of prop- erties P1,...,Pm. The corresponding sets are denoted by X1,...Xm (i.e. Xi is the set of elements having property Pi). Given this reformulation of our problem we want to count X (X ... X ). \ 1 ∪ ∪ m (ii) Count N(S): For each S [m], determine N(S), the number of elements of X having all bad properties⊆ P for i S. i ∈ (iii) Apply PIE: Apply Theorem 2.1, i.e. the principle of inclusion and ex- clusion. This yields a closed formula for X (X1 ... Xm) , typically with one unresolved summation sign. | \ ∪ ∪ | Theorem 2.3 (Surjections). The number of surjections from [k] to [n] is:

n n ( 1)i (n i)k. − i − i=0 X   Proof. Define bad properties: Let X be the set of all maps from [k] to [n]. Define the “bad” property Pi for i [n] as “i is not in the image of f”, i.e. ∈ f :[k] [n] has property P : j [k]: f(j) = i. → i ⇐⇒ ∀ ∈ 6 With this definition, the surjective functions are exactly those functions that have no bad property, i.e. we need to count X (X ... X ). \ 1 ∪ ∪ n Count N(S): We claim N(S) = (n S )k, for any S [n]. To see this, observe that f has all properties with indices−| | from S if and⊆ only if f(i) / S for all i [k]. In other words, f must be a function from [k] to [n] S∈and there are∈ (n S )k of those. \ − | | Apply PIE: Using Theorem 2.1, the number of surjections is therefore:

pie S X (X ... X ) = ( 1)| |N(S) \ 1 ∪ ∪ n − S [n] X⊆ S k = ( 1)| |(n S ) − − | | S [n] X⊆ n n = ( 1)i (n i)k. − i − i=0 X   S k In the last step we used that ( 1)| |(n S ) only depends on the size of S and there are n sets S [−n] of size−i. | | i ⊆  47 Corollary 2.4. (i) Consider the case n = k. A function from [n] to [n] is a surjection if and only if it is a bijection. Since there are n! bijections on [n] (all permutations) we obtained the identity: n n n! = ( 1)i (n i)n. − i − i=0 X   (ii) A surjection from [k] to [n] can be seen as a partition of [k] into n non- empty distinguishable parts (the map assigns a part to each i [k]). Since the partitions of [k] into n non-empty indistinguishable parts is∈ counted by II sn (k) and there are n! ways to assign labels to the n parts, we obtain that II the number of surjections is equal to n!sn (k). This proves the identity: n n n!sII (k) = ( 1)i (n i)k. n − i − i=0 X   Theorem 2.5 (Derangements revisited). Recall that for n N, the derange- ∈ ments Dn on n elements are permutations of [n] without fixed points. We claim: n n D = ( 1)i (n i)! | n| − i − i=0 X   Proof. Define bad properties: Let X be the set of all permutations of [n]. We define the “bad property” Pi that means “π has a fixpoint i”: π X has property P : π(i) = i, (i [n]). ∈ i ⇐⇒ ∈ Derangements are exactly permutations that have none of these properties. Count N(S): We claim N(S) = (n S )! for any S [n]. − | | ⊆ Indeed, π X has all properties with indices from S if and only if all i S are fixed∈ points of π. On the other elements, i.e. on [n] S, π may be∈ an arbitrary bijection so there are (n S )! choices for π. \ − | | Apply PIE: Using Theorem 2.1, the number of derangements is therefore:

pie S X (X ... X ) = ( 1)| |N(S) \ 1 ∪ ∪ n − S [n] X⊆ S = ( 1)| |(n S )! − − | | S [n] X⊆ n n = ( 1)i (n i)! − i − i=0 X   S In the last step we used that ( 1)| |(n S )! only depends on the size of S and there are n sets S [n−] of size−i. | | i ⊆ Theorem 2.6 (Combinations  of multisets). Consider a multiset M with types 1, . . . , m and repetition numbers r1, . . . , rm. Then the number of k-combinations of M is: S m 1 + k i S(ri + 1) ( 1)| | − − ∈ − m 1 S [m]  P  X⊆ − a where we define b := 0 for a < b.  48 Proof. Define bad properties: Let X be the set of k-combinations where we disregard the restrictions the repetition numbers impose, in other words X is the set of k-combinations of M, where M is the multiset with the same m types as in M but infinite supply of each type (i.e. “r = ” for i ∞ each i [m]). f f ∈ m 1+k Recall that by Theorem 1.9, X = m− 1 . Define the bad property Pi as: | | −  A X has property P : type i is repeated in A at least r + 1 times. ∈ i ⇐⇒ i Then the k-combinations of M are exactly those k-combinations in X that have none of the bad properties. Count N(S): We claim that for any S [m]: ⊆ m 1 + k (r + 1) N(S) = − − i S i . m 1∈  −P  To see this, note that the repetition number of each type i S is at least ∈ ri+1. This fixes ri+1 elements of the k-combination. In total i S(ri+1) ∈ are fixed this way and what remains to be chosen is a (k i S(ri + 1))- P m −1+k P∈ (ri+1) combination of M. Again by Theorem 1.9 there are − − i∈S Pm 1 such combinations if k (r + 1) 0 and no such combinations− i S i  otherwise. f − ∈ ≥ P Apply PIE: Using Theorem 2.1, the number of k-combinations of M is there- fore:

pie S X (X ... X ) = ( 1)| |N(S) \ 1 ∪ ∪ m − S [m] X⊆ S m 1 + k i S(ri + 1) = ( 1)| | − − ∈ − m 1 S [m]  P  X⊆ −

In the special case where all the repetition numbers are equal, i.e. r1 = r2 = ... = rm = r, this can be simplified to:

m m m 1 + k (r + 1) i ( 1)i − − · . − i m 1 i=0 X   −  Before we study a more advanced example, we prove a small result needed there:

2n 2n r Lemma. There are 2n r r− binary sequences (with letters 0 and 1) such that: −  The sequence has length 2n and exactly r copies of 1 and 2n r copies of • 0. − No two copies of 1 are adjacent. Here the first and the last position of the • sequence count as adjacent (the sequence is cyclic). For example, if n = 3 and r = 2 there are 9 such sequences, namely:

101000, 100100, 100010, 010100, 010010, 010001, 001010, 001001, 000101.

49 Proof. Since no two copies of 1 may be adjacent, we know that after each 1 there must be a 0. So we can just imagine that one 0 is already “glued” to every 1 and we are actually building a sequence with 2n 2r copies of 0 and r copies of 10. − However, one copy of 10 might “spill” across the border, i.e. the 1 could be in position 2n and the 0 in position 1. We need to handle this case separately. Case 1: The last position of the sequence is 1. Then the first position is 0 and the remaining positions contain r 1 copies of 10 and 2n 2r copies of − 2n r 1 − 0, now without any further pitfalls. There are r− 1− ways to arrange them. −  Case 2: The last position of the sequence is 0. Then our special character 10 does not spill across the border and the sequence is any ordered ar- 2n r rangement of r copies of 10 and 2n 2r copies of 0. There are r− of them. −  In total we have: 2n r 1 2n r r 2n r 2n r 2n 2n r − − + − = − + − = − . r 1 r 2n r r r 2n r r  −    −     −  

Example 2.7 (Probl`emedes m´enages). There are n 2 married couples ≥ at a dinner party, a husband and a wife each, denoted by H1,...,Hn and W1,...,Wn. They should be seated on a round table with 2n (distinguishable) seats such that: Men and women alternate, i.e. no two men and no two women sit next • to each other. Equivalently, the seats with even labels are used either exclusively by women or exclusively by men. All couples are separated, i.e. no husband sits next to his wife. • We want to count how many such assignments exist. First count the number of ways to seat the women. They may be seated in the even or odd seats and once this is fixed, there are n! ways to seat them, so 2 n! ways in total. · Assuming the women are already seated, we now count the number of ways the men can be added. It is easy to see that this number does not depend on how the women are seated so we can assume without loss of generality that wife Wi sits in seat 2i and the odd-numbered seats are still free. We count the number of ways the men can join using PIE.

Define bad properties: Let X be the set of all ways to seat the men without paying attention to the rule that they must not sit next to their wives. There are X = n! ways to do it. | | We define the bad property Pi that captures that the husband Hi sits next to his wife Wi, i.e. on seat 2i 1 or on seat 2i + 1 (all seat numbers are taken modulo 2n, in particular,− 2n and 1 are adjacent). The permitted arrangements are exactly those with none of the bad prop- erties.

50 Count N(S): This time, calculating N(S) for arbitrary S [n] is tricky. In- stead we calculate ⊆

N ∗(r) := N(S), for 0 r n. ≤ ≤ S [n] SX⊆=r | | which will be just as helpful. The meaning of this number is a bit subtle: N ∗(r) is the number of pairs (π, S) where S [n] is of size r and π is a ⊆ seating plan such that (wife Wi sits at 2i and) the couples with indices in S are not separated. To better count these pairs, define a map f: Under f, the pair (π, S) is mapped to a binary sequence with 2n characters and exactly S copies of 1: For i S (remember that Hi and Wi will be assigned adjacent| | places in π) a one∈ should be put in the position of the husband Hi – if he sits left of his wife – or in the position of Wi in π, if she sits left of her husband. It is time for an example. Consider the pair (π, S) with S = 2, 5 and the seating { } 1 2 3 4 5 6 7 8 9 10 π = (H5W1H2W2H1W3H4W4H3W5).

Note that the second and fifth couple are indeed not separated (W5 and H5 are adjacent because the table is circular). Also note that the fourth couple is also not separated, this is allowed. The mapping f discussed above would map this pair to the sequence 0010000001 since H2 (in seat 3) and W5 (in seat 10) sit left of their spouse. It is clear that f will never produce sequences with two adjacent ones: That would mean two adjacent people are both sitting left of their spouse: Which is impossible (assuming n 2). However, any sequence with r copies of 1 and no two adjacent copies≥ of 1 is the image of (n r)! pairs (π, S): From the r copes of 1, the set S (of size r) can be reconstructed− as can be the position of the husbands with indices in S. The (n r) other husbands can be distributed arbitrarily onto the (n r) remaining− odd-numbered seats, there are (n r)! ways to do so. This− proves, using the Lemma above to count the number− of binary sequences of length n with r copies of 1 and no two adjacent copies of 1:

2n 2n r N ∗(r) = (n r)! − . − · 2n r r −   Apply PIE: Using Theorem 2.1, the number of valid ways to arrange the hus-

51 bands between the wives is:

pie S X (X ... X ) = ( 1)| |N(S) \ 1 ∪ ∪ n − S [n] X⊆ n S = ( 1)| |N(S) − r=0 S [n] X SX⊆=r | | n n r r = ( 1) N(S) = ( 1) N ∗(r) − − r=0 S [n] r=0 X SX⊆=r X | | n 2n 2n r = ( 1)r(n r)! − . − − · 2n r r r=0 X −   Multiplying this with 2n!, the number of ways to place the wives the total number of seatings is (for n 2): ≥ n 2n 2n r 2n! ( 1)r(n r)! − . − − · 2n r r r=0 X −   2.1.2 Stronger Version of PIE Theorem 2.1 can be strengthened as follows:

Theorem 2.8 (Stronger PIE). Assume f, g : 2[n] R are functions mapping subsets of [n] to real numbers and g can be written→ in terms of f as:

g(A) = f(S)(for A [n]). ⊆ S A X⊆ Then f can also be written in terms of g like this:

A S f(A) = ( 1)| |−| |g(S)(for A [n]). − ⊆ S A X⊆ Before we proof Theorem 2.8, we first observe how it is indeed a generalization of Theorem 2.1. So assume we have the setting of Theorem 2.1, i.e. properties P1,...,Pm with corresponding sets X1,...,Xm. Then for S [m] define f(S) to be the number of elements having all properties with indices⊆ not in S and none of the properties with indices in S, i.e. f(S) := X X i \ i i [m] S i S ∈\\ [∈

Note that f([m]) = X i [m] Xi is precisely the number we want to count in Theorem 2.1. | | − | ∈ | The function g fulfilling theS requirement of Theorem 2.8 is given as:

g(A) = f(S) = X = N([m] A) i \ S A i [m] A X⊆ ∈ \\

52 To see the second “=”, note that S A f(S) counts an element x if and only if the set S of all properties that x does⊆ not have is a subset of A. This means x is counted if and only if it has allP properties from [m] A. We now apply Theorem 2.8 to obtain: \

m S m S f([m]) = ( 1) −| |g(S) = ( 1) −| |N([m] S) − − \ S [m] S [m] X⊆ X⊆ S = ( 1)| |N(S). − S [m] X⊆ where in the last step we changed the order of summation, i.e. summed over [m] S instead of S. This concludes the proof of (Thm. 2.8 Thm. 2.1). \ ⇒ Theorem 2.8 establishes a similar result as M¨obius Inversion which we con- sider later. In both cases there are two “linked” functions f and g where g is given in terms of f. The assert that f can then also be given in terms of g. The subset relation “ ” will be replaced by the notion of divisibility “ ”. Both are order relations and,⊆ in fact, there are results even more general than| both Theorem 2.8 and M¨obiusInversions, dealing with general partial orders (but we will not consider them here). We now proof Theorem 2.8. Proof. Consider (for A [n]) the term that is claimed to be equal to f(A): ⊆ A S A S ( 1)| |−| |g(S) = ( 1)| |−| | f(T ) = c f(T ) − − T S A S A T S T A X⊆ X⊆ X⊆ X⊆ for appropriate cT (to be determined!), that captures how often f(T ) is en- countered (for T A). Observe cA = 1, since f(A) is only encountered for T = S = A. For a⊆ proper subset T A (and k := A T ) we have: ⊂ | | − | | k A S k i k c = ( 1)| |−| | = ( 1) − = 0. T − − i T S A i=0   ⊆X⊆ X where in the second step we observed that picking a set between T and A is equivalent to picking a subset of A T . The last step is an identity we already saw. \ This proves the claim.

2.2 M¨obiusInversion Formula Any positive number n N has a unique decomposition into primes, i.e. n can be written as n = pa1 ∈pa2 ... pak for primes p , . . . , p with multiplicities 1 · 2 · · k 1 k a1, . . . , ak. In other words, the factors of n are given as a multiset M(n) with types p1, . . . , pk and repetition numbers a1, . . . , ak. For example 360 = 23 32 5 and M(360) = 2, 2, 2, 3, 3, 5 = 3 2, 2 3, 1 5 (which admittedly looks a· bit· quaint written that{ way). } { · · · } We write n m if n divides m, meaning there is a number a N such that a n = m. In terms| of multisets this means M(n) M(m). ∈ · ⊆

53 The greatest common (gcd) of m and n, is the number k = gcd(m, n), with M(k) = M(m) M(n). For example gcd(12, 90) = 6 since M(6) = 2, 3 = 2, 2, 3 2, 3, 3, 5 ∩= M(12) M(90). { } { Note} ∩ that { M(1)} = (1 is the∩ result of the ). If gcd(m, n) = 1, then m and n are called∅ relatively prime (or coprime). We define the Euler’s φ-function for n 2 as: ≥ φ(n) = # k N 1 k n, gcd(n, k) = 1 { ∈ | ≤ ≤ } For example, φ(12) = # 1, 5, 7, 11 = 4, φ(9) = # 1, 2, 4, 5, 7, 8 = 6. A formula to compute Euler’s φ{-function} is given in the following{ Theorem.}

a1 a2 ak Theorem 2.9. If n = p1 p2 . . . pk then

k 1 φ(n) = n (1 ) − p i=1 i Y 2 1 1 Consider for instance n = 12 = 2 3. Then φ(12) = 12 (1 2 ) (1 3 ) = 12 1 2 = 4 confirming what we counted· by hand before. · − · − · 2 · 3 Proof. In the proof we use the principle of inclusion and exclusion.

Define bad properties: As ground set we use X = [n]. We say a number x X has the bad property P if x is divisible by p , i.e.: ∈ i i x [n] has property P : p x. (i [k]). ∈ i ⇐⇒ i| ∈ Then the numbers in [n] that are relatively prime to n are exactly those that have no bad property.

n Count N(S): For any S [k] we have N(S) = Q . ⊆ i∈S pi This is because, if x X is a multiple of all primes with indices from S, ∈ then x must be a multiple of their product i S pi (which is the least common multiple of those primes). Since this product∈ divides n, there are no rounding issues and the number of numbersQ x between 1 and n that

are multiples of i S pi is just given as the quotient. ∈ Apply PIE: Using TheoremQ 2.1, we can write φ(n) as:

S pie S ( 1)| | 1 φ(n) = ( 1)| |N(S) = n − = n (1 ). pi − · pi − S [k] S [k] i S i [k] X⊆ X⊆ ∈ Y∈ Q The last identity is best seen by multiplying out the right side: For each of the k factors we can either choose 1 or we can choose 1 . The indices − pi i where 1 was chosen are captured in the set S. For each S [k] we pi get exactly− the term under the sum. ⊆

We now show a few more number theoretic results leading up to M¨obiusInver- sions. Theorem 2.10. n = φ(d). d n P|

54 Proof. We claim that for a divisor d of n the sets x [n] gcd(x, n) = d and n n { ∈ | } y [ d ] gcd(y, d ) = 1 have the same cardinality. Indeed, it is easy to see { ∈ | x } that x y := d is a bijection. 7→ n This means # x [n] gcd(x, n) = d = φ( d ). Summing these identities for all d n yields: { ∈ | } | n n = φ( d ) d n X| This is almost identical to the claim, the only thing left to do is to change the n order of summation: Substitute d with d0 := d and note that d0 divides n iff d divides n. We now define the M¨obiusFunction for d 1 as: ≥ 1 d is the product of an even number of distinct primes µ(d) := 1 d is the product of an odd number of distinct primes −  0 otherwise

For example,µ(15) = 1, µ(7) = µ(30) = 1, µ(12) = 0, since 15 = 3 5 is the product of two distinct primes, 7 = 7 and− 30 = 2 3 5 are the product· of an odd number of primes and 12 = 2 2 3 is not the· product· of distinct primes: We need 2 twice. The numbers n with· ·µ(n) = 0 are also called square free since they do not have a square as a factor (12 is6 not square-free since it has 4 as a factor). Note that µ(1) = 1 since 1 is the product of 0 primes and 0 is even. Lemma 2.11. 1 if n = 1 µ(d) = 0 if n = 1 d n ( X| 6 Proof. If n = 1 this is obvious since d = 1 is the only divisor of 1 and µ(1) = 1. For n 2 we write n = pa1 . . . pak . Then we have: ≥ 1 k k D k i µ(d) = µ(d) = ( 1)| | = ( 1) = 0. − i − d n d (p1p2...pk) D [k] i=0   X| | X X⊆ X In the first step remember that µ(d) = 0 if d is not square-free. This means if d contains a prime factor more than once, it does not contribute to the sum. For the second step, realize that a divisor d of p1 . . . pk is just given by choosing a subset D of those k primes and multiplying them. Then µ(d) will be 1 if an even number of primes where chosen and 1 otherwise. The last two steps are identities we already encountered earlier. − Corollary 2.12. φ(n) µ(d) = . n d d n X| Proof. We use an identity that came up in the proof of Theorem 2.9 and argu- ments similar to those from the last Theorem.

k S 1| | µ(d) µ(d) φ(n) = n (1 1 ) = n − = n = n . pi − i S pi d d i=1 S [k] d p1...pk d n Y X⊆ ∈ | X X| Q 55 We now have all ingredients to prove the M¨obius Inversion Formula: Theorem 2.13 (M¨obiusInversion). Let f, g : N R be functions satisfying g(n) = f(d). Then: → d n P| n f(n) = µ(d)g( d ). d n X| n Proof. We start by changing the order of summation (d d ) and using the definition of g. → n µ(d)g( n ) = µ( )g(d) d d d n d n X| X| n = µ( d ) f(d0) d n d0 d X| X| = c 0 f(d0) d · d0 n X|

Where cd0 are numbers (to be determined!) that count how often f(d0) occurs as a summand. Note that cn = µ(1) = 1 since f(n) occurs only for d0 = d = n. For d0 n, d0 = n we have: | 6 n 0 cd = µ( d ) = µ(m) = 0. d0 d n m n X| | X| d0 n Where we substituted the summation index d m := d and used Lemma 2.11 in the last step. This proves the claim. 7→ Example 2.14 (Circular Sequences of 0’s and 1’s). Circular sequences are for example:

0 1 1 0 1 0 0 1 0 1 0 0 A = 0 ,B = 1 ,C = 1 1 0 0 0 0 0 0 1 0 1 1 0

Circular sequences are considered equal if they can by transformed into one another by rotation. We would write

A = 001011010 = 110100010 = B = C = 100100100. 6

Let Nn be the number of circular 0/1-sequences of length n and M(d) the number of aperiodic circular sequences of length d where a sequence is called aperiodic if it cannot be written as several times a shorter sequence. For exam- ple, C is not aperiodic since it consists of three copies of “100” while A (and therefore B) are aperiodic. For every circular sequence S of length n there is a unique way to describe it as “repetitions of S0” where S0 is an aperiodic circular sequence. This is not

56 trivial, but also not hard to see. Clearly, the length of S0 must divide n. So we found:

Nn = M(d). (?) d n X| Consider Table2 for an example with n = 6.

d M(d) corresponding sequences 1 0,1 000000,111111 2 01 010101 3 001,011 001001,011011 6 000001,000011,000101, 000001,000011,000101, 000111,001011,010011, 000111,001011,010011, 001111,010111,011111 001111,010111,011111

Table 2: Aperiodic sequences of all lengths that divide 6. There is a bijection between them and the circular sequences of length 6.

Another crucial observation is that the number of aperiodic linear sequence of length d is just d M(d) (every rotation of an aperiodic circular sequence) and therefore the number· of all linear sequences of length n can be written as “repetition of some aperiodic linear sequence” so as:

2n = d M(d). (∆) · d n X| Now define g(n) = 2n and f(d) = d M(d). These choices of f and g fulfill the requirement of the M¨obiusInversion· Formula so we obtain:

2.13 n f(n) = µ(d)g( d ) d n X| n nM(n) = µ(d)2 d ⇔ d n X| (?) d 1 l 1 d l so Nn = M(d) = d µ(l)2 = d µ( l )2 d n d n l d d n l d X| X| X| X| X| 1 d l 1 d l 1 l = d µ( l )2 = d µ( l )2 = k l µ(k)2 · l n l d n l n 1 d n l n k n | | | | l l | | l X X X X| | X (dX:=k l) · 2l µ(k) 2l φ( n ) 2l = 2=.12 l = φ( n ). l k l n n l l n k n l n l l n X| X| l X| X| While not overwhelmingly pretty, at least our result is an explicit formula only involving one sum and Euler’s φ-function. This is as good as it gets.

57 3 Generating Functions

In the following we consider sequences (an)n = a0, a1, a2,... of non-negative ∈N numbers. Typically, ak is the number of discrete structures of a certain type and “size” k. n The generating function for (an)n N is given as F (x) = n∞=0 anx . Despite the name, a generating function should∈ not be thought of as a thing you plug values into: It is not meaningful to compute something likePF (5): In fact, these values are often not well-defined because the sum would be divergent. We care about the thing as a whole: You can either think of generating functions as functions that are well-defined within their radius of convergence (some area close to zero) or you just think of the “x” in F (x) as an abstract thing (not a placeholder for a number) which makes F (x) an element of the of formal . In the following, we ignore all technicalities and boldly apply an- alytic methods as though a generating function were just a simple well-behaved function. And it just works. If you think this is all a bit arcane, try to look at it this way: At first, a generating function is just a silly way to write a sequence (instead 2 of a0 = 1, a1 = 42, a2 = 23,... you would write A(x) = 1 + 42x + 23x + ...). Some operations on the sequence have a natural correspondence: For example, shifting the sequence (a0 = 0, a1 = 1, a2 = 42, a3 = 23 ...) corresponds to multiplying A(x) by x. Some complicated operations on sequences suddenly become simple and natural in the world of generating functions where analytic tools are readily available. n When dealing with generating functions F (x) = n∞=0 anx and G(x) = n ∞ bnx , we freely use the following operations: n=0 P P Differentiate F term-wise •

∞ n 1 ∞ n F 0(x) = nanx − = (n + 1)an+1x n=1 n=0 X X Multiply F (x) by a scalar λ R term-wise • ∈

∞ n λF (x) = λanx . n=0 X Add F (x) and G(x) •

∞ n F (x) + G(x) = (an + bn)x . n=0 X Multiply F (x) and G(x) • n ∞ n F (x) G(x) = akbn k x . · − n=0 ! X kX=0 We will later see that the coefficients of the product sometimes count meaningful things if an and bn did (see Example 3.2 and 3.3).

58 Example 3.1 (Maclaurin series).

(i) Consider (an)n N with an = 1 for all n N. The corresponding generating function F (x) =∈ 1 + x + x2 + x3 + ... ∈is called the Maclaurin series. 1 We claim F (x) = 1 x , which looks a lot like the identity for an infinite geometric series. To− verify this, just observe that the product of F (x) and (1 x) is one: − (1 x)F (x) = (1 + x + x2 + x3 + ...) (x + x2 + x3 + ...) = 1. − − 1 (ii) Differentiating F (x) = 1 x with respect to x on both sides yields: −

∞ n 1 F 0(x) = (n + 1)x = . (1 x)2 n=0 X − So we found the generating function for (bn)n with bn = n + 1. ∈N (iii) Substituting x = y in the Maclaurin series gives the generating function of the alternating− sequence (i.e. a = ( 1)n): n − 1 ∞ = ( 1)nyn. 1 + y − n=0 X How do we count using Generating Functions?

(k) Example 3.2. Let an be the number of arrangements of n unlabeled balls in k labeled boxes and at least 1 ball per box. For each k, this gives a generating function: (k) ∞ (k) n B (x) = an x . n=0 X Considering the case with only one box, we have

0 for n = 0 a(1) = n 1 for n 1. ( ≥ So B(1)(x) is just the Maclaurin Series from 3.1(i) shifted by one position (i.e. multiplied by x), meaning:

∞ x B(1)(x) = 0 + x + x2 + x3 + ... = x xn = . · 1 x n=0 X − The key observation we make now is that multiplying two generating functions has a meaningful correspondence in our balls-in-boxes setting: For two numbers of boxes s and t, we have the identity:

n (s+t) (s) (t) an = al an l − Xl=0 This is merely the observation that arrangements of n balls in s + t boxes are given by arranging some l balls in the first s boxes and the remaining n l balls −

59 in the remaining t boxes. Looking at the right side of the equation, note that these numbers are exactly the coefficients of the product B(s)(x) B(t)(x) (check this!). So we obtained: ·

B(s+t)(x) = B(s)(x) B(t)(x) · and therefore k x k B(k)(x) = B(1)(x) = . 1 x    −  (k) (k) n However, we want to write B (x) in the form n an x to see the coefficients (k) 1 an . To do this, note that deriving k 1 times the term (1 x)− yields k − P − (k 1)!(1 x)− . Using this we obtain: − − xk xk dk 1 1 xk dk 1 ∞ B(k)(x) = = − = − xn (1 x)k (k 1)! · dxk 1 1 x (k 1)! dxk 1 − − n=0 ! − −  −  − X k x ∞ n k+1 = n (n 1) ... (n k + 2)x − (k 1)! · − · · − n=k 1 − X− ∞ n! = xn+1 (n k + 1)!(k 1)! n=k 1 X− − − ∞ n ∞ n 1 = xn+1 = − xn k 1 k 1 n=k 1   n=0   X− − X − a where in the last step use that b = 0 if a < b and additionally make an index shift of 1.  So we found a new proof that the number of arrangements of n unlabeled (k) n 1 balls in k labeled boxes and at least one ball per box is an = k−1 . − Example 3.3. We want to count integer solutions for a + b + c = n with a non- negative even integer a, a non-negative integer b and c 0, 1, 2 . Equivalently, we can think of this as counting arrangements of n unlabeled∈ { } balls in boxes labeled a, b and c where box a should receive an even number of balls and c at most 2 balls. We first consider the more simple situations where only one of the vari- ables/boxes exists:

Solutions to a = n with even a. Clearly, there is a unique solution for • even n and no solution for odd n. The corresponding generating function is: ∞ ∞ 1 A(x) = x2n = (x2)n = . 1 x2 n=0 n=0 X X − Solutions to b = n where b is an integer. Clearly, there is exactly one so- • lution for each n. The corresponding generating function is the Maclaurin series: ∞ 1 B(x) = xn = . 1 x n=0 X −

60 Solutions to c = n where c 0, 1, 2 . Clearly, there is exactly one solution • if n 0, 1, 2 and no solution∈ { otherwise.} The corresponding generating function∈ { is: } C(x) = 1 + x + x2.

With the same argument as in the previous example, multiplying the generating functions yields the generating function of the sequence we are interested in, so we calculate: (1 + x + x2) (1 + x + x2) F (x) = A(x) B(x) C(x) = = · · (1 x2)(1 x) (1 + x)(1 x)2. − − − We would like to write this as a linear combination of generating function we understand well, so we search for real numbers R, S, T with:

(1 + x + x2) R S T F (x) = = + + (1 + x)(1 x)2 1 + x 1 x (1 x)2 − − − Multiplying by the denominators yields:

(1 + x + x2) = R(1 x)2 + S(1 x2) + T (1 + x) − − We compare the coefficients in front of 1, x and x2 and get the equations 1 = R + S + T , 1 = 2R + T , 1 = R S. This system has the unique solution 1 3 − 3 − R = 4 , S = 4 , T = 2 . Using this− and the identities we obtained in Example 3.1 yields:

1 1 1 ∞ ∞ ∞ F (x) = 1 3 + 3 = 1 ( 1)nxn 3 xn+ 3 (n+1)xn. 4 1 + x− 4 1 x 2 (1 x)2 4 − − 4 2 n=0 n=0 n=0 − − X X X So the coefficient of xn, and therefore the number of solutions to a + b + c = n we want to count is: ( 1)n 3 3(n + 1) − + . 4 − 4 2

Example 3.5. We want to count the number an of well-formed parenthesis expressions with n pairs of parenthesis. For example (()())() is a well-formed expression with 4 pairs of parenthesis but )())(( is not. Formally, a permutation of the multiset n “(”, n “)” is well-formed if reading it from left to right and counting “+1”{ for· every· “(”} and “-1” for every “)” will never yield a negative number at any time (no prefix contains more closing than opening parenthesis). Every well-formed expression with n 1 pairs of parenthesis starts with “(” and there is a unique matching “)” such≥ that the sequence in between and the sequence after is a well-formed (possibly empty) expression. For example:

(())()() (()(())) ()(()(()))

In other words, a well-formed expression with n pairs of parenthesis is obtained by putting a well-formed expression with k pairs in between “(” and “)” and then append a well-formed expression with n k 1 pairs of parenthesis. This gives the equation: − − n 1 − an = akan k 1 − − kX=0

61 So if F (x) is the generating function belonging to an, then we know:

n 1 n ∞ n ∞ − n ∞ n+1 F (x) = anx = 1 + ( akan k 1)x = 1 + ( akan k)x − − − n=0 n=1 k=0 n=0 k=0 X n X X X X ∞ n 2 = 1 + x ( akan k)x = 1 + x F (x) . · − · n=0 X kX=0 And with analytic methods we can find a solution to this as:

1 √1 4x F (x) = − − . 2x So far it is unclear what this means for the coefficients of F (x) (and therefore for the number of well-formed expressions). It seems we need additional tools to understand the power series of √1 4x. − 3.1 Newton’s Recall the following special case of the Multinomial Theorem (Theorem 1.6).

n n n k ∞ n k (1 + x) = x = x , n N. k k ∀ ∈ kX=0   kX=0   n n Where, as usual, k = 0 for k > n. This shows that (1 + x) is the generating n function for the series (ak)k with ak = .  ∈N k We extend this result from natural numbers n N to any n R.  To this end we first extend the definition of binomial∈ coefficients. ∈ Definition 3.7 (Binomial Coefficients for Real Numbers). Recall that for in- tegers n, k N we have: ∈ n P (n, k) n(n 1) (n k + 1) n! = | | = − ····· − = k k! k! k!(n k)!   − It does not make sense to talk about permutations of sets of size n R and it is unclear what n! should be, but the formula in between is well-defined∈ for general n R. With this in mind, we define p(n, k) := n (n 1) ... (n k + 1) and n∈ p(n,k) · − · · − k := k! . Note that the new definition matches the old one if n is integer. We  can now talk about numbers such as “ 7/2 choose 5” by which me mean: − 7 9 11 13 15 7/2 − − − − − 9009 − = 2 · 2 · 2 · 2 · 2 = . 5 5! − 256   Note that for n R we have p(n, 0) = 1 and for k 1 the : ∈ ≥ p(n, k) = (n k + 1) p(n, k 1) (?) − · − = n p(n 1, k 1). · − − Given our extended definition of binomial coefficients, we can state the following Theorem (but will omit the proof).

62 Theorem 3.8 (Newton’s Binomial Theorem). For all non-zero n R we have: ∈ ∞ n (1 + x)n = xk. k kX=0   Setting n = 1/2 yields an identity for √1 + x that we require to proceed in Example 3.5. But before we can use it, we need to better understand the coefficients of the form 1/2 . · Lemma 3.9. For any integer  n 1 we have ≥ 1/2 2n 2 1 1 = ( 1)n+1 − . n − n 1 22n 1 · n    −  − Proof. We do induction on n. For n = 1:

1/2 1 1 1 = 2 = = 1 1 1 1 1! 2 · · 2 ·   For the induction step (n n + 1) we use recursion (?) from above: → 1/2 p(1/2, n + 1) (1/2 (n + 1) + 1)p(1/2, n) n 1/2 1/2 = = − = − n + 1 (n + 1)! (n + 1) n! − n + 1 n   ·   IH n 1/2 2n 2 1 1 = − ( 1)n+1 − − n + 1 − n 1 22n 1 · n  −  − 2n 2n 1 2n 2 1 1 = − ( 1)n+2 − 2n · 2n − n 1 22n 1 · n + 1  −  − (2n 2)!(2n 1)(2n) 1 1 = ( 1)n+2 − − . − (n 1)!(n 1) n n 22n+1 · n + 1 − − · · 2n ( n ) | {z } Proposition 3.10. Using the last Theorem and Lemma we obtain:

3.8 ∞ 1/2 3.9 ∞ 2n 2 1 1 √1 + x = xn = 1 + 2 − ( 1)n xn. n − n 1 − 22n · n · n=0 n=1 X   X  −  Example 3.5 (Continued). Using the proposition we are now able to find the coefficients of the generating function from Example 3.5 above, i.e. the number of well-formed parenthesis expressions.

1 √1 4x 3.10 1 ∞ 2n 2 1 1 F (x) = − − = 2 − ( 1)n ( 4x)n 2x 2x n 1 − 22n n − n=1 X  −  1 ∞ 2n 2 1 ∞ 2n 1 = − xn = xn. x n 1 n n n + 1 n=1 n=0 X  −  X   2n 1 The numbers Cn := n n+1 are called Catalan Numbers. They do not only count well-formed parenthesis expressions but occur in other situations as well. 

63 3.2 Exponential Generating Functions Until now we mapped sequences to functions like this:

∞ n (an)n anx R[x] ∈N 7→ ∈ n=0 X In other words, we used the coefficients an to obtain a linear combination of the n basis x n N of R[x]. Our{ choice} ∈ of a basis was useful in the cases we considered, but we can consider other bases as well that will be useful in other situations. The following sets are all bases of R[x]:

n n x x x 1 p(x, n) e− x n n! n n! n n n   ∈N   ∈N   ∈N   ∈N

In the last case we get from any sequence (an)n N a corresponding Dirichlet a ∈ ∞ n series n=0 ns . Such series are important in algebraic (in that setting the variable is typically called s instead of x). As an example, we state the followingP result (without proof).

Theorem 3.11 (Euler Product). The for (µ(n))n satisfies ∈N

∞ µ(n) 1 s = = (1 p− ) ns ζ(s) − n=0 p prime X Y 1 where is the : ∞ . ζ(s) ζ(s) = s n=0 n P We will not examine Dirichlet series further, dealing with exponential gener- ating functions instead. In contrast to ordinary generating functions (the kind n xn we considered before) the basis is not x n N but n! n N and instead of counting arrangements of unlabeled objects{ } (unlabeled∈ { balls} ∈ in labeled boxes, solutions to a1 + a2 + ... + al = n with constraints, indistinguishable parenthe- sis in an ordered string), exponential generating functions are useful to count arrangements of labeled objects (permutations, derangements, partitions, . . . ) as we will see shortly. Before we get started, note the exponential generating functions A(x) and B(x) of (an)n and (bn)n with an = 1 and bn = n!(n N) are ∈N ∈N ∈ ∞ xn A(x) = = ex n! n=0 X ∞ xn ∞ 1 B(x) = n! = xn = n! 1 x. n=0 n=0 X X − The following two observations show the connection of exponential generating functions to arrangement of labeled objects.

Observation 3.12. Say the numbers (an)n N and (bn)n N count arrangements of type A and B, respectively, using n labeled∈ objects. ∈

64 Assume arrangements of type C with n objects are obtained by a unique split of the n objects into two sets and then forming an arrangement of type A with the first set and an arrangement of type B with the second. Then cn, the number of arrangements of type C and size n, is given as

n n cn = ak bn k. k · − kX=0   Crucially, the exponential generating functions A(x),B(x),C(x) of the three sequences reflect this relationship as C(x) = A(x) B(x), which is easy to verify: · n n n ∞ x ∞ x ∞ ak bn k n A(x) B(x) = a b = − x · n n! · n n! k! (n k)! n=0 ! n=0 ! n=0 ! X X X kX=0 − n ∞ n xn = ak bn k = C(x). k · − n! n=0 ! X kX=0   Observation 3.13. An index shift by one in the sequence corresponds to a derivation, meaning

n n 1 n ∞ x derive ∞ x − ∞ x F (x) = a F 0(x) = a = a . n n! ⇒ n (n 1)! n+1 n! n=0 n=1 n=0 X X − X We already encountered Stirling numbers before. We distinguish three kinds:

The unsigned Stirling numbers of the first kind sI (n) fulfill the recursion • k I I I sk(n) = (n 1)sk(n 1) + sk 1(n 1) − − − − and count the number of n-permutations of [n] with k cycles. The signed Stirling numbers of the first kind are simply • I n k I s¯ (n) = ( 1) − s (n). k − k The Stirling numbers of the second kind sII (n) fulfill • k II II II sk (n) = ksk (n 1) + sk 1(n 1) − − − and count the number of partitions of [n] into k non-empty parts.

Theorem 3.14. For k N, we have the following identity for the exponential ∈ II generating function Fk(x) of the Stirling numbers (sk (n))n : ∈N

∞ xn 1 F (x) = sII (n) = (ex 1)k. k k n! k! − n=0 X Proof. We proceed by induction on k. For k = 1 we have:

II 1 for n 1 sk (n) = ≥ (0 for n = 0

65 ∞ xn x and the claim follows from the identity n! = e 1. n=1 − If k 2, we use the recursion of StirlingP numbers from above: ≥ II II II sk (n + 1) = ksk (n) + sk 1(n). − Taking the generating functions for each term (remember that derivations cor- respond to index shifts!) and then applying the induction hypothesis yields:

IH 1 x k 1 Fk0 (x) = k Fk(x) + Fk 1(x) = kFk(x) + (e 1) − . · − (k 1)! − − This differential equation has the solution 1 F (x) = (ex 1)k k k! − as claimed, where we used Fk(0) = 0. Note that solutions to differential equations may be hard to find but are easy to verify. Finding the solutions is not the topic of this lecture. It is perfectly alright if you use computers (or techniques from other lectures).

To later obtain the exponential generating function for Stirling numbers of the first kind, we first prove two identities: Lemma 3.15. n (i) sI (n)xk = p(x + n 1, n). k − kX=0 n I k (ii) s¯k(n)x = p(x, n). kX=0 Proof. Note that p(x + n 1, n) is a polynomial of degree n with variable x so − n k there are constants bn,k such that p(x + n 1, n) = k=0 bn,kx . They fulfill the following equation: − P n b xk = p(x + n 1, n) = (x + n 1) p(x + n 2, n 1) n,k − − · − − kX=0 n 1 n n 1 − k k − k = (x + n 1) bn 1,kx = bn 1,k 1x + (n 1) bn 1,kx . − − − − − − kX=0 kX=1 kX=0

Comparing the coefficients of xk we get:

bn,k = bn 1,k 1 + (n 1)bn 1,k − − − − where we define bn,k = 0 for k > n or k < 0. I So the numbers bn,k fulfill the same recursion as sk(n) and since the starting I I values s0(0) = b0,0 = 1 match as well, this proves bn,k = sk(n) and thus (i).

66 For (ii), plug x into the equation above: − n n ( 1)ksI (n)xk = p( x + n 1, n) = ( x + n k) − k − − − − k=0 k=1 X n Y = ( 1)n (x n + k) = ( 1)np(x, n). − − − kY=1 Multiplying by ( 1)n gives the desired result: − n n k I k ( 1) − s (n) x = p(x, n). − k k=0 I X s¯k(n)

| {z } I Theorem 3.16. The exponential generating function of (¯sk(n))n fulfills: ∈N ∞ xn 1 s¯I (n) = (log(1 + x))k. k n! k! n=0 X Proof. One way to expand (1 + x)z is:

∞ 1 (1 + x)z = ez log(1+x) = (log(1 + x))kzk k! kX=0 On the other hand, we can also expand (1 + x)z using Newton’s Binomial The- orem and the last Lemma: n ∞ z ∞ p(z, n) ∞ xn (1 + x)z 3.8= xn = xn 3.15= s¯I (n)zk n n! n! k n=0 n=0 n=0 X   X X kX=0 ∞ ∞ xn ∞ ∞ xn = s¯I (n) zk = s¯I (n) zk. k n! k n! n=0 k=0 k=0 n=0 ! I X X X X sk(↑n)=0 for k > n We have written (1 + x)z in two ways and the coefficients of zk in both repre- sentations must match. This proves the claim.

3.3 Recurrence Relations We have already seen (in the case of Stirling numbers) that relations between numbers in a sequence may fully characterize the sequence (if some starting value is given). A sequence (an)n is defined recursively if ∈N

an = f(an 1, an 2, . . . , an k) − − − for some function f, a positive integer k, and all n k. This identity a = ≥ n f(an 1, an 2, . . . , an k) is called a recurrence or a or a re- − − − cursion of order k. If a0, a1, . . . , ak 1 are known, they are called initial values − or initial conditions. The sequence (an)n is then called a solution of the ∈N recursion f with initial values a0, . . . , ak 1. In the following we examine special− cases of such recurrence relations and show how the sequence can be derived from them.

67 Example 3.17 (Fibonacci Numbers). Rabbits were first brought to Australia by settlers in 1788. Assume for simplicity, the first pair of young rabbits (male and female) arrives in January (at time n = 1). A month later (n = 2) this pair of rabbits reaches adulthood. Another month later (n = 3) they produced a new young pair of rabbits as offspring. In general, assume that in the course of a month every pair of young rabbits grows into adulthood and every pair of adult rabbits produces one pair of young rabbits as offspring. If Fn denotes the number of rabbits at time n, then we have Fn = Fn 1 + Fn 2, since exactly the rabbits that already existed at time n 2 will be− adults at− time n 1 and produce offspring between time n 1 and −n. This gives sequence: − −

F0 = 0,F1 = 1,F2 = 1,F3 = 2,F4 = 3,F5 = 5,F6 = 8,F7 = 13,F8 = 21,...

The sequence (Fn)n N is called the Fibonacci sequence and its elements are the famous Fibonacci∈ numbers. Rabbits in Australia are now a serious prob- lem, causing substantial damage to crops and have been combated with ferrets, fences, poison, firearms and the myxoma virus.

Photos taken by Daniel Oines and Esdras Calderan

Figure 19: Spiraling patterns can be seen in many plants. The pine cone on the left has 8 spiral arms spiraling out counterclockwise and 13 spiral arms spiraling out clockwise. In the sunflower, different spiral patterns stand out depending on where you look. Close to the center, a counterclockwise spiral with 21 arms and a clockwise spiral with 34 arms can be seen. Closer to the border another counterclockwise spiral with 55 arms emerges. Fibonacci numbers everywhere! We’re not making this up, zoom in and count for yourselves (or better yet, go outside and look at nature directly), the patterns are really there!

Fibonacci numbers also frequently occur in plants (see Figure 19). In case you do not know Vi Hart (seriously?!) you should definitely check out her Youtube Channel and watch her videos on this topic.

Another example where Fibonacci numbers occur is tilings of checkerboards of size n 2 with dominoes as shown in Figure 20. Let cn be the number of such tilings.× Note that there are two possibilities to cover the rightmost column (see Figure 21: Either with a vertical domino, then a board of size (n 1) 2 − ×

68 Figure 20: A tiling of the 9 2 board with dominoes. × remains to be tiled, or with horizontal dominoes, then a board of size (n 2) 2 − × remains to be tiled. This means cn = cn 1 + cn 2 and together with c1 = 1 and − − c2 = 2 (and setting c0 = 1) this implies cn = Fn+1.

Figure 21: The last column can either be covered by a vertical domino or by horizontal dominoes.

Example 3.18 (Ternary n-strings without substring 20). Consider strings over the alphabet 0, 1, 2 that do not contain the substring 20, for instance { } 0010221021 or 1111011102. Let tn be the number of such strings of length n. We have t0 = 1 (the empty string), t1 = 3 (each string of length 1), and t2 = 8 (all strings of length 2 except for 20). In general, an n-string without 20 consists of an (n 1)-string without 20 − and an additional digit, which gives 3 tn 1 possibilities. However, this may · − produce strings that end in 20 (but have no 20 in other places): There are tn 2 of them. − This gives tn = 3 tn 1 tn 2. · − − −

The Fibonacci sequence and the sequence (tn)n N from the last example have in common that they are solutions for some recurrence∈ relation of order 2, i.e., their elements can be described in terms of the previous two elements. Moreover, each element could be expressed as the sum of multiples of previous elements. We now develop a general theory dealing with such recurrence relations. Definition 3.19. A recurrence relation of the form

an = c1(n)an 1 + c2(n)an 2 + + ck(n)an k + g(n) − − ··· − for functions g : N R and ci : N R, i = 1, . . . , k, is called a linear recurrence. → → In case g 0 (that is, g(n) = 0 for all n N), the recurrence is called linear homogeneous≡ , otherwise linear non-homogeneous∈ .

In this chapter we shall treat the case when the coefficients ci’s are constants, so an = c1an 1 + c2an 2 + + ckan k. − − ··· − We assume that ck = 0, otherwise we treat the recursion as one of a smaller order. 6

69 The Fibonacci sequence fulfills a homogeneous linear recurrence relation with constant coefficients Fn = Fn 1 + Fn 2 − − where k = 2, c = 1, c = 1, g 0. 1 2 ≡ The same is true for tn from Example 3.18

tn = 3tn 1 tn 2 − − − where this time k = 2, c = 3, c = 1, g 0. 1 2 − ≡

n 3.3.1 Special Solution an = x and the Characteristic Polynomial Throughout this section assume we are given a linear homogeneous recurrence relation f of the form

an = c1an 1 + c2an 2 + ... + ckan k − − − i of order k. If one replaces ai with x and plugs it in the recursion, the following is obtained: n n 1 n 2 n k x = c x − + c x − + + c x − . 1 2 ··· k n k Dividing by x − and bringing terms to one side of the equation gives:

k k 1 k 2 x c x − c x − c = 0. − 1 − 2 − · · · − k Solving this equation allows to find some solutions. The characteristic polyno- mial of f is given by

k k 1 k 2 p(x) = x c1x − c2x − ck 1x ck. − − − · · · − − − We see that a homogeneous linear recursion could be reconstructed from its i characteristic polynomial by replacing x with an k+i: −

an c1an 1 c2an 2 ckan k = 0, − − − k − k 1 −k 2 − · · · − x c1x − c2x − ck 1x ck = 0. − − − · · · − − − Note further, that we can assume that all the roots of the characteristic poly- nomial are non-zero because ck = 0. In this section we shall present6 some methods to solve linear homogeneous recurrences and some specific non-homogeneous recurrences. Lemma 3.20 (Linearity Property). Let f be a linear homogeneous recurrence with constant coefficients. If (bn)n is a solution of f with some initial values ∈N and (dn)n is a solution of f with some initial values, then for any constants ∈N β, δ R, (βbn + δdn)n is a solution of f with some initial values. ∈ ∈N Proof. Let an = f(an 1, . . . , an k) = c1an 1 + +ckan k. Thus bn = c1bn 1 + − − − ··· − − + ckbn k and dn = c1dn 1 + + ckdn k for all n k. Multiplying the first ··· − − ··· − ≥ equation by β, the second by δ and adding them gives βbn + δdn = c1(βbn 1 + − δdn 1) + + ck(βbn k + δdn k). − ··· − − Lemma 3.21 (Divisibility of Polynomials). Let p(x) be a polynomial. Then p(x) is divisible by (x q)m, for a positive integer m and a nonzero constant q if and only if the following− polynomials are divisible by (x q): p (x) = p(x), p (x) = − 0 1 xp0(x), p2(x) = xp10 (x),..., pk(x) = xpk0 1(x),..., pm 1(x) = xpm0 2(x). − − −

70 Proof. Assume first that p(x) is divisible by (x q)m. Then p(x) = (x q)mt(x), − − m k where t(x) is a polynomial. We use induction on k to show that (x q) − (m 0) − divides pk, k = 0, . . . , m 1. When k = 0, (x q) − divides p0(x) = p(x). m k+1− − m k Assume that (x q) − divides pk 1(x). Let us show that (x q) − divides − − m k+1 − pk(x). We have then that pk 1 = (x q) − s(x) for some polynomial s. Then − − m k m k+1 pk(x) = xpk0 1(x) = x (m k + 1)(x q) − s(x) + (x q) − s0(x) , so pk − m k − − − is divisible by (x q) − . −  Now, assume that each of p0, . . . , pm 1 is divisible by x q. We shall show k − − that pm k(x) is divisible by (x q) , k = 0, . . . , m 1, so in particular that p (x) is− divisible by (x q)m.− We do this by induction− on k. When k = 1, 0 − the statement holds by assumption. Now, assume that pm k(x) is divisible k − k+1 by (x q) , let us prove that pm k 1(x) is divisible by (x q) . Assume − − `− − not, i.e., that pm k 1(x) = (x q) t(x), t(q) = 0, ` k. Then pm k(x) = ` 1 − − − 6 ≤ − x(x q) − (`t(x) + t0(x)(x q)), so the highest power of (x q) dividing pm k − − −k − is ` 1 < k. We see that pm k(x) is not divisible by (x q) . − − − Lemma 3.22 (Fundamental Solution). Let f be a linear homogeneous recur- rence with constant coefficients. If q is a root of the characteristic polynomial n of f, then (q )n N is a solution of f for some initial values. Moreover, if q is ∈ i n a root of the characteristic polynomial of multiplicity s, s > 1, then (n q )n N, 0 i s 1 is a solution of f. ∈ ≤ ≤ − n Proof. Let an = f(an 1, . . . , an k) = c1an 1 + + ckan k and bn = q , where − − − ···k −k 1 0 q is a root of the characteristic polynomial, so q c1q − ckq = 0. n k n −n 1 − · · · −n k Multiply the last equality by q − to obtain q c1q − ckq − = 0. So n − − · · · − (q )n is a solution of f. ∈N Now, assume that q has multiplicity s, i.e., the characteristic polynomial s p1(x) has a form p1(x) = (x q) p(x), where p(x) is a polynomial. Observe − n k that q is a root of each of the following polynomials: p2(x) = p1(x)x − , p20 (x), p3(x) = xp20 (x), p30 (x), p4(x) = xp30 (x), etc. till ps+2(x). Indeed, the (x q) term remains after performing the product rule. So, in particular, for i s− ≤

pi+2(q) = 0. (2)

We have

n n 1 n k p2(x) = x c1x − ckx − , n− 1 − · · · −n 2 n k 1 p20 (x) = nx − c1(n 1)x − ck(n k)x − − , n − − n 1 − · · · − − n k p3(x) = nx c1(n 1)x − ck(n k)x − , 2 n−1 − 2 −n ·2 · · − − 2 n k 1 p30 (x) = n x − c1(n 1) x − ck(n k) x − − , 2 n − −2 n 1 − · · · − − 2 n k p (x) = n x c (n 1) x − c (n k) x − , 4 − 1 − − · · · − k − . . i n i n 1 i n k p (x) = n x c (n 1) x − c (n k) x − . i+2 − 1 − − · · · − k −

i n We need to verify that for bn = n q ,(bn)n gives a solution of the recurrence ∈N an c1an 1 ckan k = 0. Let us plug bn for an in the left hand side of − − − · · · − −

71 the equation:

bn c1bn 1 ckbn k = − − i n i n −1 − · · · − i n i n q c (n 1) q − c (n k) q − = − 1 − − · · · − k − pi+2(q) = 0. Here the last equality holds by (2). Theorem 3.23 (Existence and Uniqueness). Let f be a linear homogeneous recursion with constant coefficients and p(x) be its characteristic polynomial with roots q1, . . . , qr having multiplicities s1, . . . , sr, respectively. Then for any initial values, there is a solution (an)n of f written in the form ∈N n n s1 1 n n n sr 1 n a = (?q + ?nq + + ?n − q ) + + (?q + ?nq + + ?n − q ), (3) n 1 1 ··· 1 ··· r r ··· r where ?’s are some constants. In particular, if all roots of the characteristic polynomial are distinct, a = ?qn + ?qn + + ?qn. n 1 2 ··· r Proof. From Lemma 3.22, all the listed terms are solutions of the recursion with some initial values. From Lemma 3.20, any of their linear combinations is a solution of the recursion with some initial values. It remains to verify that there are coefficients (?’s) of the linear combination so that it is a solution of the recursion with the given initial values a0, a1, . . . , ak 1. Denote the consecutive coefficient in (3) in front of njqn by γ , i = 1, . . . , r−, j = 0, . . . , s 1. That is, i i,j i − the γi,j’s are the unknowns solving equation (3) for all n N, which can hence be written as ∈

r si 1 r si 1 − − a = γ njqn = γ nj qn. n i,j i  i,j  i i=1 j=0 i=1 j=0 X X X X   Recall that a0, a1, . . . , ak 1 are the known initial values. Plugging n = 0, n = 1, . . . , n = k 1 we get: − − a = (γ + 0 + + 0) + + (γ + 0 + + 0) 0 1,0 ··· ··· r,0 ··· a1 = (γ1,0 + γ1,1 + + γ1,s1 1)q1 + ··· − + (γr,0 + + γr,sr 1)qr ··· ··· − s1 1 2 a2 = (γ1,0 + 2γ1,1 + + 2 − γ1,s1 1)q1 + ··· − sr 1 2 + (γr,0 + + 2 − γr,sr 1)qr ··· ··· − . . (4)

s1 1 k 1 ak 1 = γ1,0 + (k 1)γ1,1 + + (k 1) − γ1,s1 1 q1− + − − ··· − − sr 1 k 1 + γr,0 + + (k 1) − γr,sr 1 qr − ··· ··· − − The of this system of linear equations with unknowns γi,j,i = 1, . . . , r, j = 1, . . . , s 1 is: i − 1 0 0 0 1 0 0 0 ··· ··· q1 q1 q1 q1 q2 q2 q2 q2  q2 2q2 22q2 23q2 ··· q2 2q2 22q2 23q2 ···  1 1 1 1 ··· 2 2 2 2 ···  q3 3q3 32q3 33q3 q3 3q3 32q3 33q3  (5)  1 1 1 1 ··· 2 2 2 2 ···   q4 4q4 42q4 43q4 q4 4q4 42q4 43q4   1 1 1 1 ··· 2 2 2 2 ···   ......   ......   ···    72 This is a square k k matrix. We shall show that it is non-singular by proving × that its rows r1, r2,..., rk are linearly independent. Assume that there are coefficients α1, . . . , αk, not all equal to zero, so that α1r1 +α2r2 + +αkr1 = 0. The components of this vector correspond to respective columns··· in the matrix in the following form:

2 k 1 α1 + α2q1 + α3q1 + + αkq1− = 0 2 ··· k 1 α10 + α2q1 + α32q1 + + αk(k 1)q1− = 0 2 2 ··· − 2 k 1 α 0 + α q + α 2 q + + α (k 1) q − = 0 1 2 1 3 1 ··· k − 1 . . 2 k 1 α1 + α2qr + α3qr + + αkqr − = 0 2 ··· k 1 α10 + α2qr + α32qr + + αk(k 1)qr − = 0 2 2 ··· − 2 k 1 α 0 + α q + α 2 q + + α (k 1) q − = 0 1 2 r 3 r ··· k − r . .

2 k 1 Let h(x) = α1 +α2x+α3x ...+αkx − . Then we see that q1 is a root of h(x), as well as a root of h2(x) = xh10 (x), h3(x) = xh20 (x),..., hs1 . By Lemma 3.21 h(x) s1 s2 sr is divisible by (x q1) . Similarly, h(x) is divisible by (x q2) ,..., (x qr) . − r si − − Thus h(x) is divisible by i=1(x qi) . Thus the degree of h(x) is at least r − k = si, a contradiction to the definition of h(x). Thus there are no such i=1 Q coefficients αi’s and thus our matrix is non-singular. Therefore the system (4) P has a solution for any given right-hand side (a0, a1, . . . , ak 1). − Theorem 3.23 gives us a tool to solve any linear homogeneous recursion with constant coefficients. The main steps are the following. 1. Derive the characteristic polynomial p(x).

2. Compute the roots q1, . . . , qr of p(x) with respective multiplicities s1, . . . , sr. 3. Solve the system of linear equations (4) (for example by Gaussian Elimi- nation).

r si 1 j n 4. Write down the explicit solution an = i=1 j=0− γi,jn qi . Let us illustrate this approach with a few examples.P P Example 3.24. We solve the recursion

an = 5an 1 + 6an 2, a0 = 3, a1 = 10. − − − The characteristic polynomial is x2 + 5x 6, its roots are q = 1 and q = − 1 2 6, both with multiplicity 1, i.e., s1 = s2 = 1. Thus the general solution is γ− 1n + γ ( 6)n. Plugging initial values, we obtain 1,0 2,0 −

a0 = γ1,0 + γ2,0 = 3, a = γ 6γ = 10. 1 1,0 − 2,0

Thus γ2,0 = 1, γ1,0 = 4, and the recursion is solved by (an)n with − ∈N a = 4 ( 6)n. n − −

73 Example 3.25. We solve the recursion

an = 4an 1 4an 2, a0 = 3, a1 = 10. − − − − 2 The characteristic polynomial is x + 4x + 4, its roots are q1 = 2, with mul- n − n tiplicity 2, i.e., s1 = 2. Thus the general solution is γ1,0( 2) + γ1,1n( 2) . Plugging initial values, we obtain − −

a0 = γ1,0 = 3, a = γ ( 2) + γ ( 2) = 10. 1 1,0 − 1,1 −

Thus γ1,0 = 3, γ1,1 = 8, and the recursion is solved by (an)n with − ∈N a = 3( 2)n 8n( 2)n. n − − − Example 3.26. We solve the recursion

an = 4an 2, a0 = 6, a1 = 20. − − 2 The characteristic polynomial is x + 4, its roots are q1 = 2i and q2 = 2i, both − n with multiplicity 1, i.e., s1 = s2 = 1. Thus the general solution is γ1,0(2i) + γ ( 2i)n. Plugging initial values, we obtain 2,0 −

a0 = γ1,0 + γ2,0 = 6, a = γ (2i) + γ ( 2i) = 20. 1 1,0 2,0 − n n Thus γ1,0 = 3 5i, γ2,0 = 3+5i. Therefore an = (3 5i)(2i) +(3+5i)( 2i) = n − n n − n − (2i) (3 5i + ( 1) 3 + ( 1) 5i). So, if n is odd, an = (2i) ( 2 5i), if n is − n− − n n ln i πni/2 − · even an = (2i) 6. Note that i = e = e = cos(πn/2) + i sin(πn/2), so for n = 4k, in = 1, n = 4k + 2, in = 1, for n = 4k + 1, in = i, for n = 4k + 3, in = i. − − Hence the recursion is solved by (an)n with ∈N 5 2n+1, if n = 4k + 1 · 6 2n, if n = 4k + 2 an =  ·  5 2n+1, if n = 4k + 3 − · 6 2n, if n = 4k + 4. − ·   Recall from Example 3.17 that the Fibonacci sequence (Fn)n N solves the recursion ∈ Fn = Fn 1 + Fn 2,F0 = 1,F1 = 1. − − We shall now derive an explicit formula for Fn. Theorem 3.27 (Binet’s Formula). For the Fibonacci numbers we have:

n n 1 1 + √5 1 √5 Fn = − √5 2 ! − 2 ! !

1+√5 where Φ := 2 is called the Golden Ratio.

74 Figure 22: In a drawing like this one, obtained by squares “spiraling” out of the center, the ratio of width and height will approach the golden ratio. Of course, you already know all about spirals, since you watched all Vi Hart Videos, right?

Proof. The recurrence relation is Fn+2 = Fn+1 + Fn with starting values F0 = 2 0,F1 = 1. So the characteristic polynomial is p(x) = x x 1. The roots of 1 √5 − − p(x) are q1,2 = ±2 , i.e.

1 + √5 1 √5 p(x) = x x − . − 2 ! − 2 !

The general solution for Fn is therefore (by Theorem 3.23)

n n 1 + √5 1 √5 Fn = γ1,0 + γ2,0 − . 2 ! 2 !

3 We need to find γ1,0, γ2,0 from initial values .

0 0 1 + √5 1 √5 0 = F0 = γ1,0 + γ2,0 − = γ1,0 + γ2,0. 2 ! 2 ! 1 1 1 + √5 1 √5 1 = F1 = γ1,0 + γ2,0 − 2 ! 2 !

The unique solution is γ = 1 , γ = 1 . 1,0 √5 2,0 − √5 1 √5 n n In the explicit formula we have just shown, the term ( −2 ) ( 0.62) quickly goes to zero. This allows us to ignore it when computing≈ Fibonacci− numbers:

n Corollary 3.28. F is the integer closest to 1 Φn, i.e. F = Φ + 1 . n √5 n √5 2 n j k 1 1 √5 1 Proof. It suffices to show that − < . √5 2 2

3 Any two starting values will do, we could determine γ1,0, γ2,0 from F4 = 3,F5 = 5 if we wanted to. But using F0 and F1 yields the simplest system of equations.

75 n 1 1 √5 1 Φ n 1 Φ 1 2 2 1 − = | − | | − | = < = < . √5 2 √5 ≤ √5 Φ√5 √5√5 5 2

Example 3.29 (Ternary n-string without 20). In Example 3.18 we found start- ing values and the recurrence but we have yet to determine the values. Recall:

t = 3, t = 8, t = 3t t . 1 2 n+2 n+1 − n 2 3 √5 So p(x) = (x 3x+1). The characteristic polynomial has the roots q1,2 = ±2 . So by Theorem− 3.23 the general solution is

n n 3 + √5 3 √5 t(n) = γ1,0 + γ2,0 − . 2 ! 2 !

Again we use the initial values to determine γ1,0 and γ2,0. For simplicity, use t0 = 1 instead of t2 = 8:

1 = t(0) = γ1,0 + γ2,0 3 + √5 3 √5 3 = t(1) = γ + γ − 1,0 · 2 2,0 2

5+3√5 5 3√5 Solving yields γ1,0 = 10 , γ2,0 = −10 and therefore:

n n 5 + 3√5 3 + √5 5 3√5 3 √5 tn = − − . 10 ! 2 ! − 10 ! 2 !

3.3.2 Advancement Operator Remark. This section contains an alternative treatment of recurrence relations in terms of the so-called advancement operator. We present this material, even though it is repetitive and everything is already covered in Section 3.3.1 with the approach of the characteristic polynomial.

We can consider any sequence (fn)n as a function f : N R where ∈N → f(n) = fn. If f fulfills a linear recurrence relation with constant coefficients, then there is a unique way to extend f to Z such that the recurrence relation is still fulfilled. Take for instance the Fibonacci sequence: Given the starting values there is a unique way to calculate backwards using F (n 2) = F (n) F (n 1): − − − n ... 5 4 3 2 1 0 1 2 3 4 5 . . . − − − − − F(n) ... 5 3 2 1 1011235... − − This makes f an element of the of all functions from Z to R. We define the advancement operator A acting on this vector space. It maps a function f to the function Af with:

(Af)(n) = f(n + 1).

76 An advancement operator polynomial is a polynomial in A, for example 3A2 A + 6. This is also an operator which maps f to the function (3A2 A + 6)−f with: − ((3A2 A + 6)f)(n) = 3f(n + 2) f(n + 1) + 6f(n). − − This notation allows us to write linear recurrence equations with constant coef- ficients as: p(A)f = g where p is some polynomial. For the Fibonacci numbers we would write (A2 A 1)F = 0. − − In the following we try to solve equations of this kind. We start with the eas- iest case of a homogeneous equation and an advancement operator polynomial of degree one.

Lemma 3.30. For a real number r = 0 the solution to (A r)f = 0 with initial value f(0) = c is given by f(n) = cr6 n. − Proof. (A r)f(n) = 0 f(n + 1) rf(n) = 0 − ⇔ − f(n + 1) = rf(n) ⇔ f(n) = rn f(0) = rn c ⇔ · · Raising the difficulty a bit, now consider the advancement operator polyno- mial p(A) = A2 + A 6 = (A + 3) (A 2) and the corresponding homogeneous recurrence p(A)f =− (A + 3)(A 2)· f =− 0. Note that solutions of (A 2)f = 0 or (A + 3)f are also solutions of− (A + 3)(A 2)f = 0. 4 − Actually, all solutions are sums of such− solutions, which means that, using the last Lemma, we know that f is of the form

f(n) = c ( 3)n + c 2n. 1 − 2

Here, c1 and c2 are constants that can be derived from two initial values for f. It is no coincidence that we found a two dimensional space of functions: Lemma 3.31. If p(A) has degree k and its constant term is non-zero, then the set of all solutions f to p(A)f = 0 is a k-dimensional subspace of functions from Z C, parametrized by c1, . . . , ck which can be determined by k initial values for→f. We do not give a formal proof. However, you should not be surprised by the statement. It is clear that set of all solutions form a subspace: If f, g are solutions and α, β C 0 then αf +βg is also a solution since p(A)(αf +βg) = αp(A)f + βp(A)g =∈ α\{0 +}β 0 = 0. It is also intuitive that there are k degrees of freedom: For every choice· for· values of f on 1, 2, . . . , k the recurrence gives a unique way to extend f (upwards and downwards){ to other} values. Generalizing our observation for the advancement operator polynomial (A+ 3)(A 2), we state (also without proof): − 4There is some low-level notational magic involved here: We need ((A + 3) · (A − 2))f = (A + 3)((A − 2)f) = (A − 2)((A + 3)f) where · is multiplication of polynomials and “nothing” is the application of an operator to a function.

77 Proposition 3.32. If p(A) = (A r ) (A r ) ... (A r ) for distinct − 1 · − 2 · · − k r1, . . . , rk, then all solutions of p(A)f = 0 are of the form

n n n f(n) = c1r1 + c2r2 + ... + ckrk . Every polynomial of degree k has k complex roots, but those roots are not necessarily distinct. The following Theorem handles the case of roots with multiplicity at least two and is therefore the last piece in the puzzle. From now on, instead of “all solutions are of the form. . . ” we say the “general solution is . . . ”. Theorem 3.33. If r = 0 and p(A) = (A r)k, then the general solution of the homogeneous system p6 (A)f = 0 is −

n n 2 n k 1 n f(n) = c r + n c r + n c r + ... + n − c r . 1 · 2 3 k · If p(A) = q (A) q (A) and q (A), q (A) have no root in common, then the 1 · 2 1 2 general solution of p(A)f = 0 is the sum of the general solutions of q1(A)f = 0 and q2(A)f = 0. Example. Let p(A) = (A 1)5 (A + 1)2 (A 3). Then the general solution to p(A)f = 0 is − · · −

f(n) = c + n c + n2 c + n3 c + n4 c + c ( 1)n + n c ( 1)n + c 3n. 1 · 2 · 3 · 4 · 5 6 − · 7 − 8 Note that the condition that r (the constant term of p(A)) may not be zero is no real restriction: An advancement operator polynomial of the form p(A) A describes the same recurrence as p(A), just at a different index. · Theorem 3.34 (Binet’s Formula). For the Fibonacci numbers we have:

n n 1 1 + √5 1 √5 Fn = − √5 2 ! − 2 ! !

1+√5 where Φ := 2 is called the Golden Ratio.

Proof. The recurrence relation is Fn+2 = Fn+1 + Fn with starting values F0 = 0,F1 = 1. Rewrite this in terms of advancement operator polynomial:

f(n + 2) f(n + 1) f(n) = 0 (A2 A 1) f(n) = 0. − − ⇐⇒ − − p(A)

1 √5 | {z } The roots of p(A) are ±2 , i.e.

1 + √5 1 √5 p(A) = A A − . − 2 ! − 2 !

The general solution for f(n) is therefore (by Proposition 3.32)

n n 1 + √5 1 √5 f(n) = c1 + c2 − . 2 ! 2 !

78 Figure 23: In a drawing like this one, obtained by squares “spiraling” out of the center, the ratio of width and height will approach the golden ratio. Of course, you already know all about spirals, since you watched all Vi Hart Videos, right?

5 We need to find c1, c2 from initial values .

0 0 1 + √5 1 √5 0 = f(0) = c1 + c2 − = c1 + c2. 2 ! 2 ! 1 1 1 + √5 1 √5 1 = f(1) = c1 + c2 − 2 ! 2 !

The unique solution is c = 1 , c = 1 . 1 √5 2 − √5 1 √5 n n In the explicit formula we have just shown, the term ( −2 ) ( 0.62) quickly goes to zero. This allows us to ignore it when computing≈ Fibonacci− numbers: n Corollary 3.35. F is the integer closest to 1 Φn, i.e. F = Φ + 1 . n √5 n √5 2 n j k 1 1 √5 1 Proof. It suffices to show that − < . √5 2 2

n 1 1 √5 1 Φ n 1 Φ 1 2 2 1 − = | − | | − | = < = < . √5 2 √5 ≤ √5 Φ√5 √5√5 5 2

Example 3.36 . (Ternary n-string without 20) In Example 3.18 we found start- ing values and the recurrence but we have yet to determine the values. Recall:

t = 3, t = 8, t = 3t t . 1 2 n+2 n+1 − n So p(A)t = (A2 3A + 1)t = 0. The advancement operator polynomial has the 3 √5 − roots ±2 . So by Proposition 3.32 the general solution is n n 3 + √5 3 √5 t(n) = c1 + c2 − . 2 ! 2 !

5 Any two starting values will do, we could determine c1, c2 from F4 = 3,F5 = 5 if we wanted to. But using F0 and F1 yields the simplest system of equations.

79 Again we use the initial values to determine c1 and c2. For simplicity, use t0 = 1 instead of t2 = 8:

1 = t(0) = c1 + c2 3 + √5 3 √5 3 = t(1) = c + c − 1 · 2 2 2

5+3√5 5 3√5 Solving yields c1 = 10 , c2 = −10 and therefore:

n n 5 + 3√5 3 + √5 5 3√5 3 √5 tn = − − . 10 ! 2 ! − 10 ! 2 !

3.3.3 Non-homogeneous Recurrences So far we ignored non-homogeneous linear recurrence relations, i.e. those of the form an = c1(n)an 1 + + ck(n)an k + g(n) − ··· − for some non-trivial function g : N R. →

Lemma 3.37. Any two solutions (an)n , (an0 )n of a non-homogeneous linear ∈N ∈N recurrence an = f(an 1, . . . , an k) + g(n) “differ by” a solution of the corre- − − sponding homogeneous recurrence an = f(an 1, . . . , an k), i.e. there is (an∗ )n − − ∈N such that an = an∗ + an0 for all n N with an∗ = f(an∗ 1, . . . , an∗ k) for all n k. ∈ − − ≥ Proof. Just set a∗ = an a0 for all n N. Then for all n k we have n − n ∈ ≥ k k f(an∗ 1, . . . , an∗ k) = ci(n)an∗ i = ci(n) an i an0 i − − − − − − i=1 i=1 X X k k 

= ci(n)an i ci(n)an0 i = (an g(n)) (an0 g(n)) − − − − − − i=1 i=1 X X = a a0 = a∗ , n − n n as desired.

This means that to find all solutions of an = f(an 1, . . . , an k)+g(n) it suffices − − to find a single such solution (an0 )n , the particular solution, and all solutions ∈N (an∗ )n N of an = f(an 1, . . . , an k), the general solution. Then∈ the general− solution of− the non-homogeneous recurrence is given by an = an∗ + an0 for n N. Since there is no∈ framework that always works to find particular solutions, this is actually the difficult part.

Example 3.38. Recall from Example 1.6 (and the corresponding problem on the exercise sheet) that when cutting a two dimensional cake into the maximum number of pieces, the n-th cut yields n additional pieces, i.e.

sn = sn 1 + n, s0 = 1 and therefore sn = c1(n)sn 1 + g(n) − − for c1(n) = 1 and g(n) = n for n N. ∈

80 1. Find general solution for the homogeneous recurrence:

sn = sn 1 p(x) = x 1 q1 = 1, s1 = 1 − ⇒ −n ⇒ a∗ = γ q = γ . ⇒ n 1,0 1 1,0

2. Find particular solution for non-homogeneous recurrence: sn = sn 1 + n. We guess that the solution is “similar” to the right hand side. Since− the right hand side (i.e. g(n) = n) is a polynomial of degree 1 over n, we guess that the solution could be a polynomial of degree 2 over n meaning

2 an0 = d1n + d2n + d3

where d1, d2, d3 are to be determined. We plug this guess into the desired recurrence:

2 ! 2 d1n + d2n + d3 = an0 = an0 1 + n = d1(n 1) + d2(n 1) + d3 + n − − − = d n2 + d n + d + ( 2d n + d d + n) 1 2 3 − 1 1 − 2 1 which has the solution d1 = d2 = 2 (note that we “lost” d3 on the way, so it is not needed and we just set it to zero). So we found one particular solution: 1 2 1 n(n + 1) n + 1 a0 = n + n = = . n 2 2 2 2   3. The general solution for the non-homogeneous recurrence is the sum n + 1 s = a∗ + a0 = γ + n n n 1,0 2  

We still need to determine γ1,0 using the initial value:

1 = s = γ + 0 γ = 1 0 1,0 ⇒ 1,0 n + 1 n n n s = + 1 = + + . ⇒ n 2 2 1 0         Example 3.39. We solve the recurrence:

n an = 4an 1 4an 2 + 3 + 2n − − − 1. General solution for homogeneous recurrence:

2 2 an = 4an 1 4an 2 p(x) = x 4x + 4 = (x 2) q1 = 2, s1 = 2 − − − ⇒ −n n − n⇒ n a∗ = γ q + γ nq = γ 2 + γ n2 . ⇒ n 1,0 1 1,1 1 1,0 1,1 2. Particular solution of the non-homogeneous recurrence:

n an = 4an 1 4an 2 + 3 + 2n − − − Guess something similar to the right hand side. Our attempt is6:

n a0 = d 3 + d n + d . n 1 · 2 3 6 2 We could have guessed another summand of d4 · n , but as it turns out, it is not needed.

81 Which means we need to find d1, d2, d3 such that:

n ! n d1 3 + d2n + d3 = an0 = 4an 1 4an 2 + 3 + 2n · − − − n 1 = 4d 3 − + 4d (n 1) + 4d 1 · 2 − 3 n 2 n 4d 3 − 4d (n 2) 4d + 3 + 2n − 1 · − 2 − − 3 n 2 n = 8d 3 − + 4d + 3 + 2n 1 · 2 = d 3n + d n + d 1 · 2 3 n 2 n + ( d 3 − d n d + 4d + 3 + 2n) − 1 · − 2 − 3 2 = d 3n + d n + d 1 · 2 3 n 2 + ((9 d )3 − + (2 d )n d + 4d ) − 1 − 2 − 3 2 and we get d1 = 9, d2 = 2, d3 = 8, i.e.

n+2 an0 = 3 + 2n + 8.

3. The general solution of non-homogeneous recurrence is therefore

n n n+2 an = an∗ + an0 = γ1,02 + γ1,1n2 + 3 + 2n + 8.

Starting values would now allow us to determine γ1,0 and γ1,1 but we omit this here.

3.3.4 Solving Recurrences using Generating Functions We now examine one example of a recurrence relation with non-constant coef- ficients. Example 3.40 (Derangements yet again). Recall from Theorem 1.24 that for d = D , the number of derangements on [n], we have n | n| d0 = 1, d1 = 0, d2 = 1, dn = (n 1)(dn 1 + dn 2) (for n 2). − − − ≥ This constitutes a homogeneous linear recurrence relation with k = 2 and coef- ficients c (n) = c (n) = n 1, g 0. 1 2 − ≡ Let D(x) be the exponential generating function for (dn)n , i.e. ∈N ∞ xn D(x) = d n n! n=0 X Then using Observation 3.13 and the recurrence we find:

∞ xn ∞ xn ∞ xn D0(x) = dn+1 = ndn + ndn 1 n! n! − n! n=0 n=0 n=1 X X X n 1 n 1 ∞ x − ∞ x − = x dn + x dn 1 (n 1)! − (n 1)! n=1 n=1 X − X − = xD0(x) + xD(x).

This differential equation has (given d1 = 0, d2 = 1) the unique solution: e x D(x) = − 1 x. − 82 x Luckily, we already know the exponential generation functions for e− and for 1 1 x so we can write: − n n x 1 ∞ n x ∞ x D(x) = e− = ( 1) n! . · 1 x − n! · n! n=0 ! n=0 ! − X X n n 3.12 ∞ n n k x = k!( 1) − . k − n! n=0 ! X kX=0   n n n k So we found yet another proof that dn = k k!( 1) − . k=0 − P  Let us give two more examples showing how we can solve recurrences using generating functions. Example 3.41. Consider the recurrence

an = an 1 + 6n 2 (for n 2), a0 = 1, a1 = 3. − − − ≥ With the characteristic polynomial p(x) = x2 + x 6 in mind we find the − n following equation for the generating function F (x) = ∞ anx of (an)n . n=0 ∈N P 2 ∞ n ∞ n ∞ n F (x)(1 + x 6x ) = anx + an 1x 6 an 2x − − − − n=0 n=1 n=2 X0 1X 1 X = a0x + a1x + a0x = 1 + 4x.

Thanks to our approach and using the recurrence for n 2, all but a finite number of terms canceled out. In the last step we used the≥ initial values. Using partial fraction decomposition (not handled here) we can simplify the identity and obtain: 1 + 4x 1 + 4x 1 1 F (x) = = = 6 1 1 + x 6x2 (1 2x)(1 + 3x) 5 · 1 2x − 5 · 1 + 3x − − − 6 ∞ 1 ∞ = (2x)n ( 3x)n 5 − 5 − n=0 n=0 X X 1 n ∞ where in the last step we applied 1 y = n=0 y for y = 2x and y = 3x. We can now see the coefficients: − − P a = 6 2n 1 ( 3)n. n 5 · − 5 · − Example 3.42. Consider the following non-homogeneous recurrence:

n bn = bn 1 + 2bn 2 + 2 (for n 2), b0 = 2, b1 = 1. − − ≥ n n Multiplying the equivalent statement bn = bn 1 2bn 2 = 2 with x for each n 2 and adding yields a sum in which we can− rewrite− − the terms: ≥

∞ n ∞ n ∞ n ∞ n n bnx bn 1x 2 bn 2x = 2 x . − − − − n=2 n=2 n=2 n=2 X X X X 2 1 F (x) b0 xb1 x (F (x) b0) x F (x) 1 2x − − · − 1−2x − − | {z } | {z } | {z } | {z } 83 Solving for F (x) yields: 1 F (x)(1 x 2x2) = + 1 3x − − 1 2x − − 2 5x + 6x2 F (x) = − ⇒ (1 2x)(1 x 2x2) − − − From here, apply partial fraction decomposition with a method of your choice, to get 1 1 1 F (x) = 1 + 2 + 13 . − 9 1 2x 3 (1 2x)2 9 1 + x − − From this we can obtain the coefficients again using identities we know.

2 ∞ ∞ ∞ F (x) = 1 2nxn + 2 2nxn + 13 ( 1)nxn. − 9 3 9 − n=0 n=0 ! n=0 X X X ∞ ∞ ∞ = 1 2nxn + 2 (n + 1)2nxn + 13 ( 1)nxn. − 9 3 9 − n=0 n=0 n=0 X X X The coefficients are therefore:

b = 1 2n + 2 (n + 1)2n + 13 ( 1)n. n − 9 3 9 −

84 4 Partitions 4.1 Partitioning [n] – the set on n elements

A partition of [n] into given by its parts A1,...,Ak with

k A = (for i = 1, . . . k),A A = (for i = j), A = [n]. i 6 ∅ i ∩ j ∅ 6 i i=1 [

7 1 5 3 6 4 9 2 8

Figure 24: A partition of [9] into 4 sets.

The parts are unlabeled, i.e. if two partitions use the same parts Ai only in a different order, we consider them to be identical. If we fix the number of parts, i.e. if we want to count partitions of [n] into exactly k non-empty parts, II then their number is given by sk (n) as we have already seen in Section 1.5.2 where we considered arrangements of n labeled balls in k unlabeled boxes with at least one ball per box. We define the as

n II Bn = sk (n). kX=0 It counts the total number of partitions of [n] (into an arbitrary number of sets). Don’t get confused over the special case of n = 0: There is exactly one partition of into non-empty parts: = A A. Every A is non-empty, since no ∅ ∅ ∈∅II ∈ ∅ such A exists. So we also have B0 = s0 (0) = 1. A different way to define the BellS numbers is to consider a square free number k N, meaning ∈ k = p p ... p for distinct primes p , . . . , p . 1 · 2 · · n 1 n The number of ways to write k as product of integers bigger than one is exactly B . Take for instance k = 2 3 5 7, then some ways of writing k would be: n · · · k = 6 35, k = 2 5 21, k = 210 · · · which directly corresponds to the partitions:

2, 3, 5, 7 = 2, 3 5, 7 , 2, 3, 5, 7 = 2 5 3, 7 , 2, 3, 5, 7 = 2, 3, 5, 7 . { } { }∪{ } { } { }∪{ }∪{ } { } { }

In the following we try to find an explicit formula for Bn. We start by finding a recursion:

85 Theorem 4.1. n 1 − n 1 B = − B (n 1). n k k ≥ kX=0   Proof. Every partition of [n] has one part that contains the number n. In ad- dition to n this part contains k other numbers (for some 0 k n 1). The remaining n 1 k elements are partitioned arbitrarily. From≤ this≤ correspon-− dence we obtain− − the desired identity:

n 1 n 1 n 1 − n 1 − n 1 − n 1 Bn = − Bn 1 k = − Bn 1 k = − Bk. k − − n 1 k − − k kX=0   kX=0  − −  kX=0   Theorem 4.2 (Dobinski’s Formula).

1 ∞ kn B = . n e k! kX=0

Proof. Consider the exponential generating function B(x) of (Bn)n . ∈N

∞ xn B(x) = B n n! n=0 X Using the recursion from Theorem 4.1 we see

n ∞ xn ∞ n xn B0(x) = B = B n+1 n! k k n! n=0 n=0 ! X X kX=0   ∞ xn ∞ xn 3.12= B 1 = B(x) ex. n n! · n! · n=0 ! n=0 ! ex X1 X B(x) = e − (using B = B(0) = 1). ⇒ 0 x k n x 1 ∞ (e ) 1 ∞ 1 ∞ (xk) = 1 ee = = e e k! e k! n! n=0 kX=0 kX=0 X ∞ 1 ∞ kn xn = e k! n! n=0 ! X kX=0 4.1.1 Non-Crossing Partitions Imagine the elements of [n] to be laid out in a circular way. Two A, B [n] are crossing if there are numbers i < j < k < l [n] such that i, k A,⊆ j, l B. ∈ A non-crossing{ partition} ⊆ { is a} partition ⊆ in which the parts are pairwise non- crossing. In cyclic drawings the notion is very intuitive. See Figure 25 for examples. We denote the number of non-crossing partitions of [n] by NCn. We prove now that NCn is equal to Cn, the n-th . We already came across these numbers in Example 3.5, where we counted well formed parenthesis expressions.

86 4 3 4 3 2 2 5 5 1 1 6 6 9 9 7 8 7 8

Figure 25: A non-crossing partition of [9] on the left and a crossing partition of [9] on the right.

Theorem 4.3. 1 2n NC = C = . n n n + 1 n   Proof. Recall the values C0 = 1,C1 = 1,C2 = 2 (corresponding to the paren- thesis expressions “”, “()”, “()()”, “(())”) and the recursion

n Cn+1 = CkCn k. − kX=0

It suffices to prove that the sequence (NCn)n N has the same starting values and satisfies the same recursion. ∈ It is easy to check that NC0 = 1, NC1 = 1, NC2 = 2. Actually those numbers are just the Bell numbers since partitions of [n] can only be crossing for n 4. We now have to prove the recursion ≥

n NCn+1 = NCk NCn k. · − kX=0 To this end, consider any non-crossing partition of [n + 1]. The last element n + 1 is in some part S [n + 1]. Let k be the biggestP number in S other than n + 1 if such an element⊆ exists and k = 0 if S = n + 1 . Now observe that in the partition , every part contains either only numbers{ } that are bigger than k or only numbersP that are at most k: Otherwise, such a part would cross S. This means that decomposes into a non-crossing partition of [k] and a non-crossing partitionP of k + 1, . . . , n . Here we ignored n + 1: It must be in the same part as k and will{ never produce} a crossing if there has not already been one. Such a decomposition is unique: Every non-crossing partition of [n + 1] uniquely decomposes and every pair of non-crossing partitions of [k] and k + 1, . . . , n corresponds to a non-crossing partition of [n + 1]. { This proves} the claimed recursion and therefore the Theorem.

4.2 Partitioning n – the natural number We are interested in ways to write n as the sum of positive natural numbers. We would say, for instance, that

n = 17 = 5 + 5 + 4 + 3.

87 is a partition of n = 17 into the (unlabeled) parts 5, 5, 4 and 3. Alternatively we write n = λ = (5, 5, 4, 3) and say λ is the partition of n (even though λ is a sorted tuple) hoping this will not be confusing. We already considered this in Section 1.5.4 were we counted arrangements of n unlabeled balls in k unlabeled boxes and at least one ball per box. The number of such arrangements is given by pk(n). We established a recursive formula 0 if k > n, 0 if n 1, k = 0, pk(n)  ≥ 1 if n = k = 0,  pk(n k) + pk 1(n 1) if 1 k n. − − − ≤ ≤  We define the total number of partitions of n (into an arbitrary number of parts) as the partition function n

p(n) = pk(n). kX=0 We illustrate a partition by its Ferrer diagram (also called Young diagram). It consists of rows of squares (left aligned) corresponding to the parts (largest part in the topmost row). Consider for instance the partition λ = (5, 5, 4, 3) of n = 17 which has the Ferrer diagram

Observation 4.4. The number of partitions of n into at most k parts is equal to the number of partitions of n with parts of size at most k. Proof. The Ferrer diagrams for partitions with at most k parts are those with at most k rows. The Ferrer diagrams for partitions with parts of size at most k are those with at most k columns. Clearly there is a bijection between those sets of diagrams: Flip them along the diagonal!

For instance, flipping yields .

Which means the diagram for the partition λ = (7, 5, 3, 3, 3, 2) is mapped to the diagram for the partition λ = (6, 6, 5, 2, 2, 1, 1). Formal Proof. The bijection on the diagrams corresponds to a bijection on the partitions that maps the partition n = (λ1, . . . , λk) with biggest part l to the conjugate partition n = (λ1∗, . . . , λl∗) defined as

λ∗ = j λ i . i |{ | j ≥ }| Theorem 4.5 (Euler). ∞ ∞ 1 p(n)xn = . 1 xk n=0 X kY=1 −

88 Proof. We boldly rewrite the infinite product as an infinite product of infinite 1 n ∞ sums using the identity: 1 y = n=0 y . − P ∞ 1 1 1 1 = ... 1 xk 1 x · 1 x2 · 1 x3 · kY=1 − − − − ∞ ∞ ∞ = xn1 x2n2 x3n3 ... n =0 ! n =0 ! n =0 ! X1 X2 X3 In an expansion of this, the coefficient in front of xn is just the number of ways we can choose n1, n2, n3,... such that i 0 i ni = n. But such choices exactly correspond to partitions of n, i.e. the coefficient≥ · is p(n). Take for instance our favorite partitionP n = 17 = 5+5+4+3. It corresponds to choosing n5 = 2, n4 = 1, n3 = 1 and all other indices ni as 0. This proves the claim.

Define podd(n) as the number of partitions of n into odd parts and pdist(n) as the number of partitions of n into distinct parts. Take for instance n = 7. We have podd(7) = 5 since

7 = 1+1+1+1+1+1+1, 7 = 1+1+1+1+3, 7 = 1+3+3, 7 = 1+1+5, 7 = 7, and pdist(7) = 5 since

7 = 3 + 4, 7 = 1 + 2 + 4, 7 = 2 + 5, 7 = 1 + 6, 7 = 7.

This is no coincidence, in fact, in the problem class you already saw:

∞ ∞ 1 ∞ ∞ p (n)xn = = (1 + xk) = p (n)xn. odd 1 x2k+1 dist n=0 n=0 X kY=0 − kY=0 X This proves podd = pdist, but we will prove it yet again, this time by constructing a bijection.

Theorem 4.6. podd = pdist.

Proof. Let n = λ1 + λ2 + ... + λk be a partition of n into distinct parts. We separate the powers of 2 from the λi, i.e. we write:

a1 a2 ak n = u12 + u22 + ... + uk2

ai for odd numbers ui and λi = ui2 . Note that the ui need no longer be distinct, for example if λ1 = 5, λ2 = 10 we would have u1 = u2 = 5. We sort the summands according to the ui which gives

α α α β β β n = µ (2 1 + 2 2 + ... + 2 l1 ) + µ (2 1 + 2 2 + ... + 2 l2 ) + ... (?) 1 · 2 · r1 r2 where the values| µi are{z all distinct} and take| the roles{z of ui, in} particular, the following multisets coincide

u , u , . . . , u = µ , µ , . . . , µ , µ , µ , . . . , µ ,... . { 1 2 k} { 1 1 1 2 2 2 } l1 times l2 times | {z } | {z } 89 Note that the values ri are sums of distinct powers of 2. For the bijection, we map the original partition into distinct parts to the following partition into odd parts

n = µ1 + µ1 + ... + µ1 + µ2 + µ2 + ... + µ2 + ....

r1 times r2 times We still need to argue| that{z this constitutes} | a bijection.{z } To do so, we explain the inverse mapping. Given a partition into odd numbers with repetition numbers ri, there is a unique way to write these ri as sums of distinct powers of 2, the binary expansion of ri. This gets us to the situation (?) and from there we get back to a partition into distinct parts by multiplying out. Example. To illustrate the last Theorem, take the partition into this distinct parts: 26 = 12 + 6 + 4 + 3 + 1. separate the powers of two and sort the terms according to the odd numbers that remain:

26 = 3 22 + 3 21 + 1 22 + 3 20 + 1 20 · · · · · = 3(20 + 21 + 22) + 1(20 + 22) = 3 7 + 1 5 · · which yields the partition

26=3+3+3+3+3+3+3+1+1+1+1+1.

All steps are reversible, as there is only one way to write 5 and 7 as sums of distinct powers of two. In terms of Ferrer diagrams we have mapped:

7→

even odd Now define pd (n) and pd (n) as the numbers of partitions of n into an even number of distinct parts and an odd number of distinct parts, respectively. We have, for instance

peven(7) = # “1+6”, “2+5”, “3+4” = 3, d { } podd(7) = # “1+2+4”, “7” = 2. d { } Apparently, these numbers can differ, but we now show that they can differ by at most 1, and characterize when this is the case. (3k 1)k To this end, define for k Z the wk := −2 . Note ∈ (3k+1)k that with this definition w k = . Some values are: − 2

90 k . . . -3 -2 -1 0 1 2 3 . . . wk . . . 15 7 2 0 1 5 12 . . . Lemma 4.7. k even odd ( 1) if n = wk for some k Z, pd (n) pd (n) = − ∈ − (0 otherwise. Proof. Consider the Ferrer Diagrams for a partition into distinct parts. In it, two rows may never have the same length. Define the slope S of such a diagram to be a maximal staircase going diagonally down starting from the top-right square and define the bottom B as the last row of the diagram. The following diagram has a slope of length 3 (highlighted in red) and a bottom of size 2 (highlighted in blue):

Note that, in a few special diagrams, B and S may have a single square in common. We define the set ∆ := B S, it contains the single common square if it exists, and is empty otherwise.∩ We will be sloppy in notation using B, S and ∆ to simultaneously denote the set and the size of the set. Now distinguish three types partitions corresponding to three types of dia- grams where: Type 1: S B + ∆ • ≥ Type 2: S < B ∆ • − Type 3: B ∆ S < B + ∆ • − ≤ Note that the latter case can only occur for ∆ = 1 and S B,B 1 . ∈ { − } Claim. (i) Type 3 partitions can only occur if n = wk for some k Z. ∈ (ii) Conversely, if n = wk for some k Z, then there is exactly one Type 3 partition of n. ∈

Proof of (i). Consider the case k := S = B 1 first, where the diagram looks like this (for k = 4): −

which gives

k 2 2 (k + 1)k (3k + 1)k n = k + i = k + = = w k. 2 2 − i=1 X The other case is k := S = B meaning the diagram looks like this (for k = 4)

91 which gives

(3k + 1)k 2k (3k 1)k n = w k k = = − = wk. − − 2 − 2 2

Proof of (ii). The existence is already clear from our proof of (i): We found for arbitrary k N a type 3 partition of w k and wk. For the uniqueness, note that no two diagrams∈ of type 3 have the same− size: We can iterate through all of them by alternatingly adding a column and a row. Next we prove the following claim: Claim (iii). For every k N there is a bijection between Type 1 partitions on k rows and Type 2 partitions∈ on k 1 rows. − Note that from this the Theorem follows, since it guarantees a bijection that maps every partition to a partition with one row more or one row less, so every partition into an even number of parts is mapped to a partition into an odd number of parts and vice versa. This shows that the number of even and odd partitions must coincide. The only disturbance can be the single type 3 partition that exists if n = wk and is even if k is even and odd if k is odd.

Proof of (iii). Consider a Type 1 partition, its slope is at least as large as its bottom, maybe like this:

We take away the bottom and distribute the squares among the first B rows of the diagram, like this: | |

There is enough room to do this (since the slope was at least as big as the bottom) and the resulting partition is a partition into distinct parts. The size of the bottom has increased and the size of the new slope is the size of the old bottom. Therefore the new diagram is of type 2 or type 3 and, looking more closely, type 3 can actually not occur as result of our operation, so it really is of type 2. The inverse operation is to take the current slope and create a new row from it. After checking that this maps type 2 partitions to type 1 partitions we are done. Theorem 4.8 (Euler’s Pentagonal Number Theorem).

∞ ∞ ∞ (1 xk) = ( 1)kxwk = 1 + ( 1)k(xwk + xw−k ). − − − k=1 k= k=1 Y X−∞ X

92 Proof. Multiply out the left hand side. The coefficient for xn counts the number of partitions of n into distinct parts, however, partitions into an odd number of parts are counted with negative sign. Therefore

∞ ∞ Thm 4.7 ∞ (1 xk) = (peven(n) podd(n))xn = ( 1)kxwk . − d − d − k=1 n=0 k= Y X X−∞

We have yet to explain why wk are called pentagonal numbers. It is basically analogous to square numbers and triangle numbers: All occur naturally when filling a two-dimensional region with dots.

We start with one dot, and say it is “layer 0”, then add layer by layer filling an area of the respective kind. For pentagonal numbers, there are four dots in layer 1, seven dots in layer 2 and so on. Generally, in layer i there are 3i + 1 dots since there are three sides with i + 1 dots each, but two dots are shared by two sides. In a drawing with k layers there are:

k 1 k 1 − − k (3k 1)k 3i + 1 = k + 3( i) = k + 3 = − = w . 2 2 k i=0 i=0 X X   4.3 Young Tableau A Young tableau T is a Ferrer diagram of a partition λ of n that is filled with all numbers 1, . . . , n. For instance if λ = (7, 6, 5, 3, 2, 2, 1) is a partition of 26:

11 26 23 13 19 14 24 8 18 9 15 22 21 5 1 2 3 6 17 4 7 10 12 16 25 20

We say T is a tableau of λ or T has shape λ. There are n! tableaux of shape λ. Definition 4.9. A standard Young tableau is a Young tableau where the num- bers in the rows are increasing from left-to-right and the numbers in the columns are increasing from top-to-bottom, take for instance the following standard

93 Young tableau: 1 2 3 6 12 16 25 4 7 8 17 18 21 5 11 13 19 26 9 15 22 10 23 14 24 20

From now on we only consider standard Young tableaux. Theorem 4.10 (Robinson-Schensted-Correspondence). There is a bijection be- tween the permutations of [n] and the set

(T ,T ) T and T are standard Young tableaux of shape λ . { 1 2 | 1 2 } is partition of n λ [

i.e. the set of ordered pairs (T1,T2) where T1 and T2 are standard Young tableaux of the same shape.

Proof. We will see a geometric construction that builds, given a permutation π, a corresponding pair of standard Young tableaux. Let π be a permutation of [n] and X(π) = (i, π(i)) i = 1, . . . , n the corre- sponding point set in the plane. For instance, for{ the permutation| π }= 3271546 of [7] the point set X(π) is:

π(x)

7 6 5 4 3 2 1 x 1 2 3 4 5 6 7

A point p is minimal if there is no other point that is both to the left and below of p. The set of all minimal points in X is therefore

min(X) = (x, y) X (x0, y0) X x, y : x0 > x or y0 > y . { ∈ | ∀ ∈ \{ } } The shadowline S(X) for a point set X is the weakly decreasing rectilinear line through all points in min(X) and convex bends exactly in min(X). Here, by convex we mean and by concave we mean . In the following drawing,

94 min(X) (red) and S(X) (black) are shown.

π(x)

7 6 5 4 3 2 1 x 1 2 3 4 5 6 7

In case you like pseudo-code, a concise algorithmic description of our con- struction is given below. It does not yet provide the pair of tableaux we want, but all the auxiliary objects we require to define them. A detailed description in words follows.

Algorithm 1 Geometric Variant of Robinson-Schensted-Correspondence

X1 X(π) i ←0 ← while Xi+1 = do i i + 16 ∅ ← Xi0 Xi j ←0 ← while Xi0 = do j j +6 1∅ j← S S(X0) i ← i Xi0 Xi0 min(Xi0) end while← \ ni j ← 1 ni Xi+1 convex bends of Si ,...,Si end while← m i ←

The algorithm proceeds in phases (counted by the variable i), the total number of phases m is not known beforehand (but bounded by n). We start every phase i with a non-empty point set Xi. For Xi we construct a sequence 1 ni of shadowlines Si ,...,Si where the j-th shadowline is taken for the point set consisting of those points from Xi that were not used for previous shadowlines. 1 2 1 j j 1 k S = S(X ),S = S(X S ). In general: S = S(X − S ). i i i i \ i i i \ k=1 i The i-th phase ends as soon as all points from Xi were containedS in one of the 1 2 ni shadowlines Si ,Si ,...,Si . The shadowlines of phase i determine the i-th row of the tableaux T1 and j T2 we want to construct. Let xi be the x-coordinate of first segment of the j j shadowline Si , i.e. the x-coordinate at which Si leaves the picture on the top j j j and yi the y-coordinate of last segment of Si , i.e. the y-coordinate at which Si leaves the picture on the right. Then the i-th row of T1 and T2 consists of the 1 ni 1 ni numbers xi , . . . , xi and yi , . . . , yi , respectively.

95 Shadowlines may contain concave bends (they do iff they have two or more convex bends). The set Xi+1 is defined as the set of all concave bends occurring 1 ni in the shadowlines Si ,...,Si . If Xi+1 is empty, we are done, otherwise, it serves as point set for phase i + 1. (proof continues later) Before we verify that the construction yields the bijection we desire, we give an example. Example. Consider again π = 3271546. The first phase of the algorithm will 1 2 3 find three shadowlines S1 ,S1 ,S1 . They leave the diagram on x positions 1, 3 and 7 and on y positions 1, 4 and 6, giving rise to the partial tableaux as shown.

π(x)

7 (1) 6 T1 = 1 3 7 5 4

3 (1) 2 T2 = 1 4 6 1 x 1 2 3 4 5 6 7 There are four concave bends on these shadowlines which give the point set for the next phase: X2 = (2, 3), (4, 2), (5, 7), (6, 5) . In phase 2 we get two shadowlines and add corresponding{ second lines to the} tableaux.

π(x)

7 (2) 1 3 7 T = 6 1 2 5 5 4 3 (2) 1 4 6 2 T2 = 1 2 5 x 1 2 3 4 5 6 7 There are still convex bends, so we proceed with phase 3

π(x) 1 3 7 7 T1 = 2 5 6 4 6 5 4 3 1 4 6 2 T2 = 2 5 1 3 7 x 1 2 3 4 5 6 7 No concave bends were made this time, our construction is done. The pair (T1,T2) is the result.

96 We now establish, in a series of claims, that the construction constitutes a bijection between permutations and pairs of Young tableaux of the same shape as desired. We use the notion of a chain, which is a subset Y of a point set X such that Y is increasing, i.e. Y = (x , y ), (x , y ),..., (x , y ) { 1 1 2 2 k k } with y1 < y2 < . . . < yk and x1 < x2 < . . . < xk.

Claim. The number ni of shadows lines in phase i is the length of the longest chain in Xi.

Proof of Claim: Let Y be a longest chain in Xi. No shadowline can contain more than one point of Y , since shadowlines are decreasing while Y is increasing. This shows ni Y . Now observe≥ | | that for the element µ Y with smallest x- and y-coordinate ∈ we have µ min(Xi), otherwise there would be a point to the left and below ∈ 1 of µ enabling us to extend Y . Therefore, the first shadowline Si will contain µ 1 (and, indeed, an element of every longest chain) so the longest chain in Xi Si will be of size Y 1. The next shadowline will contain another point of Y and,\ | |− by induction, we conclude that ni Y since X1+ Y will only contain chains ≤ | | | | of size 0, meaning X1+ Y is empty. (claim) | | Claim. The tableaux have valid shape, i.e. n n for all i = 1, . . . , m 1. i+1 ≤ i − Proof of Claim: Let Y be a longest chain in Xi+1, consisting of some positions of concave bends of the shadowlines from the previous phase i. Since the concave bends of every shadowline are decreasing, Y can only contain one concave bend from each shadowline of phase i. This means Y ni. We already know Y = n from the previous claim, so we are done.| | ≤ (claim) | | i+1  Claim. The rows of T1 and T2 are increasing. Proof of Claim: This is equivalent to saying: Shadowlines constructed later in the phase will leave the picture further to the right and further to the top. This is obvious by construction: Every shadowline goes through the unused point with smallest x coordinate and the unused point with smallest y coordinate, making that point unavailable for later shadowlines. (claim)

Claim. The columns of T1 and T2 are increasing. Proof of Claim: We consider two consecutive phases i and i + 1 and show that in the step from row i to row i + 1 every column is increasing. 1 ni In phase i the shadowlines Si ,...Si where constructed with corresponding 1 ni sets of concave bends Bi ,...,Bi (some of which may be empty) that together form the set Xi+1. In the following we implicitly use that shadow lines of 1 the same phase are non-crossing. We have Bi min(Xi+1), meaning that 1 ⊆ 1 the points Bi will be “consumed” by the first shadowline Si+1 of phase i + 1. 2 2 ni Therefore Si+1 will be constructed from a subset of Bi ,...,Bi , and will use up 2 j all remaining points from Bi (if any remain). Generally, Si+1 will be constructed j ni j from a subset of Bi ,...,Bi . The x-coordinate of the leftmost point of Si+1 j is therefore at least the x-coordinate of the leftmost concave bend of Si and j therefore to the right of the leftmost point of Si . This proves that columns of T1 are increasing, for T2 do the same argument for y coordinates. (claim)

97 So far we proved that our map is well-defined, i.e. it gives rise to a pair of standard Young tableaux for any permutation π Sn. The final step is to show that the map constitutes a bijection. ∈

Claim. There is an inverse map, i.e. from any pair (T1,T2) of standard Young tableaux of the same shape we can recover a corresponding permutation π S . ∈ n Proof. We demonstrate the inverse procedure with an example, hoping the gen- eral case will be apparent from this. Consider the following pair of Young tableaux: 1 2 4 8 1 3 6 7 3 6 2 4 T := ,T := 1 5 2 5 7 8 These tableaux contain eight numbers in four rows, so we try to recover a four-phased construction and annotate the x and y coordinates of an 8 8 grid with the index of the phases at which we wish a corresponding shadowlines× to leave the picture. π(x)1 1 2 1 3 2 4 1 8 4 7 1 6 1 5 3 4 2 3 1 2 2 1 1 x 1 2 3 4 5 6 7 8

We claim there is a unique way to draw the shadowlines of phases 4, 3, 2 and 1 (one after the other) that correspond to a forward construction (see Figure 26). First connect the indices of the last phase with L-shaped shadowlines (the lines of the last phase do not have convex bends). In our example, there is only one such line (drawn in red). Now draw the shadowlines for phase 3, again there is only one in our example. It must connect the corresponding numbers at the border of the picture and have a concave bend at the convex bend of the shadowline of phase 4 on the way, there is only one way to do it. The lines for phase 2 (shown in blue) are added one after the other (right-most first). The first line must visit the convex bend at (7, 5) if it didn’t, then this point would not be reachable for the next blue shadowline (since they must not cross). In the last step the shadowlines of phase 1 are added (yellow). Convince yourself that there is, again, only one way to draw them: Every yellow shadowline must visit exactly those convex bends of the blue lines that where not visited by a previous yellow shadowline and that would otherwise become unreachable for subsequent yellow shadowlines. Now the permutation can be obtained by taking the convex bends of the shadowlines of phase 1. It is: π = 28561437. With techniques similar to what we have already seen one can show that this backwards procedure works for arbitrary tableaux.

98 π(x)1 1 2 1 3 2 4 1 π(x)1 1 2 1 3 2 4 1 8 4 8 4 7 1 7 1 6 1 6 1 5 3 5 3 4 2 4 2 3 1 3 1 2 2 2 2 1 1 1 1 x x 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

π(x)1 1 2 1 3 2 4 1 π(x)1 1 2 1 3 2 4 1 8 4 8 4 7 1 7 1 6 1 6 1 5 3 5 3 4 2 4 2 3 1 3 1 2 2 2 2 1 1 1 1 x x 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

Figure 26: Reverse construction of the Robinson-Schensted-Correspondence.

4.3.1 Counting Tableaux Given the geometric variant of the Robinson-Schensted-Correspondence (the original construction, although equivalent, was formulated very differently), it is not hard to count all Young tableaux with n elements. 2 We say a permutation π Sn is an if π = id, i.e. applying π twice yields the identity permutation.∈ This is equivalent to saying that the disjoint cycle decomposition of π contains only cycles of size 2 (transpositions) and cycles of size 1 (fixed points). For instance:

i: 1 2 3 4 5 6 7 8

π(i): 1 2 3 4 5 6 7 8

π2(i): 1 2 3 4 5 6 7 8

We would write this involution as π = (1 4)(2 6)(7 5).

RSC Lemma 4.11. Let π (T1,T2), i.e. π is mapped to (T1,T2) via the Robinson- Schensted-Correspondence.7→ Then

1 RSC (i) π− (T ,T ). 7→ 2 1 (ii) π is an involution iff T1 = T2.

99 (iii) The length of the longest increasing subsequence in π is the length of the first row in T1. Proof. (i) Recall that the construction starts with

1 X(π) = (i, π(i)) i [n] = (π− (j), j) j [n] . { | ∈ } { | ∈ } 1 The point set X(π− ) is therefore obtained by flipping the point set X(π) along the diagonal x = y, in other words, changing the roles of x and y. The shadowlines we obtain will also be flipped, so every x-coordinate we would have written into T1 is now a y-coordinate and therefore written into T2 and vice versa. This just means that the roles of T1 and T2 are swapped.

2 1 (i) (ii) π = id π = π− (T ,T ) = (T ,T ) T = T . ⇔ ⇔ 1 2 2 1 ⇔ 1 2 (iii) Consider an increasing subsequence, i.e.

i1 < i2 < . . . ik with π(i1) < π(i2) < . . . π(ik).

The corresponding set Y = (ij, π(ij)) j [k] is a chain in X(π). Now we can use a sub claim from{ the last| theorem∈ } which asserted that the length of the i-th row of the tableaux is the length of a longest chain in Xi, using it for i = 1.

Note that by Lemma 4.11(ii) we have that the number in of involutions of [n] and the number of standard Young tableaux with n squares coincide. The following Lemma therefore gives a recurrence for the number of Young tableaux.

Lemma 4.12. For the number in of involutions of [n] we have

i1 = 1, i2 = 2, in = in 1 + (n 1)in 2. − − − Proof. Consider involutions in their disjoint cycle decomposition. The cycles contain one or two elements, so think of an involution as an arrangement of n labeled balls in unlabeled boxes and one or two balls per box. To count them, we distinguish two cases:

Case 1: The ball with label n is in its own box. The rest is an arrangement with n 1 balls. − Case 2: The ball with label n is in a box together with another ball x. There are n 1 choices for x and the rest is an arrangement with n 2 balls. − − This proves the claim.

4.3.2 Counting Tableaux of the Same Shape Let (λ) be the set of all standard Young tableaux with shape λ. Take for instanceT

λ = , (λ) = 1 2 3 , 1 2 4 , 1 2 5 , 1 3 4 , 1 3 5 . T 4 5 3 5 3 4 2 5 2 4  

100 In general, let λ = (n , n , . . . , n ) where n n ... n are numbers 1 2 m 1 ≥ 2 ≥ ≥ m that sum up to n. With this in mind we define the function t : Zm N as →

((n1, . . . , nm)) if n1 n2 ... nm 0, t(n1, . . . , nm) = |T | ≥ ≥ ≥ ≥ (0 otherwise. For technical reasons, we allow trailing zero-length rows, for instance we could write ((3, 2, 0, 0)) = ((3, 2, 0)) = ((3, 2)) = 5. We claim the function is fully characterizedT by theT following identities:T

(1) If the numbers ni fail to be weakly decreasing then

t(n1, . . . , nm) = 0.

(2) If the last number is zero, then it can be removed

t(n1, . . . , nm, 0) = t(n1, . . . , nm).

(3) If the numbers are weakly decreasing and none is zero then

m

t(n1, . . . , nm) = t(n1, n2, . . . , ni 1, ni 1, ni+1, . . . , nm) − − i=1 X = t(n 1, . . . , n ) + t(n , n 1, . . . , n ) + .... 1 − m 1 2 − m (4) If only one number n 0 is left then ≥ t(n) = 1.

It is obvious that t fulfills (1), (2) and (4). And (3) is just the observation that the biggest number n must be the last number in one of the m rows. Consider for instance λ = (5, 4, 4, 3) then in any Young tableau the number n = 16 will be in one of the following positions:

n n n n

For each case, we count the number of Young diagrams of the shape where the corresponding square was removed, i.e. we consider the shapes

, , ,

So we compute t(4, 4, 4, 3)+t(5, 3, 4, 3)+t(5, 4, 3, 3)+t(5, 4, 4, 2). You may object that the second shape is not valid, but that’s no problem since t(5, 3, 4, 3) was defined to be zero. Note that (1)-(4) gives a complete recursion, i.e. for each input the value is either defined explicitly or given in terms of values for smaller inputs (in (2) the number of terms is decreased, in (3) the sum of the inputs is decreased). This

101 means there is a unique solution. While we do not try the long and arduous journey of discovering the solution ourselves, we can, given the solution, verify that it is indeed correct. To this end, we need to first examine the Vandermonde which is defined7 as ∆(x , . . . , x ) := (x x ). 1 m i − j 1 i

2 m 1 1 x1 x1 x1 − 2 ··· m 1 1 x2 x x − (m)  2 ··· 2  ∆(x1, . . . , xm) = ( 1) 2 det . . . . . − . . . · · · .    2 m 1   1 x x . . . x −   m m m    Before we can get back to counting tableaux, we need to prove a technical Lemma:

Lemma 4.13. We have the following identity on polynomials over x1, x2, . . . , xm, y: m m xk∆(x1, . . . , xk + y, . . . , xm) = x1 + x2 + ... + xm + 2 y ∆(x1, . . . , xm). k=1 X   Proof. Let g be the polynomial given as the sum on the left hand side. Observe what happens if we swap the roles of xi and xj (with i < j) in g. The summands for k / i, j will just change sign (by our previous observation). The summand with k∈= { i turns} into:

x ∆(x , . . . , x + y, . . . , x , . . . , x ) = x ∆(x , . . . , x , . . . , x + y, . . . , x ). j 1 j i m − j 1 i j m ↑ ↑ position↑ i position j position i position↑ j So the term for k = i became the negated term with k = j, and this works vice versa as well — altogether, the value of g changes sign if xi and xj swap roles. Now consider the case of xi = xj, then swapping xi and xj obviously doesn’t change anything. The only number that doesn’t change if its sign changes is zero, so g = 0 whenever two x and x coincide (for i = j). i j 6 Now think of all the variables x2, . . . , xm, y as some (distinct) integer con- stants and of x1 as the only actual variable, then g is a polynomial of degree 7Our definition differs from the usual one in that it may have a different sign.

102 n over x1. It has several zeroes, one of which is x1 = x2. This allows us to divide g by the degree 1 polynomial x1 x2 (using polynomial long division) and we obtain g = p (x x ) for some− polynomial p. In the same way we · 1 − 2 separate the other zeroes getting g = p0 (x x )(x x ) ... (x x ) for · 1 − 2 1 − 3 1 − m some polynomial p0. Then, looking at p0, we switch perspective thinking of x2 as the variable and of the other values as (suitable distinct) constants. Looking at it that way we find the zeroes x2 = x3, x2 = x4, . . . , x2 = xm of p0 and can separate corresponding factors from p0. We repeat this for the remaining m 2 variables as well. With this we eventually obtain: −

g =p ˆ (x x ). (?) · i − j 1 i

g = (a x + a x + ... + a x + by) (x x ) 1 1 2 2 m m · i − j 1 i

8 for some constants a1, . . . , am, b still to be determined . But recall that g was originally defined to be:

m

g = xk∆(x1, . . . , xk + y, . . . , xm). kX=1 Since the equation must hold for the special case of y = 0, we quickly see that a1 = a2 = ... = am = 1. Now to determine b, we multiply out g and collect the sum of all containing y with multiplicity 1. We start by analyzing the k-th summand of g:

x ∆(x , . . . , x + y, . . . , x ) k · 1 k m = x (x x ) (x x y)(x x y) ... (x + y x ). k · i − j 1 − k − 2 − k − k − m  1 i

8You may object thatp ˆ may have a constant term. We call a polynomial homogeneous, if all monomials have full degree. Since g and ∆(x1, . . . , xm) are homogeneous, we can conclude thatp ˆ must be homogeneous as well.

103 m 1 summands (one for each occurrence of y) add up to − y y y xk (xi xj) − + − + ... + ·  −  x1 xk x2 xk xk xm 1 i

(2) t(n1, . . . , nm 1, 0) = t(n1, . . . , nm 1). − − Since nm = 0 means xm = 0, we calculate: m 1 − ∆(x1, . . . , xm 1, 0) = (xi xj) = (xi xj) xi − − − 1 i

104 With this it is easy to verify:

∆(x1, . . . , xm 1, 0)n! t(n1, . . . , nm 1, 0) = − − x1! . . . xm 1!0! − ∆(x1 1, . . . , xm 1 1)n! = − − − = t(n1, . . . , nm 1). (x 1)! ... (x 1)! − 1 − m − The hard part is (3), but we did most of the work already in the last Lemma. (3) t(n , . . . , n ) = m t(n , . . . , n 1, . . . , n ). 1 m k=1 1 k − m First note: P

m m m 1 − m x = (n + m i) = n + j = n + . i i − 2 i=1 i=1 j=0 X X X   Now Lemma 4.13 gives for y = 1 − m m x ∆(x , . . . , x 1, . . . , x ) = (x + ... + x ) ∆(x , . . . , x ) k 1 k − m 1 m − 2 1 m kX=1   n Hence: | {z } m ∆(x1, . . . , xm)n! k=1 xk∆(x1, . . . , xk 1, . . . , xm)(n 1)! t(n1, . . . , nm) = = − − x1! . . . xm! x1! . . . xm! m P m ∆(x1, . . . , xk 1, . . . , xm)(n 1)! = − − = t(n1, . . . , nk 1, . . . , nm) x1! ... (xk 1)! . . . xm! − kX=1 − kX=1

The last property is again easy to verify. For m = 1 and n = n1, we have:

∆(x1)n! ∆(n1)n1! (4) t(n1) = = = ∆(n1) = 1. x1! n1! where ∆(n1) = 1 because it is an empty product. With the last Theorem it would already be possible to count Young tableaux of any shape fairly conveniently. However, with a geometric interpretation, it gets even easier. This geometric interpretation uses hooks. Consider a partition λ = (n , . . . , n ) and a square (i, j) λ, by which we 1 m ∈ mean the square in row i and column j. The hook hi,j rooted in (i, j) is the union of all squares to the right of the square in the same row and below of the square in the same column. Here is a picture, a Ferrer diagram where the hook h2,3 is highlighted, it has length 7.

105 Theorem 4.15 (). Let λ be a partition of n. Then

n! t(λ) = hi,j . (i,j) λ | | Q∈ Proof. Let λ = (n1, . . . , nm). We multiply the lengths of all the hooks, going through them row by row. For row i, the first hook hi,1 starts at square (m, 1) then goes upwards and rightwards and ends at square (i, ni). It has length (ni + m i) = xi. The other hooks in row i also end in (i, ni) but they start in other− positions. We go through these positions from left to right as shown in the following figure (for i = 3).

• • • • Now if there was a hook starting in each integer position this line passes through, the product of the hook lengths would just be xi!. However, the positions marked with a dot do not correspond to starts of hooks so for each dot we have to divide by the length that a hook starting there would have. The dot in line j (j > i) is in position (j, nj + 1) so a hook going to (i, ni) has length (j i + ni nj) = (xi xj). So the product of all valid hooks for line i is − xi! − − Q . Now for the product of all hook lengths of all rows we get: (xi xj ) j>i −

xi! x1! . . . xm! x1! . . . xm! 4.14 n! hi,j = = = = | | (xi xj) (xi xj) ∆(x1, . . . , xm) t(λ). (i,j) λ 1 i m Y∈ ≤Y≤ j>i − i

7 6 3 1 5 4 1 → 3 2 2 1

The number of Young tableaux of this shape is by the Hook length formula 11! divided by all the hook lengths, so: 11! 11 10 9 8 = · · · = 11 10 3 4 = 1320. 7 6 3 1 5 4 1 3 2 2 1 3 2 · · · · · · · · · · · · · ·

106 5 Partially Ordered Sets

Example 5.1. As was accurately observed by Randall Munroe, creator of xkcd, different fruit not only differ in their tastiness, but also in the difficulty of preparing them. Some types fruit are clearly superior to others, for instance

Figure 27: https://xkcd.com/388/: Coconuts are so far down to the left they couldn’t be fit on the chart. Ever spent half an hour trying to open a coconut with a rock? – Randall Munroe seedless grapes are both more tasty and easier to prepare than oranges. However, if you compare pineapples with bananas then pineapples are more tasty but harder to prepare, so there is no clear winner. To model these situations where some things are bigger/better/above/dominating others things but some pairs of things may also be incomparable/on the same level/of equal rank, we introduce the concept of partial orders.

Definition 5.2. A or poset or partial order is a pair P = (X, ) where X is a (for us usually finite) set and is a binary re- lation that is:≤ ≤ (i) reflexive: x x, x X. (ii) transitive: (x ≤ y and y z) x z, ∀x,∈ y, z X. (iii) antisymmetric: (x ≤ y and y ≤ x) ⇒ x ≤= y, ∀x, y X∈ . ≤ ≤ ⇒ ∀ ∈ You probably already know some (partial) orders, and recognize a few ex- amples from Table3. Note that a (where for any x, y X we have x y or y x) is a special case of a partial order. We now introduce∈ some notation.≤ ≤ If x y and x = y then we write x < y. • ≤ 6 If x y we also write y x. • ≤ ≥

107 X Example Relations ≤ natural numbers order by value 23 42, 42  23 ≤ natural numbers order by divisibility 6  7, 7  6, 13 91 polynomials order by divisibility X 7 X2 10≤X + 21 words over A, . . . , Z order lexicographically ELEPHANT− ≤ − MOUSE { } ≤ real intervals order by inclusion [3, 5] [2, 7], [2, 3]  [4, 6] ≤ real intervals “completely left of” [3, 5]  [2, 7], [2, 3] [4, 6] ≤ real valued functions point-wise domination 1 + x  x2, sin(x) 2 subsets of [N] inclusion 1, 3 1, 2, 3, 5 ≤ { } ≤ { }

Table 3: Some posets P = (X, ) with informal definitions of their order rela- tions and examples. ≤

If neither x y nor y x then x, y are incomparable denoted by x y. • ≤ ≤ k If x < y and there is no z with x < z < y, then x < y is a cover relation • (or cover for short), denoted by x l y. We depict a poset by its . In it, every x X corresponds to a ∈ point in the plane and for every x, y X with xly the corresponding points are connected by a y-monotone line where∈ x is the lower (with smaller y-coordinate) endpoint. In particular “transitive edges” are omitted meaning x, y P are not directly connected if there is a z such that x < z < y. The poset from∈ Example 5.1 has the Hasse diagram shown in Figure 28.

Figure 28: Hasse Diagram for the poset originating from Figure 5.1. For instance lemons and bananas are connected by a line since bananas are both tastier and easier to prepare. Lemons and cherries are not connected by a direct line since they are already connected via bananas. Bananas and watermelons are not connected at all since they are incomparable.

108 Sometimes it may be easier to define the cover relations than to define the entire relation.

Example. Let X be the set of states of a Rubik’s . For x X define r(x) to be the minimum number of moves needed to solve the cube from∈ state x. We want x l y if and only if r(x) < r(y) and x and y can be transformed into one another by one move. This is the cover relation of a poset (without proof).

We now define some key concepts and corresponding notation. Definition 5.3. Let P = (X, ) poset. ≤ A chain is a sequence of elements in increasing order, i.e. y1 < y2 < . . . < • y for y X. It has length k. k i ∈ The height of P is the length of a longest chain in P . • A set Y X where elements are pairwise incomparable (y1 y2 for all • y = y ⊆Y ) is an antichain. k 1 6 2 ∈ The width of P is the size of a largest antichain in P . • The sets of minimal and maximal elements of P are given as • min(P ) := x X y X : x y or x y , { ∈ | ∀ ∈ ≤ k } max(P ) := x X y X : x y or x y . { ∈ | ∀ ∈ ≥ k } Looking at the poset P from Figure 28 again, an example for a chain would be Watermelons, Pears, Strawberries . There are several longest chains, one of which{ is Grapefruit, Oranges, Bananas,} Plums, Pears, Blueberries, Seed- less Grapes {. The height of P is therefore 7. An example for an antichain is Pineapple,} Cherries, Plums, Red Apples . No other antichain is longer so the{ width of P is 4. The maximal elements} are max(P ) = Seedless Grapes, Peaches and the minimal elements are min(P ) = Pineapples,{ Pomegranates, Grapefruit,} Lemons . We now study partitions of{ posets into chains and an- tichains. We start with} the easier case. Theorem 5.4 (Antichain Partitioning). The elements of every poset P = (X, ) can be partitioned into h(P ) antichains (and not less). ≤ Proof. Note first that we cannot partition P into fewer antichains: No antichain can contain two elements of a chain, since elements of chains are pairwise com- parable and elements of antichains are pairwise incomparable. Since P contains a chain Y of size h(P ), at least h(P ) antichains are needed. To see that h(P ) antichains suffice, first note that min(X) is an antichain: Two minimal elements are always unrelated, otherwise the “bigger” minimal element would not be minimal at all. Also, every maximal chain in P contains an element from min(X): If Y = y , . . . , y is a maximal chain with y < . . . < y { 1 k} 1 k and y1 is not minimal, then we would find y0 < y1 and therefore a longer chain. So we take the first antichain to be min(P ), then we still need to partition X min(P ) into h(P ) 1 antichains. Since the maximal chains in X min(P ) have\ size at most h(P )− 1 we can do this by induction. \ −

109 Figure 29: Partition of the poset into 7 antichains obtained by iteratively putting the minimal remaining elements into a new antichain.

To get back to our fruit example, Figure 29 shows how the poset can be partitioned into 7 antichains. Theorem 5.5 (Dilworth’s Theorem). Every poset P = (X, ) can be parti- tioned into w(P ) chains (and not less). ≤ Proof. Clearly, w(P ) chains are necessary: P contains an antichain of size w(P ) and no two of its elements can be contained in the same chain. To show w(P ) chains suffice, we do induction on X . Consider a maximum | | antichain A = x1, . . . , xw(P ) . The idea is{ to split P along} A into two parts, take for instance our fruit poset and A = Seeded Grapes, Cherries, Plums, Red Apples , then the two parts are shown{ in Figure 30. } Formally we define P = (X , ) and P = (X , ) with elements 1 1 ≤ 2 2 ≤ X := y X x A: x y ,X := y X x A: y x , 1 { ∈ | ∃ ∈ ≤ } 2 { ∈ | ∃ ∈ ≤ } and the same relations as before. Note that with this definition X1 X2 = A, since if y (X X ) then there is x A and x A with x y ∩x . Since ∈ 1 ∩ 2 1 ∈ 2 ∈ 1 ≤ ≤ 2 A is an antichain this implies x1 = x2 = y so y A. ∈1 1 Now if we can partition P1 into chains C1 ,...,C A and P2 into chains 2 2 1 2 | | C1 ,...,C A where Ci and Ci both contains xi then we can attach the chains | | to one another obtaining a partition of P into chains C1,...,C A . We can find | | these partitions by induction unless P1 or P2 fail to be smaller than P . Convince yourself that P1 = P A = min(P ) and P2 = P A = max(P ). So our only problem is the case where⇔ there is no largest antichain⇔ except for min(P ), max(P ) or both. In that case, let C be any maximal chain. In the same way as in the previous Theorem we argue that min(P ) C = and max(P ) C = . Then P C has ∩ 6 ∅ ∩ 6 ∅ \

110

Figure 30: We split the poset P along an antichain into P1 (the upper half) and P2 (the lower half). The elements of the antichain are contained in both posets after the split. width w(P ) 1 (since we removed an element from each largest antichain) so it can be partitioned− into w(P ) 1 chains by induction and we are done. − 5.1 Subposets, Extensions and Dimension

We now define when a poset is part of another poset, that is, when P = (X, P ) is a subposet of Q = (Y, ). This should be the case if X Y and x , x≤ ≤Q ⊆ ∀ 1 2 ∈ X :(x1 P x2 x1 Q x2). In that case we say P is the subposet of Q induced by X. But≤ since⇔ we≤ do not care about relabeling of elements, the notion of subposet is actually a bit more generous: P is a subposet of Q if there is an f : X Y such that → x1, x2 X :(x1 P x2 f(x1) Q f(x2)). The function f is an embedding ∀and the∈ image of f≤is a copy⇔ of P in≤ Q. We say Q = (X, ) is an extension of P = (X, ) (note that they use the ≤Q ≤P same ground set X) if x1, x2 X : x1 P x2 x1 Q x2. When interpreting relations as subsets of X∀ X we∈ could≤ write ⇒ ≤ . × ≤P ⊆ ≤Q We say L = (X, L) is a linear extension of P if it is an extension of P and a total order (meaning≤ no two elements are incomparable in ). ≤L Lemma 5.6. Let P = (X, ),P = (X, ) be two posets on the same ground 1 ≤1 2 ≤2 set X. Then we define P = P1 P2 := (X, ) with the relation := 1 2, meaning ∩ ≤ ≤ ≤ ∩ ≤ x y (x y and x y). ≤ ⇔ ≤1 ≤2 We claim P is a poset and P1 and P2 are extensions of P .

111 e z c d Q := P := x y P 0 := w a b

Figure 31: With the posets as shown (via their Hasse diagrams) P is a subposet of Q as witnessed by the map w a, x c, y d, z e. Actually, there are four different embeddings, since w7→could7→ also be7→ mapped7→ to b and x and y can also be mapped to c and d the other way round. These two maps correspond to two copies of P in Q, namely a, c, d, e and b, c, d, e . The poset P 0 is not { } { } a subposet of Q. However, Q is an extension of P 0.

Proof. It is easy to verify that is an order relation. It is also clear that all pairs ≤ that are related via are related via 1 and 2, so 1 and 2 are extensions of . ≤ ≤ ≤ ≤ ≤ ≤ Two-dimensional posets. Recall the poset P from the fruit example. It has two natural linear extensions: The first is the total order LTaste in which the fruit are arranged in order of increasing tastiness:

Lemons Grapefruit Tomatoes ... Peaches. ≤LTaste ≤LTaste ≤LTaste ≤LTaste

In other words, LTaste is the order obtained when projecting all elements to the tastiness-axis (note how this is an extension of P and a total order, assuming no two fruit have exactly the same tastiness) The second is the total order LEase, in which the fruits are arranged in order of increasing ease of preparing them:

Pineapples Pomegranates ... Seedless Grapes. ≤LEase ≤LEase ≤LEase Fruit are ordered if and only if they are ordered according to both linear orders so P = L1 L2. We want∩ to capture this property in a new notion and define: A poset is 2- dimensional if it has a two dimensional picture9, i.e. an assignment f : P R2 of positions in the plane to each element such that x < y in P if and only→ if f(y) is above and right of f(x).

Not every poset is 2-dimensional. Consider the spider poset with three legs, the Hasse Diagram is shown on the left of Figure 32. This spider has a head H three knees K1,K2,K3 considered bigger than the head and three feet F1,F2,F3 considered smaller than the corresponding knee. There are no further relations. We try to find a two dimensional embedding into the plane, i.e. assign positions in the plane to each element (see right side of Figure 32). The head H has to go somewhere which partitions the plane into four quad- rants (above and right of H, below and right of H, above and left of H, below and left of H). The knees must all go above and right of H since they are bigger than H. Since knees are incomparable, the ones with bigger x coordinate must

9Strictly speaking we would have to say: It has a two-dimensional picture but no one- dimensional picture, for details refer to the general definition later.

112 K1 K2 K1 K3 K2 K3 F2 F1 H F3 H

Figure 32: On the left: The spider with three legs. On the right: Sketch for the proof that the spider is not two-dimensional. have smaller y coordinate so the knees must lie on a decreasing curve as shown. Without loss of generality, K2 is the second point on this decreasing curve. Now observe that there is no suitable point to put F2: It cannot be in the quadrant below and left of H nor in the quadrant above and right of H since H F . k 2 Since it must also be below and left of K2, it must be in the shaded area. But points in that area are below and left of either K1 or K3 (or both), which is not allowed since F K and F K . This completes the proof. 2 k 1 2 k 3

Sums of Chains. We define an additional bit of notation. By i k for i, k N we mean the poset that is the union of a chain of length i and a chain⊕ of length∈ k and no comparabilities between the chains. For instance 1 1 is an antichain of size 2. The poset important in the following is 1 2 = ⊕, let’s label with c a ⊕ b . It consists of three elements a, b, c with b < c and a b, a c. Note that has three linear extensions, a < b < c, b < c < a, kb < ak < c, the last of which will have a special standing in the following: Given a linear extension L of some poset P and a copy of in P then we say L orders this copy of if the lonely element (corresponding to a above) does not come in between the other two elements (corresponding to b and c above). Take for instance the linear ordering L : a

Theorem 5.7. A poset P is 2-dimensional if and only if there is a linear extension L of P ordering all copies of 1 2 in P . ⊕ Proof. “ ” Assume P is 2-dimensional. Then P = L1 L2 for the two linear ⇒ ∩ 2 extension L1,L2 of P corresponding to an embedding into R . We claim that any of the linear extensions, say L1, orders every copy of .

113 Assume not, then we find elements a, b, c in P that form an unordered c a copy of , i.e. we have the situation b with b

c b L1

We know from L1 that, horizontally, a is between b and c, but vertically it must be below b (since a b) and it must be above c (since a c). Clearly this is not possible (it wouldk have to be in both shaded areask at once) which contradicts the assumption that L1 does not order a copy of . “ ” Let L be a linear extension of P ordering all copies of . From this ⇐ we construct an embedding into R2. First label the elements of P in accordance with L, i.e. x1

xj x

We must have put xi so high for a reason: There must be some xk (k < i) with xk

Higher dimensional posets. There is a natural way to generalize the notion from the previous few pages to arbitrary dimension. First note that for d N ∈ the set Rd becomes an (infinite) poset via the dominance order which simply d d means that for x = (x1, . . . , xd) R , y = (y1, . . . , yd) R we have ∈ ∈ x y : i [d]: x y . ≤dom ⇔ ∀ ∈ i ≤ i

114 d We say a poset P is d-dimensional if it is a subposet of (R , dom) and not a d 1 ≤ subposet of (R − , dom). ≤ Theorem 5.8. A poset P is d-dimensional if and only if d is smallest number of linear extensions of P whose intersection is P .

d d Proof. It suffices to show that P is a subposet of R if and only if P = i=1 Li for some linear extensions L1,...,Ld of P . T “ ” We can assume, without loss of generality, that no two points share a ⇒ coordinate (we always break ties without changing the relations in the poset). Then define Li as the order of the elements on the projection of P to the i-th coordinate. This is a set of linear extensions with intersection P .

“ ” Given linear extensions L1,...,Ld we can take these to assign coordinates. ⇐ d The coordinate of x P will be (r1, . . . , rd) R where ri is the rank of x in the i-th linear extension,∈ i.e. r = y ∈P y x . It is easy to i |{ ∈ | ≤Li }| see that this gives an embedding of P into Rd.

Next we observe that every poset has a well-defined dimension, in other words, every poset can be written as the intersection of d linear orders for some large enough d. We call such a set of linear extensions a realizer of P . We claim that taking all linear extensions = L L is linear extension of P L { | } works, i.e. P = L L. It is clear that “ ” holds. We leave the other direction as an exercise. ∈L ⊆ Our next resultT shows that the dimension does not only exist but is actually bounded by the width of the poset. We require some preparation though. Definition 5.9. Let P = (X, ) be a poset. ≤ For S X define the open downset D(S) = y X x S : y < x and • the closed⊆ downset D[S] = D(S) S. { ∈ | ∃ ∈ } ∪ For single elements sets S = s we will simply write D(s) and D[s] • instead of D( s ) and D[ s ]. { } { } { } In the same way define open and closed upsets: • U(S) = y X x S : x < y ,U[S] = U(S) S, { ∈ | ∃ ∈ } ∪ U(s) = U( s ),U[s] = U[ s ]. { } { } Consider the left of Figure 33 for an example. Theorem 5.10. For any poset P its dimension is at most its width, i.e.

dim(P ) w(P ). ≤ Proof. Our goal is to find a realizer on w = w(P ) . By Dilworth’s Theorem we can partition P into w(P ) chains C1,...,Cw. Claim. For any chain C there is a linear extension L of P such that for any x C and x y we have y < x. ∈ k L

115 X5

X4 X3 ∅ X2

X1

X0

Figure 33: On the left: The open upset of Cherries, Red Apples, Blueberries is shown in red and the closed downset of{ Pineapples Bananas is shown in} dashed blue. { } On the right: Sketch for the claim from Theorem 5.10. Consider the chain C = Oranges, Watermelons, Cherries, Pears, Strawberries . Then L is a linear extension{ that puts the elements from C as late as possible,} for instance like this (elements from C highlighted): L : Grapefruit, Lemons, Tomatoes, Pineapples, Pomegranates, Oranges, Bananas, Red Apples, Watermelons, Plums, Green Apples, Seeded Grapes, Cherries, Pears, Peaches, Blueberries, Strawberries, Seedless Grapes

Note first that once we have proved this claim, we have proved the theorem since the linear extensions L1,...,Lw we get for the chains C1,...,Cw are a w realizer of P , meaning P = i=0 Li. It is clear that the right side is an extension of the left side. Now consider if x y. Then x Ci for some i [w], and y Cj T k ∈ ∈ ∈ for some j = i. Then we have y

  • Proof of claim. Denote the elements of C by x1 < x2 < . . . < xk and consider their upsets. Note that clearly U(x ) U(x ) ... U(x ). 1 ⊂ 2 ⊂ ⊂ k Now define X0 := X U[x1], Xj := U(xj) U[xj+1] (for 1 j < k) and X := U(x ) and let P :=\ (X , ) be the poset\ induced by X (0≤ j k). k k j j ≤ j ≤ ≤ Given a linear extension Lj for each Pj (0 j k) we now define the linear extension L of P as: ≤ ≤

    L : L0 x1 L1 x2 L2 . . . xk Lk. First, convince yourself that L really is a linear extension of P , i.e. every element occurs exactly once and if x

    116 Now assume x C and x y. Say x = xl. Then y / U(xl) so y is not part of any X for j l∈. This meansk y < x = x as claimed.∈ j ≥ L l

    Example 5.11 (Standard Examples). For a positive integer n define the poset Sn = ( a1, . . . , an, b1, . . . , bn , ) where ai bj i = j and no (non-reflexive) relations{ within a , . . . , a }and≤ b , . . . , b≤ respectively.⇔ 6 For instance: { 1 n} { 1 n}

    b1 b1 b2 b1 b2 b3 b1 b2 b3 b4 S1 = ,S2 = S3 = S4 = a1 a1 a2 a1 a2 a3 a1 a2 a3 a4

    We call these posets the standard examples. We claim that dim(Sn) = n = w(Sn) (for n 2), so the standard examples make the bound from the last Theorem tight.≥ To see that dim(Sn) n, consider a realizer of Sn (consisting of linear extensions). Since a b ≥for i [n], some linear extension L in the realizer i k i ∈ must have bi

    5.2 Capturing Posets between two Lines

    l2 T S E

    l1 P O

    Figure 34: On the left we drew five shapes between two horizontal lines. From this we obtain a poset (shown on the right) by considering a shape less than another shape if it is left of it.

    2 Imagine l1 and l2 are two horizontal lines in the plane. An object O R ⊆ is spanned between l1, l2 if it is contained in the strip in between l1 and l2, is connected and touches both l1 and l2. With a set of such objects we associate a poset where the strict relation is given as “left of”, meaning that O O if and only if O O = and the 1 ≤ 2 1 ∩ 2 ∅ leftmost point of O1 is left of the leftmost point of O2. This clearly gives an order relation (once we add the reflexive relations). Note that since the objects touch the top and bottom line we avoid situations like this

    l2

    O1 O2 O3 l1 where one might be inclined to say that O1 is left of O3 is left of O2 but O1 is not left of O2. Since this is forbidden our “left of” is really consistent with all reasonable intuitions one might have about leftness.

    117 In the following we ask: What kind of posets can be represented in such a way and what shapes are required? Observation 5.12 (Straight Lines). If P is 2-dimensional then P can be rep- resented by straight segments spanned between the two lines. To see this, assume P = L L for two linear orders L ,L . Then place 1 ∩ 2 1 2 the elements of P on l1 in the order given by L1, also place them onto l2 in the order given by L2 and connect the points corresponding to the same element. This give a line segment sx for each x X. Now observe: ∈

    x < y (x < y and x < y) s left of s . P ⇔ L1 L2 ⇔ x y The reverse holds as well, i.e. any poset represented by straight segments spanned between l1 and l2 is at most two-dimensional and a realizer is given by sorting the segments according to their endpoints on l1 and l2. The second type of object we consider are axis aligned rectangles. Note that since those rectangles must be tangential to l1 and l2, they are already uniquely determined by their leftmost and rightmost x-coordinate. In fact, the setting is not “really” two-dimensional and is easily seen to be equivalent to the setting of interval orders where: An interval order P = (X, ) is given by a set X of open bounded intervals ≤ in R with (a, b)

    If P could be written as an interval order, then a, b, c, d would be rep- resented by intervals (al, ar), (bl, br), (cl, cr), (dl, dr) with the following properties: a < b (since a < b) • r R l P b < c (since c b) • l R r kP c < d (since c < d) • r R l P d < a (since a d) • l R r kP

    This requires ar

    118 Given the observation, we can order the downsets = D(x) x X by strict inclusion, i.e. = D D D ... DD . { | ∈ } ∅ 0 ⊂ 1 ⊂ 2 ⊂ ⊂ µ For x X we choose the interval (a , b ) such that ∈ x x

    min β x Dβ if x / max(P ) D(x) = Da , bx = { | ∈ } ∈ x µ + 1 if x max(P ). ( ∈

    Observe that ax < bx. Also, if x

    h h g f g e f d c e b d a

    a b c 0 1 2 3 4 5

    = D = ,D = ab,D = abc,D = abcde,D = abcdeg D { 0 ∅ 1 2 3 4 } D(a)=D(b)=D(c) D(d) D(e) D(f)=D(g) D(h) | {z } | {z } | {z } | {z } | {z } Figure 35: Example for the construction in Fishburn’s Theorem. The poset on the left contains no 2 2. We determine all downsets D0 ... D4. We pick the interval orders as⊕ the Theorem suggests, for instance,⊆ the interval⊆ for d is (1, 3) since D1 is the downset of d and D3 is the first downset containing d.

    We now consider the case where the objects are (open) triangles spanned between l1 and l2 with the base on l1 and tip on l2 (see pictures below). Theorem 5.14. P is a triangle order if and only if there is a linear extension of P that orders all copies of 2 2. ⊕ Here we say a 2 2 is ordered by a linear order L, if L puts the element of one chain both before⊕ the elements of the other chain. In other words, if 2 2 is labeled like this: ⊕ b d a c then we want either a

    119 Then a, b, c, d correspond to triangles Ta,Tb,Tc,Td and Ta is left of Tb.

    Ta Tb

    Assume for contradiction that the tip of Tc is in between the tips of Ta and T . Since c b we need T to intersect T , maybe like this: b k c b

    Ta Tc Tb

    Now there is no way to add Td: It must be right of Tc but not right of Ta which is clearly impossible.

    “ ” Assume L : x1 < x2 < . . . < xn is a linear extension of P that orders ⇐ all copies of 2 2. We argue that P is a triangle order. We construct a triangle T for⊕ every x P , given by the horizontal position of the three x ∈ points, namely the tip tx and base (ax, bx).

    The tips should simply be ordered according to L, so txi = i for each i [n]. We pick the bases of the triangles one after the other, going through∈ the elements of P from left to right according to L.

    For x1 we define its base arbitrarily. Having defined triangles for x1, . . . , xk, k 1, we now show how to add Tx for x = xk+1 with the desired rela- tionships≥ to the already existing triangles. Consider the downset D(x). We choose a = max b y D(x) , so as x { y | ∈ } far left as possible while still right of every triangle that Tx should be right of. The right end point bx is chosen somewhere far right.

    y1 u y2 y3 s x

    a↑x far right↑

    In the picture we assume D(x) = y , y , y , so a = max b , b , b = { 1 2 3} x { y1 y2 y3 } by3 . This ensures that x >P y implies that Tx is right of Ty (for any y). Also (by choice of the tips) Tx is not left of an existing triangle. However, the pictures shows that there might be u / D(x) such that T is right of ∈ x Tu. We need to fix this.

    We do this by shifting ax to the left. But remember that ax needs to be right of by for each y D(x), so as soon as ax would overtake such a by, we just take it with us,∈ shifting it to the left as well. Now we need to argue that this does not cause Ty to be suddenly left of some Tu for y  u. So assume ax and by are about to overtake the left endpoint au of Tu, maybe like this: t y u x

    ↑ shifting by and ax left, almost overtaking au

    120 All Orders Convex Orders Triangle Orders

    dim(P ) 3 dim(P ) 2 Interval Orders ≤ ≤

    Figure 36: Venn diagram showing the relationships between some types of orders we consider here and in the following.

    Now remember: The reason we are shifting is that there is some t  x with Tt still left of Tx. Since we almost overtook au already, this triangle Tt is also left of Tu so t < u (otherwise we would have made a mistake earlier). Combining what we know (y < x, t < u, y  u, t  x) we see that y, x, t, u forms a copy of 2 2 like this: ⊕ x u y t

    Since x is the most recent element, it comes last in L, and since L orders every copy of 2 2 we have t

    t u y x

    Here, shifting ax and by past au is no problem as Tu and Ty will still intersect. So we can shift as far as we need which proves the claim. As an overview over the results of this section consider the Venn diagram in Figure 36. The class of convex and triangle orders are already defined geo- metrically as orders arising from convex sets or triangles spanned between lines, respectively. But remember that interval orders and orders of dimension at most two also arise from shapes in that way: From axis aligned boxes and segments. In Theorem 5.15 we will furthermore see that the set of all orders arises from y-monotone curves. The containments are easy to see: Any straight line is a (degenerated) tri- angle and every triangle is convex. To see the inclusion of interval orders in triangle orders, observe that if we only allow “boring” triangles that have the tip above their base, then triangles intersect iff their bases intersect, so these triangles behave like intervals. Alternatively, this relation immediately follows from Theorem 5.13 and Theorem 5.14. In Lemma 5.16 we show that there is indeed a non-convex three dimensional order. We remark without proof that the “small” circle of interval orders al- ready contains posets of arbitrary dimension: The interval order defined by all

    121 1 closed intervals with end points in [n] has dimension at least log log n + ( 2 + o(1)) log log log n. Theorem 5.15. Every poset P can be represented by y-monotone curves spanned between l1 and l2.

    Proof. Take any realizer of P , i.e. P = L1 L2 ... Lk for linear orders L ,...,L . Assume without loss of generality ∩k 2∩ and introduce∩ k 2 lines in 1 k ≥ − between l1 and l2. This gives k lines in total. On the i-th line we distribute the element of P in increasing order according to Li. We then connect all points belonging to the same element with a straight segments.

    ordered by L5: l2 ordered by L4: ordered by L3: ordered by L2: ordered by L1: l x w y z 1

    Now it is obvious that x

  • d1 d2 d3

    c1 c2 c3

    b1 b2 b3

    a1 a2 a3

    Assume it can be represented by convex shapes Ca1 ,Ca2 ,...,Cd3 that are spanned between l1 and l2. Note that this implies that each such shape Cx contains a straight line Sx that is spanned between l1 and l2.

    Since ai di for i = 1, 2, 3 the shapes Cai and Cdi intersect. So take a point x Ck C . Assume without loss of generality that x has the middle i ∈ ai ∩ di 1 y-coordinate. With respect to the straight line from x2 to x3, we have that x1 is either left of it (red area) or right of it (blue) area:

    l2 x2

    x3 l1

    Assume x1 is on the left. Then since c1 > a2 and c1 > a3, the shape Cc1 must be right of x and x (since x C , x C ), but since c < d it must also 2 3 2 ∈ a2 3 ∈ a3 1 1 be left of x1 (since x1 Cd1 ). The same holds for the straight line Sc1 . Clearly such a straight line does∈ not exist.

    If x1 is on the right instead, we run into the same problem with the line Sb1 : It has to be right of x1 but left of x2 and x3.

    122 5.3 Sets of Sets and Multisets – Lattices We now consider posets whose elements are sets ordered by inclusion meaning A B iff A B (we will prefer the latter notation). Usually the sets we consider ≤ ⊆ are subsets of [n] and the most important example is the Boolean lattice Bn, the family of all subsets of [n]. Take for instance B3:

    1, 2, 3 { }

    1, 2 2, 3 1, 3 { } { } { }

    2 1 3 { } { } { }

    ∅ Two sets are incomparable if they are pairwise not included in one another. For instance 1 2, 3 and 1, 2 1, 3 . Note also that 1, 2 , 2, 3 , 1, 3 is { } k { } { } k { } {{ } { } { }} a largest antichain in B3. The next Theorem generalizes this observation.

    n Theorem 5.17 (Sperner’s Theorem). w(Bn) = n . d 2 e Proof. Note that two different sets of the same size  are never included in one another. So taking all k-subsets of [n] (i.e. all subsets of size k) yields an n n antichain of size k . Choosing k = 2 maximizes its size and proves the lower bound. d e  To prove the upper bound, we introduce a new notion: For a permutation π of [n] we say that π meets a set A [n] if the elements of A form a prefix of π, meaning: ⊆ A = π(1), π(2), . . . , π( A ) . { | | } For instance, the permutation π = 24513 meets 2, 4 and 1, 2, 4, 5 but does not meet 4, 5 . { } { } Now consider{ } a antichain . We count the number of permutations that meet an element of . Note thatF the sets met by a single permutation π form a chain, so no π canF meet several elements of . So clearly the sets π π meets A are disjointF for different A and we have: { | } ∈ F π π meets A n! |{ | }| ≤ A X∈F For any given A [n] there are A !(n A )! permutations that meet A: We know that the first⊆ A elements of| such| − a π | are| given by A and can be arranged in A ! ways, and the| | remaining n A elements can be arranged in (n A )! | | − | | − | |

    123 ways. So we have:

    A !(n A )! n! | | − | | ≤ A X∈F 1 1 ⇔ n ≤ A A X∈F | | 1  n 1 ⇒ n ≤ A X∈F d 2 e  n . ⇔ |F| ≤ n d 2 e n In the first step we divided both sides by n!, then we used that k = 2 maxi- n d e mizes k and thus made the term under the sum independent of A.

    Remark. Note that if actually is a largest antichain in Bn, then all inequalities F n used in the above proof must be tight. In particular, each A must actually n n n | | n have been equal to n = n . So only sets of size 2 or 2 are used in 2 2 b c d e . From this it is easyb c to seed that:e F   [n] If n is even, then = n is the unique largest antichain. • F 2  [n] [n] If n is odd, then all largest antichains are contained in n n . • b 2 c ∪ d 2 e Next we consider so-called intersecting families. We say 2n isintersect- ing if A, B : A B = . F ⊆ ∀ ∈ F ∩ 6 ∅ [n] n 1 Observation 5.18. If 2 is intersecting, then 2 − and this bound can be attained. F ⊆ |F| ≤

    Proof. For any element A , the complement [n] A is disjoint from A and cannot be in . So taking complements∈ F is an injection\ that maps elements from F n n 1 to elements not in . Therefore 2 and thus 2 − . F F |F| ≤ − F |F| ≤ To attain this bound, consider x = A [n] x A for some fixed F { ⊆ |n 1 ∈ } x [n]. Clearly this is an intersecting family of size 2 − . ∈ So this problem has a fairly straightforward and boring solution. But what about the maximum cardinality of an intersecting k-family, i.e. we only allow sets of size k? If k = 1, then we can clearly only pick one set (any two different sets of size 1 do not intersect). If k = 2, then it is easy to see that n 1 is best possible using the sets 1, 2 , 1, 3 , 1, 4 ,..., 1, n . Is the best strategy− still the “obvious” one in general?{ } { It turns} { out} the answer{ } is “yes”, but the proof is a bit more involved.

    n Theorem 5.19 (Erd˝os-Ko-Rado). For two integers n and 0 < k 2 the n 1 ≤ maximum size of an intersecting k-family of [n] is k−1 . − Note that we have the requirement of 2k n since the problem becomes trivial otherwise: Two sets of size bigger than ≤n/2 always intersect.

    124 k Proof. The lower bound is easy, just take x = A [n] x A, A = k , for k F { ⊆ | ∈ | | } some fixed x [n]. Clearly, x is an intersecting family of the desired size. For the upper∈ bound weF use a similar approach as in Sperner’s Theorem. Recall that a circular permutation σ of [n] is an arrangement of the numbers 1, . . . , n on a circle where we do not distinguish between circular shifts, e.g. 53142 31425. The number of circular permutations of [n] is (n 1)!. We≡ say a circular permutation σ meets a set A [n] if the elements− of A appear consecutively on σ. For instance A = 1, 2, 4⊆is met by σ = 25341 but B = 2, 3, 4 is not. { } Now{ fix } to be an intersecting k-family. F Claim. For any circular permutation σ, the number of sets A met by σ is at most k. ∈ F

    Proof by Picture. Fix one k-set A0 met by σ, we draw it in red (here k = 5, n = 12):

    Consider a different k-set B that is also met by σ (here drawn in blue), it must intersect A . There∈ are F 2(k 1) choices for such a B,(k 1) for 0 − − intersecting A0 at the “clockwise” side and (k 1) for intersecting A0 at the “counterclockwise” side. But note that if B − intersects some part of A ∈ F 0 until some gap between two consecutive elements on σ, then the set B0 that intersects the opposite part of A starting from that gap cannot be in (shown 0 F dashed in the picture). This is because B0 B = . Here we use that 2k n ∩ ∅ ≤ so B and B0 do not intersect on the opposite side either. This shows that there are at most 2(k 1)/2 = k 1 sets met by σ other than A0, so at most k in total.− − (end claim) Just like in the proof of Sperner’s Theorem, the number of circular permu- tations meeting A is A !(n A )! = k!(n k)!. We now double count the set ∈ F | | − | | − := (A, σ) A and σ is circular permutation meeting A . S { | ∈ F } First Way:

    = A σ meets A k = (n 1)! k. |S| |{ ∈ F | }| ≤ − · σ σ X X Second Way:

    = σ σ meets A = k!(n k)! = k!(n k)! |S| { | } − |F| · − A A X∈F X∈F Putting this together we obtain: (n 1)!k (n 1)! n 1 F − = − = − . | | ≤ k!(n k)! (k 1)!(n k)! k 1 − − −  − 

    125 5.3.1 Symmetric Chain Partition We have already seen in Sperner’s Theorem that the width of the Boolean lattice n Bn is n . Using Dilworth’s Theorem, this means Bn can be partitioned into 2 n d e n chains. We are going to prove a version of this that is stronger in two ways. 2  Firstly,d e we are going to look at a generalization of Boolean lattices. Secondly, we  find a particularly interesting partition into chains. The next definition captures what we mean by “interesting”. Definition 5.20. Let P = (X, ) be a poset. Then the rank of x X is the size of a largest • chain beneath≤x, i.e. ∈

    rank(x) = max C C is a chain with max(C) = x , x X. {| | | } ∈ Consider for instance the following poset in which we annotated each element with its rank: 3 4 3 2 2 1

    1

    Define rank(P ) to be the maximum rank of an element of P (this is just • a new name for the height of the poset). The above poset has rank 4. A poset P is ranked if all maximal chains ending in an element x have the • same size. The poset shown above is not ranked since there are maximal chains of size 2, 3 and 4 all ending in the same element. Another way to think about it, is that a poset is ranked iff for any cover relation x l y the elements x and y have adjacent ranks. A chain C is unrefinable if there is no z X C with min(C) < z < • max(C) such that C z is a chain. Note∈ that− if P is ranked, then C is unrefinable if and only∪ { if} it does not skip any rank. A chain C is symmetric if it is unrefinable and • rank(min(C)) + rank(max(C)) = rank(P ) + 1.

    In Boolean lattices this is a very intuitive concept. A symmetric chain decomposition is a partition of the elements of P into • symmetric chains. Figure 37 shows decompositions of some Boolean lat- tices.

    We define B(m1, m2, . . . , mk) to be the poset on all submultisets of M = m1 1, m2 2, . . . , mk k ordered by inclusion. Here, mi is the repetition number of{ the· element· i. The· elements} 1,2,. . . ,k do not really need to be numbers, we might as well have chosen instead of 1 and instead of 2. The elements just need to be distinguishable and numbers happen to be convenient. Recall that if A, B are multisets with types 1,..., k and repetition numbers a1, . . . , ak for A and b , . . . , b for B then by A B we mean a b for all i. 1 k ⊆ i ≤ i

    126 Note that B(1,..., 1) = B since in that case M = 1,..., k . k { } k Figure 38 depicts B(1, 2, 1), a three dimensional poset that is similar to B3 except it has an| additional{z } “plane” since we can have the element 2 not only zero or one times, but also twice. There are many other ways to explain what these posets are, maybe you prefer one of the following perspectives:

    The poset B(m1, . . . , mk) is isomorphic to the poset of all of • pm1 pm2 ... pmk ordered by divisibility, where p , . . . , p are distinct 1 · 2 · · k 1 k primes. This is illustrated for B(1, 2, 1) and p1 = 2, p2 = 3, p3 = 5 on the right of Figure 38. B(m , . . . , m ) is the product of chains: • 1 k B(m , . . . , m ) = C ... C 1 k 1 × × k

    where each Ci is a chain on mi +1 elements. By the product P Q of two posets P and Q we mean the posets with elements (x, y) x ×P, y Q and dominance order, so { | ∈ ∈ }

    (x, y) P Q (x0, y0): x P x0 and y Q y0. ≤ × ⇔ ≤ ≤ Using particularly easy chains we can view B(m , . . . , m ) as the set [m + • 1 k 1 1] [m + 1] ... [m + 1] with dominance order k . × 2 × × k ≤R Note that B(m , . . . , m ) is ranked. A set A = r 1, . . . , r k has rank 1 k { 1 · k · } rank(A) = r1+...+rk+1 and the rank of the entire poset is rank(B(m1, . . . , mk)) = m1 + ... + mk + 1.

    Theorem 5.21. B(m1, . . . , mk) has a symmetric chain decomposition.

    Proof. We do induction on k. If k = 1, then B(m1) is a (symmetric) chain of length m1 + 1. So the trivial partition works. For k 2, consider the subposet P = B(m1, . . . , mk 1, 0) of B(m1, . . . , mk), consisting≥ of those multisets with repetition number 0− for type k. Clearly P is isomorphic to B(m1, . . . , mk 1) and has, by induction hypothesis, a symmetric chain decomposition. −

    1234 123 12 123 124 134 234 1 12 13 23 1 2 12 13 14 23 24 34 1 2 3 ∅ 1 2 3 4 ∅ ∅ ∅

    Figure 37: Symmetric chain decompositions of B1, B2, B3 and B4.

    127 1, 2 2, 3 { · } 90 2 2, 3 1, 2, 3 1, 2 2 { · } { } 45 30 { · } 18 2, 3 1, 3 15 10 2 2 1, 2 { } { } 9 6 { · } { } 3 5 3 2 2 1 { } { } { } 1 {} Figure 38: Left: The Hasse Diagram of the poset B(1, 2, 1). Right: Divisors of 90 = 21 32 51 ordered by divisibility. · ·

    Now the idea is really simple, see Figure 39. Given a symmetric chain composition of P , we first partition B(m1, . . . , mk) into “curtains” that run along the chains in P and then we partition the curtains (which are essentially two dimensional grids) into symmetric chains. Now for the formal argument. If P = C1 ... CR is a partition into symmetric chains, then B(m1, . . . , mk) = Cur ...∪ Cur∪ is a partition into R sets where 1 ∪ ∪ R Cur := A j k A C , 0 j m . i { ∪ { · } | ∈ i ≤ ≤ k} Now focus on one of the chains C = C with elements A A ... A . The i 1 ⊂ 2 ⊂ ⊂ l corresponding curtain forms a grid of size mk l standing on its tip as shown in Figure 40. Note that it is not necessarily× square. The grid can easily be partitioned into unrefinable chains that look like hooks as shown. If mk + 1 = l the last hook will be a chain of size 1, otherwise, the last hook will like straight in the picture, either pointing to the top left if l > mk + 1 or to the top right if l < mk + 1. We only need to check that all these “hook-chains” are symmetric. Consider the first hook

    A A ... A A k ... A m k . 1 ⊂ 2 ⊂ ⊂ l ⊂ l ∪ { } ⊂ ⊂ l ∪ { k · } Its minimum has rank A + 1 and its maximum has rank A + m + 1. Re- | 1| | l| k member that C was symmetric in P so we had A1 + 1 + Al + 1 = rank(P ) + 1 and get: | | | |

    A + 1 + A + m + 1 = rank(P ) + 1 + m = rank(B(m , . . . , m )) + 1. | 1| | l| k k 1 k This proves that the first hook is a symmetric chain. Subsequent hooks have their minima at higher ranks but the ranks of the maxima are correspondingly lower so it is easy to see that they are symmetric as well.

    Figure 37 shows the results of this construction for the Boolean lattices B1, B2, B3 and B4. You may want to verify your understanding by constructing these symmetric chain partitions yourself. Note that in the case of Boolean lattices all curtains have height 2 and will therefore be partitioned into only one or two hooks.

    128 Figure 39: On the left: Partition of B(2, 3) into three symmetric chains. On the right: The corresponding partition of B(2, 3, 5) into three “curtains”.

    Since Bn grows exponentially in n, it may be impractical to compute the symmetric chain partition of Bn explicitly for large n. However, there is a simple implicit characterization that allows us to compute for any set A [n] what the other sets in the symmetric chain containing A are. We state it⊆ here without proof. The first step is to associate with A [n] the characteristic zero-one string c that encodes for each i 1,..., 12 ⊆in increasing order whether or not i is A ∈ { } in A. For A = 1, 2, 7, 8, 10, 11 [12] this would be cA = 110000110110. Now, roughly{ speaking, think} ⊆ of the 0s as opening parenthesis and 1s as closing parenthesis, then cA is a (not necessarily well-formed) parenthesis ex- pression where some matching pairs can be found. In our example this would be:

    1 1 0 0 0 0 1 1 0 1 1 0 1 2 3 4 5 6 7 8 9 10 11 12

    The matched positions are those elements that do not vary within the chain. The matched 1s form the minimum m of the chain, here m = 7, 8, 10, 11 , the matched 0s are the elements missing from the maximum of the{ chain, so} here M = [12] 4, 5, 6, 9 = 1, 2, 3, 7, 8, 10, 11, 12 . The unmatched positions, here 1, 2, 3, 12\{will vary} within{ the chain and are} added from left to right. So in { } the symmetric chain partition of Bn the set A will be part of the symmetric

    129 A m k l ∪ { k · }

    Al 1 mk k ··· − ∪ { · }

    Al 2 k Al 2 mk k ∪ { · } ··· − ∪ { · }

    Al k Al 1 2 k · · · ∪ { } − ∪ { · } ···

    Al Al 1 k Al 2 2 k ... A3 mk k − ∪ { } − ∪ { · } ∪ { · }

    Al 1 Al 2 k · · · A2 mk k − − ∪ { } ··· ∪ { · }

    Al 2 · · · A3 2 k A1 mk k − ∪ { · } ··· ∪ { · }

    · · · A3 k A2 2 k ∪ { } ∪ { · } ···

    A A k A 2 k 3 2 ∪ { } 1 ∪ { · }

    A A k 2 1 ∪ { }

    A1

    Figure 40: The elements of a curtain can be partitioned into symmetric chains as shown. chain:

    7, 8,10, 11 1, 7, 8, 10, 11 1, 2, 7, 8, 10, 11 { } ⊂ { } ⊂ { } 1, 2, 3, 7, 8, 10, 11 1, 2, 3, 7, 8, 10, 11, 12 . ⊂{ } ⊂ { } 5.4 General Lattices A poset is a lattice if any two elements x, y have a least upper bound, i.e.

    x, y P : z :(z x and z y and ( z0 :(z0 x and z0 y) z z0)). ∀ ∈ ∃ ≥ ≥ ∀ ≥ ≥ ⇒ ≤ We write z = x y and call z the join of x and y (it is necessarily unique). We also require that∨ any two elements have a largest lower bound, i.e.

    x, y P : z :(z x and z y and ( z0 :(z0 x and z0 y) z z0)). ∀ ∈ ∃ ≤ ≤ ∀ ≤ ≤ ⇒ ≥ We write z = x y and call z the meet of x and y (it is necessarily unique). Note that this implies∧ that lattices have a unique minimum which we call 0 and a unique maximum which we call 1.

    130 The following poset P is not a lattice although it has a maximum and mini- mum. Note that a and b have no join, the upper bounds of a and b are c, d, 1 but there is no least upper bound (since c and d are incomparable). For{ similar} reasons, c and d have no meet. If we modify P by adding an element x as shown we obtain a lattice L. In it, we have for instance a b = x, c d = x. ∨ ∧ 1 1 c d c d P := L := x a b a b 0 0 Note that for two elements x, y of a lattice with x y we always have x y = x and x y = y. This also implies that 1 is the neutral≤ element of the -operation∧ ∨ ∧ and 0 is the neutral element of the operation. The Boolean lattices Bn really are lattices where = and = ,∨ since A B really is the least set containing A and B and A ∧B is∩ the largest∨ ∪ set contained∪ in A and B. We now consider∩ Young lattices Y (m, n). Its elements are Ferrer diagrams with at most m rows and at most n columns ordered by inclusion. Figure 41 shows Y (2, 3).

    ∅ Figure 41: The Young lattice Y (2, 3).

    Young lattices are ranked. Diagrams with s squares have rank s + 1.

    131 6 Designs

    Assume you are the leader of a local brewery and have just invented seven kinds of new beers. You are pretty sure all of them are awesome but are curious whether other people share your opinion. There are some experts, each of which can judge 3 beers (beyond that point they become tipsy and you do not trust their judgment). You want to make sure that each beer is evaluated in contrast to each other beer, i.e. for each pair of beers there should be one expert that tries both beers. Actually, make sure that there is exactly one expert for each pair, since experts are expensive and you don’t have any money to waste. So can you assign beers to the experts and meet this requirement? It turns out your problem has a solution with seven experts as shown in Figure 42.

    b7

    b4 b6 b5

    b1 b2 b3

    Figure 42: The beers b1, . . . , b7 are represented by points. Each set of beers that one expert tries is represented by a straight line or circle. Note that any pair of points uniquely determines a line or circle containing that pair.

    In terms of the following definition we have just found a 2-(7, 3, 1)-design: v = 7 beers, k = 3 beers per expert, and each pair (t = 2) of beers tasted by λ = 1 of the experts. This particular design is known as the .

    Definition 6.1. For numbers t, v, k, λ N, a t-(v, k, λ) design with point set V is a multiset of sets (“blocks”) of points∈ with B (i) V = v, | | (ii) B = k for each B , | | ∈ B (iii) each set of t points is a subset of exactly λ blocks.

    A t-(v, k, λ) design is also denoted by Sλ(t, k, v) and sometimes called bal- anced . Example 6.2. Designs with k = v, i.e. where each block contains all points, are called • trivial. This gives a t-(v, v, )-design for any t [v]. From now on we assume k < v. |B| ∈

    [v] v t Taking all point sets of size k, i.e. = k , yields a t-(v, k, k−t )-design • for any t [k] (verify this!). B − ∈   4 ~ Consider the point set V = F2 0 , so all bit-strings of length four • with addition modulo 2, except\{ for }~0 = (0, 0, 0, 0). We have for in- stance (1, 1, 0, 0) + (1, 0, 1, 0) = (0, 1, 1, 0). Now consider the blocks = B x, y, z x, y, z F4, x + y + z = ~0 . {{ } | ∈ 2 }

    132 Note that any pair of points x = y uniquely determines a third point z 6 4 such that their sum is zero, namely z = x + y (note that x = x in F2 so x = y and z = x + y = 0). So is a 2-(v = 15, k = 3, λ = 1)-design.− 6 − 6 B 6.1 (Non-)Existence of Designs As a first step we want to understand under which conditions on λ, t, k and v designs with those parameters may exist. In other words we are looking for necessary conditions on the parameters λ, t, k and v. Theorem 6.3. If is a t-(v, k, λ)-design then B (i) the number of blocks in is = λ v / k , B |B| · t t (ii) if I V is a set of size i t, then the  number  of blocks containing I is ⊆ v i k i ≤ ri = λ t−i / t−i . · − − Note that for i =t we get rt = λ as it should be. Also note that the fact that every point is contained in r1 blocks means that is an r1- regular hypergraph (you can safely ignore this remark if don’tB know what hypergraphs are).

    Proof. (i) We double count the set

    := (T,B) B ,T B, T = t . S { | ∈ B ⊆ | | } Firstly, each set T V of size t (and there are v of those) is contained ⊆ t in exactly λ blocks since that is what a design requires. So = v λ.  |S| t Secondly, each blocks contains k subsets of size t, so = k . t |S| |B| · t Together we get: k = v λfrom which the claim follows.  |B| · t t (ii) Fix an i-set I and double  count  the set

    := (T,B) B ,I T B, T = t . S { | ∈ B ⊆ ⊆ | | } v i Firstly, there are t−i sets T of size t containing I and for each such T − v i there are λ blocks B containing T . This gives = t−i λ.  |S| − · k i Secondly, for each of the rI blocks B with I B there are t−i sets T ⊆ k i − of size t with I T B. This gives = rI t−i . ⊆ ⊆ |S| · −  v i k i  Together we obtain t−i λ = rI t−i . From this we see that rI actually only depends on I =−i and· the claim· − follows. | |   Remark. We consider t = 2 to be the default case. Sometimes a 2-(v, k, λ) design is just called a (v, k, λ)-design. In the other notation the parameter λ = 1 can be omitted so an S1(t, k, v)-design is simply a S(t, k, v)-design. In that case we also call it (hence the “S”). Corollary 6.4. If is a (v, k, λ)-design (so t = 2) the results of the last Theorem become B

    v(v 1) v (i) = λ k(k−1) = r k where r is: |B| −

    133 v 1 (ii) r = r1 = λ k−1 . − Remark 6.5. For any t-(v, k, λ)-design all the numbers we derived above, such as , ri (1 i t) are integers. In particular, if some choice of parameters t, k,|B| v, λ does≤ not≤ yield integers, no design with those parameters exists. v 1 For example, in any (v, k = 3, λ = 1)-design, we have r1 = −2 , which is (v 1)v integer only if v is odd, and = −6 which is integer only if v 0, 1, 3, 4 (mod 6), so together we need|B|v 1, 3 (mod 6). Note that for v∈= { 3 we get} the trivial design and for v = 7 we∈ { get} the Fano plane, so things seem to fit so far. However, the necessary conditions for the existence of designs are not yet sufficient. In the following Theorem we derive another necessary condition, this time a lower bound on v. Theorem 6.6. In any non-trivial t-(v, k, λ = 1)-design we have v (t + 1)(k t + 1). ≥ −

    Proof. First observe that for A, B with A = B we have A B t 1. Otherwise some set T A B of∈ size B t would be6 contained in| both∩ |A ≤and−B, contradicting λ = 1. ⊆ | ∩ | Now choose some B and a set S of size t+1 such that S B = t. This 0 ∈ B | ∩ 0| implies that S is not contained in any B (otherwise the t-set S B0 would be contained in both B and B). Still each∈ of B the t-sets T S must∩ be contained 0 ⊆ in some block BT . Consider two such blocks BT and BT 0 . Each contains k elements, t of which are in S and k t are outside of S. The intersection − B B 0 certainly contains T T 0 which is already of size t 1. This means T ∩ T ∩ − (by the observation in the beginning of the proof) that BT and BT 0 must not have any more further intersections outside of S.

    B c,d,a B b,c,d { } { }

    d c

    a b

    B d,a,b B0 = B a,b,c { } { }

    Figure 43: Consider the case t = 3 and k = 9. After choosing some B 0 ∈ B we find a set S = a, b, c, d of size t + 1 that intersects B0 in exactly t = 3 points a, b, c. Now{ no block} may contain S (as argued above) but each set a, b, c , b, c, d , c, d, a , d, a, b must be contained in some block. These four blocks{ } must{ be} disjoint{ } outside{ } of S (they already intersect in t 1 elements within S). So we count k t = 6 elements for each of them plus− the elements from S, so v 24 + 4 = 28.− ≥

    This means for each of the blocks BT (and there are t + 1 of them) that there are k t elements outside of B that are not contained in any other B 0 . − T T

    134 Together with the elements from S this gives

    v (t + 1) + (t + 1)(k t) = (t + 1)(k t + 1). ≥ − − S k t for each BT ∈ − So we can conclude,| {z } for instance,| {z that} no design with parameters λ = 1, t = 10, v = 72 and k = 16 exists (even though the divisibility conditions of Theorem 6.3 are satisfied). Remark. In a recent, so far unpublished paper10 Peter Keevash shows that for any parameters λ, t, k there exist t-(v, k, λ) designs for sufficiently large v. The proof is very sophisticated so we will not include it here. In the next Theorem, we find a lower bound to the number of blocks of a design. Theorem 6.7 (Fisher’s inequality). Let be a non-trivial (v, k, λ)-design (so k < v). Then v. B |B| ≥ Proof. Consider the incidence matrix A with v rows, columns and values |B| (ap,B)p V,B defined as ∈ ∈B 1, p B, ap,B := ∈ (0, otherwise.

    T If ap, aq are rows in A, then ap aq is the number of blocks containing both p and q. This value is λ for p = q ·and r = r for p = q. So we have 6 1 r λ λ ······ .  λ r .. λ  . . T  . . . . .  A A =  ......  . ·    .   . .   . .. r λ     λ λ r   ······    T v 1 The determinant is det(A A ) = rk(r λ) − (verify this!). · v −1 By Corollary 6.4(i) we have r = λ k−1 > λ. So the determinant is not zero and thus the matrix has full rank, i.e.− rank v. This means that the matrix A must also have rank at least v. In particular A has at least v columns and therefore v as claimed. |B| ≥ 6.2 Construction of Designs So far we talked about necessary conditions for the existence of designs without actually constructing any – which is our goal in the following. We have to start with a few definitions, though. Definition 6.8. A(v, k, λ)-design with = v is called symmetric. For those designs the • matrix A we discussed|B| above is square.

    10http://arxiv.org/abs/1401.3665 submitted on 15 Jan 2014.

    135 A symmetric (v, k, 1)-design is called projective plane. The Fano plane • from Figure 42 is such a projective plane with 7 points and 7 blocks (“lines”).

    A(v, k = 3, λ = 1)-design or, equivalently, a Steiner System with param- • eters S(t = 2, k = 3, v) is called a Steiner Triple System.

    In the following we write Zn for the with n elements, in other words, the numbers modulo n.

    Definition 6.9. Let 2 k < v and λ 1. A (v, k, λ)-difference set is a set ≤ ≥ D = d1, . . . , dk Zv such that each element in Zv 0 has multiplicity λ in { } ⊆ \{ } the multiset di dj i, j [k], i = j . In other words, each non-zero number can be written{ as− the| difference∈ of6 two} numbers from D in exactly λ ways.

    Example 6.10. The set 1, 3, 4, 5, 9 Z11 is a (11,5,2)-difference set, since each of the numbers can be{ written as} a⊆ difference in two ways, as follows:

    Number First Way Second Way 1 4 3 5 4 2 3 − 1 5 − 3 3 1 − 9 4 − 1 4 5 − 1 9 − 5 5 9 − 4 3 − 9 6 9 − 3 4 − 9 7 1 − 5 5 − 9 8 1 − 4 9 − 1 9 1 − 3 3 − 5 10 3 − 4 4 − 5 − − Theorem 6.11. If D = d , . . . , d is a (v, k, λ)-difference set then { 1 k} := D, 1 + D, 2 + D, . . . , v 1 + D B { − } is a symmetric (v, k, λ)-design. Here, by a+D we mean the set a+d d D . { | ∈ } Proof. Note that there are k (k 1) ways to form terms i j with i = j and i, j D. On the other hand, each· − non-zero number can be written− in exactly6 λ ∈ k(k 1) ways like this, so λ(v 1) = k (k 1) which implies λ = v −1 < k. From this we conclude− that· the− blocks we proposed above− are all distinct: Assume a + D = b + D for a = b. Then we have a permutation π S such 6 ∈ k that a + di = b + dπ(i) for all i [k]. Then a b = dπ(i) di for each i [k] so we have k > λ ways to write a ∈b as a difference,− contradicting− the property∈ of the difference set. − By construction we have v points in total and blocks of size k. The only thing left to check is that each pair of two distinct elements x, y Zv is contained in exactly λ of the blocks. Let d = x y be the difference of∈ those two elements. Then there are exactly λ ways to pick− two elements x , y D with difference d i i ∈ and for each such pair there is a unique shift a such that x = xi + a, y = yi + a. This implies that x, y a + D for exactly λ choices of a. ∈

    136 Example. If v is a with v 3 (mod 4) then ≡ 2 2 a a Zv, a = 0 { | ∈ 6 } v 1 v 3 is a (v, k, λ)-difference set with k = −2 and λ = −4 . We do not prove this.

    6.3 Projective Planes We should mention that projective planes actually come from . There, a projective plane of order q is defined as a set of q2 + q + 1 points with a family of subsets of points (lines) such that (i) each line is of size q + 1 and (ii) each pair of points is contained in a unique line. In our setting, projective planes of order q are symmetric (v, k, λ = 1)-designs where k = q + 1. In that case we can indeed conclude

    sym. v v = Cor.=6.4 r r = k = q + 1 |B| k ⇒ so each point is contained in q + 1 lines and

    Cor.6.4 v 1 r = − v = (q + 1)q + 1 = q2 + q + 1, k 1 ⇒ − so the number of points matches as well. Before we continue, note the following property: Claim. Any two lines (blocks) of a projective plane intersect in a unique point. Proof of Claim. Consider two distinct lines L ,L and some x L L . Then 1 2 ∈ 1 \ 2 consider the family of lines (Ly)y L2 where Ly is the unique line containing x ∈ and y. These lines are all distinct (otherwise we would find some y1, y2 contained in two lines: L2 and Ly1 ), so we found q +1 lines that all contain x. This means we must have counted L1 in the process, otherwise there would be q + 2 lines containing x (contradicting r = q + 1). We conclude L L = . Clearly 1 ∩ 2 6 ∅ L1 L2 1, otherwise we would have a pair of points contained in L1 and L2 (contradicting| ∩ | ≤ λ = 1).

    Construction of Projective Planes

    In the following construction of a projective plane of order q we use Fq to denote a field of size q where q is a prime power. The points of the plane are equivalence classes of the set X = (x , x , x ) { 0 1 2 | xi Fq, (x0, x1, x2) = 0 where two vectors are equivalent if they are scalar ∈ 6 } multiples of one another. Formally, a point [x0, x1, x2] is the set:

    [x0, x1, x2] = c (x0, x1, x2) c Fq 0 . { · | ∈ \{ }}

    Note the same point can be described in different ways, for instance [x0, x1, x2] = 3 [2x0, 2x1, 2x2] (for q = 2). Since X = q 1 and q 1 elements represent the 3 6 | | − q 1 −2 same point (are in the same class), we have q −1 = q + q + 1 points in total, as desired. − 3 For (a0, a1, a2) F ~0 we define the line L([a0, a1, a2]) as ∈ q \{ } L([a , a , a ]) = [x , x , x ] a x + a x + a x = 0 . 0 1 2 { 0 1 2 | 0 0 1 1 2 2 }

    137 Note that lines are well defined, that is the definition respects the equivalence classes (either all elements representing a point satisfy the equation of the line or neither of them). 2 The number of solutions to a0x0 + a1x1 + a2x2 is q since, without loss of generality, a = 0 and thus x and x can be chosen arbitrarily and x is then 2 6 0 1 2 uniquely determined. Disregarding the solution x0 = x1 = x2 = 0 (which does not represent any point) we have q2 1 solutions in X that are part of lines, q2 1 − meaning q −1 = q + 1 points are part of the line, as desired. Now consider− two points [x , x , x ] = [y , y , y ]. We show that they are 0 1 2 6 0 1 2 contained in a unique line. Indeed, if L[a0, a1, a2] contains both, then we have a0x0 + a1x1 + a2x2 = 0, a0y0 + a1y1 + a2y2 = 0. This is a homogeneous sys- tem with two equations and three variables a0, a1, a2, so there exists a solution (a , a , a ) = 0 and all solutions are of the form c (a , a , a ). All those solution 0 1 2 6 · 0 1 2 triples define the same line, so L[a0, a1, a2] is uniquely determined. This concludes the construction (and the verification thereof). Remark. It is conjectured that the order of every projective plane is a prime power, but no proof is known.

    Theorem 6.12 (Bruck-Ryser-Chowla-Theorem). If v, k, λ N with λ(v 1) = k(k 1) and a symmetric (v, k, λ)-design exists then ∈ − − if v is even, then k λ is a square, • − 2 2 v−1 2 if v is odd, then z = (k λ)x + ( 1) 2 λy has a solution (x, y, z) = 0. • − − 6 6.4 Steiner Triple Systems We leave projective planes for now and come back to Steiner Triple Systems, i.e. (v, k = 3, λ = 1)-designs. From Remark 6.5 we know that v 1, 3 (mod 6). The following Theorem shows that this necessary condition on∈ {the existence} is also sufficient.

    Theorem 6.13. For each v N with v 1, 3 (mod 6) a Steiner Triple System with v points exists. ∈ ∈ { }

    Proof. The construction is different for v 1 and v 3 (mod 6). We start with the latter case. ≡ ≡ The points are V = Z2n+1 Z3 so v = (2n + 1) 3 = 6n + 3 3 (mod 6). There are two types of blocks: × · ≡

    type 1: (x, 0), (x, 1), (x, 2) for each x Z2n+1, { } ∈ x+y type 2: (x, i), (y, i), ( , i + 1) for all x = y and each i Z3. { 2 } 6 ∈ We need to show that each pair of elements is contained in a unique block. So consider a pair (x, i) = (y, j). 6 Case 1: x = y: Then the pair is contained in the block (x, 0), (x, 1), (x, 2) (of type 1) and in no other block. { }

    x+y Case 2: x = y, i = j: The pair is contained in the block (x, i), (y, i), ( 2 , i+ 1) (of6 type 2) and in no other block. { }

    138 Case 3: x = y, i = j: Assume without loss of generality that j = i + 1 (since 6 6 i, j Z3 we always have this or j = i 1). Then the pair is contained ∈ x+y0 − in the block (x, i), (y0, i), ( 2 , i + 1) where y0 = 2y x. The pair is contained in no{ other block. } − This concludes the case of v 3 (mod 6), we proceed with v 1 (mod 6). The construction and verification≡ is more complicated. ≡ As point set, choose V = (Z2n Z3) . To simplify notation we write xi × ∪{∞} to denote the pairs (x, i) Z2n Z3. The element is special and we assume x + = for any x ∈V . Before× we define the blocks∞ of the Triple System, we define∞ ∞ four types of ∈base blocks first: 0 , 0 , 0 , •{ 0 1 2} , 0 , n , , 0 , n , , 0 , n , • {∞ 0 1} {∞ 1 2} {∞ 2 0} 0 , x , ( x) , 0 , x , ( x) , 0 , x , ( x) , for each x [n 1], •{ 0 1 − 1} { 1 2 − 2} { 2 0 − 0} ∈ − n , x , (1 x) , n , x , (1 x) , n , x , (1 x) , for each x [n]. •{ 0 1 − 1} { 1 2 − 2} { 2 0 − 0} ∈ The blocks of the Steiner Triple System are

    = a + B a 0, 1, . . . , n 1 ,B is base block . B { 0 | ∈ { − } } where the addition of elements is defined coordinate wise, in particular a0 + xi = (a+ x)i. We need to check that there is a unique block containing u, v with u = v. We do not consider all cases here, just two important ones. 6

    Case 1: u = , v = (x, i): If x n 1, then the block is x + , 0 , n . ∞ ≤ − 0 {∞ i i+1} If x n then we find (x n)0 + , 0i 1, ni . ≥ − {∞ − } Case 2: u = xi, v = yi: Without loss of generality x < y. Either y x = 2s (the difference is even), then x , y is contained in the block (y− s) + i i − 0 0i 1, si, ( s)i . Or y x = 2s 1 then xi, yi is contained in the block { − − } − − (y s)0 + ni 1, si, (1 s)i . ...  − { − − } 6.5 Resolvable Designs Consider a game played by three players (like skat). Assume there is a tourna- ment for this game, with n players. Each set of three players should play exactly λ times against each other. There are several time slots during which the games are scheduled. Of course every player can only play one game at a time. In an optimal schedule all players have a game in each time slot. In the sense of the following definition, such a schedule corresponds to a resolvable (n, 3, λ)-design, where parallel classes correspond to the time slots.

    Definition 6.14. A parallel class of a (v, k, λ)-design is a set of disjoint blocks forming a partition of the point set V . A partition of into parallel classes is called resolution. A design is resolvable if it has a resolution.B

    139 Note that projective planes are examples for non-resolvable designs since no disjoint blocks exist. An example for a resolvable (v = 4, k = 2, λ = 1)-design with the corresponding parallel classes is shown in the next picture, where the points are A, B, C, D and the blocks are depicted by edges. D C D C D C D C and and −→ A B A B A B A B This is a special case of a (v = q2, k = q, λ = 1)-design. Such designs are called affine planes and are, as we show now, always resolvable. Theorem 6.15. Any (v = q2, k = q, λ = 1)-design is resolvable. Proof. The main observation is the following:

    Claim. For each block B and x / B there is a unique block B0 with x B0 ∈ ∈ and B B0 = . ∩ ∅ Proof of Claim. For each y B there is a unique block B such that x, y ∈ y { } ∈ By. These blocks are distinct (since By B 1) and all contain x. Because 2 | ∩ | ≤ Cor.6.4 v 1 q 1 of r = λ k−1 = q −1 = q + 1 there is exactly one block left that contains x − − that is different from each By. It is disjoint from B. With the claim proved, everything falls into place. Start with a maximal set of pairwise disjoint blocks. This set forms a parallel class (this is not obvious, but easy to prove). Then, in the next phase, take some other set of disjoint blocks not yet considered. This forms a parallel class in the same way. Continue like this until all blocks are handled (it is easy to verify that this works).

    6.6 Latin Squares In the following we consider a structure that seems unrelated to designs at first, but the final Theorem will establish an interesting connection to affine planes. A of order n 1 is an n n array filled with numbers from the ≥ × cyclic group Zn = 0, . . . , n 1 such that each row and each column contains { − } each number from Zn exactly once. Consider for instance the following two Latin squares 1 0 2 3 1 0 3 2 0 1 3 2 3 2 1 0 A = B = 2 3 1 0 0 1 2 3 3 2 0 1 2 3 0 1 . We denote the set of positions containing the number i in a Latin square L by L(i), in the example above we have for instance B(3) = (1, 3), (2, 1), (3, 4), (4, 2) . Another way to think about Latin squares is to say that{ they are a partition} L(0) ... L(n 1) of [n] [n] such that each set L(i) are a set of positions such that∪ rooks∪ (from− the game× ) placed onto those positions cannot attack each other. Every solution to a Sudoku puzzle is a Latin square with n = 9, although the reverse is not true: Latin squares do not respect the constraint for the 3 3 boxes. There× are many ways to transform a Latin square to similar Latin squares: You could permute the rows or columns or “rename” the numbers (swap A(1)

    140 and A(0) for instance). At least the latter operation seems uninteresting as renaming numbers yields the same partition, just with different labels. To capture the idea of “entirely different Latin squares, we propose the notion of defined in the following. Let A, B be Latin squares of order n with entries A = (aij)i,j, B = (bij)i,j. The juxtaposition of A and B is the n n array where each position simply contains the corresponding numbers of A×and B, i.e.

    A B = ((aij, bij)) with entries in Zn Zn. ⊗ i,j × For instance, with A and B from the example above we get

    (1,1) (0,0) (2,3) (3,2) (0,3) (1,2) (3,1) (2,0) A B = ⊗ (2,0) (3,1) (1,2) (0,3) (3,2) (2,3) (0,0) (1,1) .

    We say A, B are orthogonal if A B contains each pair from Zn Zn exactly once. The two Latin squares above⊗ are not orthogonal because some× pairs occur twice (e.g. (2, 0)) and, necessarily, some pairs therefore occur not at all (e.g. (1, 3)). An example for two Latin squares A and C with A orthogonal to C is

    1 0 2 3 1 0 2 3 (1,1) (0,0) (2,2) (3,3) 0 1 3 2 3 2 0 1 (0,3) (1,2) (3,0) (2,1) A = ,C = ,A C = 2 3 1 0 0 1 3 2 ⊗ (2,0) (3,1) (1,3) (0,2) 3 2 0 1 2 3 1 0 (3,2) (2,3) (0,1) (1,0) .

    Note that this notion of orthogonality has little to do with geometric notions orthogonality you may be familiar with. Observe the following: If A is orthogonal to B, then B is orthogonal to A. • If A is orthogonal to B, then “renaming” numbers in A or B (for instance • replace every 0 with a 1 and vice versa) preserves orthogonality. Remark. For n 2, 6 there are no two orthogonal Latin squares of order n. For n = 2 this is∈ easy { to} see since the only Latin squares with n = 2 are

    0 1 1 0 L = and L = 1 1 0 2 0 1 .

    They are not orthogonal to each other nor to themselves (only for n = 1 can a Latin square be orthogonal to itself). For n = 6 the argument is non-trivial. For n N 2, 6 , a pair of orthogonal Latin squares of order n exists. The proof of this∈ is\{ highly} non-trivial, in fact it took over a hundred years to show that there is a pair of orthogonal Latin squares of order 10.

    Our goal in the following is to construct a large set A1,...,Ak of Latin squares of order n that are MOLS (mutually orthogonal Latin squares), meaning A is orthogonal to A for i = j. i j 6 Theorem 6.16. Let n be a positive integer, r [n 1] non-zero and co-prime to n, i.e. gcd(n, r) = 1. Then Lr = (r i + j (mod∈ −n)) is a Latin square. n · i,j

    141 r To clarify how Ln looks, consider the example n = 5 and r = 2. “Going right” corresponds to +1 and going down corresponds to +2 which gives

    3 4 0 1 2 0 1 2 3 4 2 L5 = 2 3 4 0 1 4 0 1 2 3 1 2 3 4 0 .

    Proof. We have to show that each row and column contains each number exactly once. For rows this is clear, since the i-th row contains r i+1, r i+2, . . . , r i+n, which traverses all numbers modulo n. The column· j contains· r + j,·2r + j, . . . , n r + j, all modulo n. Assume two of those numbers are identical, say i r + j· i r + j we conclude (i i ) r 0 (mod n). Since gcd(n, r) = 1 1 · ≡ 2 · 1 − 2 · ≡ (meaning r has an inverse modulo n) we actually had i1 = i2. So no column r contains a number twice and Ln really is a Latin square of order n. 1 n 1 Theorem 6.17. If n is prime, then L ,...,L − are n 1 MOLS of order n. n n − Proof. Since n is prime, gcd(n, i) = 1 for all i [n 1] so by Theorem 6.16 1 n 1 ∈ − Ln,...,Ln− are Latin squares of order n. Now consider two of those squares Lr and Ls with r = s. We need to show n n 6 that they are orthogonal, so suppose some pair of numbers from Zn Zn appears r s r ×s in Ln Ln in positions (i, j) and (k, l). By definition of Ln and Ln this gives the identity:⊗ (r i + j, s i + j) = (r k + l, s k + l). · · · · So r (i k) = l j = s (i k) which implies (r s)(i k) = 0. The numbers modulo· −n form a− field and· − a product can only be− zero− if one of the factors is zero. Since r = s we obtain i = k and therefore also l = j. In particular we 6 r s showed that the same pair cannot appear in distinct positions and of Ln Ln, r s ⊗ so Ln is orthogonal to Ln as desired. k Remark. For a prime power n = p there is a field Fn = α0 = 0, α1, . . . , αn 1 αr { − } of order n. For it we can define Ln = (αr αi + αj)i,j and generalize Theorem 6.16 and Theorem 6.17 accordingly. · Lemma 6.18. For any n 2 there are at most n 1 MOLS of order n. ≥ − Proof. Assume we have a set A1,...,Ak of MOLS. As orthogonality is not affected by “renaming” the numbers in a Latin square, we may assume without loss of generality that each A has 0 1 2 ... n 1 as its first row. Now i − consider the entry at position (2, 1). If A is orthogonal to B then a21 = b21 because all pairs (0, 0),..., (n 1, n 1) already appear in row 1 of A B6 . So the entries in (2, 1) must be mutually− − distinct. They must also be non-zero⊗ since the first column of each Ai already contains zero in position (1, 1). Therefore, we started with at most k n 1 Latin squares. ≤ − And now for the theorem that establishes the connection to designs. We will only prove the equivalence of (i) and (iv), but we will do so constructively. Theorem 6.19. For n 2 each of the following is equivalent: ≥ (i) There exist n 1 mutually orthogonal Latin squares of order n. −

    142 (ii) There exists a finite field of order n. (iii) n is a prime power, i.e. n = pk.

    (iv) There exists a (v = n2, k = n, λ = 1)-design (an affine plane).

    Proof. (i) (iv). Let A1,...,An 1 be n 1 MOLS of order n. We need to construct⇒ a (v = n2, k = n,− λ = 1)-design.− Recall (from Corollary 6.4) that the number of blocks in such a design is necessarily = n(n + 1) and from Theorem 6.15 that the design is necessarily resolvable,|B| so it splits into n + 1 parallel classes of n disjoint blocks each (this resolution will be apparent). The points of the design are [n] [n], so the set of positions in Latin squares of order n. × Recall how each Latin square A is a partition of the positions [n] [n] = × A(0) ... A(n 1) where A(i) are the positions containing number i Zn. These∪ will∪ be blocks− of the design. In addition, each row R(i) = (∈i, j) j [n] and each column C(j) = i, j) i [n] is a block. So in{ total| the∈ blocks} are: { | ∈ }

    = Ar(s) r [n 1], s Zn R(i) i [n] C(j) j [n] . B { | ∈ − ∈ } ∪ { | ∈ } ∪ { | ∈ } We need to show that each pair of distinct points (i, j) and (k, l) appears in exactly one block. We first show that they are contained in at most one block.

    Case 1: i = k. Both points are contained in the i-th row so both are in R(i). They cannot be contained in the same column (otherwise they would not be distinct) and they are not both contained in any Ar(s) since that would mean that the Latin square Ar contains the number s twice in the i-th row. In particular, no block other than R(i) contains both points. Case 2: j = l. Similar to Case 1, the points are contained in C(j) and in no other block. Case 3: i = k, j = l. The points are not in the same row or column so in no 6 R(i) or6 C(j). But assume (for contradiction) that we have (i, j), (k, l) Ar(s) At(u) for (r, s) = (t, u). Since the blocks orig- inating from∈ the same∩ Latin square are6 disjoint (since those blocks form a partition), we have r = t. So in A there is the number s 6 r in both positions (i.e. at (i, j) and (k, l)) and in At there is u in both positions. This means A A has (s, u) in both positions, con- r ⊗ t tradicting the fact that Ar and At are orthogonal. So each pair of positions is contained in at most one block.

    To see that each pair of positions is contained in at least one block we double count the set:

    = ((i, j), (k, l),B) B , (i, j) = (k, l),B contains (i, j) and (k, l) . S { | ∈ B 6 }

    143 Firstly, = B B contains (i, j) and (k, l) |S| { | } (i,j)=(k,l) X6 n2 1 = . (uses “at most one”) ≤ 2 (i,j)=(k,l)   X6 Secondly, = ((i, j), (k, l)) (i, j) = (k, l),B contains (i, j) and (k, l) |S| { | 6 } XB B n n2 = | | = (n2 + n) = . 2 2 2 XB       So we realize that the “ ” is actually an equality so in particular each pair of points is contained≤ in exactly one block.

    (iv) (i) Assume we have a (v = n2, k = n, λ = 1)-design. As argued before ⇒(Corollary 6.4), there are = (n + 1) n blocks in total. Since affine planes are resolvable by Theorem|B| 6.15 we· get n + 1 parallel classes = B 1 ... n+1 consisting of n blocks each. We use two of these parallel B ∪ ∪ B 2 classes 1 and 2 to find “coordinates” for the n points and the remaining n 1 parallelB classesB to define MOLS. − We label the blocks from 1 and 2 as 1 = R(1),R(2),...,R(n) and = C(1),C(2),...,C(nB) . ForB i, j B [n],{ we know that every} pair B2 { } ∈ of blocks R(i) and C(j) intersects in exactly one point pij: Because of t = 2 and λ = 1, they cannot intersect in two points and because of the claim from the proof of Theorem 6.15 they cannot be disjoint. This means that the blocks from and intersect as shown in Figure 44 so B1 B2 C(1) C(2) C(3) C(4) C(5)

    R(1)

    R(2)

    R(3)

    R(4)

    R(5)

    Figure 44: The two different parallel classes 1 and 2 intersect like this when arranging the points accordingly. B B

    we can interpret the blocks R(i) as rows and C(j) as columns (i, j [n]). ∈ To simplify notation, identify the point point pij of the design with the position (i, j) [n] [n] in Latin squares. Then for each l 3, . . . , n + 1 we interpret the∈ n ×blocks B (0),B (2),...,B (n 1) of the∈ { parallel class} l l l − l as a Latin square where the positions containing a number r Zn B ∈ are given by Bl(r). This really is a valid Latin square since each Bl(r) intersects each R(i) and C(j) in exactly one point so each number occurs in exactly one row and column.

    144 It is also easy to verify that the n 1 Latin squares we get from 3 to are mutually orthogonal, since− for any distinct s, t 3, . . . , nB+ 1 Bn+1 ∈ { } and r1, r2 Zn we know that Bs(r1) Bt(r2) = 1 which just means that ∈ | ∩ | the pair (r1, r2) Zn Zn occurs exactly once in the juxtaposition of the Latin square for∈ and× . Bs Bt References

    [Kas61] Pieter W. Kasteleyn. The of dimers on a lattice: I. the number of dimer arrangements on a quadratic lattice. Physica, 27(12):1209– 1225, 1961. [TF61] H. N. V. Temperley and Michael E. Fisher. Dimer problem in —an exact result. Philos. Mag. (8), 6:1061–1063, 1961.

    145