<<

MAT5107 : Combinatorial Enumeration

Course Notes (this version: December 2018)

Mike Newman

These notes are intended for the course MAT5107. They somewhat follow approaches in the two (ex- cellent!) books generatingfunctionology by Herbert Wilf [2] and Analytic by Phillipe Flajolet and Robert Sedgewick [1]. Both of these are available in relatively inexpensive print versions, as well as free downloadable .pdf files.

W: https://www.math.upenn.edu/~wilf/DownldGF.html F&S: http://algo.inria.fr/flajolet/Publications/AnaCombi/anacombi.html Both of these books are highly recommended and both could be considered unofficial textbooks for the course (and both are probably not explicitly cited as much as they should be in these notes). The origin of these notes traces back to old course notes scribed by students in a course given by Jason and/or Daniel.

I typically cover most of the material in these notes during one course. Wilf’s book covers more than a course’s worth of material; Flajolet and Sedgewick cover much more.

These notes are available on my webpage. Please don’t post them elsewhere. Feel free to share the link; the current url is http://web5.uottawa.ca/mnewman/notes/.

Despite my best efforts to eliminate typos, there “may” be some that remain. Please let me know if you find any mistakes, big or small. Thanks to those who have pointed out typos in the past!

References

[1] Philippe Flajolet and Robert Sedgewick. Analytic Combinatorics. Cambridge University Press, Cambridge, 2009. [2] Herbert S. Wilf. generatingfunctionology. A K Peters, Ltd., Wellesley, MA, third edition, 2006.

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected]. MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 1. generating functions

basic ideas and setup

Combinatorial enumeration is about counting combinatorial (discrete) structures such as sequences, , graphs, polynomials over finite fields, and partitions. We start with a A of objects (graphs, permutations, etc.), as well as a function ω : A → N that measure the “size” of an object. The exact meaning of “size” depends on the objects and how we wish to treat them. Then we set An to be the pre- of n and an = |An|; that is An is the set of objects of size n, and an is the number of such objects. The sequence

a0, a1, a2, ··· , an, ··· is an answer to the question “How many objects are there of size n?” We might have an explicit formula for the an, we might have an asymptotic formula that is close for large n, we might only know a relation with some other sequence. In this course we will be dealing with both exact enumeration and asymptotic enumeration. For example, let an be the number of permutations√ of n elements. Then an = n! gives an exact answer. n Using Stirling’s formula, we have an ∼ 2πn(n/e) , which gives an asymptotic expression for an. Here fn ∼ gn means fn/gn → 1 as n → ∞. So the asymptotic expression of an gives the main order of magnitude of an. Of special interest is the fact that there are many cases where we are unable to get an exact formula for an but we can get an asymptotic one.

The unifying tool is the generating function. Given a sequence of numbers (an)n≥0, its ordinary generating function (OGF) and exponential generating function (EGF) are, respectively, 2 3 X n a0 + a1x + a2x + a3x + ··· = anx n≥0 x2 x3 X xn a + a x + a + a + ··· = a 0 1 2 2 3 6 n n! n≥0 These are formal sums, so the question of convergence does not arise (for the moment!); it is just a quirky way of writing the sequence a0, a1, a2, ··· . As we will see later, ordinary generating functions are usually associated with unlabelled structures while exponential generating functions are usually associated with labelled structures (though the reasons for that might seem a little mysterious at the moment).

formal power

We define a formal power series in the indeterminate x to be an expression of the form 2 X n A(x) = a0 + a1x + a2x + ··· = anx . n≥0

The coefficients an are typically integers or rational numbers; this is the most useful case for us. Formal power series form a , in particular if the coefficients are rational then this ring is denoted by Q[[x]].

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

1 The “x” that appears is not really a variable in the traditional sense; it might more properly be called an indeterminate or even a marker. It is a way of associating the numerical value an that appears as the coefficient, with the value n that appears as the exponent. In other words, you should think of a formal power series as being a strange way of representing a sequence. In fact there will come a point where it will be useful to consider a formal power series as a “function” of its variable x, but this will not alter its formal nature. Note that the exponents that appear are exactly the non-negative integers; this corresponds with the choice to start indexing our sequences at n = 0. This is a somewhat arbitrary but useful choice, motivated by the fact that the exponent usually represents the size of a discrete object, and most natural ways of measuring the size of a discrete object are as a . It is elementary but still worth noticing that the letter n is not really part of A(x), it is just a dummy variable of summation. So in particular we have things like the following. X n X k−3 X i+2 anx = ak−3x = a0 + a1x + ai+2x n≥0 k≥3 i≥0 These are somewhat spurious examples, but there will be other instances where it will be useful to “re-index” a formal power series. Note that both ordinary generating functions and exponential generating functions are elements of Q[[x]]. In particular, whether a generating function is ordinary or exponential is really a question of interpretation. For instance, is the following an ordinary generating function or an exponential one? ( 2 3 P n X n 1 + x + x + x + ··· = n≥0 anx OGF with an = 1 x = 2 3 n 1 + x + 2 x + 6 x + ··· = P a x EGF with a = n! n≥0 2 6 n≥0 n n! n

We use the notation [xn] F (x) to mean the coefficient of xn in F (x). So given an OGF F (x), the n corresponding sequence is exactly ([x ] F (x))n≥0. If we know F (x), then we know (implicitly at least) [xn] F (x) for every n ≥ 0, and vice versa. Note that what we typically want is not the coefficient of n x , but rather the value an. So for exponential generating functions there is a slight twist. P xn Problem 1.1. Let A(x) = n≥0 an n! be an exponential generating function. xn n Explain why an = [ n! ] A(x) = n![x ] A(x).

We will sometimes commit a small but heinous abuse of notation. Namely we will write down expressions like the left-hand equation, when what we “should” write is something like the right- hand expression. n X n n X k [x ] anx = an [x ] akx = an n≥0 k≥0 We are using n for two different things: both as a dummy variable, and as an operator that extracts the coefficient of something (namely xn). So n has two meanings, in the same equation. This is clearly insane. . . but in fact presents no practical difficulty. The index-of-summation n has no existence outside the sum, so there is no risk of confusion with the coefficient-extractor n. So we will tolerate this. In fact it is surprisingly natural, helpful even, and you quite possibly would not have even noticed without prompting.

operations on formal power series

We can add or subtract formal power series in a natural way, as we might add a sequence: X n X n X n anx ± bnx = (an ± bn)x n≥0 n≥0 n≥0

2 We can also multiply formal power series (which was not an obvious operation for sequences):

    n ! X n X n X X n  anx   bnx  = akbn−k x n≥0 n≥0 n≥0 k=0

Note that determining the coefficient of xn in the product is a finite process. Of course this process takes longer and longer as n increases, but for each n it is finite. This means we can determine coef- ficients in the product, and thus the product is well-defined as a formal power series. We emphasize that this has nothing to do with convergence. Problem 1.2. Consider the arithmetic of Q[[x]]. • Show that addition and multiplication are commutative and associative. • Show that the additive identity is 0 + 0x + 0x2 + ··· . P n P n • Show that the unique additive inverse of n≥0 anx is n≥0(−an)x . • Show that the multiplicative identity is 1 + 0x + 0x2 + ··· .

For simplicity, we often write 0 and 1 for the additive and multiplicative inverses, even though they are indeed formal power series. For that matter, we permit ourselves to omit any terms in a formal power series whose coefficients are zero; in particular all polynomials are formal power series. There remains the question of which elements of Q[[x]] have multiplicative inverses, i.e., reciprocals. Given a formal power series A(x), the formal power series B(x) is said to be the reciprocal of A(x) if A(x)B(x) = 1.

Lemma 1.3. A(x) has a reciprocal if and only if [x0] A(x) 6= 0. Furthermore the reciprocal is unique.

Proof. We need only consider whether or not the equation A(x)B(x) = 1 has a solution for B(x). Extracting coefficients on each side we get the following equations, where the n-th equation is [xn] A(x)B(x) = [xn] 1.

a0b0 = 1

a1b0 + a0b1 = 0

a2b0 + a1b1 + a0b2 = 0

a3b0 + a2b1 + a1b2 + a0b3 = 0 . . −1 The zeroth equation shows that a0 6= 0 is necessary. Given this, we find b0 = a0 . A simple induction shows that all bn are uniquely determined. So a0 6= 0 is also sufficient. This specifies uniquely the coefficients of B(x), and so B(x) exists. Thus A(x) has a unique reciprocal.

Note that this proof also gives a way to compute the reciprocal of A(x). In fact, this is exactly the method of solving a triangular system of linear equations by back-substitution. P n Problem 1.4. Use the algorithm implicit in the proof of Lemma 1.3 to find the reciprocal of n≥0 x P xn and n≥0 n! . We define the derivative of a formal power series as follows. d X X a xn = na xn−1 dx n n n≥0 n≥0

3 Notice that (assuming we are taking the sum over n ≥ 0) there is not actually a term of x−1 in this sum. It is often useful to combine the previous two, by first differentiating and then multiplying by x (which is itself a formal power series!): d X X x a xn = na xn dx n n n≥0 n≥0 We define the integral of a formal power series as well. Z X X an a xn dx = xn+1. n n + 1 n≥0 n≥0 d R R d We see that dx A dx = A for any formal power series A, but that dx A dx = A − a0. Though we use the language and notation of calculus, there is no suggestion of limits or convergence. We have defined these operations as applying the power rule from calculus on a term-by-term basis. In particular, every formal power series is infinitely differentiable and integrable.

composition of formal power series

One further operation we can do with formal power series is to compose them. That is, given P n P n A(x) = n≥0 anx and B(x) = n≥0 bnx we could consider the expression  n X X k C(x) = A(B(x)) = an  bkx  n≥0 k≥0 Unlike the sum and product of two formal power series, this may not be defined for every A and B. The given expression for C(x) is meaningful exactly when the expression uniquely determines coefficients for each power of x.

P n P n Lemma 1.5. Let A(x) = n≥0 anx and B(x) = n≥0 bnx . Then A(B(x)) is a well-defined formal power series if and only if A(x) is a polynomial or b0 = 0.

n m P P k Proof. Let cm = [x ] n≥0 an k≥0 bkx . We need to show that the cm are well-defined. n m P k First, we note that [x ] k≥0 bkx is well defined for any fixed n since it is the coefficient of m x in a finite product of formal power series. So if A(x) is a polynomial then cm is obtained as a finite sum and hence A(B(x)) is a well-defined formal power series. To put it another way, if A(x) is a polynomial then A(B(x)) is a finite sum of formal power series.

Now let b0 = 0.   n  n  n  m−n X k X X [x ]  bk+1x  m ≥ n [xm] b xk = [xm] xn b xk =  k   k+1  k≥0 k≥0 k≥0  0 m < n

So in order to compute cm, we can ignore terms in A(x) with degree greater than m, and treat A(x) as if it were a polynomial of degree m. Again cm is a finite sum so A(B(x)) is well-defined.

For the converse, assume that A(x) is not a polynomial and that b0 6= 0. Then we see that  n 0 X X k X n c0 = [x ] an  bkx  = an (b0) . n≥0 k≥0 n≥0

4 It is possible for this series to converge. However, as an algebraic object, the question of con- vergence makes no sense. What we really require is that [xm] A(B(x)) can be determined as an arithmetic process, i.e., finite.

Therefore A(B(x)) is well-defined if and only if A(x) is a polynomial or b0 = 0.

Note that Lemma 1.5 not only gives conditions for the composition to be well-defined, but gives an algorithm for computing it. A compositional inverse of a formal power series A(x) is a formal power series B(x) such that A(B(x)) = B(A(x)) = x.

Problem 1.6. Show that in order for A(B(x)) and B(A(x)) to both be well-defined then we must have one of the following.

• A and B are both polynomials

• A is a polynomial and a0 = 0

• B is a polynomial and b0 = 0

• a0 = b0 = 0

Further, show that if A(B(x)) and B(A(x)) are both well-defined and A(B(x)) = B(A(x)) = x then a0 = b0 = 0 and a1b1 = 1.

Lemma 1.7. Assume B(x) is a formal power series with b0 = 0 and b1 6= 0. Then there exists a unique A(x) such that A(B(x)) = x. Furthermore a0 = 0 and a1 = 1/b1 6= 0.

Proof. The composition is well defined since b0 = 0. We consider A(B(x)) = x as a system of equations of the form [xn] A(B(x)) = [xn] x, much as we did for Lemma 1.3. We write the equations by grouping terms according to the an since these are the unknowns.

0 = a0

1 = a1(b1) 2 0 = a1(b2) + a2(b1) 3 0 = a1(b3) + a2(2b1b2) + a3(b1) 2 2 4 0 = a1(b4) + a2(2b1b3 + b2) + a3(3b1b2) + a4(b1) . .

The zeroth equation is straightforward. The first equation gives a1 = 1/b1. Then an induction gives that all further an are uniquely determined So A(x) is uniquely determined.

As in Lemma 1.3, this is the method of back-substitution applied to a triangular linear system. The algorithm for finding the coefficients of A(x) might be termed a “computational recurrence”, in that it gives an in terms of all of the ak with k < n. It can’t be solved in the way a linear recurrence can, but it is straightforward to compute it (preferably by computer). Lemma 1.7 doesn’t seem to actually tell us when a formal power series has a compositional inverse. Except that it turns out that if A(B(x)) = x, then B(A(x)) = x also. We leave this fact as an interesting exercise.

5 Lemma 1.8. A formal power series A(x) has a compositional inverse if and only if a0 = 0 and a1 6= 0.

formal and analytic power series

We are interested in the set of formal power series, but we will frequently pretend that they are analytic functions. Why can we do this? Consider the set of all analytic power series, thought of as functions, and let φ be the mapping that takes an analytic power series to the corresponding formal power series.

2 φ 2 a0 + a1x + a2x + ··· −→ a0 + a1x + a2x + ···

Although φ looks like the identity mapping, it isn’t. On the left, x is a variable, whose value is constrained by the region of convergence of the power series. On the right, x is an indeterminate which never takes any value at all and serves only to mark different coefficients ak. Of course φ is not in general invertible (many formal power series do not correspond to analytic power series), but if we consider the image of φ, then we see that all of the operations on formal power series that we have defined are exactly consistent with the corresponding operations on analytic power series.

Theorem 1.9 (meta-theorem). Let fA, fB, fC , ··· be formal power series, and let A = −1 −1 −1 φ (fA), B = φ (fB), C = φ (fC ), etc. Then valid operations on fA, fB, fC , ··· are compatible with operations on A, B, C, ··· .

The notation is supposed to evoke the fact that A is some generating function and fA is the lex- icographically identical expression viewed as a function. Of course not all generating functions A have an fA. Valid operations include addition, subtraction, multiplication, division, differentiation, integration, composition, and evaluation. Of course division doesn’t always work, but Lemma 1.3 tells us when it does. Likewise for composition. Note that evaluation at a particular value for x corresponds to finding the value of a series. If A(x) is a formal power series that we care about, and it happens that A(x) = φ−1 (f(x)) for some analytic power series f(x), then we can choose to either operate on A(x), or to operate on f(x) and then take φ of the result. In many cases we don’t even need to know what the radius of convergence is, just that f(x) is analytic on some open region — we don’t even need to know what that region is! For example, here is the operation “integration” applied to a formal power series, and to its image under φ−1. The point is that when we consider the image under φ−1, we are free to make use of any analytic property we want. In other words, this diagram (and others like it) commutes.

X φ−1 1 xn −→ , |x| < 1 1 − x n≥0 n+1 Z X X x φ−1 Z 1 1 xn = −→ dx = log , |x| < 1 n + 1 1 − x 1 − x n≥0 n≥0

P xn+1 If we want to manipulate the formal power series n≥0 n+1 , then we are free to manipulate the 1 analytic function log 1−x , and then consider the analytic power series of the result. In essence, we are making use of theorems like “the derivative of an analytic power series is the term-by-term derivative” and the like. Our formal power series operations are all lexicographically

6 identical to the term-by-term operations on analytic power series, so as long as our formal power series corresponds to an analytic one we may go back and forth at will. n Example 1.10. Describe the OGF and EGF for the sequences an = n! and bn = 2 . Do they correspond to functions?

P n The OGF for an = n! is n≥0 n!x , which is not an analytic power series. It does not correspond P n 1 to any function. The EGF for an = n! is n≥0 n!x /n! = 1−x , which is a function. n P n n 1 P n n For the sequence bn = 2 , n ≥ 0, the OGF and EGF are n≥0 2 x = 1−2x and n≥0 2 x /n! = exp(2x).

The symbol “=” is being used in two (very!) different ways. In particular, in each case the final equality should be seen in a different light. Equality of formal power series simply means that the coefficients are all equal. The fact that generating functions are sometimes “equal to” analytic functions is a bonus.

exercises

1. Write down some sequences whose OGF and/or EGF corresponds to an analytic function, and some that do not. 2. Is there a sequence whose OGF gives an analytic function but whose EGF does not? P xn 3. Find the reciprocal of the formal power series 1 + x and n≥0 n! . x x 4. Verify that 1−x and 1+x are compositional inverses as formal power series. You need to first verify that the composition is legal (in both directions!). Check that the composition gives x in both directions, first by substituting one closed form in to the other, and next by expressing both as actual formal power series, and composing them. 5. Characterize all polynomials A(x) and B(x) that are compositional inverses. (hint: degree) 6. In the notes it was stated that the set of formal power series Q[[x]] is a ring. Prove this. If necessary, look up the definition of a ring. 7. Prove that if F (x) is a formal power series such that F 0(x) = 0, then F (x) is a constant. 0 8. Consider the functional equation F (x) = F (x). Show that for every f0 there is a unique 0 formal power series that satisfies the equation with [x ] F = f0. Note that we are not presuming that F corresponds to an analytic power series, so calculus is useless here. Pt (j) (j) 9. Consider the functional equation j=0 αjF (x) = 0, where F means the j-th derivative of F and the αj are constants. Show that given any values f0, f1, ··· , ft−1 there is a unique j formal power series F that satisfies the equation with [x ] F = fj for 0 ≤ j ≤ t − 1. Again, no calculus! 10. Characterize all polynomials A(x) and B(x) that are compositional inverses. (hint: degree)

11.  Show that if A(x) and B(x) are formal power series with b0 = 0 and A(B(x)) = x, then a0 = 0 and B(A(x)) = x. 12. “Find” the compositional inverse of x + x2. By this is meant: write out the first few terms, and explain how you could go on computing them forever if you cared. 13. Consider the formal power series corresponding to ex and log(x). Explain why these are not compositional inverses of each other. Show that the formal power series corresponding to ex − 1 and log(x + 1) are compositional inverses of each other.

7 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 2. some examples: generating functions and counting

We are typically interested in counting the number of objects of some particular type according to their “size”. The “size” of an object is a deliberately vague term; in fact, we will often be interested in several different notions of “size”.

strings

Example 2.1. Let an be the number of 01-strings of length n. Find an, using generating functions.

We notice that a0 = 1 and that an = 2an−1 for n ≥ 1. This gives X n X n anx = 2an−1x n≥1 n≥1 X n X n−1 −a0 + anx = x 2an−1x n≥0 n≥1 X n X n −a0 + anx = x 2anx n≥0 n≥0 −1 + A(x) = 2xA(x) P n n n n Thus we see that A(x) = 1/(1 − 2x) = n 2 x and therefore an = [x ] A(x) = 2 . n Of course, in this case we could have just noticed that an = 2 right away. But this method will prove useful for situations that aren’t so simple.

Example 2.2. Let bn be the number of 01-strings of length n with no adjacent 0’s. Find bn, using generating functions.

It is easy to see b0 = 1, b1 = 2 and b2 = 3. Consider such a string of length n. If it ends with 1 then after deleting the final 1 we obtain a string of length n − 1 with no adjacent 0’s. Clearly all such strings of length n − 1 arise in this way so the number of strings of length n with no adjacent 0’s that end with 1 is the same as the number of strings of length n − 1 with no adjacent 0’s. If such a string of length n ends with 0 then it must end with 10. Removing the final 10 leaves a string of length n − 2, and all such arise in this way. So the number of strings of length n with no adjacent 0’s that end in 0 is the same as the number of strings of length n − 2 that have no adjacent 0’s.

Therefore for n ≥ 2, we have bn = bn−1 + bn−2.

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

8 P n Now we find the OGF B(x) = n≥0 bnx using the recursion.

bn = bn−1 + bn−2 X n X n X n bnx = bn−1x + bn−2x n≥2 n≥2 n≥2 X n X n−1 2 X n−2 bnx = x bn−1x + x bn−2x n≥2 n≥2 n≥2 X n X n 2 X n bnx = x bnx + x bnx n≥2 n≥1 n≥0     B(x) − 1 − 2x = x B(x) − 1 + x2 B(x) Alternatively, we just plug the recursion into the OGF directly. X n X n X n B(x) = bnx = 1 + 2x + bnx = 1 + 2x + (bn−1 + bn−2)x n≥0 n≥2 n≥2 X n−1 2 X n−2 = 1 + 2x + x bn−1x + x bn−2x n≥2 n≥2 X n 2 X n = 1 + 2x + x bnx + x bnx n≥1 n≥0   = 1 + 2x + x B(x) − 1 + x2B(x) √ √ 1 + x 1 + x 1 + 5 1 − 5 Hence B(x) = , or B(x) = where γ = , γ = . 1 − x − x2 (1 − γx)(1 − γx) 2 2 We use partial fraction decomposition: 1 + x a b γ2 1 γ2 1 = + = − . (1 − γx)(1 − γx) 1 − γx 1 − γx γ − γ 1 − γx γ − γ 1 − γx

 √ n+2  √ n+2 Using geometric series, we have b = √1 1+ 5 − 1− 5 . n 5 2 2

Notice that a partial fraction decomposition gives the generating function as a sum of terms of the P n n form α/(1−rx), and since this corresponds to the geometric series n≥0 αr x , we can easily extract coefficients. The above should convince you that in fact for any sequence determined by a linear recurrence (of fixed order!) we can easily derive the ordinary generating function. Ps Problem 2.3. Let an be a sequence satisfying 0 = t=0 ptan+t for n ≥ 0 where the pt are given P n constants. Give a nice closed form expression for n≥0 anx .

In fact we can handle more general recurrences too. As long as we can deal with the result of applying P n n () x to the expression, we can find a generating function. P n Example 2.4. Suppose that an = 2an−1 + n for n ≥ 1 with a0 = 1. Find A(x) = n≥0 anx .

P n From the recurrence we deduce that A(x) − 1 = 2xA(x) + n≥1 nx . We have two ways (at least) of dealing with the un-simplified part. We can observe that the coefficient of xn in (1+x+x2+··· )2 is n + 1, which leads to the following. X X x nxn = nxn = x(1 + x + x2 + ··· )2 = (1 − x)2 n≥1 n≥0

9 Or we can see the n as the consequence of a derivative. X X d d 1 x nxn = x xn = x = dx dx 1 − x (1 − x)2 n≥1 n≥0 In either case we obtain the following. 1 x A(x) = + 1 − 2x (1 − 2x)(1 − x)2 Since our next step would be to expand in terms of partial fractions and extract coefficients, it n seems pointless to add the fractions. Note that already we can see that an is the sum of 2 and “something”, which is perhaps not surprising (or maybe it is?) given the initial recurrence.

derangements

We use the notation [n] = {1, . . . , n}.A derangement of [n] is a σ :[n] → [n] with no fixed points; that is σ(i) 6= i for all i ∈ [n]. Let dn be the number of derangements of [n].

Lemma 2.5. For n ≥ 1, we have dn+1 = n(dn + dn−1).

Proof. Let σ be a derangement of [n + 1], and let j be such that j = σ−1(n + 1). If σ(n + 1) = j, then σ induces a derangement of the set [n + 1] \{j, n + 1}. By relabelling this set, we see that σ corresponds to a unique derangement of [n − 1]. This gives a total of dn−1 derangements that swap j and n + 1. There are n choices for j, so this gives ndn−1 derangements of [n + 1] that swap n + 1 with something. If σ(n+1) 6= j, then construct a derangement σ0 on [n] according to the following rule: σ0(i) = σ(i) for i 6= j and σ0(j) = σ(n + 1). We leave as an exercise to show that this gives a bijection between the derangements of [n + 1] that map j to n + 1 but not n + 1 to j, and the derangements of [n]. So the number of such derangements of [n + 1] is ndn (again, n choices for j).

P xn We will derive the EGF for derangements, so let D(x) = n≥0 dn n! . By Lemma 2.5 we have

dn+1 = ndn + ndn−1 d nd nd n+1 xn = n xn + n−1 xn n! n! n! X dn+1 X ndn X ndn−1 xn = xn + xn n! n! n! n≥1 n≥1 n≥1 Why is the sum is over n ≥ 1? Why can’t we take n ≥ 0? We now try to identify each term as a function of D.

X dn+1 X dn+1 d d X dn+1 d X dn xn = xn+1 = xn+1 = xn = D0(x) n! (n + 1)! dx dx (n + 1)! dx n! n≥1 n≥1 n≥1 n≥2

X ndn X dn d d X dn xn = x xn = x xn = xD0(x) n! n! dx dx n! n≥1 n≥1 n≥1

X ndn−1 X dn−1 X dn xn = x xn−1 = x xn = xD(x) n! (n − 1)! n! n≥1 n≥1 n≥0

10 Continuing with the example, we have a differential equation D0(x) = xD0(x) + xD(x). D0(x) x = D(x) 1 − x d x log D(x) = dx 1 − x Z x 1 log D(x) = dx = log − x 1 − x 1 − x e−x D(x) = 1 − x

There are a number of things we can do with D(x). Notably, these are operations on the formal power series itself, but they give us information about the coefficients. The point being that even if the sequence is wajht interests us, it might be profitable to study the generating function.

• We can get an exact formula for dn as n n 1 X 1 X (−1)k d = [xn/n!] D(x) = n![xn] e−x = n! [xk] e−x[xn−k] = n! n 1 − x 1 − x k! k=0 k=0

• We can also use the EGF to get a nice identity in terms of the dn. First rewrite as X xn 1 ex d = n n! 1 − x n≥0 n n   X 1 dk X n Taking [xn] of both sides we get = 1, or nicer as d = n!. (n − k)! k! k k k=0 k=0 • We can also use the EGF to get another recursion for the sequence dn. First rewrite as X xn (1 − x) d = e−x n n! n≥0 d d (−1)n Taking [xn] of both sides we get n − n−1 = or nicer as d = nd + (−1)n. n! (n − 1)! n! n n−1

direct approach to generating functions : partition the set of objects

n Instead of thinking of a generating function as a formal sum of terms anx over all possible sizes n, n P |α| we can think of it as a sum of terms x , one term for each object of size n, so A(x) = α∈A x . Previously, we thought of a generating function as a formal way of writing a sequence; now we think of it as a formal way of writing the set of all objects (well, a set that is in bijective correspondence with the set of all objects). Thus the GF of a disjoint union is the sum of the GFs.

This suggests that instead of trying to write a recurrence for the sequence an, we should write the set of all objects as a disjoint union, where we can easily determine the GF for each part. Example 2.6. Let A be the set of all 01-strings. Find the ordinary generating function for A.

For convenience, let  denote the empty string (i.e., the string of length zero). Also, if σ is a string and S is a set of strings, we will use the notation σS to denote the set of all strings obtained by concatenating σ with some string in S. This is just a special case of the cartesian product of two sets: σS = {σ} × S. We can write A as a disjoint union as follows, which then translates directly

11 into a functional equation for the GF. A =  ∪˙ 0A ∪˙ 1A A(x) = 1 + xA(x) + xA(x) = 1 + 2xA(x). 1 Solving, we obtain A(x) = . 1 − 2x

Example 2.7. Let B denote the set of all 01-strings with no adjacent 0’s. Find the ordinary gener- ating function for B.

We have the following disjoint union, and the resulting functional equation for the GF. B =  ∪˙ 1B ∪˙ 0 ∪˙ 01B B(x) = 1 + xB(x) + x + x2B(x). 1 + x Solving this gives, as before, B(x) = . 1 − x − x2

Example 2.8. Let B denote the set of all 01-strings with no adjacent 0’s. Find the ordinary gener- ating function for B.

Let bn,k be the number of 01-strings of length n with k 1’s and with no adjacent 0’s. This is an example where each object has two “sizes”: the length and the number of 1’s. This means we will have two indeterminates: x for length and y for weight (number of 1’s). So we will have P P n k n k B(x, y) = n k bn,kx y and we want [x y ] B(x, y). Let B denote the set of all 01-strings with no adjacent 0’s. We have the same disjoint union, but now we have a different GF, so the functional equation is a little different. B =  ∪˙ 1B ∪˙ 0 ∪˙ 01B B(x, y) = 1 + xyB(x, y) + x + x2yB(x, y). Hence we have the generating function 1 + x X X B(x, y) = = (1 + x) (x + x2)kyk = xk(1 + x)k+1yk 1 − (x + x2)y k≥0 k≥0 This gives the coefficients: k + 1 b = [xnyk] B(x, y) = [xn−k] (1 + x)k+1 = n,k n − k Notice that we have n n X X k + 1 b = b = n n,k n − k k=0 k=0 Compare this with the expression for bn that we obtained in Example 2.2.

The previous two examples can be done with a recursion on the coefficients; here we would have a system of two recursions in two indices. The direct approach bypasses this completely: the generating function does the work for us! Sometimes one needs to be clever in partitioning the set of objects in a nice way. Here “nice” means that we can easily relate the generating function of each of the parts to the generating function of the whole thing and/or something known.

12 Example 2.9. Let sn,k be the number of 01 strings of length n and k occurrences of 001. Find the ordinary generating function.

Firstly, we note that the set of objects is just the set of all 01 strings. But because of the fact that we care about occurrences of 001, we need to in some way make them stand out. In particular, if a string ends in 0, then that 0 is not involved in an occurrence of 001, but if it ends in a 1 then it may or may not be. So we need to further partition the strings ending in 1 into those that end in 001 and those that don’t. S =  ∪˙ S0 ∪˙ (S1 \S001) ∪˙ S001 The rest of the details are left as an exercise (see below).

a quick “application”

Suppose we wish to know the average number of 1’s in a 01-string with no adjacent 0’s. The answer (using previous notation) is given by the following expression. 1 X µn = kbn,k bn k Why do we not need to worry about the right-hand side converging? Using the OGF directly we see that:

X X n k B(x, y) = bn,kx y n k d X X B(x, y) = kb xnyk−1 dy n,k n k

d X X n B(x, y) = kbn,kx dy y=1 n k

n d X [x ] B(x, y) = kbn,k dy y=1 k The summations are over “all n and k”. What are the ranges, precisely? Is the evaluation at y = 1 really valid? Since we already found a closed form for B(x, y), we can evaluate this sum directly by applying the same operations to the closed form.

1 n d 1 + x µn = [x ] 2 (1) bn dy 1 − (x + x )y y=1 The last step is left as an exercise.

Problem 2.10. Find a closed-form expression for µn. That is, evaluate equation (1).

You will discover in the previous exercise some messy-looking partial fractions. Here is one way of cutting the work in half. If you have an irreducible rational quadratic with irrational roots, then those roots come in “conjugate pairs”. What that means is that the roots lie in an extension field of Q that has an automorphism that fixes Q and swaps the two roots. This means that if you take any equation in the extension field (for instance, a partial fraction decomposition) then you can “conjugate” that equation and it remains valid. An example of this that you’ve already seen: if a real polynomial has complex roots, then these come in complex conjugate pairs.

13 exercises

1. Considering the OGF A(x) in Example 2.1, let B(x) = 1 + xA(x). Is B(x) an OGF? Does it count anything? 2. Try and find the EGF for the sequences in Example 2.1 and Example 2.2. What goes wrong? Can you find a way around it?

3. Find an expression for an in Example 2.4. That is, use a partial fraction decomposition of the generating function found there and extract coefficients. 2 n 4. Consider the sequence (bn)n≥0, where bn = 2bn−1 − bn−2 + n + 3 for n ≥ 0 a) Give the OGF, in terms of b0 and b1. b) Give the OGF as a partial fraction decomposition. You don’t need to find the constants in the numerator, but you should have the right form. c) Using your partial fraction decomposition, give an expression for bn in terms of n. Your answer will be in terms of the unknown constants of your partial fraction decomposition. d) For b0 = b1 = 1, find the constants in your partial fraction decomposition, and find an expression for bn. 5. Give the derangements of {1, 2, 3, 4}, according to the classification suggested by Lemma 2.5. 6. In deriving the equation D0(x) = xD0(x) + xD(x), we twice wrote D0(x) when what we had was the derivative of a function that was not D(x). Explain this.

The next two questions are not meant to suggest that GFs are superfluous. Using GFs, we did not just prove these relations, we derived them. Also sometimes direct proofs are much harder, or even not known at all.

P n 7. Prove by a direct counting argument (without GFs) that k k dk = n!, where dk is the number of derangements on [k]. 8. Prove by a direct counting argument (without GFs) that the number of 01-strings of length P k+1 n with no adjacent 0’s is k n−k . 9.  We found that the number of 01-strings of length n with no consecutive 0’s is given by the following.

 √ n+2 √ n+2 ! ! √ √ 1 1 + 5 1 − 5 1  n+2 n+2 √  −  = √ (1 + 5) − (1 − 5) 5 2 2 2n+2 5

m P n k Using the fact that (1+x) = k k x , “simplify” this to show that the number of 01-strings of length n with no consecutive 0’s is given by the following.

n+1 b 2 c 1 X  n + 2  5j 2n+1 2j + 1 j=0

n+1 n b 2 c X k + 1 1 X  n + 2  Conclude that = 5j n − k 2n+1 2j + 1 n j=0 k=b 2 c P n 10. Let D(x) = n dnx be the OGF for derangements. The point of this exercise is that one can use the generating function approach without even finding the generating function. a) Using the recurrence dn+1 = ndn + ndn−1, obtain a functional equation for D(x). n b) By rearranging this equation and extracting coefficients, prove that dn = ndn−1 +(−1) .

14 11. Consider the GF A(x, y) above for 01-strings with no consecutive zeros. In extracting the coefficients of A(x, y) in Example 2.8 we did not use partial fractions. Could we have used the same method when we considered these strings in Example 2.2? 12. As part of finding the average number of 1’s in a 01 string with no adjacent 0’s we evaluated d a generating function at y = 1. The expression dy B(x, y) above is certainly a bivariate generating function. Why does evaluating this at y = 1 result in a well-defined single-variable n d generating function? (hint: look at [x ] dy B(x, y))

13. Let cn,t be the number of 01 strings of length n with at most t consecutive 0’s. Find the OGF for cn,t. For t = 1 verify that this is consistent with Example 2.2. For t = 0, extract coefficients to find cn,0 and verify that it makes sense.

14. Let cn,k,t be the number of 01 strings of length n with k 1’s and at most t consecutive 0’s. Find the (bivariate) OGF for cn,k,t. For t = 1 verify that this is consistent with Example 2.8. For t = 0, extract coefficients to find cn,k,0 and verify that it makes sense. 15. Find the average number of 1’s in a 01 string with at most t consecutive 0’s.

16. Let sn,k be the number of strings of length n with k occurrences of 001. Let S(x, y) = P P n k n≥0 k≥0 sn,kx y . a) Without calculating S(x, y), what should S(x, 1) be? Why should evaluating this at y = 1 make sense? b) Find a simple closed form for S(x, y). Your answer should be a rational function. (hint: partition the set of objects)

17. Let sn,k,t be the number of strings of length n with k occurrences of t 0s followed by a 1, for some positive integer t. So t = 2 corresponds to the previous question. Let S(x, y) = P P n k n≥0 k≥0 sn,k,tx y . a) For t = 1, give another interpretation of k. b) Find a simple closed form for S(x, y). c) What should happen for t = 0? Does it?

15 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 3. set partitions

many generating functions for one sequence

n Consider the sequence k , by which symbol we will mean the number of k-subsets of an n-set. There are three different OGFs that we could reasonably associate with this sequence. X n X n X X n A (y) = yk B (x) = xn C(x, y) = xnyk n k k k k k n n k

For the moment, the only reason to write An(y) instead of An(x) is to match nicely with C(x, y). n n − 1 n − 1 Problem 3.1. Prove that = + . For which values of n and k is this recursion k k k − 1 valid? Recall that binomial numbers are by definitions expressions that count subsets; we do not yet know any “formula” for them.1

P k Problem 3.2. By taking k (·) y of the recurrence you proved in the previous problem, derive a functional equation for An(y). Solve this to give an explicit formula for An(y). Remember to only apply the recursion for values of n and k for which it is valid.

n Of course we know what the answer is: An(y) = (1 + y) . But the method outlined in the previous problems will be useful. n We can use An(y) to find C(x, y). This is because An(y) = [x ] C(x, y). X X n X X n X X 1 xnyk = xn yk = xnA (y) = xn(1 + y)n = k k n 1 − x(1 + y) n k n k n n

Now we can find Bk(x) from C(x, y).  k k k k 1 k 1 1 1 x x Bk(x) = [y ] C(x, y) = [y ] = [y ] x = = k+1 1 − x(1 + y) 1 − x 1 − 1−x y 1 − x 1 − x (1 − x)

It is useful to start with Bk−1(x) and manipulate things slightly. X  n  xk−1 B (x) = xn = k−1 k − 1 (1 − x)k n X  n  1 x−k+1B (x) = xn−k+1 = k−1 k − 1 (1 − x)k n X n + k − 1 1 x−k+1B (x) = xn = k−1 k − 1 (1 − x)k n Are you sure you understood the last step?

For convenience we notice what our knowledge of An(y) and Bk(x) (and modified Bk−1(x)) gives us in terms of coefficients.

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected]. 1 Well, we are pretending that we don’t!

16 Proposition 3.3. We have the following generating functions for binomial coefficients. X n n yk = (1 + y)n ←→ [yn] (1 + y)n = k k k X n xk xk n xn = ←→ [xn] = k (1 − x)k+1 (1 − x)k+1 k n X n + k − 1 1 1 n + k − 1 xn = ←→ [xn] = k − 1 (1 − x)k (1 − x)k k − 1 n

You might have noticed that we’ve stopped putting explicit limits on the summations. In such cases, we might think of this as meaning that the limits of summation are “obvious”. A better point of P P∞ view is to say that there are no limits, so j f(j) means j=−∞ f(j). This only makes sense if the meaning of f(j) is such that it is only non-zero where it is supposed to be. This typically causes no difficulties and sometimes simplifies things.

Problem 3.4. Verify that the unconstrained indices of summation above give what they should. As part of this, verify that the last step obtained above in the simplification of Bk−1(x) is correct.

set partitions

A set partition of a set S is a collection of pairwise disjoint, nonempty subsets of S such that their union is S. The subsets of the partition are sometimes called classes. For convenience, we use [n] n to denote the set {1, 2 . . . , n}. The number of set partitions of [n] into k classes is denoted by k and is called the Stirling number of the second kind. n Learning from our experience with Problem 3.1, it is convenient to define k for all integers n, k. n From the definition it is evident that k = 0 if either k < 0 or k > n > 0 or n > k = 0, and also 0 n that 0 = 1. We also set k = 0 when n < 0 or k < 0 (whether this follows from the definition or whether it is a notational convention is a metaphysical question that we will leave unresolved).

n n − 1 n − 1 Lemma 3.5. For (n, k) 6= (0, 0) we have = + k . k k − 1 k

Proof. We prove this in the case where n ≥ k > 0, leaving the other cases as an exercise. If {n} is one class of the partition, then removing this class gives a partition of [n − 1] into k − 1 n−1 classes. So there are k−1 partitions with the singleton class {n}. If {n} is not a class of the partition, then n appears in a class with something else. So removing n gives a partition of [n − 1] into k classes. This defines a surjective k-to-1 mapping from the partitions of [n] into k classes where n is not a class to the partitions of [n − 1] into k − 1 classes. n−1 So there are k k partitions with n not appearing as a singleton.

Problem 3.6. Verify that the mapping in the above proof is surjective, and is (exactly) k-to-1.

Now we use this recursion to find the generating function for the Stirling numbers of the second kind. You should notice that in principle, this is much like using the well-known analogue to Lemma 3.5 for binomial coefficients to find the generating function for the binomial numbers (a.k.a. the Binomial Theorem). Of course you did notice this in Problem 3.1 and Problem 3.2.

17 We have three options for the OGF. X n X n X X n A (y) = yk B (x) = xn C(x, y) = xnyk n k k k k k n n k

set partitions with k classes

We’ll start with Bk(x). This generating function answers the question “for k = 19, how many set partitions of [n] are there with k classes”, where 19 is an arbitrary non-negative integer. More n precisely, the number of set partitions of [n] with k classes is exactly [x ] Bk(x).

Problem 3.7. Write down Bk(x) explicitly, for k = 0, 1, 2. You should obtain “nice” expressions for the coefficients, and maybe even the generating function as well. Are they still “nice” for k > 2?

n In order to find Bk(x), we multiply the recursion of Lemma 3.5 by x and sum over n to get: X n X n − 1 X n − 1 xn = xn + k xn k k − 1 k n n n

Now we identify each term in terms of Bk(x). Recall that k is a parameter, meaning a fixed but unknown number, as opposed to an index of summation like n or an indeterminate like x. X n xn = B (x) k k n X n − 1 X n − 1 X  n  xn = x xn−1 = x xn = xB (x) k − 1 k − 1 k − 1 k−1 n n n X n − 1 X n − 1 X n k xn = kx xn−1 = kx xn = kxB (x) k k k k n n n Notice that it was convenient to take the sum over all n. We can do this because our conventions n for k make the recursion valid for all n (really?). Now we have the functional equation Bk(x) = x xBk−1(x) + kxBk(x). This is equivalent to Bk(x) = 1−kx Bk−1(x). Applying this recursively we get xk xk B (x) = B (x) = k (1 − kx) ··· (1 − 2x)(1 − x) 0 (1 − kx) ··· (1 − 2x)(1 − x) Notice that we had to assume that k > 0 throughout (we didn’t say so explicitly). This is because Lemma 3.5 is not valid when n = k = 0, and since we want to include n = 0 we have to exclude k = 0. Also, we used the “nice” expression for B0(x) from Problem 3.7 to get the final form. Problem 3.8. Based on Problem 3.7, make a conjecture about what the smallest power term in Bk(x) is. Prove this conjecture using the closed form for Bk(x) that we just derived. Prove this by n directly determining the smallest n for which k > 0. Also, give a nice combinatorial justification. Problem 3.9. Using the closed form for Bk(x), write down Bk(x) explicitly for n = 0, 1, 2, 3. Com- pare with Problem 3.7.

Returning to the matter at hand, we have Bk(x) as a finite product (with a pre-factored denomina- tor!), so we can use partial fractions to write it as a sum of geometric series. k 1 X αj = (1 − x) ··· (1 − kx) 1 − jx j=1 1 Multiply both sides by (1 − tx) for some 1 ≤ t ≤ k and then substitute x = t . This gives (−1)k−ttk−1 α = t (t − 1)!(k − t)!

18 This gives us a form for Bk(x) that is easy to work with (i.e. extract coefficients). k k Y 1 X 1 B (x) = xk = xk α . k 1 − jx j 1 − jx j=1 j=1 Now we can extract coefficients directly, using geometric series:

k n X 1 = [xn] B (x) = α [xn−k] k k j 1 − jx j=1 k X n−k = αjj j=1 k k X (−1)k−jjk−1 1 X k = jn−k = (−1)k−j jn (j − 1)!(k − j)! k! j j=1 j=1 n Problem 3.10. Notice that the final expression shows that k can be written as k! times some integer expression. What does this expression count?

set partitions on n elements

Now let’s consider An(y). This answers the question “for n = 23, how many set partitions of [n] are there with k classes”, where 23 is an arbitrary non-negative integer. More precisely, the number of k set partitions of [n] with k classes is exactly [y ] An(y). Compare this with the analogous statement for Bk(x).

Problem 3.11. Write down An(y) explicitly, for n = 0, 1, 2, 3. Are the expressions “nice” in some way? How is this different from Problem 3.7? How are they related?

To find An(y), we again start with the recursion of Lemma 3.5 and obtain the following. X n X n − 1 X n − 1 yk = yk + k yk k k − 1 k k k k This gives the following functional equation. d   d  A (y) = yA (y) + y A (y) = y 1 + A (y) n n−1 dy n−1 dy n−1

We apply this recursively to write An(y) in terms of A0(y). We can’t use this for n = 0 because we summed over all k and the recursion fails when n = k = 0.   d n   d n A (y) = y 1 + A (y) = y 1 + 1 n dy 0 dy n (Recall that you already showed that A0(y) = 1.) We can’t use this to get a formula for k in any simple way, but we can use this to show the sequence is unimodal in k (see Wilf for details). Problem 3.12. Based on Problem 3.11, make a conjecture about what the smallest and largest power terms in An(y) are. Prove this conjecture by induction using the functional equation for An(x) that we just derived. Prove this by directly determining the smallest value of n and the largest value of n n for which k > 0.

Problem 3.13. Using the functional equation for An(y), write down An(y) explicitly for n = 0, 1, 2, 3. Compare with Problem 3.11.

19 EGF for set partitions

A set partition is a labelled object (what would the “unlabelled” version look like?), so we might have suspected that an EGF is more appropriate.

k X nxn X 1 X k xn = (−1)k−j jn k n! k! j n! n≥0 n≥0 j=1 1 X k X (jx)n = (−1)k−j k! j n! j n≥0 1 X k = (−1)k−j ejx k! j j 1 = (ex − 1)k k!

This is a more “compact” generating function than Bk(x), but there are other reasons for looking at the EGF. Consider the following bivariate generating function (part exponential part ordinary).   n X X n x X 1 x G(x, y) = yk = (ex − 1)kyk = ey(e −1) k n! k! k≥0 n≥0 k≥0 P n Let bn = k k . These are the Bell numbers, which count the number of ways of partitioning [n] into any number of classes. We have an explicit formula for these, in the sense that we have an n explicit formula for k and the sum is really only for 1 ≤ k ≤ n. However, we notice that setting y = 1 in G(x, y) gives exactly the exponential generating function for the Bell numbers. n X x x F (x) = G(x, 1) = b = ee −1 n n! n≥0

d We can use F (x) to get a recursion for the bn, by applying the operator x dx log (·).   d X bn d x x log xn = x log ee −1 dx  n!  dx n≥0

P bn n n≥0 x (n−1)! = xex P bn n n≥0 n! x X bn X bn xn = xex xn (n − 1)! n! n≥0 n≥0 Now we extract coefficients on each side. This gives X n − 1 b = b . n k k k

We can also use F (x) to obtain another formula for the bn, by directly extracting coefficients. xj t n x 1 x 1 X e 1 X 1 X (xj) 1 X j b = n![xn] ee −1 = n![xn] ee = n![xn] = n![xn] = n e e j! e j! t! e j! j≥0 j≥0 t≥0 j≥0 Check this numerically! (with a computer, most likely)

1 Problem 3.14. The e means we are slipping into the analytic world. Can you extract coefficients 1 to get this expression for bn without pulling out e first?

20 questions

P k 1. In Problem 3.2 we applied k (·) y to the recursion of Problem 3.1 to obtain a formula for P n An(y), and only found Bk(x) via the intermediary of C(x, y). Now apply n (·) x to the recursion of Problem 3.1 to obtain a formula for Bk(x). 2. Inspired by Proposition 3.3, 1 X a) Find a such that = a xn. n (1 − x)2 n n≥0 x X b) Find b such that = b xn. n (1 − x)2 n n≥0 c) Let s, t be integers with t ≥ 0 and t − s ≥ 0. Give the generating function for P n+s n n≥0 t x . n 3. We extended the definition of k to all n, k. Check that these values make sense. Check that Lemma 3.5 is valid for all n, k except n = k = 0 (the proof assumed that n ≥ k > 0). n 4. Write out a small table of values of k . Verify that the rows and columns “are” the GFs An(y), Bk(x) and that the whole table “is” C(x, y). Explain what “are” and “is” mean. n 5. Write out a small table of values of k . Verify that the rows and columns “are” the GFs An(y) and Bk(x), and that the whole table “is” C(x, y). Explain what “are” and “is” mean.

6. For the binomial numbers, we used the closed form for An(y) to find a closed form for C(x, y). Can we do something similar for set partitions? d 7. In deriving the EGF for the Bell numbers, we applied the operator dx log(·) to a formal power series with non-zero constant term. This is clearly illegal. d a) Explain why dx log(x) composed with A(x) does not make sense when a0 = 1. (there are two reasons!) d b) Explain why dx log(x + 1) composed with A(x) − 1 does make sense when a0 = 1. d 0 c) Show that dx log(x + 1) composed with A(x) − 1 gives A (x)/A(x) when a0 = 1. You may use the fact that the chain rule is valid for formal power series. d)  Show that the above works just as well for any a0 6= 0. This is a little technical and in most interesting cases where a0 6= 0 we have a0 = 1 anyway. n 8. Define k to be the number of ordered partitions of [n] into k parts. So the parts are still (unordered) subsets of [n], but there is an order among the parts. Show that:

X X nxn X X n xn yk yk = k n! k n! k! k≥0 n≥0 k≥0 n≥0 1 9. In deriving the series representation for bn we pulled out a factor of e . We could justify this by taking a broader view of formal power series (this would require us to upgrade Lemma 1.5). . . We could justify this by saying that we moved from formal power series to analytic ones. . . 1 Or, we could extract coefficients without pulling out a factor of e . Do this. You should n discover that bn is the coefficient of x in a rather imposing looking triple sum, which should simplify nicely. 10. We found that the EGF for set partitions with k parts (where k is a fixed parameter) is:

X nxn 1 = (ex − 1)k k n! k! n≥0

21 Can we directly conclude the EGF for the Bell numbers as ! X xn X X n xn X X nxn X 1 F (x) = b = = = (ex − 1)k n n! k n! k n! k! n n k k n k In other words, instead of explaining why it is legal to plug y = 1 into the expression for G(x, y) we got above, explain why we can (in this case at least) add an infinite number of formal power series. This question is not asking you to consider some kind of metric on generating functions in order to show that the partial sums converge. P n n k Qk 11. (slightly  ) Recall that Bk(x) = n≥0 k x = x / j=1(1 − jx). P n n n n Let C(x) = n≥0 cnx with cn = 2 2 + 2 1 + 0 . P n a) Apply n≥0 (·) x in order to express C(x) in terms of various Bk(x). b) Simplify the expression for C(x) using the known expression for Bk(x) and compute cn. c) Having overcome your shock at the surprisingly simple formula you obtained for cn, you n now realize that the defining equation for cn has become a curious identity of various k . Give a combinatorial proof of this identity, by identifying a particular interpretation of your nice formula for cn. n n n n d) What can you say about 6 3 + 6 2 + 3 1 + 0 ? n e)  Generalize this to an identity among t for 0 ≤ t ≤ k where k is some fixed nonnegative integer. n n 12. We’ve looked at binomial coefficients k and Stirling numbers of the second kind k . Con- n sider now the Stirling numbers of the first kind k . Without going into details, they obey the following recursion: n n − 1 n − 1 = + (n − 1) k k − 1 k n Find out as much as you can about k using this recursion and the techniques we’ve seen here. Note that we have not said what (if anything) they count), simply that they obey the above recursion. In fact we will encounter these “first” Stirling numbers later, where they will lead us into another method.

22 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 4. manipulations: formal power series and sequences

We know how to manipulate formal power series, but some manipulations are more useful than others, given that what we are really interested in is the sequence of the coefficients. So we want to look at operations on formal power series that are useful in terms of the way they act on the underlying sequence. Or, put another way, given a particular “useful” operation on sequences, what is the corresponding effect on the formal power series?

We will use consistent notation, in the sense that A, an, A represent a set of objects, the number of objects of size n, and the generating function (ordinary or exponential). Then B, bn, B correspond to another set of objects, etc. In case we want to refer to many different sets of objects we will (t) (t) (t) sometimes use bracketed superscripts to denote them: A , an , A , for the t-th type of object. Note that some of these “rules” apply in the same way to ordinary and exponential generating functions, while some of them apply differently. It’s worth remembering that any particular formal power series is neither ordinary nor exponential; it’s more a matter of the way we use it. The element of the sequence that we care about is either the coefficient of xn, or that coefficient divided by n!; in either case we have a formal power series. However, we are really relating operations on the generating function to effects on the actual sequence, so we should expect differences sometimes.

sums

Our first observation is trivial, so we state it in three different ways.

Lemma 4.1. We have the following correspondence for adding two sequences.

an = bn + cn ⇐⇒ A = B + C X n (bn + cn)x = B + C n n [x ](B + C) = bn + cn

This is not a surprise and in fact we used it many times already. It corresponds to A = B ∪˙ C. It is useful to note that Lemma 4.1 is in fact a statement about how to extract coefficients. Problem 4.2. Show that the three different versions of Lemma 4.1 are equivalent. This is more a question of understanding the equivalence of different points of view than it is about truly “proving” anything. Problem 4.3. Lemma 4.1 is stated as if it were for OGFs. Or is it? Does Lemma 4.1 (really) apply differently to OGFs and EGFs?

Although Lemma 4.1 is stated in terms of the sum of two generating functions, it applies equally well to the sum of t generating functions. In fact, it can even apply to an infinite sum of generating functions, as long as for any fixed n only finitely many of them have non-zero coefficients for xn. This condition is just equivalent to saying that the sum of these generating functions is well defined, which is just equivalent to saying that in the corresponding infinite union of sets, there are only finitely many objects of any given size. Example 4.4. Determine the OGF for 01-strings with no consecutive 0s, by partitioning the set according to the number of 0s.

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

23 Let B be the set of all 01-strings with no consecutive 0s. Then we have the following partition of B, where S = {, 1, 11, 111, 1111, ···} is the set of all strings with no zeros at all. We use this to derive the OGF B(x) for the class B. B = S ∪˙ S0S ∪˙ S0S10S ∪˙ S0S10S10S ∪˙ S0S10S10S10S ∪˙ · · · 1 x x3 x5 x7 B = + + + + + ··· 1 − x (1 − x)2 (1 − x)3 (1 − x)4 (1 − x)5 Note that the generating function for S is “easy”, which is what makes this partition useful. Also notice that for t ≥ 2, the t-th part of this partition contains no string of length less that 2t − 1, meaning that in the sum of generating functions the coefficient of xn is nonzero only in the first d(n + 3)/2e. So the infinite sum of generating functions is legal. We can recover the OGF that we already know by recognizing the above as a geometric series. k 1 x X  x2  1 x 1 1 + x B = + = + = ··· = 1 − x (1 − x)2 1 − x 1 − x (1 − x)2 x2 1 − x − x2 k≥0 1 − 1 − x

This of course works for any partition, not just the “obvious” ones. Problem 4.5. Let A be the set of all 01-strings, and consider the partition of A given by the following. A =  ∪˙ 0 ∪˙ 1 ∪˙ 00A ∪˙ 01A ∪˙ 10A ∪˙ 11A. Use this to find a functional equation for the OGF A(x), and hence find A(x). Without simplifying A(x), extract the coefficient of xn. Is the answer correct? Does A(x) simplify?

Just in case you are still in doubt as to the number of unconstrained 01-strings, we will derive it in yet another manner. Example 4.6. Find the number of unconstrained 01-strings in yet another manner.

Let A be the set of 01-strings. Note that if there are an strings of length n in the set A then there are an−1 strings of length n in 0A (this is perhaps not what you were expecting). In any case, we get the following, where A(x) is the EGF for A. A =  ∪˙ 0A ∪˙ 1A X xn X xn X xn a = 1 + a + a n n! n−1 n! n−1 n! n≥0 n≥1 n≥1 Z A(x) = 1 + 2 A(x) dx

0 A (x) = 2A(x) (and a0 = 1) A(x) = e2x n n 2x n n Extracting coefficients we see that an = n![x ] A = n![x ] e = n!2 /n! = 2 . Note that the EGF for 0A is not the EGF for 0 times the EGF for A. This is perhaps worth thinking about; we’ll get a better idea when we explore products shortly.

At the risk of going on far too long on a straightforward point, it is worth noticing that sums can also mean differences; see Example 2.9. As formal power series we can just as well have negative as positive terms (although as generating functions that enumerate something this is unlikely). From the point of view of the sets, we have not disjoint union but the deletion of a subset. Note that just

24 as the terms in a disjoint union must be disjoint for it to truly be a sum, the term being subtracted must be a subset of the first, so as to truly be a difference.

shifts

It is often useful to “shift” a sequence. We accomplish this differently for ordinary and exponential generating functions, since we want to keep the n! lined up with the xn.

Lemma 4.7. We have the following correspondences for shifting the underlying sequence. t−1 X X A − a0 + a1x + ··· + at−1x for OGF A = a xn : a xn = n n+t xt n≥0 n X xn X xn dt for EGF A = a : a = A n n! n+t n! dxt n≥0 n

This is easy to check (which you should do). In particular, you should convince yourself that we are not actually multiplying by the formal power series reciprocal of xt (which is illegal by Lemma 1.3). Rather this is a notation that indicates that we can cancel xt cleanly into the numerator. The point of Lemma 4.7 is that we can go effortlessly from recurrences for coefficients to functional equations for generating functions. We’ll start with an OGF example. P n Example 4.8. Let fn+2 − fn+1 − fn = 0 for n ≥ 0. Find a closed form for F = n≥0 fnx .

P n Applying n≥0 (·) x to the recurrence gives the following. X n X n X n 0 = fn+2x − fn+1x − fnx n≥0 n≥0 n≥0 F − (f + f x) F − f 0 = 0 1 − 0 − F x2 x (f + f x) − x(f ) F = 0 1 0 1 − x − x2 Given any linear recurrence on the coefficients, we will obtain a rational function as the generating function (see the exercises).

Problem 4.9. Describe precisely the OGF arising from any fixed-length linear recurrence. You should explain exactly how the denominator and the numerator are determined by the recurrence (the numerator is a little more technical). Problem 4.10. Suppose a recurrence is only valid for (say) n ≥ 17. Does this method still work to find the ordinary generating function?

For EGFs we get differential equations. But nice ones. P xn Example 4.11. Let gn+2 − gn+1 − gn = 0 for n ≥ 0. Find a closed form for G = n≥0 gn n! .

P xn Applying n≥0 (·) n! to the recurrence gives the following. X xn X xn X xn 0 = g − g − g n+2 n! n+1 n! n n! n≥0 n≥0 n≥0 0 = G00 − G0 − G

25 Given any homogeneous linear recurrence on the coefficients, we will obtain a homogeneous differ- ential equation with constant coefficients (see the exercises). These types of differential equations have solutions obtained as linear combinations of terms of the form xderx, where r is a root of the characteristic equation and d is non-negative integer less than the multiplicity of the corre- sponding root. The characteristic equation is the polynomial obtained by formally replacing the j-derivative of F with the j power of a variable λ. For the example above we get the following, where c1 and c2 are constants determined by the initial values g0 and g1. γx γx G(x) = c1e + c2e

Problem 4.12. Describe precisely the coefficients of the differential equation in the EGF arising from any fixed-length linear recurrence.

Problem 4.13. Suppose a recurrence is only valid for (say) n ≥ 17. Does this method still work to find the (exponential) generating function?

Problem 4.14. Of course Example 4.8 and Example 4.11 describe the same sequence; we used fn and gn only to distinguish the two methods. Finish the details of Example 4.8 and Example 4.11 to show that we get the same expressions in each case.

The following is more of a conceptual project based on the preceding, but is worth at least thinking about in principle.

Problem 4.15. Consider a homogeneous linear recurrence.

• Explain how this corresponds exactly to an OGF F (x) and an EGF G(x). You should explain how any one of the recurrence or F (x) or G(x) determines the other two. • Explain how using partial fractions on F (x) one can obtain a solution for the recurrence. You should explain how (at least in principle) one can obtain F (x) from the solution. • Explain how using the theory of homogeneous linear differential equations one can obtain a solution for the recurrence. You should explain how (at least in principle) one can obtain G(x) from the solution.

Putting this together, explain (in principle at least) how one can use the solution for the recurrence obtained from F (x) in order to prove what the solution G(x) is to the differential equation is. Notice that this is, at least in principle, more general than describing how to solve linear differential equations, since we never assumed that the power series we obtain correspond to actual functions.

weightings

We might want to weight each an by a function of the index n. In fact we already did this, for instance in deriving the expression for µn leading up to Problem 2.10. Using this idea, it is straightforward to weight each an by n, or indeed any polynomial function of n.

26 Lemma 4.16. We can weight an as follows. X X d for OGF A = a xn : na xn = x A n n dx n≥0 n≥0 X d p(n)a xn = p(x )A n dx n≥0 X xn X xn d for EGF A = a : na = x A n n! n n! dx n≥0 n≥0 X xn d p(n)a = p(x )A n n! dx n≥0

This applies in the same way to both types of generating functions, but notice the eerie similarities between weighting and shifting for EGF. The main use of this is in computing averages (and variances. . . ) But we give some other examples here.

P n Example 4.17. Suppose we know a nice closed form for A(x) = n anx , and we are interested in P 2  n n n − 2n anx .

We find the generating function for the new series in terms of A(x). !  d 2  d  d d d x − 2 x A(x) = x x A(x) − 2x A(x) dx dx dx dx dx d = x xA0(x) − 2xA0(x) dx = xA0(x) + x2A00(x) − 2xA0(x)

As another application, we can use this to evaluate series.

P n Example 4.18. Find the value of the series n≥0 p(n)a where |a| < 1.

We build the desired series from a known generating function. X 1 xn = 1 − x n X d 1 p(n)xn = p(x ) dx 1 − x n

X n d 1 p(n)a = p(x ) dx 1 − x n x=a Note that this is using the fact that the generating function is an analytic function (of course this needs |a| < 1. . . ).

A similar idea holds for evaluating finite sums. The method works for any polynomial, but even just p(n) = n2 gives something interesting. Example 4.19. Find a “formula” for the sum of .

27 n n 2 n 2 X X  d  X  d  1 − xn+1 k2 = k2xk = x xk = x dx dx 1 − x k=0 k=0 x=1 k=0 x=1 x=1 Evaluating this at x = 1 (after the differentiation of course!) is a little less risqu´e, since it is actually a finite sum. Note that in this example we don’t really get a formula, we get a “formula”, since in order to evaluate we need to simplify and cancel. Try evaluating the final expression to see what happens. If the technical details seem overwhelming, first use this an example to derive Pn an expression with k=0 k and evaluate that. If the final result is a little disappointing, then you can look forward to a better formula for the sum of squares, which we will see shortly.

products

Multiplying generating functions corresponds to considering ordered pairs of objects. There is a slight twist with EGFs, but actually it’s the kind of twist we might have expected had we thought of them as enumerating labelled objects.

Lemma 4.20. n ! X n X n X X n for OGF A = anx ,B = bnx : akbn−k x = A(x)B(x) n≥0 n≥0 n≥0 k=0 n ! X xn X xn X X n xn for EGF A = a ,B = b : a b = A(x)B(x) n n! n n! k k n−k n! n≥0 n≥0 n≥0 k=0

This is just the rule for multiplying formal power series, but consider the right-hand expression in each case. For OGF, the generating function is for objects that are ordered pairs, with the first object being from A, the second object being from B, and the weight of the ordered pair being the sum of the weights of the parts. Assume we have two families of objects A and B. Define the family C = A × B, where the size of an object (α, β) ∈ C is the size of α plus the size of β. Then a straightforward argument Pn shows that the number of objects of size n in C is k=0 akbn−k. In other words, C(x) = A(x)B(x). Taking cartesian products of sets with size additive corresponds to multiplying OGFs. We often informally think of cartesian products as being a kind of concatenation. But cartesian products are not actually associative: We informally think of U × V × W as a set of ordered triples, but in fact (U × V ) × W 6= U × (V × W ). Problem 4.21. Consider ordered pairs of 01-strings, each string having no adjacent 0’s, where the size of an ordered pair is defined to be the sum of the lengths of the two strings. Find the OGF. Your answer is not the same as the OGF for 01-strings with no repeated 0s. Why not? Isn’t such an ordered pair just a big 01-string with no repeated 0s?

n For EGF, the right-hand side has a strange factor of k . This is best understood if A and B represent labelled objects. Consider an object to be made of “atoms”, where the weight of the object is the number of atoms. The weight of an ordered pair is again the sum of the weights of the parts, meaning the total number of atoms. A labelling of an object α ∈ A of weight k means an assignment of {1, ··· , k} to the atoms. Likewise a labelling of an object β ∈ B of weight n − k means an assignment of {1, ·, n − k} to the atoms of β. If we consider the object (α, β) then it would have weight (k) + (n − k) = n, but what of the labelling? We want to assign {1, ··· , n} to the atoms of (α, β), in such a way that the relative order of the atoms on α respects the original labelling, and

28 likewise for β. Note that if we choose any k labels from {1, ··· , n} to be applied to α, there is exactly one way to assign them consistently with the original ordering, and exactly one way to assign the remaining labels to β in a manner consistent with its labelling. This mean that for every α and β of n sizes k and n − k, there are k labelled ordered pairs that are consistent with these two. In other n words, if we fix labelled α and β, there are k labelled ordered pairs (α, β) that can be built from them. As an example, the following two connected graphs α and β can be put together as one graph in many different labelled ways, one of which is shown corresponding to the choice of {2, 4, 5} for the labels for the first object. Having decided on a set of labels for α in (α, β), the original labels of α and β determine the rest.

1 4 1 7 1 2 {2, 4, 5} 2 3 2 3 4 5 3 6 −→ α β relabelled (α, β)

The individual objects don’t change, nor does the relative order of their labels. The only possible n variation is in the actual set of labels. So for each pair of objects α and β there are k different relabelled ordered pairs. We sometimes use A ? B for the set of all relabelled ordered pairs of labelled objects. This is the reason why exponential generating functions are often useful in enumerating labelled objects: because their multiplication rule corresponds to the labelled version of cartesian products of the underlying sets. Consider again Example 4.6. We saw that the EGF for the cartesian product 0A = {0} × A was not the EGF for {0} times the EGF for A. This is because the cartesian product was not a “relabelled” cartesian product.

Example 4.22. Find the EGF for {0} ? A as a “relabelled” cartesian product. What does it mean?

If we are thinking of a 01-string as a labelled object, then we are thinking of a string as a multiset of 0’s and 1’s, each one labelled by its position in the string. Then {0} × A is a collection of ordered pairs, where the first element is from {0} and the second element is from A. There is only one string in {0}; as a labelled object it consists of a single zero labelled by 1. A string of length n from A consists of n symbols (either 0 or 1) labelled by {1, 2, ··· , n}. Relabelling then means choosing one label from {1, 2, ··· , n + 1} to be the label for the single 0, and the others are for the length-n string. It might help to think of the element from {0} as being red and the element of A to be blue. Then the relabelled cartesian product consists of inserting the single red 0 somewhere in the length-n blue string. There are (n + 1)2n such ordered pairs, as you can find from a direct counting argument (right?). Now the EGF for the relabelled cartesian product {0} ? A is straightforward: it is the product of the two EGFs. Let’s define B = {0} ? A, so we are looking for B. !   x1 X xn X xn X xn+1 X xn+1 B(x) = 1 2n = 2nx = 2n = (n + 1)2n 1!  n!  n! n! (n + 1)! n≥0 n≥0 n≥0 n≥0 n n Then bn = [x ] B(x) = (n + 1)2 , as you already found out. We will see this idea again when we talk about admissible constructions.

Multiplication extends to more than two terms. The idea is the same but we will write it down anyway; notice that the inner sum is in general not simply a single variable.

29 Lemma 4.23. ! X X for OGF a(1)a(2) ··· a(k) xn = A(1)(x)A(2)(x) ··· A(k)(x) n1 n2 nk n≥0 n1+···+nk=n ! X  n  X xn for EGF a(1)a(2) ··· a(k) = A(1)(x)A(2)(x) ··· A(k)(x) n1 n2 nk n1 n2 ··· nk n! n≥0 n1+···+nk=n

powers

Powers of a generating function are just a special case of products, but they are a useful special case.

Lemma 4.24. ! k X X n   for OGF an1 an2 ··· ank x = A(x) n≥0 n1+···+nk ! X  n  X xn  k for EGF an1 an2 ··· ank = A(x) n1 n2 ··· nk n! n≥0 n1+···+nk

This is already useful for the “trivial” constant sequence with an = 1 for all n. Example 4.25. Let A = N, the natural numbers, with the size of an integer defined as its value. P n 1 The OGF for A is simply A(x) = n x = 1−x (there is one non-negative integer of every size). The cartesian product of A with itself k times gives (ordered) t-tuples of integers, with the size of each t-tuple being its sum. So the OGF for A×t counts the number of ordered sums of non-negative t 1 integers according to their sum. But this OGF is just A (x) = (1−x)t . P n n yk We can extract coefficients from this, using n k y = (1−y)k+1 that we derived earlier. 1 xt−1 x(t−1) n + t − 1 [xn] = [xn+t−1] = [xn+t−1] = (1 − x)t (1 − x)t (1 − x)(t−1)+1 t − 1 This tells us that the number of ways or writing n as an ordered sum of t non-negative integers is n+t−1 −t t−1 . It also tells us how to extract coefficients from (1 − x) , which generalizes our ability to extract coefficients from geometric series.

partial sums

Lemma 4.26. n ! X X X 1 for OGF A = a xn : a xn = A(x) n k 1 − x n≥0 n≥0 k=0 n ! X xn X X n xn for EGF A = a : a = exA(x) n n! k k n! n≥0 n≥0 k=0

This is a special case of multiplying A(x) × B(x) (what is B(x)? what is bn−k?), but a very useful one. Given an OGF A(x) where an counts the number of objects of size n, we consider the OGF 1 S(x), where sn counts the number of objects of size at most n. This rule says that S(x) = 1−x A(x).

30 1 In other words, multiplication by 1−x maps an OGF of a sequence to the OGF of its partial sums. 1 In this sense one can think of multiplication by 1−x as being a partial-sums operator for OGFs. The partial-sums operator for EGFs is ex. For instance, we can use this to compute the sum of squares (or any other polynomial). Compare this with Example 4.19.

Example 4.27. Find a forumla for the sum of squares.

We use the partial sum operator to find the OGF for the sum of squares. Consider: n n ! X 2 X n X X 2 n bn = k B(x) = bnx = k x k=0 n n k=0 Then we can describe B in terms of the partial sum operator and weightings. n ! 2 2 X X 1 X 1  d  X 1  d  1 B = k2 xn = n2xn = x xn = x 1 − x 1 − x dx 1 − x dx 1 − x n k=0 n n

Problem 4.28. Expand the final formula in Example 4.27 (ie, compute the derivatives, etc). Be d 2 d careful with x dx ; it means apply the operator x dx two times, which does not mean multiply the 2 Pn 2 second derivative by x . Then extract coefficients to get a closed form expression for k=0 k . Notice that in computing derivatives on the right-hand side you get terms like (1−x)−t; we can use the result from Example 4.25 to extract coefficients from them.

Pn 1 As another example, consider the harmonic numbers hn = k=1 k . We can find an OGF similarly. Pn Example 4.29. We find the OGF for hn = k=1 1/k. Briefly, it consists of the partial sum operator applied to the OGF for 1/k.

n ! X X X 1 1 X 1 1 1 H(x) = h xn = xn = xn = log n k 1 − x n 1 − x 1 − x n≥1 n≥1 k=1 n≥1

We recognized the series for log. Alternatively, proceed as for the sum of squares but integrate instead of differentiate.

One way of understanding a partial sum is that we build a size-n object out of some size-k object plus “filler” of size n − k. Seen in this light the exponential version might make more sense: the size-k object needs to be relabelled with a k-subset of {1, 2, ··· , n}. So it is a “relabelling partial sum”.1 Here are two examples taken from things we’ve seen before.

Example 4.30. Recall that the Bell numbers give the number of partitions. So bn counts the number of partitions of {1, 2, ··· , n} into any number of non-empty classes. One can show directly that Pn n bn+1 = k=0 k bk (see Exercise 4.15). On the right we recognize a relabelling partial sum; on the left a shifted sequence. So we get B0 = exB.

Example 4.31. Recall that a derangement is a that has no fixed points. So dn counts Pn n the number of derangements on n points. One can show directly that n! = k=0 k dk. On the right we recognize a relabelling partial sum; on the left the sequence corresponding to a well known EGF. So we get 1/(1 − x) = exD.

1 What a horrible name. Suggestions accepted.

31 exercises

Pt 1. Consider a sequence an that satisfies a linear recurrence of order t, so that j=0 αjan+j = 0 for n ≥ 0, with aj given for j < t. Show that the OGF A(x) is a rational function and describe exactly the denominator. In fact you could just give A(x) explicitly, the numerator being best expressed as a double sum. This exercise is shamelessly plagiarized from Problem 2.3.

2. Suppose that an OGF A(x) is rational (the ratio of two polynomials). Show that the an satisfy a linear recurrence for sufficiently large n. How big does n have to be? What, exactly, is the recurrence? 3. Explain how the study of linear recurrences (of fixed order) is “equivalent” to the study of rational functions. (hint: This is really just asking if you’ve done the previous two questions.) Pt (j) (j) 4. Consider the functional equation j=0 αjF (x) = 0, where F means the j-th derivative of F . Show that given any values f0, f1, ··· , ft−1 there is a unique formal power series F j that satisfies the equation with [x ] F = fj for 0 ≤ j ≤ t − 1. No calculus! This exercise is shamelessly plagiarized from Exercise 1.9. 5. Suppose that an EGF A(x) satisfies a homogeneous differential equation with constant coef- ficients. Show that the an satisfy a linear recurrence for sufficiently large n. How big does n have to be? What, exactly, is the recurrence? 6. Explain how the study of linear recurrences (of fixed order) is “equivalent” to the study of homogeneous differential equations with constant coefficients. (hint: This is really just asking if you’ve done the previous two questions.) Pn r 7. Show that k=0 k is a polynomial in n of degree r + 1. Do this by finding the generating function inspired by the method of Problem 4.28, and by considering the form of the partial fraction decomposition. You do not necessarily need to work out all the details of the partial fraction decomposition, just enough to conclude that the given sum is a polynomial in n of degree r + 1. P n P n 8. a) Let A(x) = n≥0 anx . Give an expression in terms of A for n≥t an−tx P n P n b) Let B(x) = n≥0 bnx /n!. Give an expression in terms of B for n≥t bn−tx /n! 9. In Example 4.25 we found the number of ways of writing n as an ordered sum of t non- negative integers. Inspired by this, give an OGF and a formula for the number of ways of writing n as an ordered sum of t positive integers. 10. Give an OGF and a formula for the number of ways of writing n as an ordered sum of t integers each of which is at least 7. 11. Give an OGF and a formula for the number of ways of writing n as an ordered sum of at most t positive integers. Explain why the number of ways of writing n as an ordered sum of exactly t non-negative integers is not the same as the number of ways of writing n as an ordered sum of at most t positive integers. 12. In Exercise 4.9 you found the OGF for the number of ways of writing n as an ordered sum of t positive integers. Give the OGF for the number of ways of writing n as an ordered sum of any number of positive integers, and a closed form for the coefficient. The answer should be a bit of a surprise. Or perhaps not? Can you prove it directly?

32 13. We will use the idea of multiplying generating functions in what might be an unexpected way. Let an and bn be sequences for n ≥ 0. We will show that n X an = bn, ∀n ≥ 0 ⇐⇒ an = bn − bn−1, ∀n ≥ 0 k=0

a) Let A(x) and B(x) be the OGFs for the two sequences, and for convenience let b−1 = 0. Pn n Consider that k=0 ak = [x ] R(x)A(x) for some FPS R(x). What is R(x)? b) Explain why R(x)A(x) = B(x) if and only if A(x) = R−1(x)B(x), where R−1(x) is the reciprocal of R(x). −1 c) Extract coefficients from A(x) = R (x)B(x) to show that an = bn − bn−1. d) Of course one may show the given identity directly fairly trivially (please do so!), but notice that the generating function approach allowed us to deduce the right-hand side. Given the expression for the b’s in terms of the a’s, we found an expression for the a’s in terms of the b’s.

14. Using the approach of the previous question, find an expression for an in terms of various bk Pn n given that k=0 k ak = bn.

15. Let bn be the , that is the number of partitions of {1, 2, ··· , n} into non-empty Pn n parts. Show directly that bn+1 = k=0 k bk. Explain how the “smaller” partition gets “relabelled”. (hint: In a partition of {1, 2, ··· , n + 1} the element n + 1 is in some part. What is left over if you remove that part? How many ways are there to choose that part?)

16. Let dn be the number of derangements, that is the number of permutations on {1, 2, ··· , n} Pn n with no fixed points. Show directly that n! = k=0 k dk. Explain how the “smaller” derangement gets “relabelled”. (hint: Every permutation consists of a derangement on some points and the identity on the rest. How many ways are there to choose the “deranged” points?

33 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 5. admissible constructions

We have seen generating functions as formal power series. We have seen examples of using them to derive formulas and identities for various combinatorial functions. We have seen ways to relate operation on sequences to generating functions. Now we want to look at relationships between constructions of objects and the generating functions corresponding to them. This will in many cases allow us to write down a formalized description of the class of objects we are interested in, and deduce directly from this a generating function. This approach is well-documented in Flajolet&Sedgewick [1].

framework

Assume that B(1), B(2), ··· , B(m), ··· are combinatorial classes of objects, each equipped with some weight function. By combinatorial class we mean that there are a finite number of objects of any (j) given weight in each class. In other words, the numbers bn are well defined for each j. Let A be some class of objects with some weight function that is built from the various B(j). We might write A = Ψ[B(1), B(2), ··· , B(m), ··· ] where Ψ represents the details of the process. By this is meant nothing more than that the number of objects of A of each size is determined by the number of (j) (j) objects of each size of the various B . So the numbers an are determined by the various bn . In this case we say that Ψ is an admissible construction. We conclude that there exists some function Φ such that A(x) = Φ[B(1)(x),B(2)(x), ··· ,B(m)(x), ··· ]. The generating function A(x) is determined by the various generating functions B(j)(x). One might be tempted to call this a theorem, but in fact tautology would be a better word. A generating function is completely determined by the sequence of coefficients it encodes; indeed, it is not a bad idea to think of the generating function and the sequence as essentially the same object. So if the sequences underlying the B(j)(x) determine the sequence underlying A(x), then the B(j)(x) determine A(x). This viewpoint however will turn out to be very powerful. Our plan will be to identify useful ways of deriving a class A from various B(j), and then use these as building blocks to construct a wide variety of classes. So our real goal is not to determine which constructions are admissible, but rather identify, among the admissible constructions Ψ, those that are combinatorially useful and for them determine the corresponding Φ. This approach is due to Phillipe Flajolet. It’s called the symbolic method, since it consists in writing down a symbolic description of a class of objects, which translates directly into an expression for the generating function. Since we will look at things from the point of view of constructions, we will distinguish between labelled objects and unlabelled objects. We will generally take the point of view that the objects under consideration are composed of “atoms”, where the size of an object is the number of atoms. When we speak of labellings, we are referring to a bijective assignment of the labels {1, 2, ··· , n} to the n atoms of an object (which of course has size n). Here are some examples of this. • The set of strings on v different symbols. Such a string can be thought of as a sequence of n symbols, unlabelled. In this sense it would be represented by an OGF.

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

34 • The set of strings on v different symbols. Such a string of length n can be thought of as a collection of n marbles, of v different types, labelled by {1, 2, ··· , n}. The labels correspond to position and the type corresponds to symbol. In this sense it would naturally be represented by an EGF. • The set of labelled graphs. Such a graph of size n consists of n vertices, labelled by {1, 2, ··· , n}, and a set of edges where each edge joins two different vertices. In this sense it would naturally be represented by an EGF.

Note that the key distinction is not EGF or OGF, but labelled or unlabelled.

warm-up: sums

The first way one might combine classes to make a new class is by disjoint union. The weight in the new class is the weight in whichever of the old classes the object lived in. This works the same way for unlabelled and labelled. One sometimes writes “+” instead of “ ∪˙ ” for a disjoint union of sets, as it behaves much like an addition operator.

unlabelled: A = B ∪˙ C ←→ OGF: A = B + C labelled: A = B ∪˙ C ←→ EGF: A = B + C

When specifying a construction we also need to specify the weight function. Usually this will be obvious but for completeness, the weight of an object in A is its weight according to the weight function for B if the object is in B or its weight according to the weight function for C if it is in C.1 Note that this works the same for labelled and unlabelled: we just add the generating functions. In fact there is nothing new here, except for the fact that we are now referring to this as an “admissible construction”.

warm-up: products

The second natural way to combine classes is by taking pairs of objects. The weight of the new compound object is the sum of the weights of the two subobjects. For unlabelled objects, we have the cartesian product. The set of new objects is the set of all unlabelled ordered pairs, where each element is an unlabelled structure. For labelled objects we have the ?-product (which we might call the relabelling cartesian product). The set of objects is the set of all labelled ordered pairs, where each element is a labelled structure. The overall labelling is done in a way that is order-consistent with each part.

unlabelled: A = B × C ←→ OGF: A = BC labelled: A = B ? C ←→ EGF: A = BC

It is worth emphasizing that we are thinking of the objects as being composed of “atoms” which contribute to the weight, and it is the atoms that are labelled. It is also worth repeating that despite the similarity in the way the labelled and unlabelled cases behave, they are not the same. This generalizes to products of more than two classes in the obvious way. Note that we consider objects of type B(1) × B(2) × · · · × B(k) to be ordered k-tuples, as opposed to nested ordered pairs.

1 A function on a domain X can be thought of as a set of ordered pairs, so f is identified with the set of ordered pairs {(x, f(x)) : x ∈ X}. From this point of view, the weight function for the disjoint union of two sets is the (disjoint) union of the two weight functions. Nice!

35 In other words we make no distinction between (B × C) × D and B × (C × D), and other similar expressions. Problem 5.1. What is the difference between (B × C) × D and B × (C × D)? Why are they not “equal”? Can you think of an instance where you would truly want to make the distinction between them?

composition of constructions

If we needed to treat every construction separately there would be no point in having a unified point of view. But in fact we can compose constructions in a way which allows us to compose the generating functions. Again we are assuming that the object is built of atoms, which are what provide the “weight” or “size” of the object. The following assumes that we can, in some manner, tell individual atoms apart. For instance, in a sequence of atoms, we may speak of the “first” element, “second” element, etc. We’ve been doing this all along, when we considered 01-strings. All of the 0s are “the same”, but it still makes sense to talk about the first 0 as opposed to the fourth 0. In this context, consider two unlabelled classes B and C. We imagine taking an object from B and replacing each of its atoms with an element of C. We call this process composition of classes and denote it by B ◦ C. As a particular example, imagine that B is the set of all RB-strings, and C is the set of all graphs. We identify R with red and B with blue. Then B ◦ C would be the set of all ordered sequences of graphs, where each graph is either red or blue. The weight of the resulting object is the sum of the weights of the C-objects that have been “substituted” for atoms of the B-object. If we consider two labelled classes B and C, the process is largely similar. But now we must remove the labels on B, and then relabel the objects of C. We will have more to say about this later. This gives the following. Note that despite the similarity (identicality?) of the labelled and unlabelled cases, they are not the same. unlabelled: A = B ◦ C ←→ OGF: A(x) = B(C(x)) labelled: A = B ◦ C ←→ EGF: A(x) = B(C(x)) We have in fact been tacitly assuming that there are no C-objects of size zero. If there were such objects, then we could construct infinitely many A-objects of any given size. This is what we already saw in terms of when formal power series can be composed. For another analogy, if one were to declare the integer 1 to be prime, then every positive integer would have an infinite number of distinct prime factorizations. All this to say that in such compositions, we will always be assuming that c0 = 0. As proof that composition of construction corresponds to composition of generating functions, we offer the following. We simply extract coefficients to see what an is in terms of the bj and ck, and observe that it is what it should be. n n X X unlabelled: an = [x ] A(x) = [x ] B(C(x)) = bj cn1 cn2 ··· cnj j≥0 n1+···+nj =n   n n X X n labelled: an = n![x ] A(x) = n![x ] B(C(x)) = bj cn1 cn2 ··· cnj n1 n2 ··· nj j≥0 n1+···+nj =n Problem 5.2. Where did the n! go in the labelled expression? Explain why in the unlabelled and labelled case the right-hand side counts what the above discussion says it does. Recall that atoms of B-objects are “distinguishable”, and also that c0 = 0 (which means we may consider each ni > 0 right?).

It’s worth making one final observation. If B-type objects are built of atoms that can always be distinguished (eg: sequences), then the generating function for objects of type B ◦ C is just B(C(x)).

36 If the B-type objects are not built in this way, or the atoms are not distinguishable, it doesn’t mean that B ◦ C is not an admissible class, it just means that it’s generating function isn’t (necessarily) B(C(x)).

pointing (or: distinguishing an atom)

Another useful construction is what is sometimes called “pointing” or perhaps a better word would be “rooting”. Given an object, make one particular atom special by “pointing” to it. Again, we are assuming that the object is composed of atoms which are ultimately distinguishable. As a concrete example, image the set of 01 strings, where each string has one particular symbol that has a dot written above it. d unlabelled: A = pointed-B ←→ OGF: A = x B dx d labelled: A = pointed-B ←→ EGF: A = x B dx Example 5.3. Find the OGF for 01-strings with a distinguished element.

Consider the set of all 01-strings with a distinguished element. These are the pointed 01-strings. Note that in particular there are no such objects of length zero (why not?). The number of such strings of length n, as well as the generating function should be obvious from direct consideration (how many 01-strings are there? how many ways are there to distinguish one symbol in the string?). For illustrative purposes we’ll derive it in two other ways. We know that the ordinary generating function for 01-strings is (1 − 2x)−1. Thus the ordinary d −1 2 generating function for pointed 01-strings is x dx (1 − 2x) = 2x/(1 − 2x) . On the other hand we can regard the set of 01-strings with a distinguished element as follows. Let B be the set of all RB-strings with a single R. Clearly there are n such strings of length n. Thus B(x) = x/(1 − x)2 (Exercise 3.2 might help). Let C = {0, 1}. Clearly C(x) = 2x. Then the set of 01-strings with a distinguished element (the “red” one) is B ◦ C. This gives the ordinary generating function for pointed 01-strings as B(C(x)) = 2x/(1 − 2x)2.

sequences of objects

A fundamental construction is that of sequences. Given a class B, we wish to consider sequences of B-type objects. The simplest case of this is perhaps when B is a finite set of distinct atoms (perhaps better called “letters” or “symbols”). For instance if B = {0, 1} then Seq(B) is the set of all 01-strings.

First we consider sequences of a fixed length k, denoted by Seqk. This is simply an ordered k-tuple, which is just like a k-fold cartesian product. So the following is immediate. k unlabelled: A = Seqk(B) ←→ OGF: A = (B(x)) k labelled: A = Seqk(B) ←→ EGF: A = (B(x))

Note that if b0 6= 0 then this operation is still defined, and gives a well-defined generating function. However, we will “usually” only want to use this when b0 = 0. See the exercises for a discussion of this, and a possible example of when having b0 6= 0 might be useful after all. Now consider a sequence with any number of terms, denoted by Seq. This is just the disjoint union over all k ∈ N of Seqk. Is this union really disjoint? Could the same compound object appear as a 0 sequence of length k and a sequence of length k 6= k? No because the structure of a Seqk means that we know that it is of length k, we can pick out the j-th element for each 1 ≤ j ≤ k. This is

37 true whether or not b0 = 0. However, a more fundamental issue arises with Seq(B) with b0 6= 0. It gives an infinite number of sequences of any weight, since given any sequence of weight n > 0, if we prefix that sequence with an empty object, then we have another different object with weight n. So we insist that b0 = 0 for Seq(B). At the end of this discussion, you might have noticed that in fact a Seq(B) is just a composition, so b0 = 0 should not be a surprise. In any case, we have the following. [ 1 unlabelled: A = Seq(B) = Seq (B) ←→ OGF: A(x) = k 1 − B(x) k≥0 [ 1 labelled: A = Seq(B) = Seq (B) ←→ EGF: A(x) = k 1 − B(x) k≥0 Example 5.4. Let A be the 012-strings with no consecutive 0s. Find the OGF.

Such a string starts with some (possibly empty) 12-string, followed by a possibly empty sequence of units, where each unit is a 0 followed by a non-empty 12-string, followed by either a 0 or by nothing. Then we have the following, which instantly gives the generating function.   A = Seq {1, 2} × Seq {0} × {1, 2} × Seq({1, 2}) × {, 0} 1 1 A(x) = × 1 × (1 + x) 1 − 2x 1 − x × 2x × 1−2x 1 + x = 1 − 2x − 2x2 k ∗ Sometimes in the context of strings we write B instead of Seqk(B) and B instead of Seq(B). Also we often write cartesian products by omitting the symbol “×”. Furthermore, for singleton sets we omit the “{}”. Round brackets are used to indicate precedence of operations. So we have the slightly more compact form.  ∗ A = {1, 2}∗ 0 {1, 2}{1, 2}∗ {, 0}

0 Problem 5.5. In considering Seq we seem to have tacitly assumed that Seq0 is in fact x = 1. This looks like a special case. Is it still correct? Are we not suddenly allowing “empty objects” in the sequence? Is it really a special case? Problem 5.6. Explain why (using the compact notion for strings) we have 11∗ = 1∗ \{}. Note that what is being postulated is an equality of (infinite) sets.  ∗ Thus, explain why {1, 2}∗ 0({1, 2}∗ \{}) {, 0} also gives the set of all 012-strings with no repeated 0s. Write down the generating function corresponding to this new description. Is it equal to the one we found above?

summary (so far. . . )

It’s useful to keep track of the constructions we know so far. In this table (and others like it to follow) we won’t always give the weight function explicitly if it is obvious from the context and/or the previous discussion. Also, we could give the coefficient of xn in the constructed class in terms of the various an, bn etc, but we don’t. Think of this as an exercise if you like! It is useful to have a special marker for an “atom”; we use the symbol “•” for that. So {•}, can be thought of as a set containing a single object that consists of exactly one atom, i.e., that has weight one. The set {•} is the basic building-block for many different objects. The generating function for this set is of course x.

38 Theorem 5.7. The following gives some admissible constructions and the corresponding op- eration on generating functions. If the objects are unlabelled then the generating functions are ordinary; if the objects are labelled then the generating functions are exponential.

disjoint union ←→ A ∪˙ B ←→ A(x) + B(x) cartesian product ←→ A × B ←→ A(x) · B(x) composition ←→ A ◦ B ←→ A(B(x)) k sequence of length k ←→ Seqk({•}) ←→ x [ 1 sequence of any length ←→ Seq({•}) = Seq ({•}) ←→ k 1 − x k≥0 k sequence of length k ←→ Seqk(B) = B×B×···×B ←→ (B(x)) [ 1 sequence of any length ←→ Seq(B) = Seq (B) ←→ k 1 − B(x) k≥0

The expressions for Seq(B) and Seqk(B) can be regarded as combining A ◦ B with the results for Seq({•}) and Seqk({•}). Alternatively Seq({•}) and Seqk({•}) are special cases of Seq(B) and Seqk(B). Typically things like Seq(B) will be more useful, but things like Seq({•}) might be easier to prove, at least conceptually. This uses the fact that the objects in a sequence can be distinguished by their position in the sequence alone. More generally in A ◦ B we assume that the atoms of an A-object can be somehow distinguished. Problem 5.8. In Theorem 5.7 the unlabelled/OGF and labelled/EGF cases are “identical”. It’s nice and convenient that they look the same. But explain why they are not really identical. At some point in your explanation you should be asking yourself why Seqk ({•}) for labelled objects doesn’t have an xk/k!. This is a good exercise, as sometimes labelled and unlabelled will not be “identical”.

restricted constructions

We will often have a “basic” form of a construction and several “restricted” forms. The simplest example of this is that Seq allows sequences of any length, while Seqk allows only sequences of length exactly k. There are some further generalizations of this in the exercises. You should be able to deduce them using the techniques above. We will see other things like this throughout the course.

exercises

1. (This question is deliberately open-ended.) Consider two classes of labelled objects A and B. Explain why the number of objects of size n in A×B is less then the number of objects of size n in A ? B. Describe a natural correspondence (which is not one-to-one) between these two sets, which naturally demonstrates that the ?-product is “bigger” than the cartesian product. 2. Describe a construction that is not admissible. One can answer this question by cheating, i.e., Let A be the set of all graphs, let B be the set of connected graphs and let C be the set of prime irreducible polynomials over a fixed field. Then there exists no admissible construction that gives A = Ψ[C]. . . because the admissible construction is of the form A = Ψ[B]. It’s not that graphs cannot be constructed, it’s that irreducible polynomials are the wrong building blocks. Can you find a “really not admissible” construction?

39 3. The construction Seqk,`(B) denotes the set of all objects composed of a sequence of either k or ` B-objects. Give the generating function for Seqk,`(B). If you’re unsure, then start with Seqk,`({•}).

4. Let D be a set of non-negative integers. Give the generating function for SeqD({•}), se- quences whose length is an element of D. Deduce from this SeqD(B) in terms of the gener- ating function for B(x).

5. Let 0 ≤ r < m. Give the generating function for Seqr/m({•}), sequences whose length is r (mod m). Deduce from this the generating function for Seqr/m(B) in terms of B(x).

6. Let B = {, 0, 1}, and Ak = Seqk(B). So B consists of an empty symbol (ie: one that has weight 0), a “0” symbol, and a “1” symbol (that both have weight 1). One can imagine  as being like a space (but it might be more helpful to draw it as  just so we know it’s there). a) List all of the 01-sequences of weight 2 (recall that the weight of a sequence is the sum of the weight of the objects in that sequence, not the length of the sequence). b) List all of the 01-sequences of weight 2. c) Give the OGFs B(x) and A3(x). d) Explain the difference between Ak and Seqk ({0, 1}). Both are well-defined and admis- sible, but they are not the same. How would you obtain one from the other? Compare with Lemma 1.5. 7. a) Consider sentences of length N written with the letters a, b, c, ··· , z and blank. Length means the number of non-blank symbols. Write down the ordinary generating function for such sentences, where the weight of a sentence is the number of non-blank symbols in the sentence. b) Explain why the previous part fails if we replace “of length N” with “of any length”.

40 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 6. Catalan numbers

rooted unlabelled plane trees

A rooted unlabelled plane tree is a tree with a distinguished vertex, and an ordering on the children of each vertex. Every vertex is assigned a level: a non-negative integer. There is a unique vertex at level 0 (the root). Every vertex at level j ≥ 0 has an ordered (possibly empty) list of neighbours at level j + 1 (children). Every vertex at level j > 1 has exactly one neighbour at level j − 1 (parent). These are all the neighbours. Note that the children are ordered but not labelled. The ordering is on the subtrees, as opposed to being an ordering induced by labels on the vertices. So the following three isomorphic trees are all non-isomorphic as rooted unlabelled plane trees:

=6∼ =6∼

Each rooted unlabelled plane tree may be decomposed as a root vertex, a set of droot edges (where droot is the degree of the root vertex), and an ordered list of droot rooted unlabelled plane trees. It is permissible for droot to be zero, or for any of the “smaller” rooted unlabelled plane trees to be empty. We get the following partition for T . T =  ∪˙ |T ∪˙ |T |T ∪˙ |T |T |T · · · The “|” are supposed to represent edges incident with the root. Each successive term on the right represents rooted unlabelled plane trees with droot equal to 0, 1, 2, 3,. . . , thus giving as a (disjoint) union the set of all rooted unlabelled plane trees. This gives a functional equation for the ordinary generating function, where x marks the edges of the rooted unlabelled plane tree. 1 T (x) = 1 + xT (x) + (xT (x))2 + (xT (x))3 + (xT (x))4 + ··· = 1 − xT (x) Therefore the OGF satisfies T (x)2x − T (x) + 1 = 0, and thus we can find T (x). √ 1 − 1 − 4x X 1 2n T (x) = = xn 2x n + 1 n n≥0 Recall that the n represents the size of the tree, which in this case is the number of edges (or if you prefer, the number of non-root vertices). We took the negative sign in the quadratic formula because the positive sign would give a formal power series with “infinite constant term”. This can be seen 0 from t0 = [x ] T (x) = T (0). The extraction of the coefficients is left as an exercise. The coefficients are called the Catalan numbers. √ 1 − 1 − 4x 1 2n Problem 6.1. Show that [xn] = . Do this by finding the Taylor series for √ 2x n + 1 n 1 − 1 − 4x (and then dividing through by 2x). You should find a “simple” formula for the n-th derivative, evaluated at x = 0. It might help to notice that (2n)! = (1)(3)(5) ··· (2n − 1)n!2n.

It’s worth noticing that the symbolic method provides a more direct route to the generating func- tion. We observe that a rooted unlabelled plane tree consists of a sequence of any number of pairs (edge,subtree). Of course there is a root vertex but we don’t “count” that.1 This gives the following

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected]. 1 There is always a root vertex. We don’t “count” the root because it does not contribute to the size of the tree.

41 expression for T , where again “|” represents an edge joining a subtree to the root. The equation for the ordinary generating function T follows immediately. T = Seq ({|} × T ) 1 T (x) = 1 − xT (x)

rooted unlabelled plane trees redux

P n What happens if we define the size of a tree as the number of vertices? Let S(x) = n≥0 snx where sn is the number of rooted unlabelled plane trees on n vertices. Notice that s0 = 0; why? A rooted unlabelled plane tree consists of a vertex (the root), and a sequence of rooted unlabelled plane subtrees. The subroot of the subtree indicates where it attaches to the main root. This gives the following, from which we get the ordinary generating function S(x). S = {•} × Seq(S) 1 S(x) = x 1 − S(x) We solve this to find that S(x)2 − S(x) + x. This gives the following expression for S(x). √ 1 − 1 − 4x S(x) = 2 We notice that S(x) = xT (x). This gives sn. 1 2(n − 1) [xn] S(x) = [xn−1] T (x) = n n − 1

This tells us that sn = tn−1. If we somehow manage to prove that the number of edges in a tree determines the number of vertices and vice versa, all without actually proving what the relations are, then we have proved that a tree on n vertices has n − 1 edges. Needless to say, there are much easier ways to see this. Problem 6.2. Explain the details in the above (ridiculous) proof that a tree on n vertices has n − 1 edges.

What is the point of the previous exercise? Given two unknown sequences, if we know something about the relationship of their generating functions we can deduce relationships between the terms of the two sequences. We’ll see other examples of this later (which are not so ridiculous). Perhaps marginally more interesting is the following. Problem 6.3. Prove (in any way you like) that a tree on n vertices has n − 1 edges. Using this, deduce that S(x) = xT (x). This is perhaps a little less ridiculous than Problem 6.2.

bracketings

A bracketing is a string of “(” and “)” such that the number of opens is equal to the number of closes, and reading from left to right, there are always at least as many accumulated opens as closes.2 The size of a bracketing is the number of opens (or equivalently, the number of closes or the number of pairs). We can decompose bracketings as follows. Either the bracketing is empty, or else it begins with a “(”. In the latter case, this opening “(” will have a corresponding “)” somewhere: the smallest prefix

2 More prosaically, if one uses curly braces instead it would compile cleanly as LATEX code.

42 that is itself a legal bracketing. Between these two, we may have any legal bracketing, and after the close, we may also have any legal bracketing. B = {} ∪˙ {(} × B × {)} × B B(x) = 1 + xB(x)2 This is the same equation as before! We have proved that the number of rooted unlabelled plane trees with n edges is the same as the number of legal bracketings with n “(”. This is because both of these quantities are equal to the coefficient of xn in the same OGF.

walks above y = x

We consider the set W of walks from (0, 0) to (n, n) that are composed of two types of steps, up and right, such that the path never passes under the line y = x (though it may touch it). Such a path starts and ends on the line y = x. Consider the first point at which the path returns to the line y = x after (0, 0). Call this point (t, t). Then clearly the first 2t steps consist of t up-steps, t down-steps, the first step is an up-step, the last of these 2t steps is a right-step. This means that the first 2t steps consist of an up-step, followed by a walk from (0, 1) to (t − 1, t) that never passes below the line y = x + 1, followed by a right-step. For every walk there is such a point t. Furthermore the middle part of the walk from (0, 1) to (t − 1, t) can be thought of as a shifted version of a walk from (0, 0) to (t − 1, t − 1) that never passes below y = x. The shifting is clearly a bijection, so this gives a decomposition for every such walk of positive length. There is one exception: the path of length zero does not fit the above decomposition. So we must include the empty walk, and we get the following decomposition. W = {} ∪˙ {↑} × W × {→} × W W (x) = 1 + xW (x)2 Again, we already know the answer.

others

Here are some other things that the Catalan numbers enumerate. See Richard Stanley’s book for many more examples.3 • The number of ways of joining 2n points on a line pairwise with arcs that are drawn above the line. • The number of involutions on [2n] with no fixed points, and where j between i and σ(i) implies σ(j) between i and σ(i). • The number of triangulations of a convex (n + 2)-sided polygon. • ...

questions

1. Given that A(x) is a formal power series that satisfies xA(x)2 − A(x) + 1 = 0, show that A exists and is uniquely determined. (hint: start by extracting coefficients)

3 Richard P. Stanley, “Catalan numbers”. Cambridge Univerisyt Press, New York (2015).

43 2. Find a bijection between rooted unlabelled plane trees with n edges and legal bracketings with n “(”. (hint: imagine the tree is in fact a hedge maze, as seen from above. Start at the root and walk around the hedge, always staying adjacent to the hedge.) 3. Find a bijection between legal bracketings with n “(” and walks above the line y = x. 4. Since rooted unlabelled plane trees and bracketings have the same OGF, we should be able to give the same description for each. Find a description for walks analogous to A = Seq ({|} × A). 5. Since rooted unlabelled plane trees and bracketings have the same OGF, we should be able to give the same description for each. Find a description for rooted unlabelled plane trees analogous to B = {} ∪˙ {(} × B × {)} × B. 6. We want to know the ordinary generating function for rooted unlabelled plane trees where every vertex has an even number of children. Find a relatively simple functional equation for this generating function. (hint: Exercise 5.4, or even Exercise 5.5) You don’t need to give the generating function explicitly. You should realize that you could write it down explicitly: explain how. 7. Give the ordinary generating function for rooted unlabelled plane trees where every vertex has an odd number of children. Do this first by writing down a specification of this class using a partition or Seq-type constructions, and then deduce a functional equation for the generating function. Solve this equation. After noticing that the answer is ridiculously simple, explain how you could have seen this right from the beginning. Even if you did see it right from the beginning, follow the steps to see how they do give the correct answer. 8. Let D be a set of non-negative integers. Let T be the set of all rooted unlabelled plane trees where the number of child vertices of every node is an element of D. Give a functional equation for the ordinary generating function T (x). You don’t need to give the generating function explicitly. Based only on your functional equation, explain why your generating function always has a unique solution, and why this solution is radically different depending on whether 0 ∈ D. For which sets D is it possible to write down T explicitly?

44 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 7. multi-section formulæ

n This is a short technical section, showing how to extract from a formal power series all terms anx where n is even, odd, or in general where n ≡ t (mod r). Here’s a quick example.

X 2k X 2k+1 Example 7.1. Given A(x), find an expression for a2kx and a2k+1x . k k

! ! X 1 X 1 X X 1   a x2k = a (1 + (−1)n) xn = a xn + a (−x)n = A(x) + A(−x) 2k 2 n 2 n n 2 k n n n ! ! X 1 X 1 X X 1   a x2k+1 = a (1 − (−1)n) xn = a xn − a (−x)n = A(x) − A(−x) 2k+1 2 n 2 n n 2 k n n n

More generally, we have the following

( r r | n Lemma 7.2. Let ω = e2πi/r. Then 1 + ωn + ω2n + ··· + ω(r−1)n = 0 otherwise

The proof is left as an exercise, based on the geometry of the s-th roots of unity in the complex plane.1 Lemma 7.2 gives the following useful corollary, which picks out the terms of index t mod r.

2πi/r P n Corollary 7.3. Let ω = e and let A(x) = n anx . Then the following hold. r−1 X 1   1 X a xrk+t = A(x) + ω−tA(ωx) + ··· + ω−(r−1)tA(ωr−1x) = ω−jtA(ωjx) rk+t r r k j=0

The example at the beginning of this section corresponds to r = 2 and t = 0. Problem 7.4. Using Corollary 7.3, give each of the following in terms of the stated (well-known) formal power series. X x2n • in terms of ex (2n)! n≥0 X x2n+1 • in terms of ex (2n + 1)! n≥0 X (−1)5n+3x5n+2 • in terms of log(1 + x) 5n + 2 n≥1 X  n  Example 7.5. Evaluate . 3j j

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected]. 1 This appears in all sorts of places, for instance the following. A. M. Turing, “The chemical basis of morphogenesis”, Philosophical Transactions of the Royal Society of London B. 237 (641): 37–72 (1952). doi:10.1098/rstb.1952.0012

45 Certainly this exists since it is a finite sum of integers, but let’s pretend we didn’t notice that. n  n The sequence 3j for j corresponds to every third k for k , so we start with the generating P n k n 2πi/3 function k≥0 k x = (1 + x) and ω = e .

          ! X n X n 3j 1 X n k X n k X n 2 k = x = x + (ωx) + (ω x) 3j 3j 3 k k k j j k k k x=1 x=1

1 n n 2 n = (1 + x) + (1 + ωx) + (1 + ω x) 3 x=1 1   n  n = 2n + 1 + e2πi/3 + 1 + e4πi/3 3 1   n  n  = 2n + e−πi/3 + eπi/3 enπi/3 + e−2πi/3 + e2πi/3 e2nπi/3 3 1  nπ  2nπ  = 2n + cos + (−1)n cos 3 3 3 Now let’s justify. In the first case, F (x) = (1 + x)n is an analytic function over the whole 1 2  complex plane (why?). Therefore the sum 3 F (x) + F (ωx) + F (ω x) is also analytic everywhere. Therefore the evaluation at x = 1 really did give us the value of the series.

X 3j Example 7.6. Evaluate (−1)3j. k j

Certainly this does not exists since is is an infinite sum of integers unbounded in absolute value, P n n but let’s pretend we didn’t notice that. This time we want every third coefficient in n≥0 k y = yk/(1 − y)k+1. So we set ω = e2πi/3.

          ! X 3j j X 3j 3j 1 X n n X n n X n 2 n (−1) = y = y + (ωy) + (ω y) k k 3 k k k j j n n n y=−1 y=−1  k k 2 k  1 y (ωy) (ω y) = k+1 + k+1 + 2 k+1 3 (1 − y) (1 − (ωy)) (1 − (ω y)) y=−1 k k ! 1 (−1)k −e2πi/3 −e4πi/3 = + + 3 2k+1 e−πi/3 + eπi/3k+1 e(k+1)πi/3 e−2πi/3 + e2πi/3k+1 e(k+1)2πi/3 1 (−1)k (k − 1)π  2(k − 1)π  = + (−1)k cos − cos 3 2k+1 3 3 This time G(y) = yk/(1 − y)k+1 is analytic on |y| < 1; hence the sum is also analytic on |y| < 1. So we can not evaluate this series for y = −1. In general, it is possible for an analytic power series to converge on the boundary, but this would have to be shown separately (it is not the case here: exercise!). The illegal move was exactly the substitution of y = −1 into the closed form function at a point where the closed form and power series do not agree. Everything after that is garbage.

questions

n 1. Prove Lemma 7.2. As a hint, think about e2πi/r . 2. Prove Corollary 7.3.

46 P n  P n  3. Evaluate j 3j+1 . Explain how to evaluate j rj+t for r, t ∈ N. (using generating functions of course!) P 3j 3j 4. Show that j k (−1) does not converge. n n 5. What is the domain of (1 + x) ? For what values of x ∈ R does (1 + x) have an analytic P n  rj power series in x? For what values of x ∈ R and r ∈ N does j rj x converge? For what P n  j values of x ∈ R and r ∈ N does j rj x converge? k k+1 k k+1 6. What is the domain of y /(1 − y) ? For what values of y ∈ R does y /(1 − y) have an P rj tj analytic power series in y? For what values of y ∈ R and r ∈ N does j k y converge? P rj j For what values of y ∈ R and r ∈ N does j k y converge? 7. Consider the power series P xn, which converges for |x| < 1. Let |c| < 1. n X 2n 1 a) Explain why c = ; do this by thinking of the RHS as a generating 1 − x n x=c2 function.   X 2n 1 1 1 b) Explain why c = + ; do this by thinking of the RHS 2 1 − x 1 − x n x=c x=−c as a generating function.   1 1 1 1 c) Show directly that = + ; do this using arith- 1 − x x=c2 2 1 − x x=c 1 − x x=−c metic (in particular forget that you just showed they are already equal). P rn d)  Generalize this to the case of n c . Do these two approaches give the same result when applied to some other power series? Explain.

47 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 8. labelled objects by components

permutations and cycles

The Stirling number of the first kind is defined to be the number of permutations of [n] having n 1 exactly k cycles. It is denoted by k . We start with an analogue to Lemma 3.5.

n n − 1 n − 1 Lemma 8.1. For (n, k) 6= (0, 0) we have = + (n − 1) k k − 1 k

Proof. We prove this in the case where n ≥ k > 0, leaving the other cases as an exercise. If (n) is one cycle of the permutation, then removing this cycle gives a permutation of [n − 1] with n−1 k − 1 cycles. So there are k−1 permutations where (n) is one of the cycles. If (n) is not a cycle of the permutation, then n appears in a cycle with something else. So excising n gives a permutation of [n − 1] with k cycles. This defines a surjective (n − 1)-to-1 mapping from the permutations of [n] with k cycles where (n) is not a cycle to the permutations of [n − 1] with n−1 k cycles. So there are (n − 1) k permutations where (n) is not one of the cycles.

Problem 8.2. Verify that the mapping in the above proof is surjective, and is (exactly) (n − 1)-to-1. Problem 8.3. Compare the above with what we did for Stirling numbers of the second kind. In particular, how are the Stirling numbers of the first kind defined for all integers n and k?

One could find an OGF directly from the recursion, and then extract coefficients to obtain a formula for the Stirling numbers of the first kind, analogously to what we did with the second kind. We’ll do something different here, as a way of introducing an important technique (but see Exercise 8.1). P n xn Let Bk(x) = n k n! . First notice that there are exactly (n − 1)! permutations of [n] that have exactly one cycle. This means that X xn X xn 1 B (x) = (n − 1)! = = log 1 n! n 1 − x n n Problem 8.4. Explain why the number of permutations of [n] with exactly one cycle is (n − 1)!. Of course you already did this instinctively, right?

Now let’s decompose a permutation with two cycles. It’s not correct to say that it consists of two permutations, because these are labelled: by “permutation” we really mean “permutation on the set [n], for some n. Otherwise there are an infinite number of permutations on two objects! So we need to describe two cycles that together make up a permutation of [n]. First an example. Let’s start with the permutation (1, 5, 4)(2, 6, 3, 7). We can describe this by first stating the set of labels that correspond to the first cycle, and then giving the cycles in “reduced” form: relabelling them but keeping the relative order. (1, 5, 4)(2, 6, 3, 7) 7→ {1, 4, 5} ; (1, 3, 2) ; (1, 3, 2, 4)

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected]. 1 n The notation, like the notation k for Stirling numbers of the second kind, is due to Donald Knuth, and evokes the similarities both of these expressions have with the binomial coefficients.

48 We can recover the original as follows: It has a 3-cycle and a 4-cycle, so it is a permutation of [7]. We know that there is one cycle on the set {1, 4, 5}, and that it maps from smallest to largest to middle, so it must be (1, 5, 4). The other cycle must be on the remaining elements {2, 3, 6, 7}, and it maps smallest to second largest to second smallest to largest, so it must be (2, 6, 3, 7). So permutations with exactly two cycles consist of a choice of t, a choice of a t-subset, a permutation on this t-subset with one cycle, and a permutation on the remaining (n − t)-subset with one cycle. There is one detail: the cycles in a permutation are not ordered, so we must divide by 2. n 1 X ntn − t = 2 2 t 1 1 t 1 2 This is exactly the multiplication rule for EGFs. It says that B2(x) = 2 (B1(x)) . More generally, we would have:         n 1 X n n1 n2 nk = ··· k k! n1 n2 ··· nk 1 1 1 n1+···+nk=n 1 k and therefore Bk(x) = k! (B1(x)) . The binomial (or in general multinomial) coefficient in the multiplication rule for EGFs corresponds to a relabelling. This is why EGFs generally correspond to labelled objects. Actually, let’s rephrase that: this is why EGFs are so useful, because their multiplication naturally corresponds to relabelled disjoint unions of labelled objects.

So far we have and expressions for Bk(x). k X nxn 1 1  1  B (x) = = (B (x))k = log k k n! k! 1 k! 1 − x n≥0 We define two further generating functions. X X nxn X X nxn F (x, y) = yk G(x) = k n! k n! n k n k Note that F is an EGF from the point of view of x and an OGF from the point of view of y. This is a reflection of the fact that a permutation acts on points which are labelled (x marks points) and contains cycles which are unlabelled (y marks cycles). Also we have G(x) = F (x, 1). There is no question of convergence: the coefficient of xn in F (x, y) is a polynomial in y (why?) so we can sum all of the coefficients in that polynomial by evaluating it at y = 1. These two generating functions can be expressed in terms of B1(x). k X X 1  1   1  F (x, y) = B (x)yk = log yk = exp y log = (1 − x)−y k k! 1 − x 1 − x k k k X X 1  1   1  1 G(x) = B (x) = log = exp log = k k! 1 − x 1 − x 1 − x k k Notice that the expression for G(x) seems to collapse to something quite simple. This will not always be the case. On the other hand, we see that in some sense the “work” is all in determining B1(x); after that the procedure is automated. We can now obtain: n 1 = [xnyk/n!] F (x, y) = n![xnyk] k (1 − x)y n + y − 1 = n![yk] y − 1 = [yk](n + y − 1)(n + y − 2) ··· (y)

49 X X nxn Problem 8.5. Give a direct explanation for why G(x) = has such a simple form. k n! n k P n Start by looking at k k .

set partitions redux

n We’ve already seen generating functions for k . Now let’s look at them from this perspective. We first write down the EGF for partitions into one class. This is hopefully easy. Think of these one-class partitions as one “component” of the real k-class partition. Then we compute the EGF n for partitions into k classes. Then we get the bivariate generating function for k , or the EGF for partitions into any number of classes (Bell numbers). X xn B (x) = 1 = ex − 1 1 n! n≥1 1 1 B (x) = (B (x))k = (ex − 1)k k k! 1 k!   n X X n x X x F (x, y) = yk = B (x)yk = ey(e −1) k n! k n k k   n X X n x X x G(x) = = B (x) = ee −1 k n! k n k k The only “work” we had to do is to figure out on our own how many ways there are to partition [n] into 1 class, and find some nice expression for this EGF.

exponential formula for labelled objects : Set

n Consider what we did for k . First we have that each element of B1 of size n is a cycle (permutation) of the set {1, 2, ··· , n}. Denote by G the set of all permutations. Then every element of G is a set of cycles, where the elements of the union of the cycles are relabelled consistently with the labellings on each cycle. In other words we have the following. 1 labelled: B = Cyc({•}) ←→ EGF: B (x) = log 1 1 1 − x   1 labelled: G = Set Cyc({•}) ←→ EGF: G(x) = exp (B (x)) = 1 1 − x There was nothing particularly special about cycles. Denote by C some set of components, whose EGF is C(x). Then what we really did can be restated as follows. labelled: C ←→ EGF: C(x) 1 labelled: G = Set (C) ←→ EGF: G (x) = (C(x))k k k k k! [ X labelled: G = Setk(C) = Set(C) ←→ EGF: G(x) = Gk(x) = exp(C(x)) k k For comparison, here is the analogous process for sequences. labelled: C ←→ EGF: C(x) k labelled: Gk = Seqk(C) ←→ EGF: Gk(x) = (C(x)) [ X 1 labelled: G = Seq (C) = Seq(C) ←→ EGF: G(x) = G (x) = k k 1 − C(x) k k

50 Note that we are not really requiring that C be anything in particular. It is simply a set of “com- ponents”. Then the generating function of objects is the exponential of the generating function for components. What does it mean for C to be a set of components? It means that the construction Set(C) is meaningful. That is, it means if we take a set of atoms, and replace each atom with an element of C in all possible ways, then this process is injective. It means that for any given object, we can uniquely determine the components in that object. Alternatively we can think of the class of (labelled) objects Set({•}).

labelled: {•} ←→ EGF: x 1 labelled: G = Set ({•}) ←→ EGF: G (x) = (x)k k k k k! X labelled: G = Set({•}) = ∪k Setk({•}) ←→ EGF: G(x) = Gk(x) = exp(x) k

Extracting coefficients we discover (unsurprisingly) the number of labelled k-sets is k![xk] ex = 1. Put another way it says that there is exactly one way to make a k-component object out of k components. The fact that Set({•}) has ex as EGF explains why we call this the “exponential formula”. Typically we would use this by considering a collection C of components, and obtaining the EGF of Set(C) as exp(C(x)). One caveat: the number of components of size zero must always be zero. If we had a component of size zero, then we could take as many copies of it as we liked in a decomposition of any object, which would mean there are an infinite number of objects of size n for every n. One happy side-effect of this is that exp(C(x)) is automatically well-defined (why?). Of course we can still have an object of size zero. In fact there is necessarily a unique such object, corresponding to an empty set (i.e., no components). We can also make bivariate versions of these, where y counts the number of components. We need to attach some kind of marker to each component, and count each marker with a factor of y. So we consider sets of not just components, but sets of components with a special marker attached to them. Just like we used a dot to indicate an “atom” in an object, we will use the symbol “” to indicate a “component”. So {} is a set containing a single object, which marks one component. Its generating function is y. Note that this isn’t the component, it’s just a marker for the component. For instance, instead of specifying a graph as a vertex set and an incidence relationship (the edges), we specify it by giving the vertices, the incidence relationship, and a completely redundant extra component-marker symbol for each component. So if we have any class of labelled “components” C whose (exponential) generating function is C(x) then we have the following.

  labelled: F = Set {} × C ←→ EGF: F (x, y) = exp(yC(x))

Note that the separate components in a G-object are always distinguishable, because of the labels. From the point of view of constructing, we allow ourselves to take a set of C-objects, even though some of them might not be distinct, because we “know” that once we relabel the overall set of atoms in the set of C-objects then they will be distinct. We can imagine that the Set(C) construction first makes an infinite number of copies of every C-object, each with a unique tag of weight zero whose only purpose is to distinguish individual C-objects, then picks a set (a genuine and proper set) of tagged C-objects, then relabels the atoms in the set of tagged C-objects, then removes the tags since they are now redundant. At the risk of some redundancy, we restate our main result as a theorem.

51 P n Theorem 8.6. If the generating function for labelled components is C(x) = n cnx /n! then the generating function for labelled objects formed of components is given by the following. F = Set({} × C) ←→ F (x, y) = exp(yC(x)) G = Set(C) ←→ G(x) = exp(C(x))

It’s best to think of this as a process rather than as a final result to be memorized. Although it is an important result as given, we will often want to use it with various modifications.

variations and combinations

As with all such admissible constructions, the proof is as important as the result. We can apply the same idea to slightly different situations to get quite different-looking results. One such variation is to consider objects that are composed of components, but the components are themselves ordered, independently of the labelling. If C is the set of components, then Seq(C) is exactly the set of all objects whose components are in C and whose components are in some fixed order. Of course we already know how the generating function for Seq(C) and C(x) relate, but let’s look at this again for comparison with Set. 1 X Seq(C) ←→ = (C(x))k 1 − C(x) k≥0 X 1 Set(C) ←→ exp(C(x)) = (C(x))k k! k≥0

The difference between Set(C) and Seq(C) is the factor of k!. Indeed, since every component is necessarily different (because of the labelling) then every one of the k! orderings of k components is genuinely different. This would not be true for unlabelled objects, and is why we will need a different approach when we consider Set for unlabelled objects shortly. Another variation is to restrict the number of components. If we look at the formula for G(x) when G = Set(C) or G = Seq(C) then we see this amounts to restricting the values of k over which we sum. Another variation is to restrict the the allowed components. This corresponds to restricting the objects of C, and hence the generating function of C itself. This is (hopefully!) easy since the objects of C are typically simpler than the objects of G. Furthermore, one can ask what interesting combinations we can make with these constructions. We have already seen one combination. A permutation of {1, 2, ··· , n} can be thought of as an arrangement of the set {1, 2, ··· , n}. Here is an example with a permutation of n = 6 elements, shown as an arrangement, a permutation (function), and a union of (disjoint) cycles.

3, 5, 2, 6, 1, 4 ←→ σ(1) = 3 , σ(2) = 5 , σ(3) = 2 , σ(4) = 6 , σ(5) = 1 , σ(6) = 4 ←→ σ = 1 3 2 5  4 6 

This bijection amounts to a combinatorial proof of the following.

Proposition 8.7. For labelled objects we have Set ◦ Cyc =∼ Seq

We can also prove this using generating functions (which is essentially what we did when we discovered n that G(x) for the numbers k collapsed to a very simple form). We just compare the generating

52 functions for Set(Cyc({•})) and Seq({•}).    1  G = Set Cyc({•}) ←→ G(x) = exp log 1 − x 1 F (x) = ←→ F = Seq({•}) 1 − x

What do we get from Set ◦ Set? The answer is nothing useful, since if our objects are sets of possibly empty sets, then we have an infinite number of objects of any given size. But Set ◦ Set≥1 is a valid construction, where Set≥1 is the constructor for sets of size at least 1 (i.e., non-empty sets). x Set≥1({•}) = Set({•}) \{} ←→ e − 1   x B = Set Set≥1({•}) ←→ B(x) = exp(e − 1)

  x A = Set {} × Set≥1({•}) ←→ A(x) = exp (y(e − 1))

We’ve rediscovered the generating function for the Bell numbers. But this is not a surprise, since the Bell number bn is the number of partitions of an n-set into non-empty parts, which is exactly what n k a set of non-empty sets of total weight n is. Also an,k = n![x y ] A(x) gives a more refined count of bell numbers according to the number of parts. This is better known by another name (such as?), but it serves as an example of how to modify a generating function to count objects by size and number of components.

As another example, we consider Seq ◦ Set≥1.   1 1 S = Seq Set ({•}) ←→ S(x) = = ≥1 1 − (ex − 1) 2 − ex   1 R = Seq { } × Set ({•}) ←→ R(x, y) =  ≥1 1 − y(ex − 1) This looks quite different from the generating function for Bell numbers, but if we recall the above comparison of Set and Seq then we expect the generating functions to “look quite different” even n though they count something similar. In fact sn = n![x ] S(x) counts the number of ordered partitions of an n-set into non-empty parts. This is the number of surjections from an n-set onto a set of some n k size. For a more refined enumeration rn,k = n![x y ] R(x) counts the number of surjections from an P n-set to a k-set, and of course sn = k rn,k.

permutations with restricted cycles

We consider some restricted versions of enumerating permutations, by restricting the type and/or number of them. First let’s enumerate permutations with no small cycles, that is all cycles must have length at least r, for some fixed r. We’ll refer to this construction as Cyc≥r. X xn C = Cyc ({•}) ←→ C(x) = (n − 1)! ≥r n! n≥r r−1 r−1 X xn X xn 1 X xn = − = log − n n 1 − x n n≥1 n=1 n=1

The set of such permutations is Set(Cyc≥r({•})), so we just apply Set to Cyc≥r. The bivariate function F (x, y) results from a straightforward modification of the specification for G(x); also, G(x) =

53 F (x, 1). r−1 !   1 X xn G = Set Cyc ({•}) ←→ G(x) = exp log − ≥r 1 − x n n=1 r−1 ! 1 X xn = exp − 1 − x n n=1 r−1 !!   1 X xn F = Set { } × Cyc ({•}) ←→ F (x, y) = exp y log −  ≥r 1 − x n n=1 r−1 ! 1 X xn = exp −y (1 − x)y n n=1 Problem 8.8. What do we get in the case r = 2, permutations all of whose cycles are of length at least r = 2? What about r = 1? What should we get?

Next, we enumerate permutations containing only even cycles. Necessarily these will act on n = 2m points. We use Cyceven for this construction. So we want Set(Cyceven({•})). We get the following generating function. X xn X (2m − 1)! Cyc ({•}) ←→ (n − 1)! = x2m even n! (2m)! n even m 2m 1 X x 1 1 1 = = log = log √ 2 m 2 1 − x2 2 m 1 − x We could also have used the multi-section technique to pick out the odd terms. In any case we just need to apply Set.   1 1  1 Set Cyceven({•}) ←→ G(x) = exp log = √ 2 1 − x2 1 − x2    1 1  1 Set { } × Cyc ({•}) ←→ G(x) = exp y log =  even 2 1 − x2 (1 − x2)y/2

Now let’s enumerate permutations containing only odd cycles. For reasons that will become apparent shortly, we will tweak this question by insisting that the number of cycles is even. Since we have an even number of odd cycles, necessarily we will have n even.

As usual we start out by looking at the case where we only have one cycle. We denote by Cycodd the construction of cycles of odd length. X xn X (2m)! Cyc ({•}) ←→ (n − 1)! = x2m odd n! (2m + 1)! n odd m X x2m+1 = 2m + 1 m X xn X x2m = − n 2m n m 1 1 r1 + x = log − log √ = log 1 − x 1 − x2 1 − x We could have used the multi-section technique to pick out the odd terms as well.

This time we don’t want Set, we want Seteven, the construction of sets of even size. Here is Seteven and Seteven applied to Cycodd (of course the latter is just the composition). We use the simplest

54 case of the multi-section technique to pick out the even powers. X 1 1 Set ({•}) ←→ x2j = ex + e−x = cosh(x) even (2j)! 2 j r !2j X 1 1 + x Set (Cyc ({•})) ←→ log even odd (2j)! 1 − x j ! !! ! 1 r1 + x r1 + x r1 + x = exp log + exp − log = cosh log 2 1 − x 1 − x 1 − x ! 1 r1 + x r1 − x = + 2 1 − x 1 + x 1 = √ 1 − x2 This is the same EGF as before! Specifically, if n is even, then the number of permutations of n that have all even cycles is the same as the number of permutations of n that have all odd cycles. In terms of constructions.

∼ Proposition 8.9. Set ◦ Cyceven = Seteven ◦ Cycodd

The generating function proof we saw above amounts to the following. 1 Set ◦ Cyceven ←→ √ ←→ Seteven ◦ Cycodd 1 − x2 Of course we needed to actually show that the given constructions both have the same generating function, but the logic of the proof is quite simple. We can find the number of such permutations (all even cycles, or an even number of odd cycles) by extracting coefficients. As usual, it helps to be clever, which means to start with a series we already √ 1 2 know that “looks similar”. If we happen to know the series for 1−4y , we could substitute y = x /4. If we happen to know the Binomial Theorem, we can write down the answer as       2n 1 X −1/2 n X 2n 1 X 2n (2n)! x √ = −x2 = x2n = 2 n n 4n n 4n (2n)! 1 − x n n n In dealing with binomial coefficients, it is sometimes handy to know that (1)(3) ··· (2n − 1) · 2nn! = (2n)!, a fact that you no doubt discovered in Problem 6.1. Note that we used the binomial theorem for a non-integer value. In fact, we just used the Taylor series for (1 − x2)−1/2, which one can determine directly, and show that it gives exactly what the binomial theorem says it should.

2 −1/2 √ 1 Problem 8.10. Derive the Taylor series for (1 − x ) . Derive the Taylor series for 1−4y . Show that by using the latter function as a starting point, we get the same series expansion for (1−x2)−1/2.

Now we notice that for an arbitrary EGF we have the following. m X x am [xm] a = m m! m! m≥0 Since m! is the number of all permutations on an m-set, this tells us that the following gives the proportion of all permutations on a 2n-set that have only even cycles (alternatively, the proportion that have only odd cycles). 2n 1 [x2n] (1 − x2)−1/2 = n 4n

55 This is the probability that a fair coin tossed 2n times will give exactly n heads and n tails (in any order). One intuitively (and correctly!) senses that this number is small (see exercises), and so most permutations on a 2n-set contain both even and odd cycles.

labelled graphs

As an example we first enumerate labelled 2-regular graphs. Components here are cycles, but cycles as in graphs, not cycles as in permutations. The two are related, but there are differences. In graphs (by this we mean simple graphs) there are no cycles of length one or two. We already know how to deal with this. Also, a cycle graph is the same if we consider it’s mirror-image in the plane. In other words, reversing the order of the labels around the cycle graph does not change the cycle graph. Let’s refer to the set of labelled cycle graphs (including 1-cycles and 2-cycles) as GCyc. Then our components are GCyc≥3, and the set of labelled 2-regular graphs is Set (GCyc≥3({•})). X (n − 1)! xn 1  1 x2  GCyc ({•}) ←→ = log − (x + ) ≥3 2 n! 2 1 − x 2 n≥3 1  1 x2  1 x x2  Set (GCyc≥3({•})) ←→ exp log − (x + ) = √ exp + 2 1 − x 2 1 − x 2 4

Note that in this example it was easy to enumerate the components directly but less obvious (without generating functions) to enumerate the objects. Problem 8.11. Give the EGF for 2-regular graphs all of whose cycles are of length at least r, for some r ≥ 3.

Now we get (finally, one might say!) to one of the most natural questions of enumerating objects by components: graphs, where component means exactly connected component. We let C be the set of labelled connected graphs. Note that for us, connected graphs must have at least one vertex: they would be better described as connected components of arbitrary graphs. We let G be the set of labelled graphs. Then G = Set(C). This time though, it’s G(x) that is easy to write down, while C(x) is a bit of a mystery. The number of labelled graphs on n vertices is precisely (n) gn = 2 2 , so we get the following. n X n x G(x) = 2(2) = exp(C(x)) n! n d We can find C(x) by doing x dx log to the equation G(x) = exp(C(x)). We temporarily forget that we know what gn is, in order to get a more generic result. ! X xn X xn g = exp c n n! n n! n n ! ! X xn X xn X xn ng = g nc n n! n n! n n! n n n Check this! Now apply the rule for multiplying EGFs to extract coefficients. This gives

n X n Lemma 8.12. If G(x) = exp(C(x)) then ngn = kckgn−k k . k=1

56 Notice that Lemma 8.12 is quite general, and is not limited to graphs. It is a restatement of what Set(C) means in terms of coefficients rather than in terms of generating functions. We first solve for cn to get a recursive description of cn in terms of ck with k < n. n−1 n−1 1 X X c = g − kc g n = g − c g n−1 n n n k n−k k n k n−k k−1 k=1 k=1 Respecializing to graphs, we can compute the number of connected graphs on n vertices.

n−1 n X n−k (2) ( 2 )n−1 cn = 2 − ck2 k−1 k=1

(cn)n≥0 = (0, 1, 1, 4, 38, 728, 26704,...) 1 2 19 91 1669 C(x) = 0 + x + x2 + x3 + x4 + x5 + x6 2 3 12 15 45 Here it was easy to enumerate the objects directly, but less obvious (without generating functions) to enumerate the connected components.

rooted labelled trees

We will enumerate rooted labelled trees on n vertices. These are trees with a distinguished vertex called the root, and the vertices are labelled with {1, 2, ··· , n}. It is convenient to define a rooted labelled forest as a union of rooted labelled trees. Let T be the collection of rooted labelled trees and F be the collection of rooted labelled forests. Clearly F = Set(T ). If we delete the root of a rooted labelled tree, we are left with a set of trees. Each of these trees has exactly one vertex that was adjacent to the root vertex we deleted. Call this the root of the smaller tree. Thus deleting the root of a rooted labelled tree gives a rooted labelled forest. This process is reversible: given a rooted labelled forest we add a new vertex, make this vertex adjacent to the root of each tree in the forest, and call this new vertex the root of the resulting tree. Thus T =∼ {•} × F. Combining these we have the following specification. T =∼ {•} × F = {•} × Set(T ) We don’t have an easy formula for either F (x) of T (x), but the dual relationships will allow us to enumerate both of them. T (x) = xeT (x)

We will solve this in a nice way using Lagrange inversion. . . but not yet. This time we don’t have an easy direct way to enumerate the objects or the components, but still the relationship between them will allow us count either. We leave it as an exercise to show how to use T (x) = x exp(T (x)) to compute the tn recursively (Exercise 8.12).

rooted labelled plane trees

A rooted labelled plane tree is a rooted labelled tree where the child vertices are ordered.2 It is again convenient to define a rooted labelled plane forest to be a collection of rooted labelled plane trees, but here the trees within the forest are ordered. This means that if we define T to be the set of

2 We met the unlabelled version when investigating Catalan numbers.

57 rooted labelled plane trees and F to be the set of rooted labelled plane forests, then F = Seq(T ). Again we have T =∼ {•} × F. Combining these we get the following specification. T =∼ {•} × F = {•} × Seq(T ) Putting this together we get x T (x) = 1 − T (x) We can solve this functional equation directly. We get √ 1 ± 1 − 4x T (x) = 2 This is an analytic function, so the required formal power series is actually an analytic power series. Since we know that 0 = t0 = T (0), we take the “−” option. Then we can extract coefficients, starting from the known series for the Catalan numbers. √ n n 1 − 1 − 4x tn = n![x ] T (x) = n![x ] 2√ 1 − 1 − 4x 1 2(n − 1) (2n − 2)! = n![xn−1] = n! = 2x (n − 1) + 1 (n − 1) (n − 1)!

exercises

P n k 1. Let An(y) = k k y . Using Lemma 8.1, derive a functional equation for An(y), find An(y) n and extract coefficients to get a formula for k . Note: there is no “simple” formula for these. r (r)(r−1)···(r−k+1) 2. It is often convenient to extend the binomial coefficients as k = k! , valid for r any r ∈ R. We still insist that k = 0 if k is not a non-negative integer. Use the Binomial 2 −1/2 −k Theorem to expand (1 − x ) and (1 − y) (k ∈ N) as a power series in y. Check that your answer is consistent with our previous results! r 3. Using Taylor series, expand (1 + x) as a power series in x, for r ∈ R, thus proving the Binomial Theorem for any exponent. 4. Prove Lemma 8.12.

5. We saw that Seteven({•}) corresponds to cosh(x). a) Show that Setodd({•}) corresponds to sinh(x). b) Give a combinatorial (bijective) proof that Seteven({•}) ∪˙ Setodd({•}) = Set({•}). Derive from this an equality of generating functions. c) Look up the definition of sinh(x) and cosh(x) as real-valued functions and see if your identity makes sense. d) Let C = Seteven({•}) × Seteven({•}) and S = Setodd({•}) × Setodd({•}). Find nice expressions for the exponential generating functions C(x) and S(x). By extracting coefficients, find a nice identity involving C(x) and S(x). Does your identity make sense in terms of real-valued functions? e) Write down the elements of C and of S of size n for 1 ≤ n ≤ 4. Can you find a relationship that gives a combinatorial proof of your identity involving C(x) and S(x)?

6. Let 0 ≤ t < r. What is the generating function for Sett mod r({•}), the set of sets whose size is t mod r? You will likely want the multi-section technique for this. There isn’t necessarily a “nice” form.

7. Let D ⊆ N. What is the generating function for SetD({•}), the set of sets whose size is an element of D? There isn’t necessarily a “nice” form.

58 8. a) Show, using Proposition 8.9, that the number of permutations on 2n symbols that are composed only of odd cycles is the same as the number of permutations on 2n symbols that are composed only of even cycles. b)  Can you find a direct bijection between these two collections of permutations? 9. “Recall” that m!! is the product of all positive integers less than or equal to m and having the same parity as m. So (2k)!! = 2k(2k − 2) ··· (2) and (2k − 1)!! = (2k − 1)(2k − 3) ··· (1). a) Show that the number of permutations on 2n symbols such that all cycles have the same parity is ((2n − 1)!!)2. b) Show that the number of permutations on 2n symbols such that all cycles have length two is (2n − 1)!!. (Such a permutation is equivalent to a perfect matching on a graph with 2n vertices.) 2n n 10. We saw that the proportion of permutations on a 2n-set that have all even cycles is n /4 . Look up Stirling’s approximation of n! and use this to give an estimate of the given proportion. Is it actually small? How large would 2n have to be so that a randomly chosen permutation on a 2n-set has at least a 99% probability of having cycles of both parities? P n P n 11. Let T (x) = n tnx /n! and G(x) = n gnx /n!. Show that tn = ngn−1 if and only if T (x) = xG(x). 12. Let T (x) be the EGF for rooted labelled trees. We showed that T (x) = xeT (x). d a) Try the “x dx log and then extract coefficients” approach here. Show that you do get a “formula” for tn in terms of tj for j < n. Compute tn for a few small values (computer might be useful) and try to guess the pattern. b) Do the same thing but, start by dividing T (x) = xeT (x) by x. Actually, start, by explaining why this is legal. 13. Let T be the set of rooted labelled trees and F be the set of rooted labelled forests. We combined T = {•} × F and F = Set(T ) to get a functional equation for T (x). Combine them differently to get a functional equation for F (x). Can you use this to compute the coefficients fn? (Of course we can find the fn by first finding the tn and then using F (x) = exp(T (x)); you are being asked to find the fn while ignoring the tn.) 14. There are some glaring similarities between what we did for rooted labelled plane trees n now, and what we did previously for rooted unlabelled plane trees. Let A(x) = anx and n T (x) = tnx /n!, where an is the number of rooted unlabelled plane trees on n edges and tn is the number of rooted labelled plane trees on n vertices. Note that A(x) is an OGF where x marks edges and T (x) is an EGF where x marks vertices. a) Explain why an−1 is the number of rooted plane trees on n vertices. b) Explain why an−1n! is the number of rooted labelled plane trees on n vertices. (This is weird: it says that every assignment of labels to vertices results in a distinct labelled object. But don’t some trees have automorphisms?) c) Combining these results, show that xA(x) = T (x). (We already did this by indepen- dently deriving A(x) and T (x).)

15. Let 0 ≤ t < r, and let sn the number of rooted labelled trees such that each vertex has t mod r children. P n a) Derive a functional equation for S(x) = n≥0 snx /n!. You will likely want the multi- section technique for this. b) Show that you can, in principle, use your equation to compute sn.

59 16. Let D ⊆ N \{0}, and let sn the number of rooted labelled trees such that each vertex has d children for some d ∈ D P n a) Derive a functional equation for S(x) = n≥0 snx /n!. You will likely want the multi- section technique for this. b) Show that you can, in principle, use your equation to compute sn.

60 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 9. unlabelled objects by components

unlabelled objects

We now turn to enumerating unlabelled objects by components. Recall that in deriving the expo- nential formula for labelled objects, we naturally encountered the multiplicative rule for EGFs. In fact, the binomial coefficient is naturally thought of as a relabelling factor: how many ways can we relabel the components. That is our motivation for using OGFs for unlabelled objects (of course the real motivation is that it works!)

We use cn for the number of “connected” objects of size n, gn,k for the number of objects of size n with k components and gn for the number of objects of size n. We define three OGFs of interest: X n C(x) = cnx n X X n k F (x, y) = gn,kx y n k ! X n X X n G(x) = gnx = gn,k x = F (x, 1) n n k The approach we used for labelled objects fails completely for unlabelled. The key observation was that we could write Bk directly in terms of B1. This worked because labelled components are never identical, because the labels always distinguish them. Even isomorphic labelled components are distinguishable. In the labelled context, we could choose to consider unordered components (e.g., for graphs) or ordered components (e.g., plane trees) by adding an extra factor of k!. In other words, the number of orderings of the components depended only on how many components, not on what the components actually are. If the underlying ground set is unlabelled, then this all falls apart. The extra factor is now something between 1 and k!, depending on what the components are. So we can’t write Bk directly in terms of k and B1. We will use another approach; in fact this approach could have been used for labelled objects also (as you are later invited to verify). We want to know what generating function corresponds to Set for unlabelled objects. But there is a slight wrinkle: a set of unlabelled components must be a set, so can not have repeated components. Two unlabelled components are isomorphic if and only if they are identical, so this means that a set of unlabelled components necessarily contains no two isomorphic components. This might be what we want, but it won’t always be. So instead of Set we define two constructions: MSet is the constructor for multisets (repetition allowed) and PSet is the constructor for proper sets (no repetition allowed). We will see that the generating functions are surprisingly similar. Note that this is a non-issue for labelled objects, since isomorphic labelled components are not identical, due to the labels. The components of a labelled object always form a set, never a multiset. As a warm-up, here is the easiest case. You are strongly encouraged to do the following problem before continuing. Problem 9.1. Determine the generating functions for MSet({•}) and PSet({•}) directly. Let M(x) and P (x) be the corresponding ordinary generating functions. Determine, by hand, the value of mn and pn for all n ∈ N. Thus, give a nice closed form for M(x) and P (x).

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

61 We cannot use Problem 9.1 to determine the generating functions for MSet(C) and PSet(C) by composition, because in order to use the composition of constructions we would need the atoms of an object in a multiset or proper set to be ultimately distinguishable by the way in which they fit into the set. This is not possible for unlabelled objects. But Problem 9.1 will turn out to be an essential step anyway, despite its apparent simplicity. First some notation, which we will recycle for both MSet and PSet (and in fact for labelled Set in the exercises). We use C for the set of “components” (which just means that MSet(C) is the set of objects we are interested in). We use C(r) for the set of all components of weight r, and C(r) for the set of all components of weight r that are equal to one particular component, denoted by the placeholder “”. In other words we have partitioned C first by size, and then by component. ! [˙ [˙ [˙ C = C(r) = C(r) r≥1 r≥1 

You are strongly encouraged to do the following problem before continuing.

Problem 9.2. How many objects are in C(r)? How many objects are in C(r)? Explicitly give the list of elements of all C(r) and C(r) for unlabelled graphs for each r ≤ 3. Do the same for labelled graphs for each r ≤ 3.

We need to know how this partition interacts with MSet. So we need to know how MSet distributes over a disjoint union. This turns out to be a nice result of its own.

Lemma 9.3. Let A and B be disjoint sets. Then MSet(A ∪˙ B) = MSet(A) × MSet(B) and PSet(A ∪˙ B) = PSet(A) × PSet(B).

Each assertion of Lemma 9.3 is simply an assertion that two sets are equal. To prove it, it suffices to show that each is contained in the other. We leave the proof as an exercise. As an illustration of Lemma 9.3, consider C to be the set of connected graphs (on at least one vertex). So we have C as follows. n o C = , , , , , , , , , , ···

Then MSet(C) is the collection of all multisets of connected graphs. n o MSet , , , , , , , , , , ··· n o = MSet { } × MSet { } × MSet , × MSet  , , , , , × · · ·  n o n o = MSet { } × MSet { } × MSet × MSet × MSet  × MSet  × MSet  × MSet  × MSet  × MSet   × · · ·

Consider a particular graph (so a multiset of connected components). We can specify this graph by giving the collection of components (with repetition of course). Or, we can specify this graph by giving for each r the collection of components of size r. Or, we can specify this graph by giving for each r and for each connected graph of size r the number of times this particular connected component appears in the graph. This is effectively how we will use Lemma 9.3. We have one further tool.

62 Lemma 9.4. Let A be a set containing exactly one object, of weight r. 1 Then MSet(A) = and PSet(A) = 1 + xr. 1 − xr Furthermore, assume that we have two different interpretations of weight ωx and ωy, marked by indeterminates x and y. Suppose that A = {α} with ωx(α) = r and ωy(α) = s. 1 Then MSet(A) = and PSet(A) = 1 + xrys. 1 − xrys

We leave the proof as an exercise. It should look strangely reminiscent of Problem 9.1. The bivariate version should be reminiscent of the idea that an object α has size r and s components (in this interpretation, if α is “connected” then we will have s = 1).

exponential formula for unlabelled objects : MSet and PSet

Now we consider MSet(C) in terms of the partition of C into various C(r) and each C(r) into various C(r). Recall from your diligent work on Problem 9.2 that the individual C(r) are single-element (r) sets, and that C has size cr. We have a slight deviation from our usual lexico-consistency, so that we use “g” for objects in both the univariate G(x) and the bivariate F (x, y).

(r) (r) (r) X X (r) n k F = MSet({} × C ) ←→ F (x, y) = gn,k x y n≥0 k≥0

(r) (r) (r) X X (r) n k Y (r) F = MSet({} × C ) ←→ F (x, y) = gn,kx y = F (x, y) n≥0 k≥0  X X n k Y (r) F = MSet({} × C) ←→ F (x, y) = gn,kx y = F (x, y) n≥0 k≥0 r≥1   X X n G = MSet(C) ←→ G(x) =  gn,k x = F (x, 1) n≥0 k≥0 This might appear to be a blizzard of notation, but the superscripts and subscripts are intended to have a consistent meaning. The important consequence is to notice is that we can build everything (r) up from gn,k , using Lemma 9.3.

(r) We start by finding gn,k , which is the number of objects of size n with k components that have all of their components of size r and all identical to one particular component “”. ( (r) 1 n = kr X X (r) X 1 g = ←→ F (r)(x, y) = g xnyk = xkryk = n,k 0 otherwise n,k 1 − yxr n k k This is in fact nothing more than Lemma 9.4 (which is itself really just Problem 9.1). Then we use Lemma 9.3 to get the result we want. Y Y 1 1 F (r)(x, y) = F (r)(x, y) = = 1 − yxr (1 − yxr)cr   Y Y 1 F (x, y) = F (r)(x, y) = (1 − yxr)cr r≥1 r≥1 Y 1 G(x) = F (x, 1) = (1 − xr)cr r≥1

Problem 9.5. Make sure you understand the step where cr first appears in the above equation.

63 It doesn’t look particularly exponential, and C(x) is a little hidden. In fact it’s there: the numbers cr are C(x). We can make the C(x) more obvious, and make it look more exponential too.

!!   Y 1 Y 1 X 1 = exp log = exp c log (1 − yxr)cr (1 − yxr)cr  r 1 − yxr  r r r≥1   k X X (yxr) = exp c  r k  r≥1 k≥1     X 1 X  r X 1 = exp yk c xk = exp ykC(xk)  k r   k  k≥1 r k≥1

We can do the same thing for PSet.

(r) (r) (r) X X (r) n k F = PSet({} × C ) ←→ F (x, y) = gn,k x y n≥0 k≥0

(r) (r) (r) X X (r) n k Y (r) F = PSet({} × C ) ←→ F (x, y) = gn,kx y = F (x, y) n≥0 k≥0  X X n k Y (r) F = PSet({} × C) ←→ F (x, y) = gn,kx y = F (x, y) n≥0 k≥0 r≥1   X X n G = PSet(C) ←→ G(x) =  gn,k x = F (x, 1) n≥0 k≥0

(r) Again, we determine gn,k (which amounts to an application of Lemma 9.4).

( (r) 1 n = kr, k ≤ 1 X X (r) X g = ←→ F (r)(x, y) = g xnyk = xkryk = 1 + yxr n,k 0 otherwise n,k n k k

Applying Lemma 9.3 according to the partition of C again gives the result.

Y Y F (r)(x, y) = F (r)(x, y) = = (1 + yxr)cr   Y Y F (x, y) = F (r)(x, y) = (1 + yxr)cr r≥1 r≥1 Y G(x) = F (x, 1) = (1 + xr)cr r≥1

Again it doesn’t look particularly exponential, but the same approach as before will make it more obviously so (Exercise 9.6). For convenience we summarize the correspondences in the following.

64 P n Theorem 9.6. If the generating function for unlabelled components is C(x) = n cnx then the generating function for unlabelled objects formed of components is given by the following.   Y 1 X 1 k k F = MSet({} × C) ←→ F (x, y) = = exp  y C(x ) (1 − yxr)cr k r k≥1   Y 1 X 1 k G = MSet(C) ←→ G(x) = = exp  C(x ) (1 − xr)cr k r k≥1   Y X (−1)k+1 F = PSet({ } × C) ←→ F (x, y) = (1 + yxr)cr = exp ykC(xk)   k  r k≥1   Y X (−1)k+1 G = PSet(C) ←→ G(x) = (1 + xr)cr = exp C(xk)  k  r k≥1

As with the labelled case, it is best to think of this as a process and not a final result, since we may (r) (r) be interested in various restrictions. The idea is to determine gn,k and thus F (x, y) and then Y  cr Y F (r)(x, y) = F (r)(x, y) = F (r)(x, y) F (x, y) = F (r)(x, y)  r Problem 9.7. We can derive the exponential formula for labelled objects the same way. We start from the following. ( 1 · n  1 n = kr g(r) = r r ··· r k! n,k 0 otherwise The rest of the process is exactly as before. In particular, note that we are, in some sense, done. The details are left as an exercise.

relating MSet and PSet

Let C be a set of components of some sort. Consider a multiset of components. Each component has a certain multiplicity, which might be either even or odd. If we remove one component of each odd multiplicity, we are left with a multiset of components each of even multiplicity. This multiset of components of even multiplicities can be regarded as a doubling of the multiset with multiplicities halved. In order to give an example we write a multiset using {{}} for brackets and exponents for multiplicities. {{α,α,α,α,β,β,γ,γ,γ,δ}} = α4, β2, γ3, δ =∼ {γ, δ} , α4, β2, γ2  =∼ {γ, δ} , α2, β1, γ1  In other words every multiset can be thought of as a set of objects whose multiplicity is odd, plus another multiset of objects that each count double. The second multiset is “half” of what remains once we have removed one of each odd-multiplicity element from our original multiset. This gives a weight preserving bijection between the following. • multisets, where the total weight is the sum of the weights of the elements of the multiset. • a set and a multiset, where the total weight is the sum of the weights of the elements of the set plus twice the sum of the weights of the elements of the multiset.

65 In other words we have the following.

Proposition 9.8. Let C be some set of unlabelled components. Let M(x) be the ordinary generating function for MSet(C) and P (x) the ordinary generating function for PSet(C). Then M(x) = P (x)M(x2).

This allows us to write the generating functions for MSet and PSet in terms of each other.

Corollary 9.9. Let C be some set of unlabelled components. Let M(x) be the ordinary gener- ating function for MSet(C) and P (x) the ordinary generating function for PSet(C). M(x) Y k Then P (x) = and M(x) = P (x)P (x2)P (x4) ··· = P (x2 ). M(x2) k≥0

We leave the proof of the iterated version as an exercise. It is closely related to expressing the multiplicity of each component in binary.

computational recurrences

The exponential formula is fundamentally a relation between the sequences cn and gn,k. We can d unravel it as an actual formula (albeit not necessarily a nice compact one). We first apply y dy log to the formula for F (x, y).   X X X 1 X g xnyk = exp yk c xkr n,k  k r  n≥0 k≥0 k≥1 r≥1     X X n k X X n k X X k kr kgn,kx y =  gn,kx y   y crx  n≥0 k≥0 n≥0 k≥0 k≥1 r≥1

We extract [xnyk] from both sides to get the formula explicitly. Note that on the right-hand side we p q i ij n k obtain products like (gp,qx y ) y cjx for integers i, j, p, q. In order to get x y we need to choose p = n − ij and q = k − i. This gives the following. X kgn,k = gn−ij,k−icj i,j≥1

This can be used to recursively compute the gn,k’s given the cj’s, or the cj’s given the gn,k’s.

rooted unlabelled trees

We let T (x) and G(x) be the OGF that enumerates rooted unlabelled trees and forests, respectively. P n P n So T (x) = n tnx and G(x) = n gnx where there are tn rooted unlabelled trees on n vertices and gn forests on n vertices. Each component of which is a rooted unlabelled tree, so a rooted unlabelled forest is a multiset of rooted labelled trees. In other words G = MSet(T ). On the other hand given a rooted unlabelled tree if we delete the root we get a rooted unlabelled forest, where the (new) root of each component is the former attachment point to the original root. This process is reversible, so we have the equivalence T =∼ {•} × G. Putting this together we get the following. T =∼ {•} × G = {•} × MSet(T )

66 Again this gives us a functional equation for T (x).

X T (xk) T (x) = x exp k k≥1

This equation doesn’t have any “nice” answers, but we can still use it to compute tn for any n. By d taking x dx log of both sides (and then multiplying by T (x)) we get ! ! X n X n X X kr ntn+1x = tn+1x rtrx (1) n n k r

Extracting [xn] from both sides we get: X ntn+1 = jtjtn−ij+1 i,j≥1

rooted unlabelled binary trees

A rooted binary tree is one where all vertices have 0 or 2 children. In particular, such a tree always has at least one vertex (the root) and will always have an odd number of vertices. It will be convenient to let x mark the number of vertices that have two children (as opposed to zero); we call such vertices internal. There is a simple relationship between the number of vertices and the number of internal vertices; we leave this to the exercises. We will define a rooted unlabelled binary forest to be a collection of rooted unlabelled binary trees. In other words, if B is the set of rooted unlabelled binary trees then F = MSet(B). We have the exponential formula for unlabelled objects:   X 1 F (x, y) = exp ykB(xk)  k  k≥1 where B(x) is the OGF that enumerates rooted unlabelled binary trees. Note that deleting the root of a B-object gives a F-object, but the converse is not true. Adding a new vertex to a rooted unlabelled binary forest and making all of the old roots adjacent to the new root gives a tree that has an arbitrary number of children at the root (but either 0 or 2 for every other vertex). The collection of rooted unlabelled binary forests with exactly two components is [y2] F (x, y). We might define this as F2. A rooted unlabelled binary tree is either a single childless root vertex (which is not internal), or else it is a root vertex with two children, each of which is a rooted unlabelled binary tree. Thus if we delete the root of a nontrivial rooted unlabelled binary tree we obtain a 2-component rooted unlabelled binary forest having in total one less internal vertex than the tree we started with.

B = {} ∪˙ {•} × F2 B(x) = 1 + x[y2] F (x, y)   X 1 = 1 + x[y2] exp ykB(xk)  k  k≥1 1 = 1 + x B(x2) + B2(x) 2

67 This in fact determines all the bn, as we see by extracting coefficients.

0 [x ] B(x) = b0 = 1 2m−1 1 X [x2m] B(x) = b = b b 2m 2 k 2m−1−k k=0 2m 1 1 X [x2m+1] B(x) = b = b + b b 2m+1 2 m 2 k 2m−k k=0

The fractions might seem a bit worrying. In fact the expressions on the right-hand side are always integers since they count something (rooted unlabelled binary trees to be precise).

P2m−1 P2m Problem 9.10. Prove that the expressions k=0 bkb2m−1−k and bm + k=0 bkb2m−k are always even, by observing that the terms pair up nicely. (Of course we already know this since dividing them by two gives the cardinality of a set.) Now prove this algebraically, by directly manipulating the expression.

restricted integer partitions

A partition of n is a multiset of positive integers that add to n; note that a multiset can have repeats but is not ordered. We can consider a multiset to be an object, and define “connected” to mean multiset of size 1. Since a partition uniquely decomposes into the collection of singleton sets of its elements, this is a valid use of “connected component” for our purposes.

7 = 1 + 1 + 2 + 3 ↔ {{1, 1, 2, 3}} ↔ {{1} , {1} , {2} , {3}}

Note that cr = 1 for r ≥ 1, or C(x) = x/(1 − x), since there is exactly one integer of size r for each value r. A more elegant perspective might be to observe that C is just the positive integers and partitions are P n just multisets of them. So if P (x) = n pnx where n is the number of partitions of n then we have the following specification, where the weight of a positive integer is just itself.

Y 1 P = MSet( \{0}) ←→ P (x) = N 1 − xr r≥1

A partition with odd parts is one which uses only odd integers. A partition with distinct parts is one where all the integers used are distinct. For instance 10 = 1 + 1 + 3 + 5 is a partition of 10 with odd parts, 10 = 1 + 2 + 7 is a partition of 10 with distinct parts, and 10 = 3 + 7 is a partition with odd parts and distinct parts. Let A be the set of partitions into odd parts and B be the set of partitions into distinct parts. Then we have the following specifications, which lead directly to generating functions.

Y 1 A = MSet ({2s + 1 : s ∈ }) ←→ A(x) = N 1 − x2s+1 s≥0 Y r B = PSet (N \{0}) ←→ B(x) = (1 + x ) r≥1

68 However it is perhaps surprising that these two generating functions are the same:

Y 1 Y  1 1 − x2s  = 1 − x2s−1 1 − x2s−1 1 − x2s s≥1 s≥1 (1 − x2)(1 − x4)(1 − x6) ··· = (1 − x)(1 − x2)(1 − x3)(1 − x4)(1 − x5)(1 − x6) ··· (1 − x)(1 + x)(1 − x2)(1 + x2)(1 − x3)(1 + x3) ··· = (1 − x)(1 − x2)(1 − x3)(1 − x4)(1 − x5)(1 − x6) ··· = (1 + x)(1 + x2)(1 + x3) ··· Y = (1 + xr) r≥1 ∼ It is not true in general that PSet(C) = MSet(Codd), but see the exercises for a slightly more general formulation.

exercises

1. Using the generating functions M(x) and P (x) you found for Problem 9.1, compute M(C(x)) and P (C(x)) for some C(x) that corresponds to an interesting class of unlabelled components C. Do you get the generating function for MSet(C) and PSet(C)? Explain. ! [˙ [˙ 2. We identified the partition C = C(r) . r≥1  a) Give the generating functions C(r)(x) and C(r)(x). P (r) (r) P (r) b) Verify that C(x) = r≥1 C (x) and C (x) = r≥1 C (x). 3. Prove Lemma 9.3. 4. In the derivation of MSet and PSet we asserted that G(x) = F (x, 1). Explain why the substitution of y = 1 is always legal in this context. 5. In the derivation of MSet and PSet we found an expression for G(x) by evaluating F (x, 1). Show that we could have avoided the bivariate versions altogether, and computed G(x) di- rectly (without F (r) or F (r). In fact we want both the univariate and the bivariate versions, so our derivation was “optimal” for that purpose. 6. Verify that PSet has the exp (··· ) formulation given in Theorem 9.6. 7. Complete the derivation of the exponential formula for labelled objects using the present (r) method and the expression for gn,k given in Problem 9.7. That is, determine the generating function for Set(C) where C is some set of labelled components. Of course we know what the final answer should be. (This is in fact the method used by Wilf [2] for both labelled and unlabelled objects, in a different presentation). Y k 8. Prove Corollary 9.9: given M(x) = P (x)M(x2) from Proposition 9.8, derive M(x) = P (x2 ). k≥0 P j 9. For a given integer b, let b0, b1, b2, ··· be the digits of b in binary, so that b = j≥0 bj2 . For a given multiset B, define a sequence of sets A0,A1,A2, ··· where the multiplicity of α in B is b if and only if the multiplicity of α in Aj is bj for every j. a) Explain how the above gives a bijection between the collection of multisets with elements from a fixed universe and sequences of sets with elements from the same universe. You may assume that all multiplicities are finite.

69 Y k b) Use this bijection to prove M(x) = P (x2 ) from Proposition 9.8. k≥0 10. Give an alternate proof of Proposition 9.8 using the following idea. Let M(x) and P (x) be the OGFs of MSet(C) and PSet(C) respectively. c Y Y 1 − x2r  r a) Using the fact that P (x) = (1 + xr)cr = , show that we can write 1 − xr r r   X 1   P (x) = exp C(xk) − C(x2k) .  k  k≥1 M(x) b) Using the relationship you just derived, show that P (x) = M(x2) d 11. a) Apply x dx log to both sides of equation (1) to derive the given result on coefficients. d b) Explain why we can cancel an x from both sides of equation (1), and then apply x dx log to both sides of the result. Show that you get the same result on the coefficients. c) Compare this with Exercise 8.12. 12. Show directly that that a rooted binary tree always has an odd number of vertices. 13. Show directly that a rooted binary tree with n internal vertices has 2n + 1 vertices in total.

14. Let B be the set of unlabelled trees all of whose vertices have 0 or d children, and Fd be the set of forests with exactly d components each component of which is a rooted labelled tree where each vertex has 0 or d children. a) Derive an expression for the (ordinary) generating function corresponding Fd(x). b) Derive an expression for B in terms of Fd, and hence for B(x) in terms of F2(x). c) Setting d = 3, extract coefficients to find a formula for bn. d) Can you find a formula for bn for arbitrary d? 15.  Define odd(n) to be the largest odd of n. Consider a set of unlabelled objects S such that the number of connected objects of size n is odd(n). So the generating function for P n connected objects is C(x) = n odd(n)x . a) Determine the OGF for objects all of whose components have odd sizes. b) Determine the OGF for objects all of whose components are distinct. c) Show that these two OGFs are equal.

70 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 10. polynomials

finite fields : background

We apply enumeration of unlabelled objects to something seemingly less combinatorial: polynomials over finite fields. We start with a brief review of some theory of rings and fields. A polynomial is irreducible if it is of positive degree and cannot be factored into polynomials of strictly smaller degree. So for instance every polynomial of degree one is irreducible. In fact, every polynomial can be uniquely factored into irreducible polynomials (possibly repeated). More precisely, if R is a ring then we have the following definitions. An element u ∈ R is a unit if there exists some v ∈ R with uv = 1; so the units are exactly those elements with multiplicative inverses. A non-zero non-unit element a ∈ R is irreducible if whenever a = bc then either a or b is a unit. A non-zero non-unit element a ∈ R is prime if whenever a | bc then either a | b or a | c. If R is a field then every non-zero element is a unit (and there are no irreducibles or primes). If R = Z then the units are ±1, and a is prime if and only if a is irreducible; these are what you might call “the” primes. If R = K[x], the ring of polynomials over a field K, then again a is prime if and only if it is irreducible; the primes here are exactly the polynomials of nonzero degree that cannot be factored over K. Both Z and K[x] are examples of unique factorization domains, in which prime and irreducible are synonyms and every element can be uniquely factored into a multiset of irreducibles, up to multiplication by a unit. Every finite field has q = pk elements for some prime p and positive integer k. In fact we can say quite a bit more.

Theorem 10.1. Let F be a finite field. Then for some prime p and some irreducible polynomial ∼ f of degree k over Zp, we have that F = Zp /(f). Furthermore, different irreducible polynomials of the same degree give isomorphic fields.

In light of this, we denote by Fq “the” of order q. In particular this says that Fq has k ∼ q = p elements. We may consider the elements of Fq = Zp/(f) to be polynomials of degree less than k. Addition is performed as you would expect for polynomials, with coefficients in Zp. Multiplication is done modulo f, so we take f ≡ 0, which we can think of as “xk ≡ · · · ” and use it to reduce the product to a polynomial of degree less than k. Note that all irreducible polynomials over Fp of degree k result in fields that are isomorphic, so for algebraic purposes the choice of f is arbitrary. However, for purposes of doing arithmetic efficiently, the choice of f can make a difference. This has applications anywhere finite fields are used (eg, coding theory, cryptography). The particular representation of Fq chosen (i.e. the particular polynomial f) can have significant consequences. We won’t investigate this aspect here.

2 Problem 10.2. Consider the polynomial f = x + x + 1 in Z2[x]. Show that this polynomial is irreducible (hint: there aren’t too many polynomials of smaller degree, so you can just try them as all factors). Then we see that Z2/(f) = {0, 1, x, x + 1}. Write down the addition table and multiplication table of this field (which is up to the unique field of order 4). Note that since we are 2 2 taking f ≡ 0 we have x ≡ −x − 1 = x + 1 (the last because we are in Z2), so if x occurs in a product we can reduce it.

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

71 3 2 Now try the same thing for f = x + x + 1 in Z2[x]. Note that to show this is irreducible it suffices to show that it is not the product of a linear and a quadratic, so it suffices to show that it is not divisible by a linear factor, so it suffices to show that it has no roots in Z2.

polynomials over Fq

Let us fix some (arbitrary) prime-power q; all polynomials will be over Fq. For technical reasons, we will only consider monic polynomials. This means that factorization truly is unique. So for instance over F7 we have the following factorization, which is the unique factorization of the given monic polynomial into monic irreducible polynomials. x5 + x4 + 3x3 + 4x2 + 4x + 3 = (x − 2)(x2 + 2)(x2 + 3x + 1) This is analogous to only using positive irreducibles in the factorization of integers. Instead of having 60 = 22 × 3 × 5 = (−2) × 2 × (−3) × 5 = −(−2)2 × 3 × (−5) = (−2)2 × (−3) × (−5) = ··· we only have 60 = 22 × 3 × 5. P n P n We define G(x) = n gnx and C(x) = n cnx , where gn is the number of polynomials of degree n n and cn is the number of irreducible polynomials of degree n. It’s easy to see that gn = q (remember our polynomials are monic), meaning G(x) = 1/(1 − qx). On the other hand unique factorization means that there is a natural bijection between polynomials and multisets of irreducible polynomials (the factors). This gives the following relation between the generating functions.   X 1 G =∼ MSet(C) ←→ G(x) = exp C(xk)  k  k≥1

This implicitly determines all of the cr.   1 X 1 = exp C(xk) 1 − qx  k  k≥1

1 X cr log = xkr 1 − qx k k,r≥1 n X q X cr xn = xkr n k n≥1 k,r≥1 Extracting [xn] on the left-hand side is simple; on the right-hand side we must have n = kr and so the sum is over all r | n, which means that k = n/r. We get the following.

n X q = rcr (1) r|n

We can solve this for cn in terms of cr for r < n, and then use it to recursively compute the number of irreducible polynomials of any degree. But there is a better way, using Mobius¨ inversion. Let µ be the function on the positive integers defined by the following:  µ(1) = 1  µ(n) = −1 if n is the product of exactly j distinct primes µ(n) = 0 if t2 | n for some t > 1 This is exactly what we need to solve equation (1) efficiently.

72 Lemma 10.3. If an and bn are two sequences then the following holds. X X an = br ⇐⇒ bn = µ(r)an/r r|n r|n

We will not prove this for the moment. But it is exactly the tool needed to solve for cd in the equation (1). We directly obtain cn in terms of the gn (powers of q). X n/r ncn = µ(r)q r|n This means that we now have an expression for C(x) also X X 1 X C(x) = c xn = µ(r)qn/rxn n n n n r|n X 1 = µ(r)qkxkr kr k,r≥1 k X µ(r) X (qxr) X µ(r) 1 = = log r k r 1 − qxr r≥1 k≥1 r≥1

It is straightforward to extract coefficients in order to determine cn for any particular n (the interme- diate form above is more practical). Although in this case it is probably easier to use the expression for cn directly. 1 1 c = q c = q2 − q c = q3 − q 1 2 2 3 3 We notice that the answer will always be a sum (±) of powers of q, where each power divides n, all n n divided by n. This means that as q gets large, the leading term, q , will dominate to give cn ≈ q /n. Thus for large q, the proportion of polynomials of degree n that are irreducible is approximately 1/n, independent of q. This is an analogue of the theorem, which gives the limiting density of prime numbers. Note however, that there are some important differences. We measure the size of a polynomial by its degree, which is a non-starter for integers (they all have degree zero).

squarefree polynomials

We enumerate the squarefree polynomials of degree n. These are polynomials whose factorization P n gives a set of distinct irreducible polynomials. Let S(x) = n snx , where sn is the number of squarefree polynomials of degree n. We recognize squarefree polynomials as being equivalent to a proper set of irreducibles (the prime factors, which are all distinct).   X (−1)k+1 S =∼ PSet(C) ←→ S(x) = exp C(xk) (2)  k  k≥1 n The above derivation determines all of the sn in terms of the known cn, as sn = [x ] S(x). In extracting coefficients from the right-hand side, it is helpful to first imagine which terms of the expansion of exp will contribute (it helps to explicitly write out the terms of the “exp” and the terms of the inner sum inside each of these). We find that 1 1 s = 1 s = c = q s = c − c + c2 = q2 − q 0 1 1 2 2 2 1 2 1

73 There is an alternative derivation of S(x) that produces a more usable form. We apply Corollary 9.9. Since S =∼ PSet(C) and G =∼ MSet(C) we have G(x) 1 − qx2 X S(x) = = = 1 + qx + qn − qn−1 xn (3) G(x2) 1 − qx n≥2 n n−1 This gives s0 = 1, s1 = q, and sn = q − q for n ≥ 2. As a quick reminder, the equation G(x) = S(x)G(x2) is based on a correspondence. We first separate off one of each factor whose multiplicity is odd. Then we divide the multiplicities of the remaining factors by two. This gives between a polynomial on the one hand and a set of factors and a multiset of factors. The polynomial is the product of all the factors, but the ones in the multiset count twice. As an example, consider a polynomial p with four distinct irreducible factors which we call p1, p2, 5 6 2 p3, p4 such that p = p1p2p3p4. 5 6 2 4 6 2 2 3 12  2 3  p = p1p2p3p4 = p1p3 p1p2p4 = p1p3 p1p2p4 ←→ {p1, p2} , p1, p2, p4 = (podd, psquare) G(x) counts the number of things on the left, while S(x)G(x2) counts the number of things on the right with the weight of the second half counting double.

questions

1. Compute s3 (and s4 if you dare) using equation (2), and verify that it is in agreement with equation (3).

2. Can we use m¨obiusinversion to determine sn for squarefree polynomials, analogously to what we did for polynomials? What goes wrong?

3. Fix some prime-power q, and a positive integer d. Let cn be the number of irreducible (monic) polynomials of degree n over Fq. Let sn be the number of (monic) polynomials of degree n P n over Fq that contain no non-trivial d-th power as a factor. Let C(x) = n≥1 cnx and P n S(x) = n≥0 snx . a) Using a suitable generalization of PSet, find a formula for S(x) in terms of C(x) (and maybe q and d). You should specify what your suitable generalization of PSet is, and (r) find the generating function by starting from sn,k and finishing with something of the form S(x) = exp (··· ). b) Using a suitable generalization of the relationship between the generating functions for MSet and PSet that we saw in the course, determine S(x) in terms of only q and d. This is not the same method you used in the previous part. c) Using either one of your expressions (your choice) for S(x), give an explicit simple formula for sn. 4.  Try and prove Lemma 10.3, or at least imagine how you might prove it. Note that we will see a proof of this in the next chapter.

74 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 11. Dirichlet generating functions

Dirichlet generating functions

We have seen ordinary generating functions and exponential generating functions; each corresponds naturally to a particular method of multiplying sequences. There are other multiplications possible, depending on the combinatorial constructions or applications.

P an Given a sequence (an)n≥1 we define its Dirichlet generating function to be A(x) = n≥1 nx . Notice that the indeterminate x appears in the exponent. Such sequences are useful in number theory (where typically an s is used as the indeterminate instead of x). For instance, the sequence 1, 1, 1,... corresponds to the Dirichlet generating function 1 1 1 ζ(x) = 1 + + + + ··· 2x 3x 4x What is the multiplication for Dirichlet generating functions? More precisely, how do we compute the cn in the following?     X an X bn X cn =  nx   nx  nx n≥1 n≥1 n≥1 The answer is that we need to multiply a term from each bracket in all possible ways so as to obtain n−x. In other words we need indices i and j such that the product ij = n. The weight of an ordered pair of objects is the product of the weights of the parts. ( α ∈ A , ω(α) = i −→ (α, β) ∈ A × B , ω(α, β) = ω(α)ω(β) = ij β ∈ B , ω(β) = j Multiplication of Dirichlet generating functions corresponds to a multiplicative weight. In contrast, ordinary and exponential generating functions both combined weight additively, with the multipli- cation rule for exponential generating functions additionally naturally adding a factor corresponding to relabelling. So the cn in the product of Dirichlet generating functions is computed as follows.     X an X an X cn X X 1 X = = a b ←→ c = a b  nx   nx  nx d n/d nx n d n/d n≥1 n≥1 n≥1 n≥1 d|n d|n

As an example, let’s compute ζ2(x).       X 1 X 1 X X 1 X d(n) ζ2(x) = = 1 · 1 =  nx   nx    nx nx n≥1 n≥1 n≥1 d|n n≥1 where d(n) is the number of of n. Note that this sequence is not monotonic, far from it! This is in contrast to our usual generating functions, where the coefficients have (mostly) increased with n. Or to put it another way, for OGF or EGF we often (but not always) have that i + j = n implies that an is larger than ai or aj. With Dirichlet generating functions we often (but not always) have that ij = n implies that an is larger than ai or aj. A more correct (but still heuristic) observation might be that the sequences from OGF or EGF are ordered according to their indices under the total

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

75 ordering of “≤” on N, while sequences from DGF are ordered according to their indices under the 2 partial ordering of “|” on N \{0}. So in this sense ζ is monotonic.

multiplicativity

A function f(n) defined on the positive integers is said to be multiplicative if whenever gcd(a, b) = 1 then f(ab) = f(a)f(b). Note that we require nothing when the factors are not coprime. Both d(n) and µ(n) defined above are multiplicative, as you are invited to show in the exercises. Prime powers play a special role for multiplicative functions.

Lemma 11.1. If f(n) is a multiplicative function then it is determined by its values on prime powers.

This has an important consequence. If f(n) is multiplicative then it behaves very nicely as the coefficients of a DGF.

Lemma 11.2. Let f be a multiplicative function. Then we have the following equivalent form for its Dirichlet generating function. X f(n) Y  f(p) f(p2) f(p3)  = 1 + + + + ··· nx px p2x p3x n≥1 primes p

To show this, extract [n−x] from both sides, and note that by uniqueness of factoring there is only one choice of term in each factor of the infinite product on the right-hand side. Note that equality of generating functions means equality of every coefficient. In particular, in expanding the product on the right-hand side we only ever take products for which all but finitely many of the terms is 1. In fact this is not quite true: we never actually “expand” anything, we simply ask what the coefficient of −x n is on each side, for all n ∈ N. The idea of taking a product where infinitely many terms are not 1 does not even arise: this would correspond to asking for the coefficient of “∞−x, which we would not do for a generating function. We saw this before without commenting on it, when we considered the generating function for MSet and PSet as a product before “simplifying” it to something of the form exp (··· ). The terms in the infinite product that correspond to x∞ never arise, since the n ordinary generating function is defined exactly by the coefficient of x for n ∈ N. Back to the business at hand, the function 1 (whose value is 1 for all n) is multiplicative (a short exercise!), so we get X 1 Y  1 1  Y 1 ζ(x) = = 1 + + + ··· = nx px p2x 1 − 1 n≥1 p p px We also know that the function µ(n) is multiplicative, so X µ(n) Y  µ(p) µ(p2) µ(p3)  Y  1  = 1 + + + + ··· = 1 − nx px p2x p3x px n≥1 p p We conclude that these two DGFs are reciprocals of each other since their product is the DGF 1.

M¨obiusinversion

X Given an = br, we want to invert, and solve for the bn. How do we do it? We first identify r|n the finite sum as the coefficient in a product of DGFs. Then we multiply by the reciprocal of the

76 convoluting function. Then we extract coefficients, using the multiplication rule backwards. X X an = br = 1 · br ⇐⇒ A(x) = ζ(x)B(x) r|n r|n 1 X A(x) = B(x) ⇐⇒ µ(n/r)a = b ζ(x) r n r|n This is the proof of Lemma 10.3. Compare this with the following analogue, which is based on an EGF convolution: n n X n X n a = b ⇐⇒ b = (−1)n−k a n k k n k k k=0 k=0 You might remember this from Exercise 4.14.

cyclotomic polynomials

We offer a short example of another application of M¨obiusinversion, as a way of studying primitive n-th roots of unity. The n-th roots of unity are exactly the roots of the polynomial 1 − xn. These form a with respect to multiplication. Among these roots, there are some that multiplicatively generate the group of all n-th roots: these are the primitive roots.

Definition 11.3. α = e2πir/n is a primitive n-th root of unity if α, α2, . . . , αn = 1 are all distinct.

We have the following characterization of primitive roots.

Lemma 11.4. The primitive n-th roots of unity are exactly the complex numbers = e2πir/n with gcd(r, n) = 1. Furthermore, every n-th root of unity is a primitive d-th root of unity for exactly one d, and d | n.

We are interested in the polynomial whose roots are exactly the primitive n-th roots of unity: this is n the Φn(x). Certainly Φn(x) | 1 − x . In fact, Lemma 11.4 gives the following.

Y n Lemma 11.5. Φd(x) = 1 − x d|n

We can find Φn(x) using M¨obius inversion. But first we need to transform the product to a sum

Y n Φd(x) = 1 − x d|n X n log Φd(x) = log(1 − x ) d|n X d log Φn(x) = µ(n/d) log(1 − x ) d|n Y d µ(n/d) Φn(x) = (1 − x ) d|n

77 Recall that µ(k) ∈ (0, ±1). So for instance we have 18/1 µ(1) 18/2 µ(2) 18/3 µ(3) 18/6 µ(6) 18/9 µ(9) 18/18 µ(18) Φ18(x) = (1 − x ) (1 − x ) (1 − x ) (1 − x ) (1 − x ) (1 − x ) (1 − x18)(1 − x3) = (1 − x9)(1 − x6) = 1 − x3 + x6

Note that the unsimplified form of Φ18(x) does not at first glance appear to be a polynomial. The formula for Φn(x) given by M¨obiusinversion is obviously a rational function. The fact that it is a polynomial follows from the proof that it is a polynomial whose roots are exactly the primitive n-th roots of unity (in fact, among all such polynomials it is the one with constant term 1). So as a “corollary” of the fact that Φn(x) is a polynomial we have the following.

Y n/r Y n/r (1 − x ) (1 − x )

r|n r|n µ(r)=−1 µ(r)=1 This is analogous to proving that the binomial coefficient can be expressed in terms of factorials, and then concluding that the apparently rational expression is an integer. n n! = =⇒ k!(n − k)! | n! k k!(n − k)! n This works because we define k as the number of k-subsets of an n-set and then prove the factorial formula. Had we defined the binomial coefficient as the factorial formula it would not have been so obviously an integer.

exercises

1. State the multiplication rule for products of three (or better yet t) DGFs. 2. Verify that d(n) and µ(n) really are multiplicative. Also verify that the constant function 1(n) = 1 is multiplicative. Can you think of other multiplicative functions? 3. Prove that if f is a multiplicative function then f(1) = 1. This explains the leading term in Lemma 11.2. 4. Prove Lemma 11.1. (hint: factor n) 5. Prove that if we arbitrarily assign a positive integer value to f on each prime power, then there is a unique extension to a multiplicative function f : N \{0} → N \{0}. This is something of a converse to Lemma 11.1, and can be seen as analogous to defining a function on a basis of a vector space and extending by linearity. 6. Both M¨obiusinversion and the EGF analogue mentioned above use the sequence 1, 1, 1,... (and it’s reciprocal) to establish an equivalence between identities. What would the OGF analogue be, using the sequence 1, 1, 1,...? (Of course, one can use other sequences as well, provided they have a reciprocal.) (hint: if you’re stuck, recall Exercise 4.13 and Exercise 4.14). P f(n) 7. Prove that if f is multiplicative then n≥1 nx has a DGF-reciprocal. (hint: start with Exercise 11.3, but note that Lemma 1.3 does not directly apply.) 8. Prove Lemma 11.4. 9. Prove Lemma 11.5.

10. Compute Φpk (x) where p is a prime number. Draw the roots in the complex plane for Φ4(x) and Φ9(x). Can you give a nice geometric description?

78 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 12. moments

probability theory: five minute review

In general, consider a (finite) set Ω and a function f defined on the objects in Ω. We choose an object α ∈ Ω according to some probability distribution p and let Y = f(α) be a random variable. So Y is the value of f on an object α ∈ Ω chosen randomly according to p. Then the expected value of Y is the sum over all Ω of the product of f(α) weighted by p(α). X E[Y ] = f(α)p(α) α∈Ω A useful case is when the probability distribution p is uniform (meaning p(α) = 1/ |Ω| for all α ∈ Ω) in which case we have 1 X E[Y ] = f(α) |Ω| α∈Ω We see that E[Y ] is then the average of Y (i.e. the average value of f(α)) over the set Ω. We can compute the k-th moment of Y as follows, for k ≥ 1. X E[(Y )k] = (f(α))k p(α) α∈Ω More commonly we use the shifted k-th moment. X E[(Y − E[Y ])k] = (f(α) − E[Y ])k p(α) α∈Ω We sometimes encounter the k-th factorial moments. X E[(Y )(Y − 1) ··· (Y − k + 1)] = (f(α)) (f(α) − 1) ··· (f(α) − k + 1) p(α) α∈Ω In principle one can take the expectation of any function of Y . The more traditional variance can be written in terms of the moments or factorial moments. Expec- tation is a linear operator (Exercise 12.1), which makes the following calculations an exercise. ( E[Y (Y − 1)] + E[Y ] − (E[Y ])2 Var[Y ] = E (Y − E[Y ])2 = (1) E[Y 2] − (E[Y ])2 In fact the three types of moments are “equivalent” (Exercise 12.2). For p uniform we define the average µ and variance σ2 of Y as follows. The standard deviation is the root of the variance. µ = E[Y ] σ2 = E[(Y − E[Y ])2] Knowing µ and σ2 is a reasonable brief description of the behaviour of Y . The probability distribution defines the moments (by definition) but knowing the moments (shifted, unshifted, or factorial) does not necessarily determine the probability distribution. The moments of Y (with respect to some probability distribution p) can be seen as roughly analogous to the terms in a Taylor series of a function. We won’t need the details of these relationships in this course. Suffice it to say that for many useful situations the moments of Y provide ways of “approximating” Y , much like for many useful functions the Taylor polynomials provide useful approximations.

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

79 bivariate generating functions and moments

P Let fn,k be the number of objects of size n with k components. Then we see that fn := k fn,k is the number of objects of size n. Consider sampling uniformly at random an object of size n from F, and let Xn be the random variable that measures the number of components of such an object.

Then we can compute various moments of Xn as follows: X fn = fn,k k 1 X E[Xn] = kfn,k fn k 2 1 X 2 E[(Xn) ] = k fn,k fn k 1 X E[Xn(Xn − 1)] = k(k − 1)fn,k fn k

P n k Suppose F (x, y) = n,k fn,kx y ; for instance the bivariate generating function for unlabelled ob- jects. Then, leaving aside the division by fn, the expectations correspond to derivatives.

X n fn,k = [x ] F (x, y) |y=1 k

X n d kfn,k = [x ] F (x, y) dy k y=1

X 2 n d d k fn,k = [x ] y F (x, y) dy dy k y=1 2 X  d  k(k − 1)f = [xn] F (x, y) n,k dy k y=1

If we were dealing with labelled objects and so had the bivariate generating function F (x, y) = P n k n,k fn,kx y /n!, then the right-hand side of each expression would be multiplied by n!. Note that in either case, we have the same formula for the moments.

[xn] d F (x, y) dy y=1 E[Xn] = n [x ] F (x, y) |y=1

n d d [x ] dy y dy F (x, y) 2 y=1 E[(Xn) ] = n [x ] F (x, y) |y=1 2 n  d  [x ] dy F (x, y) y=1 E[Xn(Xn − 1)] = n [x ] F (x, y) |y=1

Note that in some moral sense, there is a factor of n! top and bottom in each of the preceding expressions, since these are exponential generating functions. Of course they cancel, so it becomes an aesthetic choice to include it on not. Note that in order for the denominator to actually equal fn, we should include the n!.

80 cycles in permutations

Recall that for permutations we had the following.

X X nxn 1 F (x, y) = yk = k n! (1 − x)y n k

n We can compute fn = n![x ] F (x, 1) = n! but of course we knew that already. Next we compute the average.

n![xn] d F (x, y) dy E[X ] = y=1 n n![xn] F (x, 1) n n d n 1 1 n 1 1 X 1 = [x ] F (x, y) = [x ] log = n![x ] log = n! = hnn! dy (1 − x)y 1 − x 1 − x 1 − x k y=1 y=1 k=1 We included the n! here, but of course it just cancelled. Pn Recall that hn = k=1 1/k. So the average number of cycles in a permutation of [n] is hn. It can be shown (integral test) that hn ≈ log(n). More precisely, hn = log n + γ + o(1), where γ ≈ 0.5772 is Euler’s constant. 2 Next we can compute the E[Xn].

n![xn] d y d F (x, y) dy dy E[X2] = y=1 n n![xn] F (x, 1)

n d d = [x ] y F (x, y) dy dy y=1 ! 1 1 1  1 2 = [xn] log + y log (1 − x)y 1 − x (1 − x)y 1 − x y=1 ! 1 1 1  1 2 = [xn] log + log 1 − x 1 − x 1 − x 1 − x

 n n k−1  X 1 X X 1 = +  k j(k − j) k=1 k=1 j=1

 n n k−1  X 1 X 1 X 1 1  = + +  k k j k − j  k=1 k=1 j=1

 n n n n  X 1 X 1 X 1 X 1 = + −  k k j k2  k=1 k=1 j=1 k=1 2  = hn + (hn) + O(1)

So the variance is

2 2 2  2 Var[Xn] = E[(Xn) ] − (E[Xn]) = hn + (hn) + O(1) − (hn) = hn + O(1)

In principle we can apply this to any set of objects where we know F (x, y). In practice, the coefficient extraction can get a little messy. We will see later that Lagrange inversion will help!

81 single variable generating functions and moments

Similar ideas hold for single-variable GFs as well. Again let fn,k be the number of objects with size P k n n and k components and let Fn(y) = fn,ky . Notice that Fn(y) = [x ] F (x, y). Then we can Pk compute the number of objects as fn = k fn,k = Fn(1). So the average weight and variance would be as follows. Since Fn is in terms of y (the indeterminate that marked components), the derivatives 0 00 Fn and Fn are with respect to y. P 0 k kfn,k Fn(1) µ = P = (2) k fn,k Fn(1) P 2 P 2 P 2 00 0  0 2 2 k(k − µ) fn,k k k fn.k k kfn,k Fn (1) + Fn(y) Fn(1) σ = P = − = − (3) k fn,k Fn(1) Fn(1) Fn(1) Fn(1) In fact this is nothing new; the single-variable approach consists of observing that “[xn] ” commutes with “ d (·) ”: dy y=1

n d d n d 0 [x ] F (x, y) = [x ] F (x, y) = Fn(y) = Fn(1) dy y=1 dy y=1 dy y=1 n In effect we precompute Fn(y) = [x ] F (x, y) and work with that rather than with the bivariate version. Whether or not this is easier depends on whether we’d rather avoid having to extract coefficients from the bivariate generating function or avoid having to compute derivatives on the bivariate version.

cycles in permutations redux

n P n k There are k permutations on [n] with k cycles. So we have Fn(y) = k k y . We know from previous work that 1 y + n − 1 F (y) = n![xn] = n! = (y + n − 1)(y + n − 2) + ··· (y) n (1 − x)−y n

Note that Fn(1) = n!, which is what we expected. We can compute the derivatives as follows. Problem 12.1. Show that the derivatives of G are given by n y + n − 1 X 1 F 0 (y) = n! n n y + k − 1 k=1 n n n ! y + n − 1 X 1 X 1 X 1 F 00(y) = n! − n n y + k − 1 y + k − 1 (y + k − 1)2 k=1 k=1 k=1

0 00 2  This gives us Fn(1) = Fn(1)hn and Fn (1) = Fn(1) hn + O(1) . So the average number of cycles in a permutation is hn, and the variance is hn + O(1), as we found before.

exercises

1. Show that the expectation is a linear operator. That is, if Y and Z are random variables and a and b are constants then show that E [aY + bZ] = a E[Y ] + b E[Z]. 2. Using linearity, or otherwise, show that if we know any of the following three things, we can compute the others. • E[(Y )k] for every positive k • E[(Y − E[Y ])k] for every positive k

82 • E[(Y )(Y − 1) ··· (Y − k + 1)] for every positive k 2  d  3. Explain why in computing the second moment it would be “nicer” to compute y dy (·) d d rather than dy y dy (·). Explain why they both give the same result. 4. Show that if we compute the variance using the k-th factorial moments we get the same d d d d d thing. Explain how the fact that dy y dy (·) = y dy dy (·) + dy (·) corresponds to the equality of the two expressions of equation (1). 5. For the number of cycles in a permutation, we computed the variance based on the second expression of equation (1), to obtain h(n)+O(1). Now, compute E[Xn(Xn−1)] and determine the variance based on the first expression of equation (1). Verify that the final answer is the same. 6. Verify equation (2) and equation (3). In particular, show that the expressions involving the derivatives are correct. 7. We will prove a special case of the Not-Burnside Lemma, so named because it is both often and incorrectly attributed to Burnside. The case we will prove gives the average number of fixed points over all permutations of {1, 2, ··· , n}. a) Determine pn,k, the number of permutations on {1, 2, ··· , n} that fix exactly k points. Your answer should be in terms of dn−k, the number of derangements on n − k points. P P n k b) Find a closed form for n k pn,kx y /n!. (hint: expand as product of GFs) c) Find the average and standard deviation of the number of fixed points over all permu- tations on {1, 2, ··· , n}. P n n k x 8. Recall that the EGF for set partitions is n,k k x y /n! = exp(y(e − 1)). Determine the average number of parts in a partition of n. Your answer should be in terms of the Bell numbers, and should not include any summations.

83 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 13. inclusion-exclusion

counting objects with properties

Consider a set of Ω objects, and a set P of properties. Each object satisfies some of the properties. The idea is that it is easy to determine how many objects possess at least certain properties (and perhaps others) but we want to know how many satisfy exactly a certain number of properties. This chapter closely follows Wilf [2]. Consider some subset of properties R ⊂ P . We define the following.

eR = number of objects that have exactly the properties of R and no others

mR = number of objects that have at least the properties of R and perhaps others as well X er = eR |R|=r X mr = mR |R|=r

Problem 13.1. Show that er is the number of objects that have exactly r properties. Show that mr is not the number of objects that have at least r properties. Explain, in words, what mr counts.

X t Lemma 13.2. Given the definitions of m and e , we have m = e . r t r r t t≥0

X Proof. We note that mR = eT . This gives: T ⊇R X X X X X X X X t X t X t m = m = e = e = e = e = e r R T T r T r t r t |R|=r |R|=r T ⊇R t≥r |T |=t |R|=r t≥r |T |=t t≥r t≥0 R⊆T It is convenient to take the summation over t ≥ 0; in other words, we needn’t worry about the lower limit of summation since the binomial coefficient will automatically zero out the unwanted terms.

We want to invert this; that is, write the et in terms of the mr. We’ve seen this kind of thing before, so we turn it into a question about generating functions.

P r P t Theorem 13.3. Let M(x) = r mrx and E(x) = t etx . Then E(x) = M(x − 1).

Proof. Using Lemma 13.2 we see that X X t X X t X M(x) = e xr = e xr = e (x + 1)t = E(x + 1) r t t r t r t t r t So E(x) = M(x − 1).

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

84 Theorem 13.3 is what we might call inclusion-exclusion. Wilf [2] calls this “the sieve”. P r For instance e0 = E(0) = M(−1) = r(−1) mr. This is the number of objects satisfying exactly zero properties, in a form one might recognize from undergraduate textbooks. P P Notice also that |Ω| = m0 = t et and m1 = t tet. So the average number of properties that an object possesses is m1/m0. The proof of these assertions is left as an exercise.

X r Corollary 13.4. Given the definitions of m and e above, we have e = (−1)r−t m r t t t r r≥t

t t Proof. We have et = [x ] E(x) = [x ] M(x − 1). The details are left as an exercise.

fixed points

The objects are permutations on {1, 2, ··· , n}; thus there are n! of them. There are n properties: a permutation satisfies Pi if it fixes i (or if you prefer, Pi is the set of permutations that fix i). We can easily write down the number of permutations that fix a particular r-set: (n − r)!. Hence n n! mr = r (n − r)! = r! . So we have n n X n! X xr M(x) = xr = n! r! r! r=0 r=0 Then we have n X (x − 1)r E(x) = M(x − 1) = n! r! r=0

Notice that m1/m0 = 1, which shows that on average a permutation has one fixed point. The number of permutations with no fixed points is n X (−1)r  1 1 1 1  e = E(0) = M(−1) = n! = n! 1 − + − + ··· + (−1)n 0 r! 1 2 3! n! r=0 −1 So as n → ∞, we have dn/n! → e . The proportion of derangements among all permutations −1 −1 approaches e . Even stronger, for all n > n0 it follows that dn is the closest integer to n!e . This follows from the (proof of the) alternating series test. What is n0? We can find the number of permutations with t fixed points by expanding M(x − 1).

n n X (x − 1)r X X r(−1)r−k E(x) = M(x − 1) = n! = n! xk r! k r! r=0 r=0 k≥0 n X X r(−1)r−k = xkn! k r! k≥0 r=0 n n−k X n! X (−1)r−k X n! X (−1)j = xk = xk k! (r − k)! k! j! k≥0 r=k k≥0 j=0

Then extracting coefficients gives the et. n!  1 1 1 1  e = [xt] M(x − 1) = 1 − + − + ··· + (−1)(n−t) t t! 1 2 3! (n − t)!

85 Of course we could have applied Corollary 13.4 to get e0 and et directly, but this assumes we’ve done the exercise already!

Stirling set numbers

Let Ω be the set of all ways of arranging n > 0 labelled balls in k > 0 labelled bins. The properties k will be {Pi}i=1 where Pi is the property of the i-th bin being empty. Thus for instance ek = 0, since no such arrangement leaves all bins empty, and ek−1 = k, since there are k ways to leave all but one box empty. More generally, if T is a particular set of t boxes and R is a particular set of r boxes then we have the following.  n  e = (k − |T |)! m = (k − |R|)n T k − |T | R k  n  k! n  k e = (k − t)! = , m = (k − r)n. t t k − t t! k − t r r Theorem 13.3 tells us that M(x) and E(x) are related by E(x) = M(x − 1). X k! n  X k xt = (k − r)n(x − 1)r t! k − t r t≥0 r≥0

0 Since e0 = [x ] E(x) = E(0) = M(−1) we immediately get a formula for the Stirling numbers. We could have applied Corollary 13.4 also (although the t = 0 case of Corollary 13.4 is the easiest to prove). n 1 X k = (k − r)n(−1)r k k! r r≥0 We saw this previously — or did we? It looks a little different this time.

More generally, we can create a pair of bivariate GFs from the et and the mr. If the first line looks odd, recall that et and mr are really functions of n and k, we just haven’t been writing that each time in order to keep the notation simple.

X X yk X X yk e xt = m xr t k! r k! k≥0 t≥0 k≥0 r≥0 X X k! n  yk X X k yk xt = (k − r)n(x − 1)r t! k − t k! r k! k≥0 t≥0 k≥0 r≥0 k ! X xt X  n  X X k yk yk = (k − r)n(x − 1)r t! k − t r k! t≥0 k≥0 k≥0 r=0     X xt X  n  X yk X yk yk = kn (x − 1)k t! k − t  k!   k!  t≥0 k≥0 k≥0 k≥0 In the last line we noticed that the EGF in y was the product of two EGFs according to the multiplication rule for EGFs. Now extract [x0] to get

X n X knyk yk = e−y k k! k≥0 k≥0

86 The left-hand side is an OGF for Stirling set numbers; in particular it is a polynomial. Substituting y = 1 into the left-hand side gives the Bell number bn, which we saw previously. X n 1 X kn b = = n k e k! k≥0 k≥0

exercises P P 1. a) Show that |Ω| = m0 = t≥0 et and m1 = t≥0 tet. Conclude that m1/m0 is the average number of properties that a (randomly chosen) object possesses. b) What is m2 in terms of the et? Can you use it to compute the variance of the number of properties that (randomly chosen) object possesses? t t 2. Fix a particular t ≥ 0. Observe that et = [x ] E(x) = [x ] M(x−1). Use this to give a formula for et in terms of various mr. In other words, prove Corollary 13.4; notice that it is easier to prove the case t = 0 then t > 0. r r 3. Fix a particular r ≥ 0. Observe that mr = [x ] M(x) = [x ] E(x + 1). Use this to give a formula for mr in terms of various et. (hint: this should look familiar.) 4. Combine the two previous exercises (or Lemma 13.2 and Corollary 13.4 if you prefer) to get a result of the form:

et = “some function of various mr” ⇐⇒ mr = “some other function of various et” Compare this with similar results (Exercises 4.13,4.14 and the proof of Lemma 10.3). 5. We (re-)derived a formula for the Stirling set numbers above by defining k  n  k! n  e = (k − t)! = t t k − t t! k − t n and using the fact that e0 = k! k . Give a formula for et in terms of the mr. Explain, in words, what et means in this case, as a function of t. What is special about e0? ii 6. Let B be the (infinite) matrix with Bij = (−1) j , where the rows and columns are indexed 2 by N (which of course includes 0). Using Theorem 13.3, we will show that B = I, making the binomial numbers “self-inverse”. Notice that any particular entry of this product is calculated as a finite sum. a) Consider a (very boring) family of objects and properties where there is exactly one object, and it has i properties. Give directly E(x). b) Using this expression for E, expand M(x) = E(x + 1) as a sum in terms of powers of x (do not simplify, just give it as a sum). c) Using this expression for M, expand E(x) = M(x − 1) as a double sum in terms of powers of x (do not simplify, just give it as a double sum). P d) By equating your two expressions for E(x), show that t BitBtj is 1 for i = j and 0 otherwise. Conclude that B2 = I. 7. Reinterpret Exercise 13.6 as follows. Observe that E(x) = M(x − 1) = E((x − 1) + 1) Start with the same boring family of objects, so E(x) = xi. Without simplifying (x − 1) + 1, expand E((x − 1) + 1) in terms of powers of x and equate with E(x) = xi. ji 8. Let Cij = (−1) j , with the rows indexed by N. a) Apply the same strategy of Exercise 13.6 to show that C2 = I. b) By observing that C = BT (from Exercise 13.6) show that C2 = I follows as a quick corollary of Exercise 13.6.

87 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 14. using generating functions to prove identities

snake oil

We will use GFs to simplify sums. The idea is that we will regard our sum as a function of a parameter, say n, write down the GF (ordinary or exponential) associated with the sequence {f(n)} that we wish to sum, and then exchange the order of summation. At this point, we hope (!) that the inner sum will be easy, we will be able to obtain the GF, and read off the f(n) as coefficients. If that sounds either too far fetched or too good to be true, then you’ve understood why Wilf [2] calls this method “snake oil”. Indeed, this chapter closely follows Wilf [2]. P More specifically, the idea is the following. If we want to evaluate a sum k f(n, k) then we do so by considering it as a coefficient of a generating function. X X X X X f(n, k) = [xn] f(n, k)xn = [xn] f(n, k)xn k n k k n Then we hope that some part of f(n, k) can be factored out of the inner sum, making the inner sum into a simple generating function that we know a closed form for. This then becomes the summand of the outer sum, which hopefully will again be something we recognize. This explains why we want n to occur infrequently in f(n, k): so that we can factor something out. It will be helpful for the moment to remember some GFs that we already know (but in fact any GF we can recognize could be useful). X a X a xb xb = (1 + x)a xa = b b (1 − x)b+1 b a X k + 1 Example 14.1. Evaluate yk, where y is a fixed real number. n − k k≥0

It seems the only candidate for a parameter is n.   X X k + 1 X X k + 1 yk xn = yk xn  n − k  n − k n≥0 k≥0 k≥0 n≥0 X X k + 1 = ykxk xn−k n − k k≥0 n≥0 X X k + 1 = ykxk xr r k≥0 r≥0 X = ykxk(1 + x)k+1 k≥0 X 1 + x = (1 + x) ykxk(1 + x)k = 1 − yx(1 + x) k≥0 For y = 1, we can evaluate this using partial fractions to obtain a sum of powers of the roots of the denominator. In fact, we already saw this sum as an expression for the number of 01-sequences

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

88 with no adjacent 0’s, added up according to the number of 1’s. In fact for general y the procedure is exactly the same.

Problem 14.2. Finish the previous example. That is, using partial fractions, extract coefficients in the previous to get a closed form for the original sum. X  3k  Example 14.3. Evaluate yk, where y is a fixed real number. n − 2k − 1 k≥0

Although it appears messier, the method is identical. This time we get   X X  3k  x yk xn = ··· =  n − 2k − 1  1 − yx2(1 + x)3 n≥0 k≥0 The details are left as an exercise. Now we want [xn] in this expression. But this is an exercise in partial fractions, since y is a constant. Note that this time the partial fractions are somehow less fun, since the denominator is a polynomial of degree 5 in x. You should convince yourself that in principle you could give an expression for the original sum in terms of the roots of this polynomial, and an approximation in terms of approximations to the roots.

2 X n Example 14.4. Evaluate . k k

This can be done by direct counting. Notice that it is easier to use Snake Oil on a more general identity. The key is that when we exchange the order of summation, we would like to factor something out, which we can only do if the parameter appears in “few” places, ideally one. One way to do this is to consider the more general identity X n n  , k r − k k and then substitute r = n into the solution. First we solve, with r as the parameter. ! X X n n  X n X  n  xr = xr k r − k k r − k r k k r X n X  n  = xk xr−k k r − k k r X n = xk(1 + x)n k k = (1 + x)n(1 + x)n = (1 + x)2n The solution is then X n m  2n = [xr] (1 + x)2n = k r − k r k Substituting r = n gives our special case.

2 X n X n m  Problem 14.5. Evaluate by considering it as the m = n, r = m special case of , k k r − k k k which you evaluate by snake oil with parameter n. Can this be done with r as the parameter?

89 X n m  m + n Problem 14.6. Show by a direct counting argument that = . k r − k r k

These next two from Wilf can’t be done directly; see his book for more examples.

X mn + k X mn Example 14.7. Show that = 2k (taken from [2]). k m k k k k

We will show that by considering each side as a function of n, the GF of the sequences so obtained are identical. Then it will follow that the claimed equality holds for all n.

X X mn + k X m X n + k LHS: xn = xn k m k m n k k n X m X n + k = x−k xn+k k m k n X m xm = x−k k (1 − x)m+1 k m m m x (1 + x) = 1 + x−1 = (1 − x)m+1 (1 − x)m+1

X X mn X m X n RHS: 2kxn = 2k xn k k k k n k k n X m xk = 2k k (1 − x)k+1 k 1  2x m (1 + x)m = 1 + = 1 − x 1 − x (1 − x)m+1

X n2k Example 14.8. Evaluate yk (taken from [2]). k k k

Following the usual procedure, we get X X n2k X 2k X n ykxn = yk xn k k k k n k k n X 2k xk = yk k (1 − x)k+1 k k 1 X 2k  xy  = 1 − x k 1 − x k 1 1 1 = r = p 1 − x  xy  (1 − x)(1 − x(1 + 4y)) 1 − 4 1−x

X 2k 1 We used the fact that zk = √ . k 1 − 4z k In general, we can’t easily extract coefficients for this. But for certain values of y we can.

90 1 1 X 2n 1 If y = −1/4, then we have √ = = xn, which gives for our sum p n 4n 1 − x 1 − 4(x/4) n k X n2k  1 1 2n − = [xn] √ = 4−n k k 4 1 − x n k 1 1 X 2n 1 If y = −1/2, then we have √ = = x2n, which gives for our sum 2 p 2 n 4n 1 − x 1 − 4(x /4) n k ( n  −n X n2k  1 1 2 n even − = [xn] √ = n/2 k k 2 2 0 n odd k 1 − x Lastly, if y = 0 we get (1 − x)−1. But actually this doesn’t tell us much. . .

exercises n X t + k − 1 k  1. Evaluate (−1)n−t using snake oil. Looking at your proof, you might k − 1 n − t t=0 find another point of view that gives a different proof. n X j 2. Evaluate using snake oil, with parameter n. Careful when interchanging the order t j=0 of summation. Afterwards, give a short proof by induction. X it 3. Redo Exercise 13.6 using snake oil. That is, evaluate (−1)i+t using snake oil. t j t≥0 4. Here is the “Stirling” version of Exercise 13.6. Let M and N be infinite matrices with ii ii Mij = (−1) j and Nij = (−1) j . We will show that MN = I making the two types of Stirling numbers “inverse”. Notice that any particular entry of this product is calculated as a finite sum. X i a) Show that y(y − 1) ··· (y − j + 1) = yi using snake oil. j j≥0 P b) Show that t MitNtj is 1 if i = j and 0 otherwise. Conclude that MN = I. (hint: you should find the previous part useful) X k + 4 2m + 1  5. Consider (−1)k . m n − k − 1 k≥0 a) Use snake oil to evaluate this when m ≥ 4. Where do you use m ≥ 4? b) Now modify your proof to allow m ≥ 0. Your answer will be a little messier, but not much. Where do you use m ≥ 0?

91 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 15. cycle index of the

cycle types and cycle index

Consider the question “what is the probability that a permutation has no fixed points?”. In one sense, we have answered this already, but there are obvious generalizations: what about the probability that a permutation has one fixed point, or the probability that a permutation has no 3-cycles, or no odd cycles? In another sense, we might wish to change our point of view slightly: the probability that a permutation on n = 10 points has three 6-cycles is zero, but this is because n is too small. We need to clarify the types of questions we are dealing with. We will say that the probability that a permutation has property P is the limit as n tends to infinity of the probability that a permutation on n symbols has property P . If this limit does not exist, then there is no such probability for all permutations. So for instance, the probability that a permutation consists of exactly 17 3-cycles and no other cycles is zero, since for all n 6= 51 the number of such permutations is zero. But the probability that a permutation has exactly 17 3-cycles and perhaps some cycles of other lengths is not obvious.

We use the notation k for a vector of constants (k1, k2, k3,...). The size of such a vector will be P kkk = j jkj. For k with kkk < ∞, ck will denote the number of permutations with exactly kj cycles of length j, for each j. These permutations are said to have cycle-type k; necessarily, they P act on exactly n = kkk = j jkj points.

k k1 k2 k3 Let y1, y2, y3 ... be indeterminates; if kkk < ∞, then by y we mean y1 y2 y3 ··· ; this monomial is in fact a finite product. Define the generating functions X X xn φ (y) = c yk C(x, y) = φ (y) n k n n! kkk=n n 3 1 4 So for instance the coefficient of y1y2y3 in φ17 is the number of permutations of [17] with exactly 3 fixed points, 1 2-cycle and 4 3-cycles. Its coefficient in φn is zero for n 6= 17 (why?). The coefficient 17 3 1 4 n 3 1 4 of x y1y2y3 in C(x, y) and the coefficient of x y1y2y3 in C(x, y) are, respectively, the same as the first and second examples in this paragraph.

finding C(x, y)

We start by finding an exact formula for ck using a direct approach. This section follows Wilf [2]; in the next section we’ll see how the symbolic method accomplishes the same thing.

Lemma 15.1. The number of ways of choosing jk symbols from a set of m symbols, and m! arranging them into k cycles of length j is (m − jk)! jk k!

m Proof. By direct counting there are jk (jk)! ordered sequences. If we take these j at a time we get the required cycles, overcounted by a factor of jkk!.

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

92 Corollary 15.2. If kkk = n, then

m! (m − k1)! n! ck = k × k × · · · = k k k (m − 1k1)! 1 1 k1! (m − 1k1 − 2k2)! 2 2 k2! 1 1 2 2 3 3 ··· k1! k2! k3! ···

Now we have the generating function C(x, y).

  j X yjx Theorem 15.3. C(x, y) = exp  j  j≥1

Proof. We just plug in ck from Corollary 15.2 X xn X xn X C(x, y) = φ (y) = c yk n n! n! k n n kkk=n k 2 k 3 k X (xy1) 1 X (x y2) 2 X (x y3) 3 = k k k ··· 1 1 k1! 2 2 k2! 3 3 k3! k1≥0 k2≥0 k3≥0 xy  x2y  x3y  = exp 1 exp 2 exp 3 ··· 1 2 3   j X x yj = exp  j  j≥1

The above looks wrong. In the second line and beyond, there are monomials with an infinite number of terms (in fact, almost all of them). However, it is true that if kkk = n < ∞, then   j X x yj [xnyk] C(x, y) = [xnyk] exp ,  j  j≥1 and this is all that we need. In fact this is exactly what it means for two formal power series to be equal: that the coefficients of any fixed (finite!) monomial are equal. So in fact, it’s not wrong at all. In computing the infinite product of exponentials we only ever take finitely many terms with indeterminates in them. This is something we’ve seen before. . . It’s worth comparing this to what we did for counting permutations by cycles. We had previously derived the following. X X nxnyk  1  1 = exp y log = k n! 1 − x (1 − x)y n≥0 k≥0 Here y counts the number of cycles, irrespective of length. This was our first discovery of the “exponential formula”, namely the generating function that corresponds to Set(C) (in this particular example to Set(Cyc({•})). The final simplification was a “lucky coincidence”.

What we are doing now for the cycle index is assigning a separate indeterminate yj for each length of cycle. Or, more generally, instead of counting the number of components, we are separately counting the number of components of every size.

Problem 15.4. In our final expression for C(x, y), substitute y for every yj. So if you were to look 3 2 3 2 6 at the expansion then terms like y2y7y9 would become y y y = y .

93 What do you get? What should you get?

alternative: using Set to find C(x, y)

For an alternative viewpoint, consider a set of (connected) objects C, and let Cr be the set of objects in C of size r. We will consider C to be the “connected components”, and hence Cr is the set of connected components of size r. Let’s now mark each component by its size. So we consider the following. [˙ {r} × Cr r≥1 This is basically the set C, except each component is tagged by a marker that records its size. Recall how in Chapter 9 we used {} × C in order to enumerate objects by size and number of components. S˙ Now we are using r≥1 {r} × Cr in order to enumerate objects by size and number of components of each size. S˙ We can apply the Set constructor to r≥1 {r} × Cr as easily as to anything else. The objects themselves are labelled, so the “x” is an “exponential generating function”. To be clear: xr C ←→ C (x) = c r r r r! xr { } × C ←→ y C (x) = c y r r r r r r! r [˙ X X xr { } × C ←→ y C (x) = c y r r r r r r! r r≥1 r≥1 r≥1     [˙ X X xr Set( { } × C ) ←→ exp y C (x) = exp c y r r  r r   r r! r r≥1 r≥1 r≥1

If we set cr to be the number of labelled permutation-cycles of length r then we should obtain C(x, y) as above. Problem 15.5. Verify that if we let C be the set of permutation-cycles then this construction gives the cycle-index C(x, y) as above. S˙ Problem 15.6. Verify that if we consider the generating function for Set( r≥1 {r} × Cr) and set each yr = y then we get the generating function for Set(C). This shows that the what we have really done is a multivariate version of Set, where the different variables mark the number of components of each size. On the other hand, it also shows that this generalization of Set can also be seen as a [˙ special case of Set(C), with C = {r} × Cr. r≥1

probabilities

n k n k Notice that since ck = n![x y ] C(x, y), then [x y ] C(x, y) is exactly the probability that a permu- tation on {1, 2, ··· , n} has cycle-type k. If we want to compute the number of permutations on n = 17 with exactly one 3 cycle, we are 17 1 looking for monomials containing x y3 in C(x, y). However, this is not the same thing as asking 17 1 for [x y3] C(x, y), which is of course zero (right?). What we want is to look for the sum of all 17 1 coefficients of all monomials of the form x y3 (··· ) where the (··· ) represents any powers of any other indeterminates yj with j 6= 3. We can do this by first substituting yj = 1 for all j 6= 3 into 17 1 C(x, y), and then extracting [x y3].

94 In fact we’ve seen this already. If we take the bivariate generating function for Stirling cycle numbers and “don’t care” about how many cycles a permutation contains, then we get the generating function for permutations.   X X nxn  1  1 X X n xn 1 yk = exp y log = y = 1 = k n! 1 − x (1 − x)y −−−−−−−→  k  n! 1 − x n≥0 k≥0 n≥0 k≥0 Of course this example told us very little: we already knew the number of permutations and the generating function for them. So the following “result” isn’t really news. X n 1 = [xn/n!] = n! k 1 − x k≥0

A slightly more meaningful example is the Stirling set numbers, where this process gives us a gener- ating function for the Bell numbers.   X X nxn X X n xn yk = exp (y(ex − 1)) y = 1 = exp (ex − 1) k n! −−−−−−−→  k  n! n≥0 k≥0 n≥0 k≥0 Of course one can still argue that we could get this directly without the bivariate generating function. But it illustrates the correspondence between summing over all values of whatever y counts, and substituting y = 1. Back to cycle-types. Let’s phrase our permutation questions generically. Let S ⊆ N \{0} be a set of cycle lengths whose multiplicity we wish to specify, and assume we don’t care about the other lengths. Let k be a vector such that kj is the desired number of cycles of length j, for each j ∈ S. We allow the desired number of cycles of length j to be zero, so kj ∈ N for j ∈ S. We set kj = 0 if j∈ / S (although it might be better to think of these as “blank”). Now we ask the question “what is the probability that a permutation has the property that it has n k exactly kj j-cycles for each j ∈ S?” The answer is [x y ] CS(x, y), where CS(x, y) is obtained by setting yj = 1 for all j∈ / S.

  X xj X xj C (x, y) = exp y + (1) S  j j j  j∈S j∈ /S   j j j X x yj X x X x = exp − +  j j j  j∈S j∈S j≥1   j j j  X x yj X x 1 1 Y exp x yj/j = exp − + log =  j j 1 − x 1 − x exp (xj/j) j∈S j∈S j∈S

See Exercise 15.14 for an alternative perspective on CS(x, y), based on the Set construction. n k k Now we want limn→∞[x ][y ] CS(x, y). To start let’s extract [y ] CS(x, y). j  k j  1 Y exp x yj/j 1 Y [yj j ] exp x yj/j [yk] = 1 − x exp (xj/j) 1 − x exp (xj/j) j∈S j∈S j kj 1 Y x /j /kj! = 1 − x exp (xj/j) j∈S 1 = P (x) 1 − x

95 n 1 n We want [x ] 1−x P (x). We could (conceptually) get this by truncating P (x) at x ; that is dropping all higher power terms, and then evaluating at x = 1. This is clearly well defined. To get the limit, we truncate at higher and higher powers, which is surprisingly tractable.

n k n 1 lim [x ][y ] CS(x, y) = lim [x ] P (x) n→∞ n→∞ 1 − x = P (1) Y 1 = e1/j jkj k ! j∈S j

It’s perhaps worth clarifying what is happening with the limit in the above.

X X j X Lemma 15.7. For a (convergent) series aj define A(x) = ajx . Then aj = A(1). j≥0 j≥0 j≥0

Pn Proof. Define the partial sums sn = j=0 aj. In terms of generating functions we have the following characterization of sn. X 1 X 1 s xn = a xj = A(x) n 1 − x j 1 − x n≥0 j≥0 This gives the partial sums in terms of the generating function. n n 1 X X s = [xn] A(x) = [xj] A(x) = a n 1 − x j j=0 j=0 Then we get the series.

X n 1 X j an = lim sn = lim [x ] A(x) = [x ] A(x) = A(1) n→∞ n→∞ 1 − x n≥0 j≥0

We have proved the following.

Theorem 15.8. If S ⊆ N \{0} and kj ∈ N for each j ∈ S, then the probability that a random Y 1 permutation has kj j-cycles for each j ∈ S is . e1/j jkj k ! j∈S j

Example 15.9. Find the probability that a permutation has no fixed points.

The probability that a permutation has no fixed points corresponds to S = {1} and k1 = 0. So the probability is 1 1 = e1/1100! e as we saw before. Interestingly enough, the probability of having exactly one fixed point is 1 1 = e1/1111! e We saw both of these results in Chapter 13.

Example 15.10. Find the probability that a permutation has no cycles of length a perfect square.

96 Consider the probability that a permutation contains no cycles of perfect-square length. Thus S = {1, 4, 9, 16,...} and kj = 0 for all j. The probability is then ! Y 1 Y 1 X 1 2 = = exp − = e−π /6 e1/jjkj k ! e1/k2 (k2)00! k2 j∈S j k k

It is perhaps surprising in the previous example to find a value strictly between zero and one. The squares are just sparse enough that we aren’t sure to have a cycle of square length, but dense enough that we aren’t sure to avoid them altogether. On the other hand, the fact that we are specifying zero cycles of those lengths is also significant.

Example 15.11. Find the probability that a permutation has exactly one cycle of each perfect-square length.

Consider the probability that a permutation contains exactly one cycle of each perfect-square length. This would be Y 1 Y 1 Y 1 = < e1/jjkj k ! e1/k2 (k2)11! k2 j∈S j k k As this last is less than  for any  > 0, the probability must be zero. In fact a slight generalization of this argument shows that k can only be nonzero on finitely many elements of S (though S need not be finite). Of course you knew this already, right?

Example 15.12. Find the probability that a permutation has all cycles even.

Consider the probability that a permutation has no even cycles. Thus S = {2, 4, 6 ...} and kj = 0 for all j. The probability is ! Y 1 Y 1 Y 1 X 1 = < = exp − e1/jjkj k ! e1/(2k)(2k) e1/(2k) 2k j∈S j k≥1 k≥1 k As the sum inside the final expression is unbounded, the exponential is arbitrarily close to zero and so the probability is zero. We stated this before without proof, as an “obvious fact” in Chapter 8 (see also Exercise 8.10).

Problem 15.13. Find the probability that a permutation has all cycles odd. Use the methods of this chapter, not what we learned in Chapter 8.

ichthyology

If this seems fishy to you, then consider the Poisson distribution. Recall that the average number of fixed points of a permutation is 1 (exactly, for any n). Were they Poisson distributed with mean µ, then the probability of having k of them would be

µk e−µ k! In fact Theorem 15.8 states that the number of cycles of length s is Poisson with mean 1/s, and that if we consider a subset S of cycle-lengths then they are, as a set, asymptotically independent.

97 square roots

As an application of the cycle index, we consider the probability that a permutation has a square root. That is, given a permutation σ, what is the probability that there exists another permutation τ such that τ 2 = σ. Necessarily σ and τ both act on the same number of points.

Lemma 15.14. Odd cycles always have square roots.

Proof. This is perhaps best seen pictorially: if the elements of a 2m + 1-cycle σ are arranged in a circle, then σ corresponds to shifting clockwise by one. This is the same as twice shifting clockwise by m + 1. Here is an odd cycle expressed as a square of another permutation (which is itself a cycle of the same length).  2 k1 k2 ··· k2m+1 = k1 km+2 k2 km+3 ··· km+1 Of course all subscripts are to be taken module 2m + 1, so adding m + 1 to the index is done modulo 2m + 1.

Lemma 15.15. Even cycles never have square roots.

 Proof. Again a picture helps. Let σ = k1 k2 ··· k2m be an even cycle and let τ be a 2 permutation with τ = σ. Define j such that τ(k1) = kj. Applying σ to this equation gives the following.

σ(τ(k1)) = σ(kj) 3 2 τ (k1) = τ (kj)

τ(σ(k1)) = σ(kj)

τ(k2) = kj+1

Applying σ a total of j −1 times we find that τ(kj) = k2j−1. But we know that τ(kj) = τ(τ(k1)) = σ(k1) = k2, and so 2j − 1 ≡ 2 mod 2m, which is clearly false.

Lemma 15.16. The product of two disjoint even cycles of the same length always has a square root.

Proof. By direct construction:    k1 k2 ··· k2m b1 b2 ··· b2m = k1 b2 k2 b2 ··· k2m b2m

Theorem 15.17. A permutation has a square root if and only if the number of cycles of each even length is even.

This follows from the previous lemmas. Strictly speaking, it follows from the proofs of the lemmas. For instance, the statements of the lemmas are silent on why a product of three 4-cycles has no square root. The details are left to the reader! (Exercise 15.12)

98 As a short digression we mention the following related result.

Theorem 15.18. If α and β are two permutations each consisting disjoint transpositions, then there product contains an even number of cycles of each length. Furthermore, if a permutation on 2n symbols has the property that it has an even number of cycles of each length, then it is the product of two permutations each consisting of disjoint transpositions.

This result is interesting because of its history. It first appeared as one of the steps that Marian Rejewski and colleagues used to break the ENIGMA cipher.1 Knowing Theorem 15.17 we can use the cycle index generating function to answer the question. Recall that we have the following expression.   j    2   3   4  X x yj y1t x y2 x y3 y4x C(x, y) = exp = exp exp exp exp ···  j  1 2 3 4 j≥1

Previously, we set yj = 1 for all the values j that we didn’t care about, and then extracted the coefficient of yk. Instead of extracting a coefficient, we could extract every monomial that contains the right powers of the yj that we care about, and then set yj = 1 for the other yj. Another way to do that is to remove every monomial that we don’t want and then set every yj = 1. This is of course just a roundabout way of doing things, but on occasion it’s easier to extract what we want by throwing away what we don’t want. Back to our square root extraction. For each n, we want to add up every coefficient containing n exactly x and any combination of y2i+1 to any power and any combination of y2i to any even power (of course the weighted sum of the powers of the yj will automatically add to n.)

We can do this by first removing all the odd powers of each y2i, and then setting all yj = 1. Removing the odd powers involves a multi-section formula on each second term. xy  x2y  x3y  y x4  R(x, y) = exp 1 cosh 2 exp 3 cosh 4 ··· 1 2 3 4

Setting each yj = 1 gives x x2  x3  x4  R(x) = exp cosh exp cosh ··· 1 2 3 4 Y  x2m+1  Y x2m  = exp cosh 2m + 1 2m m m X x2m+1 Y x2m  = exp cosh 2m + 1 2m m m 1  1 1  Y x2m  = exp log − log cosh 2 1 − x 1 + x 2m m r 1 + x Y x2m  = cosh 1 − x 2m m 1 p Y x2m  = 1 − x2 cosh 1 − x 2m m See Exercise 15.15 for an alternative perspective on R(x, y) and R(x).

1 Marian Rejewski, “An Application of the Theory of Permutations in Breaking the Enigma Cipher”, Applicationes Mathematicae. 16 No. 4, Warsaw (1980). https://www.impan.pl/Great/Rejewski/article.html. Fascinating reading; see the concluding remarks.

99 n So the probability that a permutation on [n] has a square root is p(n) = [x ] R(x) = rn/n!. One can compute these values for small n by hand quite readily, or for somewhat less small n by computer. 1 3 12 60 R(x) = 1 + 1t + x2 + x3 + x4 + x5 ··· 2! 3! 4! 5! Notice that p(2n + 1) = p(2n). This can be seen clearly from the last expression given for R(x), where it is seen to be 1/(1−x) times of a function of x2; recall that multiplying by 1/(1−x) converts a generating function into the generating function of its partial sums. Problem 15.19. Write out a proof that p(2n) = [x2n] R(x) and p(2n + 1) = [x2n+1] R(x) are exactly equal for any fixed n.

exercises

1. Find the probability that a permutation has exactly. . . a) two 3-cycles b) zero fixed points and two 3-cycles and four 2-cycles c) three fixed points and zero 2-cycles and four 3-cycles and zero 4-cycles 2. a) Find a (simple) expression for the probability that every cycle of of a permutation has length greater than r, for some fixed r ≥ 1. b) Show that as r → ∞ your expression goes to zero; of course the result must be true, but you are asked to show this from your expression. 3. What is the probability that a permutation has no cycle of length a power of 2? A power of s, for some s ≥ 2? 4. What is the probability that a permutation has no cycle of prime length? 5. a) Find the probability that a permutation has no m-cycles. b) Find the limit as m → ∞ of the probability that a permutation has no m-cycles. c) Find the probability that a permutation has no cycles of length a multiple of m. t 6. Let pr be the probability that a permutation has no cycles of length r for any positive integer t. −1 a) Explain why we already know that p1 = e . b) Find a simple exact formula for pr for r > 1. c) Explain why your formula for pr when r > 1 is not valid for r = 1. d) Find lim pr. r→∞

7. Assume that S ⊆ N \{0}, and for infinitely many j ∈ S we have kj > 0. Prove that the probability that a permutation has kj j-cycles for each j ∈ S is zero. Does this make intuitive sense? If S is infinite but kj > 0 for only finitely many j ∈ S (so S might be infinite but then all but finitely many kj = 0), does it still follow that the probability is zero? 0 0 0 8. Assume that S ⊆ S ⊆ N \{0}, and let kj ∈ N for each j ∈ S and kj ∈ N for each j ∈ S, 0 with kj = kj whenever j ∈ S. Show that the probability that a permutation has kj j-cycles 0 for every j ∈ S is at least as great as the probability that a permutation has kj j-cycles for every j ∈ S0. When does equality hold? 0 9. Assume that S ⊆ N \ 0 and let kj ≤ kj for every j ∈ S. Show that the probability that a permutation has kj j-cycles for every j ∈ S is at least as great as the probability that a 0 permutation has kj j-cycles for every j ∈ S. When does equality hold?

10. Let S = S1 ∪˙ S2 ⊆, where S ⊆ N \{0}. Let pi be the probability that a permutation has kj j-cycles for every j ∈ Si, and p be the probability that a permutation has kj j-cycles for every j ∈ S. Show that p1p2 = p.

100 1 11. Let p(t, j) = be the probability that a permutation contains t j-cycles. e1/jjtt! P a) Find the value of t≥0 p(t, j). P b) The expected number of j-cycles is t≥0 tp(t, j). Find this value. P c) The expected number of points that are contained in j cycles is t≥0 jtp(t, j). Find this value. If you already know these, you should still derive them from the p(t, j). All values are simple. 12. Prove Theorem 15.17 as follows. Let τ be a permutation on [n] and compute σ = τ 2. What is the structure of σ? 13. Assume that we want to know lim [xn] A(x). Explain what goes wrong in the following. n→∞ 1     lim [xn] A(x) = lim [xn] (1 − x)A(x) = (1 − x)A(x) = 0 n→∞ n→∞ 1 − x x=1

14. Let C be some set of objects (to be thought of as “components”) and Cr be the set of objects of C of size r. Let S ⊆ N \{0}. The following can be thought of as the set C with only components of size j ∈ S “marked”.     [˙ [˙  {j} × Cj ∪˙  Cj j∈S j∈ /S

Show that this gives CS(x, y) in the following sense. (This is slightly more general than what we had above: there C was the set of permutation-cycles, here C is some “arbitrary” notion of component.)     [˙ [˙ Set  {j} × Cj ∪˙  Cj ←→ CS(x, y) j∈S j∈ /S

15. Let C be some set of objects (to be thought of as “components”) and Cr be the set of objects of C of size r. a) Show that Set is distributive in the following sense. Note that this is an equality of sets, not of generating functions. The symbol × means take the cartesian product (like P-notation but cartesian products of sets).   [˙ Set  Cj = ×Set(Cj) j≥1 j≥1 b) Suppose we want to consider objects of Set(C) that have the additional property that they have an even number of elements from Cj whenever j is even. Show that this is the same as asking for the following (again you are being asked to show an equality of sets). ! ×Set(C2j+1) × ×Seteven(C2j) j≥1 j≥1 c) Give an alternative derivation for R(x, y) and R(x), the generating functions for the number of permutations with square roots.

16. Define gk to be the number of 2-regular labelled graphs that have kj j-cycles for each j ∈ N (graph-cycles, not permutation-cycles). Note that we always have k0 = k1 = k2 = 0 since we X X xn are dealing with simple graphs. Let G(x, y) = g . k n! n≥3 kkk=n a) Find an expression for G(x, y) analogous to Theorem 15.3. b) Find an expression for the generating function for 2-regular graphs with all cycles of size at least r; all cycles of even length; other variations you might think of.

101 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 16. Lagrange inverse function theorem

analytic functions: five minute review

A complex function is said to be holomorphic at z0 if it is complex differentiable in some open neighbourhood containing z0. Note that this condition is stronger than real differentiability since it says that f(z) − f(z ) lim 0 z→z0 z − z0 exists and is independent of the path z → z0 in C. If we write z = x + iy, then f is complex- differentiable at z0 if and only if

∂ ∂ f(z) = −i f(z) ∂x ∂y z=z0 z=z0 This is the Cauchy-Riemann equation. Equivalently, it says that f(z) is conformal.

A complex function is said to be analytic at z0 if it can be expressed as a power series about z0 in some open neighbourhood of z0 as X n f(z) = an(z − z0) n≥0 For complex functions, being analytic and being holomorphic are the same thing. In particular, it follows that f(z) is infinitely differentiable and that it is represented by its Taylor series. For real functions, being analytic and being differentiable are not the same thing, primarily because functions can be differentiable for all paths z → z0 in R while failing at many of the paths in C. Notice that a real function can be thought of as a complex function (if we extend the domain); real functions that are (real) differentiable but not analytic are not (complex) differentiable when we extend the domain. In a nutshell, this is why real analysis is technical and hard and ugly, while complex analysis is elegant and simple and beautiful.1 Example 16.1. The following functions are analytic at all points in the complex plane where they are defined: polynomials, rational functions, sin(z), cos(z), log(z); the sum, difference, product and quotient of analytic functions, compositions. Problem 16.2. Verify, using the Cauchy-Riemann equation that any polynomial function of z is ∂ ∂ ∂ analytic at every point of C. Using the chain rule will help, as in ∂x f = ∂z f ∂x z where z = x + iy. Verify that f(z) = z is not analytic at any point of C.

Lagrange inverse function theorem

We consider the case where we have a formal power series u(z) that we wish to know the coefficients of, where it satisfies some functional equation of the form u = zφ(u). It is not hard to see that if φ(u) is a formal power series (in u), then this equation does in fact uniquely determine define u(z). In other words, u = u(z) is implicitly defined by the equation u = zφ(u) even if we have no explicit

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected]. 1 This is a completely objective and unbiased fact. ,

102 form for u(z). We have used this fact, and even extracted coefficients by hand from equations such as these (for instance trees).

Problem 16.3. Let φ(u) be a formal power series in u, and u be a formal power series in z. Assume n that u(z) = zφ(u(z)). Show that every coefficient un = [z ] u(z) is uniquely determined.

We can certainly compute coefficients based on the previous problem. We can extract a “formula” of sorts for the un. But in fact the process can be automated, and even generalized.

Theorem 16.4 (LIFT). Let u(z) be a formal power series. Let φ(z) be a formal power series with φ(0) 6= 0, and f(u) be a formal power series. Suppose u(z) is implicitly defined by u(z) = zφ(u(z)). Then 1 [zn] f(u(z)) = [un−1] f 0(u)φn(u) n

Proof. To prove this it suffices to prove it for the case where φ and f are polynomials. This is because the assertion of the theorem remains unchanged if we truncate φ and f at the n-th degree term. We know that u = u(z) is uniquely defined as a formal power series, but we don’t have an expression for it as a function. On the other hand we can directly derive an expression for z in terms of u, as z = z(u) = u/φ(u). Notice that z(u) = u/φ(u) is an analytic function, and z(0) = 0, z0(0) 6= 0. So it has an inverse which is analytic near the origin. This means that u(z), and hence f(u(z)), is an analytic function in a neighbourhood of the origin (recall that f is assumed to be a polynomial). We really don’t care what z = z(u) looks like as a function, but since it is an invertible analytic function that guarantees that its inverse is also analytic. But it’s inverse is u = u(z) which is the function we really do care about. We use the Cauchy integral formula, integrating around a curve C = C(z) which circles the origin within the region of analyticity. Since z = z(u) is at least locally invertible, we can change coordinates from z to u (obtaining the curve C0 in terms of u). Then we re-express the resulting integral and apply the Cauchy integral formula backwards. Z n 1 f(u(z)) [z ] f(u(z)) = n+1 dz 2πi C z 1 Z f(u)φn+1(u) φ(u) − uφ0(u) = n+1 2 du 2πi C0 u φ (u) 1 Z f(u)φn(u) 1 Z f(u)φn−1(u)φ0(u) = n+1 du − n du 2πi C0 u 2πi C0 u = [un] f(u)φn(u) − [un−1] f(u)φn−1(u)φ0(u) 1 = [un−1] (f(u)φn(u))0 − [un−1] f(u)φn−1(u)φ0(u) n 1 1 = [un−1] f 0(u)φn(u) + [un−1] f(u)nφn−1(u)φ0(u) − [un−1] f(u)φn−1(u)φ0(u) n n 1 = [un−1] f 0(u)φn(u) n Note that in order to express the result as a single coefficient extraction, we used the fact that for any power series P (u) we have [un] P (u) = [un−1] P 0(u)/n, which follows directly from the definition of the (formal power series) derivative. If you prefer, any of the last four lines could have been taken as “the” conclusion of the theorem.

103 In fact this proof is somewhat misleading, as it seems to depend heavily on analytic complex functions. One can prove this without recourse to complex analysis but we won’t do this here.

rooted labelled trees

Recall that if T is the set of rooted labelled trees (where size is measured by number of vertices) then we found that T =∼ {•}×Set(T ), which gave us the exponential generating function T = x exp(T (x)). For u = T (x), φ(u) = exp(u) and f(u) = u, LIFT gives:

1 1 1 nn−1 n![xn] u = n! [un−1](eu)n = n! [un−1] enu = n! = nn−1 n n n (n − 1)!

Every unrooted labelled tree corresponds to exactly n rooted labelled trees (the labelling effectively kills the automorphisms). So the number of labelled trees (unrooted) is nn−2. As another example, let’s consider the average root degree of a rooted labelled tree.

Let fn,k be the number of rooted labelled trees on n vertices. We already saw (again using Set) the bivariate generating function.

X xn F (x, y) = f yk = x exp(yT (x)) n,k n! n,k

We analyze the random variable Xn, which measures the root degree of a random rooted labelled tree. Recall that the factorial moments are given by the following (there “should” be a factor of n! top and bottom to account for the fact that we want to extract coefficients from an exponential generating function, but we have issued a pre-emptive retaliatory cancellation).

[xn] d F (x, y) dy E[X ] = y=1 n [xn] F (x, 1)

[xn] d y d F (x, y) dy dy E[(X )2] = y=1 n [xn] F (x, 1)

So it is useful to calculate the following quantities.

[xn] F (x, 1) = [xn] x exp(T (x)) = [xn] T (x)

n d n n 2 [x ] F (x, y) = [x ] xT (x) exp(T (x)) = [x ] T (x) dy y=1

n d d n  2  n  2 3  [x ] y F (x, y) = [x ] xT (x) exp(T (x)) + xT (x) exp(T (x)) = [x ] T (x) + T (x) dy dy y=1

We already worked out the first one, but in fact LIFT allows us to easily evaluate any of these. We have u = T , and φ(u) = eu; notice that we already used the implicit function to rewrite our expression as a coefficient of a function of u alone, by eliminating the x on the right-hand side. This

104 time the function is not just f(u) = u, but f(u) = u2 or f(u) = u2 + u3. Now using LIFT we obtain 1 [xn] T (x) = [xn] u = [un−1](eu)n n 1 1 nn−1 = [un−1] enu = n n (n − 1)! 1 [xn] T 2(x) = [xn] u2 = [un−1] 2u(eu)n n 2 2 nn−2 = [un−2] enu = n n (n − 2)! 1 [xn] T 2(x) + T 3(x) = [xn] u2 + u3 = [un−1] (2u + 3u2)(eu)n n 2 3 2 nn−2 3 nn−3 = [un−2] enu + [un−3] enu = + n n n (n − 2)! n (n − 3)! This gives us the moments. 2 nn−2 nn−1 2(n − 1) 2 E[X ] = ÷ = = 2 − n n (n − 2)! n! n n  2 nn−2 3 nn−3  nn−1 2(n − 1) 3(n − 1)(n − 2) 11 6 E[(X )2] = + ÷ = + = 5 − + n n (n − 2)! n (n − 3)! n! n n2 n n2 The average and variance is then 2 µ = E[X ] = 2 − n n 2(n − 1) 3(n − 1)(n − 2) 2(n − 1)2 (n − 1)(n + 2) 3 2 σ2 = + − = = 1 − + n n2 n n2 n n2

exercises

1. In the proof of Theorem 16.4, we insisted on φ(0) 6= 0, which is to say that φ has a nonzero constant term. What happens if φ(0) = 0, so that the constant term is zero? The proof of Theorem 16.4 fails, but what can we say about u(z) when u(z) = zφ(u(z)) and φ(0) = 0? 2. Recall that the Catalan numbers have ordinary generating function C(x) where C = 1+xC2. a) Explain why we can’t (directly) apply Lagrange inversion to the equation C = 1 + xC2; in other words, why taking u = C does not (directly) work. b) Set u = C(x) − 1, and show that u = x(u + 1)2. Show that Lagrange inversion does apply here and use it to derive the coefficient in u(x) and hence the coefficients of C(x). Of course you know the final answer already. . . c) Set u = xC(x), and show that u = x/(1 − u). Show that Lagrange inversion does apply here and use it to derive the coefficient in u(x) and hence the coefficients of C(x). Of course you know the final answer already. . . 3. Consider the expression u5 − u + x = 0, where we imagine that u is a power series in x. a) Show that this gives u = x/(1 − u4), and explain why Lagrange inversion applies here. b) Using Lagrange inversion, find an expression for u as a formal power series in x. c) Substitute the constant a for the variable x, and observe that you have a series solution to the quintic polynomial u5 − u + a = 0. d) Substitute u = xT into u5 − u + x = 0, and show that T = 1/(1 − x4T 4). Explain why T is the generating function for a certain class of trees and describe this class precisely. You might notice that this only gives one root to u5 − u + a = 0; as a → 0 the five roots converge to ±1, ±i, 0; this expression gives the root that converges to 0 as a → 0. Solving a

105 general quintic can be reduced to solving the Bring-Girard form u5 −u+a, so this gives (with some preprocessing!) a power series solution to (one root of) an arbitrary quintic, where the coefficients count certain trees. 4. Let T be the set of rooted labelled plane trees where size is measured by number of vertices. a) Recall (or rederive!) a functional equation for the corresponding exponential generating function T . b) Check that this equation is amenable to applying Theorem 16.4, and use Theorem 16.4 to find the number of such trees on n vertices. (Note that we already found this number without Theorem 16.4.) c) Using Theorem 16.4, find the average root degree of such a tree, and the variance. 5. Let T be the set of rooted labelled plane trees where each vertex has either 0 or d children, and size is measured by number of vertices. a) For d = 2, show that T = x(1 + T 2). Solve this directly using the quadratic formula, and notice that we know the power series of the resulting closed-form expression for T . Hence find tn. b) For d = 2, find tn using Theorem 16.4. Of course this should be the same answer as before. c) For arbitrary d, give the corresponding functional equation for T . Using Theorem 16.4 find tn. d) For arbitrary d, notice that many of the tn are zero. Explain. e) Without really calculating anything, give the average root degree of such a tree, and the variance (notice that n = 1 is a special case). f) Using Theorem 16.4, find the average root degree of such a tree, and the variance. Compare with the “obvious” answer. 6. Let T be the set of rooted labelled plane trees where the number of children is a multiple of d, and size is measured by number of vertices. a) Recall (or rederive!) a functional equation for the exponential generating function T (x). b) Using Theorem 16.4, find tn. c) Using Theorem 16.4, find the average root degree of such a tree, and the variance.

106 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 17. money changing (introduction to asymptotics)

making change and integer partitions

Given positive integers r1, r2, . . . , rm we want to determine the number of non-negative integer solu- P tions to i rixi = n. Let this number be gn. Then gn is the number of ways that one can “make change” for n using coins from the set C = {r1, r2, . . . , rm}. In terms of generating functions, this is just our old friend MSet, or more precisely, a restricted integer partition, with cr = 1 for r ∈ C and cr = 0 otherwise. m X rj C = {r1, ··· , rm} ←→ C(x) = x j=1 m Y 1 G = MSet(C) ←→ G(x) = (1 − xrj ) j=1 So each element in G is a multiset of coins, or a way of “making change”.

Since G(x) is a finite product, we could decompose it by partial fractions, and obtain fn as a linear combination of exponentials with polynomial coefficients. This is in principle straightforward, and in principle could even be done symbolically, in terms of solving a linear system. In principle, or perhaps even in reality, you’ve already done this.

Schur’s theorem

Consider the question: when is gn > 0? In other words, when is it possible to make change for n? Of course if gcd(r1, r2, . . . , rm) = d > 1, then gn = 0 whenever d - n. There is a more-or-less complete solution known when m = 2, which we state without proof.

Theorem 17.1. Let a and b be relatively prime positive integers, and set N = (a − 1)(b − 1). 1 Let G(x) = (1−xa)(1−xb) . Then • [xn] G(x) > 0 for all n ≥ N, • [xN−1] G(x) = 0, • [xn] G(x) = 0 for exactly half the values in 0, 1,...,N − 1.

Note that “exactly half” means just that, since N is always even. For general m, the specifics are not known. We will prove Schur’s Theorem, which shows that something similar happens for any m. This is by way of a demonstrating the usefulness of analytic techniques.

Theorem 17.2. Let r1, r2, . . . , rm be positive integers with gcd(r1, r2, . . . , rm) = 1. Then there n Qm 1 exists an N such that [x ] j=1 1−xrj > 0 for all n ≥ N.

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

107 The proof uses the decomposition into partial fractions. First note that all roots of the denominator are roots of unity. Furthermore, the roots of each factor (1 − xrj ) are exactly the complex numbers 2πit/r 2πi0/r e j , where 0 ≤ t < rj. Note that each factor has a root of e j = 1, so the factored form of the denominator contains (1 − x)m. It is a short exercise to show that no other root is a factor of each term in the denominator. So the factorization looks something like:

m m rj ! Y 1 Y Y 1 1 = = 1 − xrj 1 − xe−2πit/rj (1 − x)m(1 − x/ω )m1 (1 − x/ω )m2 ··· j=1 j=1 t=1 1 2 where the ωi are the other roots, and the mi are their multiplicities. The key observation is that we r know that m > mj for each j since the only root that is common to each 1 − x j is 1 − x.

Problem 17.3. Show that if e2πit/r (with gcd(t, r) = 1) is a root of 1 − xa, then r | a. Conclude that r r rm if gcd(r1, r2, . . . , rm) = 1, then the only common root of the polynomials 1 − x 1 , 1 − x 2 , ··· , 1 − x is 1; put another way,

r1 r2 rm gcd(r1, r2, . . . , rm) = 1 =⇒ gcd(1 − x , 1 − x ,..., 1 − x ) = 1 − x.

So the partial fraction expansion looks like the following.

m Y 1  c c0 c00  = + + + ··· (1) 1 − xrj (1 − x)m (1 − x)m−1 (1 − x)m−2 j=1  0 00  c1 c1 c1 + m + m −1 + m −2 + ··· (1 − x/ω1) 1 (1 − x/ω1) 1 (1 − x/ω1) 1  0 00  c2 c2 c2 + m + m −2 + m −2 + ··· (1 − x/ω2) 2 (1 − x/ω2) 2 (1 − x/ω2) 2 + ···

It is a simple matter to extract coefficients. Giving explicit details for the first term only, we get:

m Y 1 n + m − 1 [xn] = c(1)n + ··· 1 − xrj m − 1 j=1

This term is of order nm−1, which is greater than the order of any other term. Recall that all roots are roots of unity, and that m > mj for every j. Specifically, the term we did write down is a polynomial in n of degree m − 1; every other term is a polynomial in n of strictly smaller degree multiplied by the n-th power of some root of unity.

Problem 17.4. Show that when we extract coefficients from equation (1), the term shown above does

n 1 indeed dominate. First show that [x ] m −j is a polynomial in n of degree mi − j − 1. (1 − x/ωi) i Then identify the term(s) of largest degree.

So for large enough n, we have an approximation.

m Y 1 n + m − 1 [xn] ≈ c 1 − xrj m − 1 j=1

It is easy to see that this term tends to infinity for large enough n, and so will eventually dominate all other other terms and be at least 1. Thereafter, we will have gn > 0 for all n ≥ N, for some N.

108 We can explicitly compute the value of c in the dominant term, even if the exact meaning of “large enough n” is a little unclear. In the partial fraction expansion, multiply by (1 − x)m to get m Y 1  c  (1 − x)m = (1 − x)m + other terms 1 − xrj (1 − x)m j=1 m Y 1 − x   = c + (1 − x)m other terms 1 − xrj j=1 m Y 1   = c + (1 − x) big mess 1 + x + ··· + xrj −1 j=1 The denominator in each “other term” has a power of 1 − x that is strictly less than m. So no −1 denominator in the “big mess” has a factor of 1 − x. So we set x = 1 to obtain c = (r1r2 ··· rm) .

In fact, we’ve shown something stronger than the result as stated: an asymptotic formula for gn.     1 n + m − 1 m−2 1 n + m − 1 gn = + O(n ) ≈ r1r2 ··· rm m − 1 r1r2 ··· rm m − 1

exercises

1. (slightly  ) a) Let r1 < r2 < ··· < rm. Show that in Theorem 17.2 we have M ≤ (r1 − 1)(r2 − 1). As a consequence of this, give an algorithm for finding the value of N for any particular r1, r2, ··· , rm. b) Let m = 3 with r1 = 3, r2 = 5, r3 = 8. Find the value of N from Theorem 17.2. c) Show that for fixed r1 and r2, show that there is a value b such that if rj ≥ b for all j > 2, then we can easily write down the exact value of N in Theorem 17.2

2.  Can you say anything about the value of N as a function of m and/or r1, r2, ··· , rm? The complete answer is not known for m > 2, so this is really an open-ended project.1

1 There are some results known in: Nijenhuis and Wilf, “Representations of Integers by Linear Forms in Nonnegative Integers”, Journal of Number Theory. 4:98–106 (1970).

109 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 18. asymptotics from singularities : poles

We explore some basic asymptotic techniques. The chapter closely follows Wilf [2]. We first review some facts about convergence and radius of convergence, so as to better be able to understand the estimates we get.

radius of convergence

Let {xn}n≥0 be any sequence of real numbers. We say that lim sup xn = L if one of the following holds:

• L = +∞ and for all M there exists an n such that xn > M, or

• L = −∞ and for all M there exist only finitely many n such that xn > M, or • −∞ < L < +∞ and – for all  > 0 there exist only finitely many n such that xn > L +  – for all  > 0 there exist infinitely many n such that xn > L − 

An alternate definition (in fact, the usual one) is that lim sup xn = lim sup {xk}. n→∞ k≥n We collect some facts about lim sup.

Fact 18.1. Every real sequence has a unique lim sup.

Fact 18.2. If limn→∞ xn exists, then it is equal to lim sup xn.

Fact 18.3. Let S be the set of limits of convergent subsequences of {xn}. Then lim sup xn is the least upper bound of S.

We care about lim sup because of the following theorem.

−1 P n  1/n Theorem 18.4. Let f(z) = n≥0 anz and define R = lim sup |an| . Then f(z) is convergent for |z| < R and divergent for |z| > R.

We call R the radius of convergence of f(z). Proof. Consider the case 0 < R < ∞. First let z be such that |z| < R. We need to show that f converges for this z. We can choose 1 −1  > 0 such that z < R +  < R. Now we know that there exists an N such that for all n ≥ N 1/n 1  we have |an| < R + 2 . Thus !n  1  n  1 −1 1 + R/2n |a zn| < + +  = = αn n R 2 R 1 + R P n where 0 < α < 1. It follows that n≥0 anz is absolutely convergent.

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

110 Now let z be such that |z| > R. We need to show f diverges for this z. We can choose  > 0 such 1 −1 1/n 1  that |z| > R −  > R. We know that for infinitely many n, |an| > R − 2 . Thus !n  1  n  1 −1 1 − R/2n |a zn| > − −  = = αn n R 2 R 1 − R

where α > 1. So the sequence anzn diverges to infinity. Hence the series diverges.

1/n1 If R = 0, then there exists an n1 such that |an1 | = m1 > 1. Then we successively find nj+1 1/nj+1 n such that anj+1 = mj+1 > 1 + mj. For these n, we see that if z 6= 0 then |anz | → ∞. The series diverges. 1/n 1 If R = ∞ then choose any z. For all but finitely many n, |an| < 2|z| . So for all but finitely n −n many z, |anz | < 2 . The series converges absolutely.

We also have the following result, whose proof we omit.

P n Theorem 18.5. Let f(z) = n≥0 anz have radius of convergence R. Then f(z) has a singularity for some z1 with |z1| = R.

So for an analytic function we can find the radius of convergence of the power series about the origin quickly: it is the modulus of the singularity closest to the origin. For us Theorem 18.4 is not in practice a way to compute R, but a way to control the coefficients an.

bounds from radius of convergence

P n Example 18.6. Describe the convergence of n≥0 z .

1/n−1 Since an = 1 for all n, then the radius of convergence is R = lim sup(1) = 1. The power P n series n≥0 z converges for |z| < 1 and diverges for |z| > 1. Therefore there is a function, analytic in the disc {z ∈ C : |z| < 1} that agrees with this power series on the disc, and furthermore this function must have at least one singularity on the boundary of the disc. Of course we know what the function is, and it behaves as advertised. Note that we make no claims about what this function looks like outside of the disc, and in fact it bears no resemblance to the power series there.

1 Example 18.7. Describe the convergence of 1+z2 .

1 The function 1+z2 is analytic everywhere except at its singularities, which are at ±i. Therefore its power series about z = 0 must have radius of convergence R = min {|(i) − 0| , |(−i) − 0|} = 1. This means that we can write the function as a power series 1 X = a zn 1 + z2 n n≥0 1/n which converges for |z| < 1. Furthermore, we know that lim sup |an| = 1. Choose any  > 0. n n For all but finitely many n, |an| < (1 + ) and for infinitely many n, |an| > (1 − ) . Put another way, the coefficients are 1 times a sub-exponential function. This is also true of the previous example (the sub-exponential function in the previous case is just 1).

111 Problem 18.8. Using partial fractions, compute [zn] (1 + z2)−1 exactly, and verify the claims made about these coefficients above. Example 18.9. Consider the function f(z) = z/(ez − 1) (in fact this is the EGF for the Bernoulli numbers). It has singularities at z = 2πit for any t ∈ Z. But for z → 0 we find by l’Hospital’s rule that f(z) → 1. Strictly speaking, f(z) has a discontinuity at z = 0, but it is removable discontinuity: if we define f(0) = 1, then the function becomes continuous there. We will always make this kind of definition for removable discontinuities, and hence them. So effectively f(z) has singularities at z = 2πit for any t ∈ Z \{0}. Then we have that f(z) has a convergent power series around z = 0, with radius of convergence 1/n −1 R = |2πi| = 2π. Thus lim sup |an| = (2π) . In other words, for any  > 0, there are only finitely 1 n 1 n 1 n many n with an > 2π +  , and there are infinitely many n with 2π −  < an < 2π +  .

We will see later that we can get more precise asymptotic estimates for an.

We can understand these examples as a general result.

P n Theorem 18.10. Let f(z) = n≥0 anz be an analytic power series with radius of convergence R. −1 n • If 0 < R < ∞ then an = O (R + ) for any  > 0. n • If R = ∞ then an = O ( ) for any  > 0, and an → 0 as n → ∞.

• If R = 0 then an eventually grows faster than any exponential function.

The proof of this is essentially a restatement of the definition of lim sup (indeed, the moniker “The- orem” is perhaps an overstatement). Note that we cannot set  = 0, even in the case of 0 < R < ∞. It says that the coefficients are of the order R−n multiplied by a sub-exponential function (eg, poly- nomial, logarithm, . . . ). n 1/n Problem 18.11. Show that if an = cr for constants c, r, then lim sup |an| = r. Is the converse n true? If not, give an example of an an that is not of the form cr yet has lim sup an = r. n Problem 18.12. Give an example of a sequence an such that an = O ((0.5 + ) ) for any  > 0, yet n an 6= O (0.5 ). Note that this question is intimately connected with the previous one.

Cauchy integral

Recall Cauchy’s integral formula: if f(z) is analytic at z = 0, then we can compute the power series coefficients by I n 1 f(z) [z ] f(z) = n+1 dz 2πi C z where C is a curve around the origin, oriented positively, entirely within the region of convergence. Example 18.13. Cauchy’s integral formula gives an expression for the coefficients, albeit one that may not be easy to compute exactly. We show how to use it in a simple way to bound coefficients.

Taking C to be a circle of radius r, let M be the maximum value attained by f(z) over all points with |z| = r, ie, over all points on the circle. I I Z 2π n 1 f(z) 1 |f(z)| 1 M M |[z ] f(z)| = n+1 dz ≤ n+1 dz ≤ n+1 rdθ = n 2πi C z 2π C r 2π 0 r r This gives another proof of Theorem 18.10. It also suggests ways it could be improved: bounding f(z) by its maximum is likely to give a very poor approximation. We will see better ways to use this in Chapter 20.

112 counting derangements

In fact we can do more with Theorem 18.10 then determine the asymptotic behaviour. We can use it to get more accurate estimates as well. Example 18.14. The EGF for derangements is e−z/(1 − z). If we use Theorem 18.10 we obtain n the rather unhelpful estimate that dn = n! O ((1 + ) ). In fact we can do better with some clever rearrangement. e−z e−1 e−z − e−1 = + 1 − z 1 − z 1 − z The second term on the right-hand side has a singularity, but it is removable. In other words we can define a function which is identical to this one except at z = 1, and the new function has no singularities. This means that e−z − e−1 [z ] = O (n) n 1 − z Now we can compute e−z e−1 e−z − e−1 d = n![zn] = n![zn] + n![zn] = n! e−1 + O (n) n 1 − z 1 − z 1 − z

This shows (once again. . . ) that the probability that a random permutation is a derangement is asymptotically e−1. Problem 18.15. Show that the singularity at z = 1 of (e−z − e−1)/(1 − z) is removable. That is, show that the limit as z → 1 of this function exists.

meromorphic functions

A function is entire if it is analytic on the whole complex plane. Put another way, it has a convergent power series with R = ∞. From Theorem 18.10 we see that entire functions have sub-exponential coefficients — coefficients that grow slower than any exponential function.

Note that if a function has a removable singularity, then we remove it. A singularity at z0 is removable if limz→z0 f(z) exists while f(z0) does not. In such cases we tacitly redefine f(z0) to be the value of the limit. This means that we will regard functions like (e−z − e−1)/(1 − z) as having “no” singularities.

Let f be a complex function and m1 be a positive integer. The function f(z) has a pole of order m1 at z1 if the following holds. ( r limz→z1 (z − z1) f(z) = ∞ r < m1 r limz→z1 (z − z1) f(z) < ∞ r = m1

In other words, the limit exists when r = m1 and diverges when r > m1. It is useful to think as if f m had a term (z − z1) 1 in the denominator (although this is not necessarily correct). We insist here that m1 is a positive integer. If m1 = 1 then we call z1 a simple pole. Problem 18.16. Show that e−z/(1 − z) has a simple pole at z = 1.

A function is meromorphic in a region U if it has only finitely many singularities in U, all of which are poles. A function could easily have infinitely many poles in the complex plane yet be meromorphic in any disc centred at the origin. You might try and construct such an example. The typical example (for us!) of a function being meromorphic is that the poles can be separated: they do not have any cluster points in the complex plane.

113 Suppose f(z) has a pole of order m1 at z1, and furthermore that every other pole is strictly further m from the origin than z1. Then the function (z − z1) 1 f(z) is analytic at z = z1, and so we can write its Taylor series about z = z1.

m1 X j (z − z1) f(z) = bj−m1 (z − z1) j≥0 We’ve chosen a strange convention to index the coefficients. The reason is that we want to divide this now to obtain the Laurent expansion for f(z) at z = z1. 1 X j X b−j X j f(z) = bj(z − z1) = j + bj(z − z1) (z − z1) j≥−m1 j=m1 j≥0

We call the finite sum on the right-hand side the principal part of f(z) at z = z1, and write PP(f; z1). Suppose for the moment that the power series for f(z) at the point z = 0 (this is the GF we actually care about) has radius of convergence R = |z1|, and furthermore that z1 is the only singularity of f with that absolute value. Then the infinite series on the right-hand side has has a power series about 0 the origin with a radius of convergence R > R (note that the power series is written about z1, not the origin). This means that n X  1   [zn] b (z − z )j = O +  j 1 R0 j≥0 Contrast this with the naive estimation for f(z), which is exponentially larger.  1 n [zn] f(z) = O +  R We can extract coefficients directly from the principal part, giving us an improved estimate for the coefficients of f(z).

1 n n X b−j n X j [z ] f(z) = [z ] j + [z ] bj(z − z1) (z − z1) j=m1 j≥0 1  n X n b−j 1 = [z ] j + O 0 +  (z − z1) R j=m1

If we are lucky, then f(z) − PP(f; z1) is an entire function (z1 the only singularity of f), and we have an O (n) estimate. This is exactly what occurred for the derangements in Example 18.14. Here it is again, but with perhaps some better motivation as to where the “clever rearrangement” came from. Problem 18.17. Set f(z) = e−z/(1 − z). We already know that f(z) has a simple pole at z = 1. −z Using the calculations that showed it is a simple pole, determine the constant b−1 in e /(1 − z) = P j P j b−1/(1 − z) + j≥0 bj(1 − z) . Hence, find a closed form for j≥0 bj(1 − z) , and determine what (non-removable) singularities this function has.

using poles to estimate coefficients

m We can find the coefficients bj by finding the Taylor series for (z − z0) 1 f(z) about the point z0. This might be a little messy, but it is always doable, since we are (usually) only interested in a finite number of terms of this Taylor series. Note that if the pole is of order m1, then finding b−m1 is, in terms of work required, a side-effect of having shown that it is of order m1. This is especially interesting in the case of simple poles, where b−1 is the only coefficient we care about!

114 A more pressing concern is that in order to get any practical use out of this idea, we need to have our error term correspond to a function whose radius of convergence is strictly larger. In other words we need to remove all the poles with absolute value R. First we would do

f(z) = PP(f; z1) + g(z) Now we need to find the smallest singularity of g and remove its principal part.

f(z) = PP(f; z1) + PP(g; z2) + h(z) This means that we need to find the principal part of the “leftover” function g; continuing on, we need each time to find the principal part of a new function. In fact we can avoid this. The singularities of g are exactly the singularities of f, except for z1. And furthermore we have the following result.

Proposition 18.18. Let f(z) be a function with distinct poles at z1 and z2. Let g(z) be defined by f(z) = PP(f; z1) + g(z). Then PP(g; z2) = PP(f; z2).

Proof. By definition PP(f; z1) is analytic everywhere except at z−z1. So in particular it is analytic at z2. Therefore PP(PP(f; z1); z2) = 0. Furthermore PP(·) is a linear operator.

If the singularities of f(z) with absolute value R are exactly the set z1, z2, . . . , zt (remember that we are dealing with meromorphic functions!) then t X (2) f(z) = PP(f; zj) + f (z) j=1 This serves to define f (2)(z) as the “leftovers” after the principal parts have been removed. It has a radius of convergence R2 about the origin strictly larger than that of f(z). In fact, that is the only thing about f (2)(z) that we care about. There’s no reason to stop after one radius-expansion. We could now find the principal parts of all the poles of f (2)(z) on its radius of convergence. Each time we do this we will obtain a better estimate, and the error term will be asymptotically better. In practice, we need not consider the poles nor the principal part of f (2); Proposition 18.18 guarantees that we need only consider the original function f(z). The general situation can be described as follows. Let f(z) be meromorphic on some disc centred at the origin. Let 0 < R1 < R2 < R3 < ··· be the values such that the poles of f lie on one of the circles |z| = Ri (so in particular, R1 is the radius of convergence of f). Denote the poles on the circle (i) |z| = Ri by zj , for 1 ≤ j ≤ ti, where ti is the number of poles on |z| = Ri.

By considering the principal parts on all poles up to RN , we obtain an estimate whose error term is −1  O (RN+1 + ) .

t1 t2 tN n n X (1) n X (2) n X (N) [z ] f(z) = [z ] PP(f; zj ) + [z ] PP(f; zj ) + ··· + [z ] PP(f; zj ) j=1 j=1 j=1  1 n + O +  RN+1 The superscripts refer to which set of poles one is removing (ie, the radius of convergence we are currently working at). If this all looks like a huge mess, then you are right. . . but only in a technical sense. This is exactly partial fractions, but applied to a function that is not rational. Each principal part corresponds to the contribution of one pole. If the original function is rational, then the principal part is exactly the set of fractions corresponding to a particular root of the denominator.

115 If there are a finite number of poles in all, then this process terminates and we have an estimate error term decaying faster than n for arbitrarily small  > 0. If there are infinitely many poles but the function is meromorphic on every disc centered at the origin, then by taking larger and larger discs we can get estimates to any desired accuracy. If the function is meromorphic on some finite disc but not on any bigger disc, then that means there are an infinite number of singularities on that disc: we can’t subtract them all off, so we would have to stop and we would be unable to get a better estimate. One final observation. We obtain the actual estimated coefficient by extracting coefficients from the principal parts. But this is always an easy task, since these are all known series.   j   n b−j b−j n 1 b−j n + j − 1 n (−1) b−j n + j − 1 [z ] j = j [z ] j = j (1/z0) = n+j (z − z0) (−z0) (1 − z/z0) (−z0) n (z0) n

Bernoulli numbers

Consider the function f(z) = z/(ez − 1). We showed in Example 18.9 that z  1 n [zn] = O +  ez − 1 2π This was based on the radius of convergence to give an estimate of the terms. Now we will obtain a better approximation by considering the principal parts. Evidently f(z) has singularities exactly at 2πit for t ∈ Z \ 0. We claim these are all simple poles. If r < 1 then l’Hospital’s rule gives the following. z r(z − 2πit)r−1z + (z − 2πit)r lim (z − 2πit)r = lim z→2πit ez − 1 z→2πit ez  rz (z − 2πit)r  = lim + z→2πit (z − 2πit)1−rez ez = ∞ If r = 1 then l’Hospital’s rule gives the somewhat simpler expression below. z 2z − 2πit lim (z − 2πit)r = lim = 2πit z→2πit ez − 1 z→2πit ez Notice that there was no explicit factor of (z − 2πit) in the denominator, yet these are still simple poles. The limit we have just calculated is also the first term (zeroth term?) in the Taylor series of (z − 2πit)f(z) around the point z = 2πit. In other words 2πit PP(f; 2πit) = z − 2πit

In order to improve the precision of our estimate, we need to remove all poles at the radius of convergence. That means the poles z = 2πi and z = −2πi. So we would have 2πi −2πi f(z) = + + f (2)(z) z − 2πi z − (−2πi)

We don’t really care what f (2)(z), is except to notice that its radius of convergence (as a power series about the origin) is 4π (actually, we can regard this equation as defining f (2)(z)). Now we have the

116 following estimate: −1 −1 [zn] f(z) = [zn] + [zn] + [zn] f (2)(z) 1 − z/(2πi) 1 − z/(−2πi)  1 n  1 n  1 n = − − + O +  2πi −2πi 4π We could repeat this for the next two poles, obtaining  1 n  1 n  1 n  1 n  1 n [zn] f(z) = − − − − + O +  2πi −2πi 4πi −4πi 6π Problem 18.19. Verify explicitly that the poles at ±4πi are simple. In doing so determine PP(f; 4πi) and PP(f; −4πi), and check that the above expression is correct. While you’re at it, determine PP(f; 2πit) for t ∈ Z.

In fact we can do better in this case, because the poles are so nicely arranged (conjugate pairs, purely imaginary). Notice that 2πi −2πi −2 · 4π2 −2 + = = z − 2πi z − (−2πi) z2 + 4π2 1 − z2/(−4π2) Each pair of poles will combine in a similar way, giving −2 −2 −2 f(z) = + + ··· + + f (N+1)(z) 1 − z2/(−(2π)2) 1 − z2/(−(4π)2) 1 − z2/(−(2Nπ)2) (N+1) The leftover function f (z) is analytic in the disc |z| < RN+1 = 2(N + 1)π. We can extend this to any N we want. N X 1  1   [z2m] f(z) = −2 + O +  (2πit)2m (2πi(N + 1))2m t=1 N X 1 = −2 + O δ2m (2πit)2m t=1

N X  1   [z2m+1] f(z) = 0 + O +  (2πi(N + 1))2m+1 t=1 = O δ2m+1 We can choose δ as small as we like by taking N big enough. So we can get an estimate with arbitrarily small (but not necessarily zero) error. In particular it says that the odd terms seem to approach zero. This doesn’t quite mean that f(z) is an even function, but it is suspicious. Using the multi-section technique, we select the odd terms. 1 1  z −z  (f(z) − f(−z)) = − 2 2 ez − 1 e−z − 1 1  z zez  = − 2 ez − 1 −1 + ez 1 z = (1 − ez) 2 ez − 1 1 = − z 2 This tells us that [z] f(z) = −1/2, and [z2m+1] f(z) = 0 for m > 0. Which is certainly O δ2m+1. The function is not even, since not all of the odd terms are zero. But all except finitely many odd terms are zero, and it is this “eventual oddness” that the asymptotic estimates are detecting.

117 ordered set partitions

Consider the ordered Bell numbers; they count the number of ways of partitioning [n] into some number of classes with the classes being sorted in some order. So for instance {1, 3} , {2, 4} is now different than {2, 4} , {1, 3}. Let fn be the number of ordered set partitions of [n] and F (z) its exponential generating function. We can work out this EGF “by hand” based on what we (first) did for set partitions. X zn X (ez − 1)k 1 1 F (x) = f = k! = = n n! k! 1 − (ez − 1) 2 − ez n≥0 k≥0

Alternatively, we saw this in Chapter 8, where we worked out the EGF corresponding to Seq(Set≥1({•})). In fact, ordered set partitions are just surjections. 1 Problem 18.20. Show that the EGF corresponding to Seq(Set ({•})) is . ≥1 2 − ez

Considered as a complex function, F (z) has simple poles at z = log 2 + 2πit for t ∈ Z. It can also be shown that −1/2 1  1  PP(f; log 2) = = z − log 2 2 log 2 1 − z/ log 2 This gives

 1  −1/2 f = n! + O (R−1 + )n R−1 = (log 2)2 + (2π)2 ≈ 0.158 n 2(log 2)n So we get the following estimate (since the 0.158 is approximate anyway, we permit ourselves to ignore the  in the error estimate for clarity).  1  n! f ≈ n! + O ((0.158)n) ≈ n 2(log 2)n+1 2(log 2)n+1 For comparison, here is the estimate we would have had without “removing” the singularity. The singularity is at z = log 2, so the best we could have said was the following, for some unknown constant K.  1  n! f = n! · O R−1 + )n = n! · O ( + )n K n log 2 . (log 2)n

exercises 1 + z 1. Let f(z) = . Use Theorem 18.10 to give an asymptotic estimate for [zn] f(z). 1 − z − z2 Compare this with the exact value determined in Example 2.2. 2. Let g(z) be an entire function such that g(z) and 1 − z − z2 have no common roots. g(z) a) Using one pole, show that [zn] = crn +O ((s + )n) for any  > 0. You should 1 − z − z2 find explicitly c, r, s. b) Show that if g(z) = 1 + z then your result is consistent with the exact answer. g(z) c) Using two poles, what estimate do you get for [zn] ? 1 − z − z2 d) Show that if g(z) = 1 + z then your result is consistent with the exact answer.

3. a) Show that PP(·) is linear. That is, show that PP(f + g; z0) = PP(f; z0) + PP(g; z0). b) The proof of Proposition 18.18 is a a little terse. Expand it in more detail.

118 ez 4. Let f(z) = . ez + 1 a) Determine the singularities of f(z). Give R, the radius of convergence of the power series of f about the origin. b) Show that the singularities are all simple poles, and determine their principal parts. c) Give an asymptotic estimate of [zn] f(z) with error term less than O (15−n). Give the error term of your estimate. Simplify your estimate so that it contains only real numbers (i.e., combine terms that are complex conjugates). 1 X zn 5. Let f(z) = = b . 2 − ez n n! n≥0 a) Find all singularities of f(z) and show that they are all simple poles. −1/2 b) Show that PP(f; log 2) = . z − log 2 c) Give the approximation for bn that results from the principal parts with respect to the poles log 2, log 2 + 2πi, log 2 − 2πi. d) Determine (with a calculator) the smallest n for which the absolute value of the term corresponding to the pole z = log 2+2πi is greater than 0.25. (This is the smallest n for which the approximation using only the dominant pole is conceivably out by as much as 0.5, and so gives some sense of the accuracy of this approximation.) 6. Let f(z) be a formal power series such that as a complex function f is analytic except for a finite number of singularities, all of which are poles of finite order. Suppose that [zn] f(z) is an integer for all n. Is it true that we can write down a “simple approximation formula” for [zn] that is, for large enough n, exact when rounded? What would this “simple approximation formula” look like? Can we weaken the condition “finite number of singularities”?

7. Let an ∈ N for all n ≥ 0, so the an form a sequence of non-negative integers. P n a) Let A(x) = n≥0 anx and assume that as a complex-valued function, A(x) is analytic on some circle of radius R > 1 centred at the origin. Prove that A(x) is a polynomial. P n b) Now let A(x) = n≥0 anx /n! and assume that as a complex-valued function, A(x) is analytic on some circle of radius R > 1 centred at the origin. Need A(x) be a polynomial? Proof or counterexample.

119 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 19. algebraic singularities

We consider briefly what happens with singularities that are like poles but with non-integral multi- plicities: algebraic singularities. The chapter closely follows Wilf [2].

singularities

A complex function f(z) has an algebraic singularity at z = z1 if

r lim (z − z1) f(z) z→z1 is infinite if r < α and the limit exists for r = α, where α is not an integer but some real number. As an example that we have seen before, recall that as an example of enumerating labelled objects by components we found that the EGF for the number of 2-regular graphs is

e−z/2−z2/4 √ . 1 − z

If the denominator were an integer power of (1 − z), we could remove the pole and get a very precise estimate, since this is the only singularity. But as the power is not an integer, our theory does not hold. Briefly, if we were to write down the “Laurent series”, it would be in half-integer powers of (1 − z). In fact, this is essentially what we will do, but the proof is a little more demanding, and the error estimate not as good as the integer case.

Darboux’s theorem

We will start with some tools.

Lemma 19.1. Let α∈ / {0, 1, 2,...}. Then [zn] (1 − z)α = O n−α−1.

Our main interest is of course when α < 0.

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

120 n α n−α−1 Proof. This amounts to estimating a binomial coefficient, since [z ] (1 − z) = n . We can do this using the Gamma function, and Stirling’s estimate for it. n − α − 1 Γ(n − α) [zn] (1 − z)α = = n Γ(n + 1)Γ(−α) ! ! 1 n − αn−α n + 1n+1 ∼ p2π(n − α) ÷ p2π(n + 1) Γ(−α) e e 1 n − αn+1 rn − α = (n − α)−1−α e1+α Γ(−α) n + 1 n + 1 1  1 + αn+1 rn − α = (n − α)−1−α 1 − e1+α Γ(−α) n + 1 n + 1 (n − α)−1−α = + smaller terms Γ(−α) n−1−α = + smaller terms Γ(−α) = O n−1−α

Lemma 19.2. Let A(z) and B(z) be formal power series with [zn] A(z) = O(n−α) and [zn] B(z) = O(rn), where 0 < r < 1. Then [zn] A(z)B(z) = O(n−α).

Pn Proof. We show a bound on | k=0 akbn−k|. We write (#) for various constants that we needn’t (and won’t) calculate explicitly. n If k is small, then bn−k is bounded by (#)r . This means that

X  −α X n−k n/2 √ n −α akbn−k ≤ max |ak| , (#)n (#)r ≤ (#)r = O( r ) = O(n )

0≤k≤n/2 0≤k≤n/2

If k is large, then the ak terms are enough to guarantee the estimate.

X −α X n n−ko −α −α akbn−k ≤ (#)n max bn−k, (#)r ≤ (#)n = O n

n/2

Lemma 19.3. Let h(z) be analytic for |z| < R, for some R > 1. Then [zn] (1 − z)αh(z) = O(n−α−1)

Proof. Set A(z) = (1 − z)α and B(z) = h(z). We have [zn] h(z) = O(rn) for some r < 1. By Lemma 19.1, [zn] A(z) = O(n−α−1). Apply Lemma 19.2.

Now we are ready for Darboux’s Theorem.

121 Theorem 19.4. Let f(z) = g(z)/(1−z)β where β is a positive non-integer and g(z) is analytic X j in |z| < R, for some R > 1. Define bj by g(z) = bj(1 − z) . j≥0 N X n + β − j − 1 Then [zn] f(z) = b + O(nβ−m−2) j n j=0

We view N as a parameter of the estimation; we can choose it as a function of how precise we want the estimate to be. Setting N = 0 gives the leading term only, which gives the asymptotic value. With meromorphic functions the same was true: the “most negative” term of the Laurent series dominated. The actual proof is straightforward. Exactly as in the meromorphic case, we expand the function about the pole (z = 1) and read off the precise values from the leading terms and leave the rest as error.

Proof. m X j m+1 X j−m−1 g(z) = gj(1 − z) + (1 − z) gj(1 − z) j=0 j>m m β X β+j β+m+1 X j−m−1 (1 − z) g(z) = gj(1 − z) + (1 − z) gj(1 − z) j=0 j>m m X n − β − j − 1 [zn] (1 − z)βg(z) = g + O(n−m−β−2) j n j=0 In the second line we observed that the infinite sum is, like g(z), an analytic function for |z| < 1+δ. In the third line we used Lemma 19.3.

2-regular graphs

We have the exponential generating function

−z/2−z2/4 e 2 F (z) = √ = (1 − z)−1/2e−z/2−z /4 1 − z

This is of the right form for Darboux’s theorem, with β = 1/2 and g(z) = e−z/2−z2/4. We need to rewrite g(z) as a Taylor series about the point z = 1. Note that it is really an alternating Taylor series since we are taking powers of (1 − z) instead of (z − 1).

 j 1 d 2 b = (−1)j e−z/2−z /4 j j! dz z=1

This gives the expansion

1 g(z) = e−3/4 + e−3/4(1 − z) + e−3/4(1 − z)2 + ··· 4

122 The only thing we really need are the coefficients bj. Then we have our estimate. Here is what we would get for N = 0, 1, 2.

n − 1/2 N = 0 : [zn] F (z) = e−3/4 + O(n−3/2) n n − 1/2 n − 3/2 N = 1 : [zn] F (z) = e−3/4 + e−3/4 + O(n−5/2) n n n − 1/2 n − 3/2 1 n − 5/2 N = 2 : [zn] F (z) = e−3/4 + e−3/4 + e−3/4 + O(n−7/2) n n 4 n

Clearly we can obtain an explicit approximation to any desired polynomial degree of precision.

generalizing

Darboux’s bound applies only for the singularity z = 1. But we can rescale it to suit whatever singularity we have. For instance given a generating function of the form

h(z) m (z − z1) where h(z) is analytic on the disc |z| < R for some R > |z1|. we would substitute w = z/z1 to achieve

h(z1w) 1 h(z1w) β˜ m = m m = (1 − w) h(w) (z1w − z1) (−z1) (1 − w) with β = m, and h˜ analytic on |z| < R˜, for some (different) R˜ > 1. There is a version of Darboux’s Theorem that deals with multiple singularities with the same absolute value, but we won’t go into details here.

finite fields again

Let fn,k be the number of polynomials of degree n over Fq with exactly k irreducible factors (counting P n multiplicities). Let cn be the number of irreducible polynomials of degree n, and C(x) = n cnx . As a consequence of enumerating unlabelled objects by components we found that   k X n k Y 1 X y k F (x, y) = fn,kx y = = exp  C(x ) (1 − yxr)cr k n,k r≥1 k≥1

We also know that F (x, 1) = (1 − qx)−1 by a direct argument, and this enabled us to determine C(x), at least in a computational manner.

X µ(k) 1 1 C(x) = log = log + g (x) k 1 − qxk 1 − qx 1 k≥1

The function g (x) is just shorthand for the remaining terms in the sum; note that it has radius of 1 √ convergence R1 ≥ 1/ q, unlike C(x) which has radius of convergence R = 1/q.

123 We can compute the average number of irreducible factors and the variance.

d 1 X k F (x, y) = C(x ) dy 1 − qx y=1 k≥1 1 1 g (x) = log + 2 1 − qx 1 − qx 1 − qx  2 2 d 1 X k 1 X k F (x, y) =  C(x ) + (k − 1)C(x ) dy2 1 − qx 1 − qx y=1 k≥1 k≥1 1  1 g (x) 2 g (x) = log + 2 + 3 1 − qx 1 − qx 1 − qx 1 − qx 1  1 2 2g (x) 1 g 2(x) g (x) = log + 2 log + 2 + 3 (1) 1 − qx 1 − qx 1 − qx 1 − qx 1 − qx 1 − qx

Again, g (x) and g (x) are shorthand for the remaining terms of the sums. They both have radii of 2 3 √ convergence at least 1/ q, so their contribution is negligible. Now we get

[xn] F (x, 1) = qn

n d n −1 n n/2 [x ] F (x, y) = hnq + g2(q )q + O(q ) dy y=1 2 n d 2 n −1 n n [x ] 2 F (x, y) = hn q + 2g2(q )hnq + O(q ) (2) dy y=1

So we can compute the average and variance of Xn, the number of irreducible factors in a polynomial of degree n.

−1 −n/2 E(Xn) = hn + g2(q ) + O(q ) 2 −1 E(Xn(Xn − 1)) = hn + 2g2(q )hn + O(1)

2 2 So the average is asymptotically hn and the variance is asymptotically hn − (hn) + hn = hn.

estimating hn

We saw similar results before involving harmonic numbers in considering the number of cycles in a random permutation. We stated that hn is asymptotically log n. We can prove this now. Start with Lemma 19.1. 1 [xn] (1 − x)−α ∼ nα−1 Γ(α)

Now differentiate twice with respect to α to obtain 1 1 [xn] (1 − x)−α log ∼ nα−1 log n 1 − x Γ(α)  1 2 1 [xn] (1 − x)−α log ∼ nα−1(log n)2 1 − x Γ(α)

124 Note that we neglected smaller order terms since we want an asymptotic result, not an equality! We saw previously that 1 [xn] (1 − x)−1 log = h 1 − x n  1 2 [xn] (1 − x)−1 log = h 2 + O(1) 1 − x n Setting α = −1 gives

E(Xn) ∼ hn ∼ log n 2 2 E(Xn(Xn − 1)) ∼ hn ∼ (log n)

So in summary, a random polynomial of degree n has a probability of being irreducible of about 1/n, and has on average log n irreducible factors with a variance of log n. None of these depend (asymptotically) on the choice of finite field!

125 MAT5107 : Combinatorial Enumeration Mike Newman, December 2018 20. saddle-point analysis

This chapter is devoted to improvements in the way we use Cauchy’s integral formula to get estimates for the coefficients. Recalling Example 18.13, we will look at better ways to bound the integral function than by taking the maximum of the integrand. The result is due to Hayman. This chapter closely follows Wilf [2].

Stirling’s formula

We’ll develop an estimate for n! as an initial example. More specifically, we will develop asymptotic estimates for the [zn] ez. We start with Cauchy’s integral formula applied to the function ez. I z 1 n z 1 e = [z ] e = n+1 dz n! 2πi C z Take C to be a circle of (fixed) radius r, and make the substitution z = reiθ, dz = reiθ dθ. Z π iθ Z π r Z π r 1 1 exp(re ) 1 1 iθ 1 e e = n niθ dθ = n exp(re − niθ) dθ ≤ n dθ = n n! 2πi −π r e 2πi r −π 2π r −π r

To obtain the inequality, we chose the maximum value of exp(reiθ − niθ) over −π ≤ θ ≤ π; the maximum occurs at θ = 0. Now we may chose any value of r with 0 < r < R and thus obtain a bound on 1/n!; since ez is entire this means any r we wish. We’d like to choose the r that gives the best bound, which means smallest bound. It is a short exercise in Calculus to see that this bound is minimized when r = n. We have shown that 1  e n nn ≤ n! ≥ n! n e Problem 20.1. Show that over 0 < r < R = ∞, the minimum value of er/rn occurs for r = n.

better estimate

The weakness in the previous estimate is in the evaluation of the integral. We “estimated” the function ez over the circle by its maximum value. The problem is that for θ far from zero, the function is much smaller. Specifically, exp(reiθ − niθ) is maximized at θ = 0, but drops off rapidly as |θ| increases. We can’t choose a better maximum value than we did, but we can choose a bounding function instead of a maximum value. Fix some angle δ > 0 (to be determined later) and evaluate separately the integral for δ < θ < δ and δ < θ < 2π − δ. We will show the second integral is negligibly small, and the integrand is dominated by the value er/rn over the first interval. Accordingly, we will fix r = n as before. First integrate from −δ to δ. Z δ iθ Z δ 1 exp(re ) 1 1 iθ I1 = n niθ dθ = n exp(ne − niθ) dθ 2πi −δ r e 2π n −δ

∗ These notes are intended for students in mike’s MAT5107. For other uses please say “hi” to [email protected].

126 Expand neiθ − niθ around iθ to get n + n(iθ)2/2 + n(iθ)3/6 + ··· . If we choose δ so that nδ3 → 0, then we have exp(neiθ − niθ) ∼ exp(n − nθ2/2) for |θ| < δ (1) Now we have √ n δ n δ n 1 e Z 2 1 e Z 2 −nθ /2 √ −t /2 I1 = n e dθ = n √ e dt 2π n −δ 2π n n −δ n √ Were we√ to integrate over −∞ < t < ∞ then the integral would give 2π. So if we choose δ such that δ n → ∞ then we have 1 en I1 ∼ √ 2πn nn Now integrate over δ to 2π − δ. We still have r = n. 1 Z 2π−δ exp(reiθ) 1 1 Z 2π−δ en cos δ |I | = dθ ≤ exp(neiθ) dθ = 2 n niθ n n 2π δ r e 2π n δ n

Now we show that I2 is negligible compared to I1. |I | √ √   δ2 δ4   lim 2 ≤ lim 2πn exp (n cos δ − n) = lim 2πn exp n 1 − + − + ··· − n n→∞ |I1| n→∞ n→∞ 2! 4! √  δ2 δ4  = lim 2πn exp −n + n − + ··· n→∞ 2! 4! Since we already chose δ so that nδ2 → ∞, the limit tends to zero. Putting this all together, we see that 1 1 en √ nn ∼ √ n! ∼ 2πn , n! 2πn nn en which is Stirling’s approximation for n!. √ There is one detail left. We want nδ3 → 0 but also δ n → ∞. Can this be done? Yes, it suffices to take, eg, δ = n−β for 1/3 < β < 1/2. This is not actually a lucky break. It amounts to saying that nδ2 → ∞ and nδ3 → 0. In other words, in the expansion of equation (1), the terms we dropped were negligible and the terms we didn’t drop were not negligible.

saddle-point analysis

What we did for the function ez in order to get an estimate for n! can be applied to other functions as well. The generalization of Stirling’s formula is due to Hayman. n Consider some function f(z), with radius of convergence R; we wish to know fn = [z ] f(z). We write f(z) = f(reiθ) and change variables. So we want to know Z Z π n 1 f(z) 1 1  iθ  [z ] f(z) = n+1 dz = n exp log(f(re )) − niθ dθ 2πi C z 2π r −π Our premise is that the integrand is largest along the positive real axis, and decays rapidly as |θ| grows. We want f(z) to be “large” on the positive real axis and comparatively “small” on the rest of the complex plane. If f(z) is any power series with [zn] f(z) positive real for all n, then the maximum will occur along the real axis. For us, f(z) is typically a generating function that counts something, so the coefficients will typically be positive, so our premise seems very reasonable. First we consider the integral over −δ ≤ θ ≤ δ.

127 We start by expanding:

iθ d iθ 1 d iθ 2 log(f(re )) ∼ log(f(r)) + log(f(re )) (iθ) + 2 log(f(re )) (iθ) diθ iθ=0 2 diθ iθ=0 = log(f(r)) + a(r)(iθ) + b(r)(iθ)2/2 (2)

f(reiθ) = exp log f(reiθ) ∼ f(r) exp a(r)(iθ) + b(r)(iθ)2/2 (3)

This expansion defines the functions a(r) and b(r). We are assuming that equation (2) is asymptot- ically valid, that is, that the remaining terms in equation (2) tend to zero for |θ| ≤ δ. Notice that a(r) and b(r) can be more easily computed as: d f 0(r) d a(r) = r log(f(r)) = r b(r) = r 2 log(f(r)) = ra0(r) dr f(r) dr Our integral for small θ becomes Z δ 1 1 2  I1 ∼ n exp log(f(r)) + a(r)iθ − b(r)θ /2 − niθ dθ 2π r −δ We choose r so as to make the linear term disappear, that is we let r = ρ where a(ρ) = n. Note that in order for there to be a solution with 0 < r < R, we need a(r) to become arbitrarily large within 0 < r < R, so we are assuming this also. This gives Z δ 1 1 2  I1 = n exp log(f(ρ)) − b(ρ)θ /2 dθ 2π ρ −δ 1 f(ρ) Z δ = exp −b(ρ)θ2/2 dθ 2π ρn −δ √ Z δ b(ρ) 1 f(ρ) 2  = p n √ exp −t /2 dt 2π b(ρ) ρ −δ b(ρ) Z ∞ 1 f(ρ) 2  1 f(ρ) ∼ p n exp −t /2 dt = p n 2π b(ρ) ρ −∞ 2πb(ρ) ρ √ The final integral has the value 2π. But in order to get it, we needed to assume that δpb(ρ) → ∞. So we certainly should have b(ρ) → ∞. Given this, it is possible to choose δ small enough to satisfy equation (2) yet large enough to make δpb(ρ) → ∞. Now for the rest of the integral. Z 2π−δ 1 1  iθ  |I2| = n exp log(f(ρe )) − niθ dθ 2π ρ δ 1 1 Z 2π−δ ≤ f(ρeiθ) dθ n 2π ρ δ  iθ 2(π − δ) max f(ρe ) : δ ≤ θ ≤ 2π − δ = 2π ρn

In order that I2 is negligible, it would suffice to have ! f(r) f(reiθ) = o δ ≤ θ ≤ 2π − δ pb(r)

n So if we (further!) assume this, then our estimate for I1 becomes an estimate for [z ] f(z).

128 Hayman’s theorem

We first collect together the various assumptions we made about our function.

Definition 20.2. Let f(z) have radius of convergence R, and suppose there is some R0 < R with f(r) > 0 for R0 < r < R. d d Let a(r) = r log f(r) and b(r) = r a(r) dr dr The function f(z) is Hayman-admissible if H1. limr→R a(r) = +∞ and limr→R b(r) = +∞,  1  H2. For some δ = δ(r) we have f(reiθ) ∼ f(r) exp a(r)(iθ) − b(r)θ2 for |θ| ≤ δ as 2 r → R,  f(r)  H3. For δ ≤ |θ| ≤ 2π − δ we have f(reiθ) = o . pb(r)

It might be helpful to compare this to what we did for ez. Various classes of functions are known to be Hayman-admissible. We will not prove the following result.

Proposition 20.3. Let p(z) be a polynomial, and f(z), g(z) Hayman-admissible functions. • f(z)g(z) and exp(f(z)) are Hayman-admissible, • p(f(z)) is Hayman-admissible, if the leading coefficient of p(z) is positive, • exp(p(z)) is Hayman-admissible, if [zn] exp(p(z)) is eventually positive.

The astute reader will have noticed that ez is Hayman-admissible according to this. But notice that this does not imply that exp(z2) is Hayman-admissible, as it is an even function (all the odd coefficients are zero, so not eventually positive). In fact, it is not Hayman-admissible: it takes on large values when z is a negative real number. The result we get is a strong estimate for the coefficients of a Hayman-admissible function.

Theorem 20.4. Let f(z) be an Hayman-admissible function, with R, R0, a(r) and b(r) defined as in the previous. Let ρ be the solution of a(r) = n with R0 < ρ < R 1 f(ρ) Then [zn] f(z) ∼ p2πb(ρ) ρn

permutations with only small cycles

Let fn be the number of permutations on an n-set, with all cycles of length at most two. Recall that the EGF is given by: X xn F (z) = f = exp z + z2/2 n n! n This is Hayman-admissible by the final criterion given. We compute d d d a(r) = r log F (r) = r r + r2/2 = r + r2 b(r) = r a(r) = r + 2r2 dr dr dr

129 We then solve for a(ρ) = n for ρ, with 0 < ρ < R = ∞. We will want an expansion.

r 1 1 √  1 1/2 1 ρ = n + − = n 1 + − 4 2 4n 2 √  1  1 = n 1 + + O(n−2) − 8n 2 √  1 1  √ 1 1 = n 1 − √ + + O(n−2) = n − + √ + O(n−3/2) 2 n 8n 2 8 n Now we obtain asymptotic estimates for F (ρ), b(ρ), and ρn. F (ρ) = exp ρ + ρ2/2 = exp (n/2 + ρ/2) √  n 1 1  = en/2 exp − + √ + O(n−3/2) 2 4 16 n √ n n 1 ∼ exp + − 2 2 4

√ 1 1  b(ρ) = ρ + 2ρ2 = 2n − ρ = 2n − n − + √ + O(n−3/2) ∼ 2n 2 8 n

 1 1 n ρn = nn/2 1 − √ + + O(n−2) 2 n 8n   1 1  = nn/2 exp n log 1 − √ + + O(n−2) 2 n 8n !!  1 1  1  1 1 2 √ = nn/2 exp n − √ + − − √ + + O(n−3/2) ∼ nn/2 exp − n/2 2 n 8n 2 2 n 8n

We can drop any terms inside the exponential that go to zero, ie, that involve negative powers of n. Now we obtain the estimate from Hayman’s Theorem. √ 1 exp (n/2 + n/2 − 1/4) 1 1 √ [zn] exp z + z2/2 ∼ √ = √ exp n/2 + n − 1/4 p2π(2n) nn/2 exp (− n/2) 2 πn nn/2

What we really want is fn, which is n! times this. We can use Hayman’s Theorem in its simplest form (Stirling’s approximation) for n! too. 1 1 √ f ∼ n! · √ exp n/2 + n − 1/4 n 2 πn nn/2 √ nn 1 1 √ ∼ 2πn · √ exp n/2 + n − 1/4 en 2 πn nn/2 1 √ = √ nn/2 exp −n/2 + n − 1/4 2

general “small”

Our initial determination of ρ depended on being able to solve a quadratic. If we want to enumerate permutations with all cycles of length at most s, then we would need to solve d  r2 r3 rs  n = r r + + + ··· = r + r2 + r3 + ··· + rs dr 2 3 s

130 Our previous approach won’t work. But we can use a “Lagrangean trick”: we’ll illustrate with s = 2. 1  1 1/2 1 1 1 1/2 r + r2 = n ⇐⇒ r2 + 1 = n ⇐⇒ r + 1 = n1/2 ⇐⇒ = + 1 r r r n1/2 r Now let w = 1/r and y = 1/n1/2, to get w = y(w + 1)1/2. We have φ(w) = (w + 1)1/2, and we can LIFT. 1 [yk] w = [wk−1](w + 1)k/2 k X X 1 X 1  k/2  w = [yk] w · yk = [wk−1](w + 1)k/2 · yk = yk k k k − 1 k≥1 k≥1 k≥1

1 X 1  k/2  1 1 1 1 = = √ + + + O(n−2) r k k − 1 nk/2 n 2n 8n3/2 k≥1 √ 1 1 r = n − + √ + O(n−3/2) 2 8 n In the last line, we found the inverse of the FPS for r−1 to get the FPS for r, one term at a time. This is our solution, ie, the value of ρ, exactly as before. Now we estimate F (ρ), b(ρ) and ρn as above.

exercises

1. Check that ez is Hayman-admissible, using the definition. 2. a) Check that exp z + z2/2 is Hayman-admissible, using the claimed classes of Hayman- admissible functions. b) Check that exp z + z2/2 is Hayman-admissible, using the definition. This can be a little technical. m m+1 c) Show that exp z + z is Hayman-admissible if m ∈ N. m m d)  Show that exp (a1z 1 + a2z 2 ) is Hayman-admissible if m1, m2 ∈ N with gcd(m1, m2) = 1 and a1, a2 > 0. m m e)  Show that exp (a1z 1 + a2z 2 + p(z)) is Hayman-admissible if gcd(m1, m2) = 1 and a1, a2 > 0 and p(z) is some polynomial with positive coefficients. 3. Determine the asymptotic number of permutations all of whose cycles have length either two or three. Solve for ρ using the “Lagrangean trick”.

131