<<

Florida State University Libraries

Electronic Theses, Treatises and Dissertations The Graduate School

2013 The Higgs : The Search for the and Investigation of Its Properties Joseph P. Bochenek

Follow this and additional works at the FSU Digital Library. For more information, please contact [email protected] THE FLORIDA STATE UNIVERSITY

COLLEGE OF ARTS AND SCIENCES

THE HIGGS BOSON:

THE SEARCH FOR THE STANDARD MODEL HIGGS BOSON AND

INVESTIGATION OF ITS PROPERTIES

By

JOSEPH P. BOCHENEK

A Dissertation submitted to the Department of Physics in partial fulfillment of the requirements for the degree of

Degree Awarded: Summer Semester, 2013 Joseph P. Bochenek defended this dissertation on May 23, 2013.

The members of the supervisory committee were:

Harrison B. Prosper

Professor Directing Dissertation

Michael Ruse

University Representative

Andrew Askew

Committee Member

Takemichi Okui

Committee Member

Nicholas Bonesteel

Committee Member

The Graduate School has verified and approved the above-named committee members, and certifies that the dissertation has been approved in accordance with the university requirements.

ii ACKNOWLEDGMENTS

I want to thank all of the people at the FSU High Energy Physics group who helped me along the way. To my advisor Harrison Prosper, whose love for ideas and enthusiasm for science always served as a reminder for why we do physics in the first place. Thanks to

Nicola De Filippis and Kurtis Johnson for many interesting discussions, late nights at work, and for sharing many delicious pizzas with me in Florida, Italy and Switzerland. I am grateful to Pushpalatha Bhat for her ideas and support while I was at CERN. Thanks also to my mother, who instilled a fascination with and science from an early age, and my father for giving me a healthy sense of skepticism, which is indispensable in conducting good research. Also, thanks to my friends on the fifth floor for making things fun and interesting over coffee in the day and beers at night: Nobuo, Karoline, Thomas, Brendan, and everyone else. Thanks especially to Dianna without whose support, encouragement, and love I would never have made it this far.

iii TABLE OF CONTENTS

List of Tables ...... vii List of Figures ...... viii List of Symbols ...... xiii Abstract ...... xiv

1 Introduction 1 1.1 Theory ...... 2 1.1.1 Introduction to the Standard Model ...... 2 1.1.2 Anatomy of the Standard Model ...... 4 1.2 Symmetries ...... 6 1.3 Gauge Invariance ...... 8 1.3.1 Example: Scalar Electrodynamics ...... 8 1.4 The Standard Model Symmetry Group ...... 11 1.5 The Higgs Field ...... 14 1.5.1 The ...... 14 1.5.2 Spontaneous Symmetry Breaking ...... 17 1.5.3 Standard Model Higgs Field ...... 19 1.5.4 Hierarchy Problem and Other Issues ...... 22 1.6 Higgs Properties ...... 23 1.6.1 Decay Width ...... 23 1.6.2 Higgs Boson Self Coupling ...... 24 1.6.3 Couplings ...... 24 1.6.4 and ...... 28

2 Experimental Apparatus 32 2.1 The Large Hadron Collider ...... 32 2.2 The Compact Muon Solenoid ...... 33 2.2.1 Tracking System ...... 38 2.2.2 Electromagnetic Calorimeter ...... 39 2.2.3 Hadronic Calorimeter ...... 40 2.2.4 Muon System ...... 41 2.2.5 Trigger ...... 43

iv 3 Objects 46 3.1 Tracks ...... 46 3.2 Particle Flow ...... 48 3.3 Vertices ...... 49 3.4 Electrons ...... 50 3.4.1 Electron Reconstruction ...... 50 3.4.2 Electron Energy Measurement ...... 51 3.4.3 Electron Identification ...... 55 3.5 Muons ...... 57 3.5.1 Muon Reconstruction ...... 57 3.5.2 Muon Momentum Measurement ...... 57 3.5.3 Muon Identification ...... 58 3.6 Lepton Isolation ...... 58 3.6.1 Pileup Correction ...... 59 3.7 Jets ...... 60 3.7.1 Jet Reconstruction ...... 60 3.7.2 Jet Identification ...... 61 3.7.3 Jet Energy Correction ...... 61 3.8 ...... 62 3.8.1 Reconstruction, Identification, and Isolation ...... 62

4 Analysis 63 4.1 Analysis Overview ...... 63 4.2 Datasets ...... 65 4.3 Event Selection ...... 69 4.3.1 Analysis Triggers ...... 69 4.3.2 Primary Vertices ...... 69 4.3.3 Leptons ...... 69 4.3.4 Event Criteria ...... 71 4.3.5 FSR Recovery ...... 73 4.4 Signal and Background Models ...... 74 4.4.1 Signal Models ...... 74 4.4.2 Irreducible Background ...... 75 4.4.3 Reducible Background ...... 75 4.5 Lepton Efficiency ...... 82 4.6 Event-by-Event Uncertainty ...... 85 4.7 Multivariate Discriminant ...... 89 4.7.1 Introduction ...... 89 4.7.2 Classifiers ...... 90 4.7.3 Training Events ...... 96 4.7.4 BNN Training ...... 98 4.7.5 Discriminant Performance ...... 101 4.8 Systematic Uncertainties ...... 105 4.8.1 Systematic Uncertainty Priors ...... 105 4.8.2 Summary of Systematic Uncertainties ...... 107

v 5 Results and Interpretation 113 5.1 Results ...... 113 5.2 Statistical Analysis ...... 113 5.3 Likelihood Function ...... 117 5.4 Nuisance Parameters ...... 119 5.5 Discovery, Parameter Estimation, and Hypothesis Testing ...... 120 5.5.1 Discovery Significance ...... 121 5.5.2 Parameter Estimation ...... 122 5.5.3 Spin/Parity Hypothesis Testing ...... 123 5.6 Measurements ...... 124 5.6.1 Discovery Significance ...... 124 5.6.2 Higgs Mass and Couplings ...... 124 5.6.3 Higgs Spin and Parity ...... 126 5.7 Conclusion ...... 126

Appendices 128

A Marginalization of Statistical Uncertainties 128

B Priors 133 References ...... 139 Biographical Sketch ...... 145

vi LIST OF TABLES

1.1 Fields of the standard model and their group symmetries [1] ...... 13

1.2 Coupling modifiers and signal strength modifiers for decay channels which are observable with 2012 data at the LHC. The µ are measured separately for each channel and can then be used to constrain the global κ parameters...... 28

3.1 The five track parameters that define the helical pattern of a charged particle traversing the tracker. To fully define the trajectory we need a position on the path of the particle, the charge to momentum ratio (which detrmines the radius of the helix) of the particle, and the angle of the momentum of the particle with respect to the magnetic field...... 47

4.1 Data sets and triggers used in the analysis...... 67

4.2 Triggers in 2012 data analysis...... 68

4.3 Monte Carlo simulation data sets used for the signal and background processes. Z stands for Z, Z∗, γ∗; ℓ means e, µ or τ; V stands for W and Z...... 68

4.4 Parameters used to define neural network functions and hyperparameters in the BNNs...... 98

4.5 A summary of the systematic uncertainties affecting the overall scale in 7 TeV data listed in terms of κ value...... 110

4.6 Systematic uncertainties affecting the overall scale in 8 TeV data listed in terms of κ value...... 110

vii LIST OF FIGURES

1.1 Two dimensional example of Higgs potential. The red circle signifies the de- generate space of minimum energy...... 16

1.2 Predicted production cross section (right) and branching ratio (left) for a stan- dard model Higgs boson produced in proton-proton collisions at the LHC [2]. 24

1.3 Higgs production mechanisms at the LHC: -gluon fusion (right), vector boson fusion (left)...... 26

1.4 Angles that characterize the decay of a resonance to two Z [3]. . . . . 30

2.1 Cross section of the LHC beam pipe (left) and a map of the magnetic field produced by the superconducting dipole magnets (right)...... 33

2.2 Aerial view of the land above the LHC with an illustration of the accelerator ring and the location of each detector. ( c CERN) ...... 34

2.3 Outline of the LHC accelerator complex, which consists of a chain of accel- erators yielding protons with progressively higher energies. The acceleration chain starts with the LINAC, where protons are first accelerated to an energy of 50 GeV. The Proton Synchrotron Booster (PSB) increases the energy to 1.4 GeV, followed by the Proton Synchrotoron (PS) which yields protons with energy 26 GeV, and then the Super Proton Synchrotron (SPS), yielding pro- tons of 450 GeV. After the SPS, protons are injected in the main LHC ring...... 35

2.4 A cartoon diagram highlighting the main components of the CMS detector. . 36

2.5 Diagram of the silicon tracking system in the r-z plane of one quarter of the CMS detector. The dark blue lines in the lower right corner, closest to the interaction point, represent the pixel tracker. The purple and light blue lines represent the single and double sided strips, respectively. Horizontal lines correspond to layers in the barrel, while vertical lines represent strips in the endcap...... 38

2.6 The layout of the components of the electromagnetic calorimeter (ECAL) showing the barrel, the two endcaps, and the preshower. Crystals are grouped

viii together into modules and supermodules in the barrel, and into supercrystals and dees in the endcaps...... 40

2.7 A cross sectional area of the CMS detector highlighting the trajectory of a muon as it passes through the muon system. It leaves readings in four layers (or “stations” of the muon chamber...... 42

2.8 A diagram showing the cross section of a drift tube (DT) in the muon system. A map of the electric field and electric potential created by the anode wire and cathode strips is shown...... 43

2.9 The layout of the CMS muon system, showing drift tubes (DTs), cathode strip chambers (CSCs) and resistive plate chambers (RPCs). The view is in the longitudinal plane, with the center of the detector in the bottom left. . . 44

2.10 Luminosity recorded by the CMS experiment for 2010, 2011 and 2012 proton physics runs...... 45

3.1 View of primary vertex (red) and secondary vertices (blue) in the transverse plane at the interaction point. Shaded area indicate jets...... 50

3.2 Expected resolution for reconstructed energy in the ECAL barrel for the esti- mate made from either the ECAL supercluster energy, the tracker momentum measurement, or the combined measurement. The blue triangles are the abso- lute interval containing 68% of the probability, whereas the red circles assume a gaussian shape...... 52

3.3 Number of Interactions per bunch in the 2012 run at the LHC. . . . 60

4.1 Events surviving for signal and background processes following each state in the event selection. Events are for 8TeV events in the 4e channel...... 73

4.2 Distribution of the 3D Impact parameter and the particle flow isolation before the event selection. Observed data are shown in black and simulated events are shown as colored histograms, which are stacked...... 74

4.3 The effect of final state interference which is not modeled in signal simulation. Top: Z1 and Z2 mass before and after reweighting events to include these effects. Bottom: Magnitude of interference event weights...... 76

4.4 Distribution of the four lepton mass, m4l, for the 4e, 4µ, and 2e2µ channels combined...... 77

4.5 The invariant mass of the selected Z boson and the loose lepton selected for fake rate calculation. The plot shows events with zero layers with missing hits in the inner tracker, and with one layer missing hits. For events with nHitsmissing = 1, the three lepton mass has a prominent peak at the Z mass, demonstrating the presence of leptons from converted photons...... 79

ix 4.6 The probability for a loose electrons (left) or muons (right) to pass the selection for tight leptons, as a function of pT ...... 80

4.7 Events in the reducible background control region for 4e, 4µ, and 2e2µ final states combined...... 80

4.8 Estimate for reducible background in the signal region plotted separately for the final states 4e, 4µ, and 2e2µ...... 81

4.9 Tag and probe fits used to measure electron identification efficiency. Ontheleft is the di-lepton mass for events where the probe electron passes the selection criterion, and on the right is the di-lepton mass for events with probes failing the selection. The signal (green) and background (red) fits are shown for the failing probe plot...... 82

4.10 Electron reconstruction efficiencies in the Barrel in 7 TeV data...... 84

4.11 Electron identification efficiencies for the BDT electron identification in the Barrel in 7 TeV data...... 84

4.12 Muon efficiencies for Impact Parameter Significance cut (top left), Particle Flow Isolation vs. vertex (top right)...... 86

4.13 Muon efficiencies for Particle Flow Isolation vs. pT in the barrel (top left), and Particle Flow Isolation vs. pT in the endcap (bottom right)...... 86

4.14 Model of a single layer feedforward neural network (graphic from [4]). . . . . 93

4.15 Angular decay variables used as input variables for the BNNs. Differences ∗ between signal and background can be seen for decay angles cos(θ ), cos(θ1) and cos(θ2) whereas differences between signal production mechanisms can be seen for production angle, φ∗. Higher order correlations can be seen in the correlation matrix...... 97

4.16 Additional input variables for signal/background BNN: Z1 and Z2 masses nor- malized to show the difference in shape of the distributions for the signal and background processes...... 98

4.17 BNN discriminant and signal/background efficiency for a BNN trained with five production and decay angles as well as, m , m , and m . BNNs are Z Z ∗ 4l trained separately for 4e (right), 4µ (center) and 2e2µ left. Training is done with fully reconstructed Monte Carlo events at center of mass energy 7 TeV (top) and 8 TeV (bottom)...... 100

4.18 Input variables for VBF/ggH BNN: mjj and ∆ηjj, normalized to show the difference in shape of the distributions for the VBF, ggH, and ZZ processes. 101

x 4.19 Discriminating variables used to separate Higgs production mechanisms: pT,4l and BNNVBF , normalized to show the difference in shape of the distributions for the VBF, ggH, and ZZ processes...... 102

4.20 Receiver operator characteristic plots comparing the performance of multivari- ate discriminants. On the left is the signal/background BNN compared to a matrix element likelihood method (for the 2e2µ chanel, √s = 8TeV). On the right is the ggH/VBF BNN compared to a linear discriminant...... 102

4.21 Overtraining test: comparison of the performance of the BNN on signal and background files event samples for the channels 2e2mu (top left), 4µ (top right) and 4e (bottom). The Kolmogorov-Smirnov test (displayed as “KS test”) indicates good compatibility between testing and training sets. The plotted signal sample is the 125 GeV Higgs signal Monte Carlo. Both signal and background are for √s = 8 TeV MC...... 103

4.22 Correlation matrix for BNN input variables for gg H signal (with m = → H 125 GeV) and qq ZZ background...... 104 →

4.23 The effect of all systematic uncertainties on the signal (mH = 125 GeV) m4l distribution and BNN distribution...... 112

5.1 Comparison of observed and simulated data for the Z1 and Z2 distributions after event selection...... 114

5.2 Comparison of observed and simulated data for the m41 and pT,4l distributions after event selection...... 114

5.3 Two dimensional histograms in m4l and D(x) (signal/background BNN dis- criminant) for Higgs events (mH = 125 GeV) on the left, and ZZ background events on the right. Data are shown in black triangles and all final state channels are combined...... 115

5.4 Comparison of observed and simulated data for the VBF discriminant input variables, mjj and ηjj, after event selection...... 115

5.5 Comparison of observed and simulated data for the BNN distributions. The signal/background BNN is shown on the left, and the ggH/VBF discriminant is shown on the right...... 116

5.6 Schematic of the marginalization of systematic uncertainties in the hierarchical model...... 121

5.7 The pull of the µ and mH distributions measured by Eq. 5.13. A normal distribution is fit to the histogram. The plots shows the integral of the fit (p0), the mean of the normal distribution (p1), and the standard deviation of the fit, (p2)...... 123

xi 5.8 The observed significance as calculated from the Bayes factor for an obser- vation at a given Higgs mass and signal strength modifier, with √s = 8 TeV data...... 125

5.9 Likelihood scan of mH versus a global signal strength modifier, µ, which scales all signal production mechanisms (left). The likelihood scan for separate mod- ifiers for the VBF production mechanism and gluon fusion mechanism, ggH, is shown on the right...... 125

A.1 A single bin likelihood consisting of either a Poisson Distribution or a Poisson- Gamma distribution. For the Poisson-gamma likelihood, the effect of marginal- izing with respect to the gamma prior is to “smear” the distribution, account- ing for the uncertainty introduced by measuring the Poisson mean with a side-band experiment...... 130

xii LIST OF SYMBOLS

The following is a short list of symbols which are used throughout the document. π 3.1415926 ... Integrated Luminosity collected at the LHC L E/T Missing Transverse Energy GeV Giga electron volt TeV Tera electron volt fb−1 Inverse femtobarns: a measure of time-integrated luminosity pT Transverse momentum in the lab frame η Pseudorapidity

xiii ABSTRACT

This dissertation presents the search for the standard model Higgs boson and initial mea- surement of its properties using the Compact Muon Solenoid experiment at the Large

Hadron Collider. The Higgs boson is the last predicted particle of the standard model of . Its existence was predicted in the 1960s and it has been the subject of several large physics experiments. In July of 2012, at the Large Hadron Collider

(LHC) announced the discovery of a new particle with mass around 125 GeV. Using data from the 2011 and 2012 runs of the LHC we show the evidence for this new particle. While the discovery of this new particle is encouraging, its exact identity can only be confirmed by precision measurements of its fundamental properties. This dissertation concentrates on the decay of the Higgs boson to two Z bosons, each of which in turn decays to two observable electrons or two muons. We conduct a search for the Higgs boson in this channel, and then perform an initial measurement of the mass, couplings, and tensor structure of the newly discovered particle. The properties are shown to be consistent with a standard model Higgs boson, and the precision of the measurements of its parameters as of 2013 are presented.

xiv CHAPTER 1

INTRODUCTION

The Large Hadron Collider (LHC) is the largest scientific experiment ever constructed

[5]. It is a particle accelerator located at the European Organization for Nuclear Research

(CERN) in Geneva, Switzerland, built with the purpose of stretching the bounds of human understanding regarding the smallest scales that are experimentally accessible. The LHC does this by colliding protons together at the highest energies ever achieved. The aftermath of these collision is studied by four experiments in order to test fundamental theories of matter, energy, and force.

The Compact Muon Solenoid (CMS) detector is one of two general purpose experiments at the LHC and is the one considered in this dissertation [6]. The CMS experiment was designed to test a large number of theoretical models, a primary one being the standard model of particle physics (SM). The last particle predicted by the SM is the Higgs boson.

This particle, proposed over fifty years ago, provides a mechanism for electro-weak symmetry breaking, resolving a theoretical problem with the standard model, and describing how fundamental particles, such as electrons, obtain mass. The existence of this boson can be tested at the CMS experiment in multiple ways, most excitingly by observing the products of its decay to two photons (H γγ) or to four leptons (H ZZ 4l). While the former → → → has slightly higher sensitivity for discovery, the latter offers greater potential for probing the properties of the particle. On July 4th, 2012, CERN announced the discovery of a new boson at the LHC whose properties are consistent with that of a Higgs boson [7] [8].

1 However, determining the exact nature of the particle will require precision measurements.

Such measurements will be a major subject of research in the particle physics community for years to come.

The goal of this dissertation is to present the measurement of the properties of the newly observed particle using the data available at CMS as of 2013. The results are used to test the compatibility of the particle with that predicted by the standard model. Any deviation from the predicted standard model properties could be evidence of physics beyond the standard model and clues for building a more fundamental theory of nature.

1.1 Theory

In this section, I outline the standard model, interjecting historical context where possi- ble. I will end by connecting the theoretical introduction to the search for the Higgs boson and the measurement of the properties of the newly discovered particle.

1.1.1 Introduction to the Standard Model

The standard model of particle physics is our best attempt to explain the composition of the universe on a fundamental level. It does so by positing the existence of fields, which are mathematical structures that occupy every point in space and time. Classical fields, such as gravity or magnetism are familiar to most people from basic physics. For example, gravity is represented as a field by assigning a tensor to points in space, whereas magnetism is represented by assigning a vector at each point in space. A quantum field theory defines

fields as well as the rules by which fields interacts at different points in space and time.

Quantum fields are eponymous for because they must also obey known rules of quantum mechanics, in addition to classical symmetries. A consequence of imposing the rules of quantum mechanics on a field theory is that fields manifest themselves not as abstract mathematical structures spread out continuously throughout space, but as particles such as electrons and photons. Thus, as an experimental science, particle physics strives

2 to observe particles, and as a theoretical science, particle physics strives to explain these particles in terms of mathematical structure, which, for the most part means finding a quantum field theory that describes the particles that are observed. Since fields and particles are phenomena of the same underlying theory, the terms “particle” and “field” will often be interchanged in this introduction. However, strictly speaking, fields are presumed to be the fundamental quantities of which particles are their lowest energy quanta.

To be consistent with known physics, a field must obey certain mathematical and phys- ical symmetries. These symmetries come in two general forms which I will describe as external and internal. External symmetries are those which are known to be true from our observation of the universe. An example of this is translational invariance: the requirement that fields obey the same equations if we move everything uniformly in space, or, equiva- lently, if we move the coordinate system. Other symmetries are rotational invariance and invariance under boost, which connects different inertial reference frames. All of these sym- metries are contained in a more general symmetry group called the Poincar´egroup, with the latter two comprising the Lorentz group. Since we know these symmetries to be true from classical physics, any quantum field theory must be consistent with these symmetries, and any model we posit for the universe must obey these symmetry constraints.

Despite the restrictions of external symmetries, there are still a large number of possible theories, and we must rely on experiment to guide us to the correct theory. This leads to the second kind of symmetry, those I have referred to as internal symmetries. These are symmetries inside a given theory that are apparently true from experimental particle physics and allow us to connect particles and forces to a unified underlying theory. An internal symmetry manifests itself when one or more fields or particles can be exchanged without any cost of energy. The existence of such symmetries has allowed theorists to make sense of the zoo of particles that appeared in accelerator experiments throughout the second half of the twentieth century.

The culmination of this effort, and the triumph of particle physics, is the standard model of particle physics, which accurately predicts all of the particles observed thus far.

3 The standard model is a list of fields along with their interactions, which comprise (almost) all known particles and forces. In particular, it describes the electromagnetic, weak, and strong interactions, which are described by the fields of , leptons, , gauge bosons and the Higgs field. The SM falls short as a fundamental theory of nature in that it does not describe the gravitational interaction. It also fails to explain the cosmological phenomena of dark matter and dark energy. Because of the outstanding success of the standard model on the scale of contemporary physics experiments, but the apparent failure at gravitational and cosmological scales, it is widely believed that the standard model is embedded in a more general theory, or that it is an effective field theory at low energies. In this case small deviations would begin to appear as we experimentally reach higher scales in energy.

1.1.2 Anatomy of the Standard Model

The number of particles observed grew throughout the first part of the 20th century.

By studying the properties of these particles, physicists were able to classify particles and interactions and were able to formulate a relatively simple mathematical model to describe them. This model has successfully predicted the existence of several particles which were later verified experimentally, including the charm, bottom, and top quarks, and possibly the

Higgs boson. Additionally, the model has been tested to incredible precision at SLC, LEP,

HERA, the Tevatron, and currently at the LHC. All elementary particles are described by a set of numbers that determine their physical properties and their interactions. I begin my description of the standard model by arbitrarily picking one of these properties - spin. Spin, or intrinsic angular momentum, is a quantum mechanical property. All particles that have been observed in nature (with the exception of the newly discovered particle at the LHC)

1 have spin 1 or 2 . Particles with integer spin obey Bose-Einstein statistics and are called bosons, while particles with half-integer spin obey Fermi-Dirac statistics - which leads to the Pauli exclusion principle - and are called .

Fermions can be classified by their interactions into two categories: leptons or quarks.

4 The first category, leptons, interact via the only, while quarks in- teract both via the electroweak interaction and via the . Leptons can be further classified by their interactions, with some interacting electromagnetically and some only by the . There are six known fermions. But these fermions interact differently depending on their helicity, that is, their handedness, which is the direction of their spin relative to their motion. Fermions are said to have right-handed helicity if the direction of their spin and momentum is the same, and left-handed helicity otherwise. It turns out that fermions interact via the weak force only if they are left-handed. Therefore, left-handed fermions form doublets of fields – they can transform between each other at no cost except for the difference in their mass – whereas right-handed fermions form singlets that do not transform via the weak force. We will see the physical significance of such groupings of fields as we introduce the of the standard model.

There are six leptons, which we list as three doublets and six singlets:

νe νµ ντ   ,eR,νeR,   ,µR,νµR,   ,τR,ντL. (1.1) e µ τ  L  L  L       In addition to these particles, each particle also has an antiparticle. An antiparticle is a particle with the same mass but opposite electric charge to its counterpart. More specifically, a particle is identical to its antiparticle when transformed via charge conjugation, parity reversal and time reversal. The antiparticle of the electron is denotede ¯. The lower entries in each doublet are electrically charged while the upper entries, the neutrinos, are electrically neutral. The doublets introduced above are structurally identical and can be described by group theory. We can say that they transform the same as the elements of a SU(2) Lie group, which I will describe in more detail later.

5 There are also six quarks (called up, down, charm, strange, top and bottom), which can be similarly grouped:

u c t   ,uR,dR,   ,cR,sR,   ,tR,bR, (1.2) d s b  L  L  L       And their antiparticles.

Bosons are particles that mediate interactions between leptons and quarks. The ex- change of bosons communicates force from one particle to another. The spin-1 boson fields are known as “gauge fields” for reasons that will become clear later. The electrons and doublets, listed above, can transform by emitting a gauge boson. These bosons are as follows: there is the Bµ field, which is the generator of the U(1) symmetry group, and three fields, W 1,2,3, which correspond to the generators of the SU(2) symmetry group, which govern electroweak interactions. The strong force is mediated by an octet of bosons, G1,...,8, which correspond to the eight generators of the SU(3) group.

The final prediction of the standard model is the Higgs boson, a spin zero particle whose mass is a free parameter (although subject to some constraints). The subject of this dissertation is the search and identification of the Higgs boson using the CMS detector and the four-lepton decay channel. We will explore the properties of the Higgs at the end of this chapter.

1.2 Symmetries

A quantum field is a mathematical structure which assigns numbers or tensors to all points in four-dimensional space-time. Thus, a field can be written as a four-dimensional function, Φ(~x, t). All information about a quantum field theory, including its dynamics in space and its interactions with other particles, is represented by its Lagrangian, which is a functions of Φ and its first derivative. A symmetry is a property of a system which stays the same even when the system is changed. For example, if you draw a perfect circle on a piece

6 of paper, you can rotate the piece of paper and the circle is still the same. Thus, a circle is invariant under rotation about an axis perpendicular to its center. We observe certain symmetries in nature. As the Lagrangian describes the physics of fundamental fields, it must be constrained to obey the same symmetries that are observed in nature, such as translational and rotational invariance, and Lorentz invariance. For example, if we move an entire field in space by some constant, x x + δx, which is equivalent to changing the → origin of our coordinate system, then the Lagrangian should be the same, δ = 0. Similarly, L the Lagrangian is a Lorentz scalar, meaning that it is invariant under all rotations in space- time. Our fundamental theories must obey the symmetries that are observed in natural phenomenon. I call such symmetries external symmetries.

The field is a fundamental physical concept, but it is not directly observable. The observable quantities such as the charge or momentum of a field are represented by its

Lagrangian. A field can change as long as its Lagrangian, and thus its observable quantities, stay the same. We also require that adding a global phase to our field does not change the

′ form of the Lagrangian. For example, if we add an arbitrary phase to a field, φ eiθφ, → the Lagrangian,

† ∂φ ∂φ † † µ † = µ mφ φ = ∂µφ ∂ φ mφ φ, (1.3) L ∂xµ ∂x − − transforms as,

= ∂ (eiθφ)†∂µ(eiθφ) m(eiθφ)†(eiθφ) (1.4) µ − ′ ′ ′ ′ = ∂ φ †∂µφ mφ †φ , (1.5) µ − and is therefore said to be invariant with respect to a complex phase change.

We can identify further symmetries of a Lagrangian by positing the existence of multiple copies of the same field. If the Lagrangian of a theory is invariant when fields are exchanged according to group theoretic operations, then the fields form symmetry group. For example,

7 quarks come in three copies called “colours,” and the standard model Lagrangian is invariant when exchanging the quarks according to the SU(3) symmetry group. A consequence is that the Lagrangian is not only invariant under the exchange of particles, but also under the mixing of fields, or replacing a single field with a linear combination of fields according to the appropriate Lie group. Exchanging fields in such a way is known as a group transformation.

Such symmetries are essential in defining interactions in the standard model, as I will outline in the next section. These symmetries arise when one tries to describe the results of experiments with quantum field theories. I call these internal symmetries.

1.3 Gauge Invariance

A less obvious symmetry of the Lagrangian is known as local gauge invariance. In this case, we extend the global phase transformation to act differently at each point in space- time. That is, we require that the Lagrangian be invariant under a group transformation at a particular point in space, which may be a different transformation at another point in space. This transformation has the form,

φ eiθa(x)Ta φ. (1.6) →

Where Ta denotes matrices which are representations of the relevant symmetry group and

θa(x) are scalars which depend on location in space-time. This transformation, in which the symmetry transformations depend on the space-time coordinate x, is called a local gauge transformation. Note that if θ (x) θ , then this becomes a global phase transformation a → a as discussed above.

1.3.1 Example: Scalar Electrodynamics

To demonstrate this idea, consider the case of scalar electrodynamics [9]. If we start with the complex scalar field, φ(x), given in Eq. 1.5, and require a local U(1) gauge symmetry

(only one complex field which is invariant to phase rotation), the corresponding local gauge

8 transformation is,

φ(x) e−ieΓ(x)φ(x) (1.7) → φ†(x) eieΓ(x)φ†(x). →

However, notice that the Lagrangian is not invariant under this transformation. Now we do something crazy, whose utility and meaning will become apparent later. If we replace the derivative in the kinetic term of Eq. 1.5 with a different term, D(x),

∂ D ∂ + ieA , (1.8) µ → µ ≡ µ µ

where, Aµ is any vector field that transforms as,

Aµ(x) Aµ(x) ∂µΓ(x), (1.9) → − then our Lagrangian becomes,

= D φ†Dµφ mφ†φ (1.10) L µ − = ∂ φ†∂µφ e2A Aµφ†φ + ieAµ[(∂ φ†)φ + φ†∂ φ] mφ†φ. µ − µ µ µ −

The Lagrangian is now invariant under local gauge transformation [9]. The derivative Dµ is called the covariant derivative.

However, notice that in redefining the derivative to satisfy local gauge invariance, we have introduced a new field, Aµ, into the Lagrangian. In the case of a group of fields, symmetric under some symmetry group SU(N), we would have to introduce a new vector

field for each generator of the symmetry group. The new field is massless and interacts with the scalar field via a term in the expansion of the covariant derivative. Fields that are introduced by requiring local gauge invariance are called gauge fields. The general case of a gauge theory based a symmetry group is called a Yang Mills theory.

9 Notice that the gauge field doesn’t yet have a kinetic term describing its free-field dy- namics. If we try to add a term similar to the term of the scalar field, ∂µAµ, we find that it is not invariant under the gauge transformation, Eq. 1.9. However, we can define a “field strength” tensor, F = ∂ A ∂ A , which is invariant under this transformation. Since µν µ ν − ν µ µν Fµν is invariant, the term FµνF is also invariant. Our Lagrangian then becomes,

= D φ†Dµφ mφ†φ F F µν. (1.11) L µ − − µν

Building on this with a more real world example, consider the case of electrodynamics. The

Dirac equation describes a free charged field such as an electron field, Ψ:

= Ψ(¯ iγµ∂ m)Ψ. (1.12) L µ −

If we require local gauge invariance, we need the Lagranian to be invariant under the transformation,

Ψ e−iθ(x)Ψ. (1.13) →

Again this necessitates the introduction of a covariant derivative,

D ∂ + ieA , (1.14) µ ≡ µ µ which yields the Lagrangian of ,

1 = ψ¯(iγµD m)ψ F F µν. (1.15) LQED µ − − 4 µν

We have moved from the free theory to an interacting theory by including an interaction term for A containing the covariant field strength tensor, F µν = ∂ A ∂ A . µ µ ν − ν µ We should reflect on the meaning of the covariant derivative. In mathematics covariant derivatives are defined in order to specify the change in some quantity tangent to a manifold.

So, for example, if you have a vector which is tangent to a sphere, and you move it along

10 the sphere, the vector would have to change in order to maintain its tangency. Similarly, since we have restricted our fields to be invariant under gauge transformations, but gauge transformations vary from point to point in space, our derivative which defines the change in a field in space, must be modified to account for the change in gauge transformation. So by starting with the Lagrangian for a charged particle, and requiring local gauge invariance, geometry necessitates the term Aµ, which after quantizing becomes a massless spin-1 field, the excitations of which corresponds to photons.

1.4 The Standard Model Symmetry Group

The theory which describes the unified interaction of electromagnetism, the weak in- teraction, and the strong interaction, is a Yang-Mills theory, which is based on the direct product group SU(1) SU(2) SU(3) . The SU(2) group has three generators, whereas, × L × C as we have seen, the U(1) group has one generator. Therefore, there exists one field, Bµ,

1 2 3 corresponding to the generator of the U(1), YW, and three fields, W , W , and W , which correspond to the SU(2) generators τL, which in the matrix representation of the group, can be written as Pauli spin matrices.

These bosons, which we denote collectively as Vµ, are coupled to leptons and quarks, col- lectively denoted Ψ, if we enforce local gauge invariance. Upon doing so we must introduce a covariant derivative, as before:

Y D Ψ= ∂ ig T Ga ig T W a ig q B , (1.16) µ µ − s a µ − 2 a µ − 1 2 µ  

11 a where we have also introduced the gluonic fields Gµ. We can also construct gauge-invariant field strength tensors,

Ga = ∂ Ga ∂ Ga + g f abcGb Gc , (1.17) µν µ ν − ν µ s µ ν W a = ∂ W a ∂ W a + g ǫabcW bW c, µν µ ν − ν µ 2 µ ν B = ∂ B ∂ B , µν µ ν − ν µ where the terms ǫabc and f abc arise because the groups SU(2) and SU(3) are non-Abelian.

Everything so far is very elegant. In fact, we have followed the historical trajectory of quantum field theory [10]. We introduce the idea of local gauge symmetries, which necessitates the existence of gauge bosons. We apply this theory to electromagnetism and see that we have predicted the photon. We then try to expand this to include the other known forces, the strong force and the weak force and say that they also behave according to non-Abelian gauge theories. At this point, however, nature introduces a complication.

It was observed in 1957 that parity is violated in weak interactions. This is because only left-handed fermions (or right-handed antifermions) couple via the weak interaction. So left- handed fermions form SU(2) doublets, whereas right-handed fermions form SU(2) singlets.

We will see later how this creates problems in forming a complete standard model. The electroweak part of the SM Lagrangian is thus written,

1 1 = ψγ¯ µ i∂ g′ Y B g ~τ W~ ψ, (1.18) LEW µ − 2 W µ − 2 L µ Xψ   where τL is non-zero only for left-handed particles, whereas YW acts on both left and right- handed fields. Table 1.1 lists the fields of the standard model introduced up until this point along with their group symmetries and quantum numbers.

12 Table 1.1: Fields of the standard model and their group symmetries [1]

SU(3) SU(2)L U(1)Y u 1 QL = 3 2 3 d L   4 uR 3 1 3 d 3 1 2 R − 3 ν L = 1 2 -1 L e  L eR 1 1 -2

It is now possible to write down the SM Lagrangian. I will restrict it to terms involving the electroweak sector only:

1 1 1 = Ga Gµν W a W µν B Bµν (1.19) LSM −4 µν a − 4 µν a − 4 µν ¯ µ µ ¯ µ µ ¯ µ +LiiDµγ Li +e ¯Ri iDµγ eRi + QiiDµγ Qi +u ¯Ri iDµγ uRi + dRi iDµγ dRi .

This theory involves a massless W triplet and B singlet. However, it is known from experiment that electromagnetism is a long-range force, while the weak-force is a very short range force. So how could they both be described by the same theory in terms of massless gauge bosons? In fact, the W s must be massive. However, adding a mass term for the W bosons but not for the B would break the U(1) SU(2) symmetry present in the × Lagrangian. A similar problem arises for fermions. As we discussed earlier, fermions are represented mathematically as Dirac spinors. The chirality of a particle determines whether it transforms under left or right handed representations of the Poincar´egroup. Dirac spinors have both left-handed and right-handed components, so we can write a spinor, Ψ, in terms of its left and right handed projections Ψ = ΨL +ΨR. The left-handed term, ΨL, and the right-handed term, ΨR can be written in terms of the left-handed and right-handed

1−γ5 1+γ5 projection operators as ΨL = 2 Ψ and ΨR = 2 Ψ, respectively. Thus a mass term for

13 a spinor has the form,

= mΨΨ¯ = m(Ψ Ψ Ψ Ψ ). (1.20) Lf.mass L R − R L

But adding a term of this form is forbidden since it violates SU(2)L invariance. Since the charged leptons are not massless, in order for a U(1) SU(2) electroweak theory similar × to the one presented here to be a useful theory of nature, it must be a broken symmetry.

In the next two sections we will discuss the mechanism for breaking electroweak symmetry.

First we will introduce the concept of a Higgs field.

1.5 The Higgs Field

In the previous section, we saw that there is a problem with the standard model: the bosons predicted by a Yang-Mills gauge theory must be massless, otherwise the SU(2) gauge invariance would be broken. But the masses of the W and Z boson must be non-zero to explain the short range of the weak force [11] [12] [13]. So while the gauge theory that works for electromagnetism seems like a natural solution for the weak force, all experimental evidence pointed to a force mediated by a much heavier particle. This means that the

SU(2) U(1) symmetry of the electroweak sector must be a broken symmetry. × The explanation of how this symmetry could be broken came in flurry of papers in the early sixties, yielding the concept that is known as the Higgs-Brout-Englert-Guralnik-

Hagen-Kibble mechanism, more popularly known as the Higgs mechanism [14] [15] [16]

[17].

1.5.1 The Higgs Mechanism

To illustrate this theory, consider again the simplified model consisting of a single com- plex scalar field, φ = φ1 + iφ2, which we call (with apologies to Brout, et al.) the Higgs

14 field, which has Lagrangian,

1 = (∂µφ†)(∂ φ) V (φ), (1.21) L 2 µ − 1 1 V (φ)= µ2 φ†φ + λ( φ†φ )2. (1.22) 2 | | 4 | |

Looking at the potential term, notice that if the mass term, µ2, takes a negative value, then the minimum of this potential occurs at a non-zero value of φ. Figure 1.1 shows this potential in complex space for the arbitrary values µ = 1, λ = 0.5. We can see that the minimum energy is not at φ = 0, but actually occurs for a continuous set of values on a

2 circle of radius ν = µ , which is shown by the red line. The Lagrangian is invariant − λ q with respect to the position on this circle.

If we expand our field theory around the minimum in potential, ν,

1 φ = (ν + φ1 + iφ2) (1.23) √2 we obtain the effective Lagrangian for the Higgs sector,

µ2 λ V (Φ) = φ2 + φ 4 (1.24) 2 2 4 | 2| 1 1 = µ2(φ2 + φ2)+ λ(3ν2φ2 + 2ν2φ2 + φ4 + φ4 + 2φ2φ2 + 4νφ3 + 4νφ φ2) 2 1 2 4 1 2 1 2 1 2 1 1 2 λ λ λ = µ2φ2 + φ4 + φ4 + φ2φ2 + λνφ3 + λνφ φ2. − 1 4 1 4 2 2 1 2 1 1 2

Notice that the φ2 field is massless. If we repeat the exercise for an ntuplet of n fields,

† 2 2 2 Φ = φ1,φ2,...,φn, with a similar potential term, Φ Φ = φ1 + φ2 + ... + φn, we would find similar results. The minimum of the potential would be on a hypersphere of radius ν. After choosing an arbitrary direction on this sphere and expanding around that minimum, we find that we have one massive field and n-1 massless fields. This is a fundamental example of the

Goldstone theorem, which states that for every spontaneously broken continuous symmetry, the theory will obtain a massless spin-0 boson [18]. These fields are called Goldstone bosons

15 2.0

1.5

)

1.0 Φ

V(

0.5

0.0

0.5 − 1.0 0.5 0.0 1.0 0.5 0.5 2 − 0.0 φ − 0.5 1.0− φ1 1.0 −

Figure 1.1: Two dimensional example of Higgs potential. The red circle signifies the degen- erate space of minimum energy.

16 and will become relevant to the Higgs search in the next section.

1.5.2 Spontaneous Symmetry Breaking

Moving slowly towards a real world model, let us see what happens when we impose an

SU(2) local gauge symmetry on the scalar field considered above. Consider again the case of scalar electrodynamics givin in Eq. 1.11, but add a quartic coupling term,

1 1 = D φ†Dµφ µ2 φ†φ + λ( φ†φ )2 + F F µν. (1.25) L µ − 2 | | 4 | | µν

This Lagrangian is, by construction, invariant under local U(1) transformation, φ eieα(x)φ. → If we expand the kinetic term in the Lagrangian, we get,

1 1 D φ†Dµφ = (∂µφ )(∂ φ )+ (∂µφ )(∂ φ )+ ν2λφ2 (1.26) µ 2 1 µ 1 2 2 µ 2 1 1 + e2ν2A Aµ + ieφ∂ Aµ + ie2(∂ φ )A + ... 2 µ µ µ 1 µ

Our previously massless gauge boson, Aµ now appears with a mass term. Notice also that φ1 has gained a mass term but φ2 did not. Indeed, φ2 now represents the massless Goldstone boson predicted by Goldstone’s theorem. We now have a problem. In our simplified model we have demonstrated that it is indeed possible to generate mass for a gauge boson. The penalty seems to be that we now have a massless particle. This is a penalty since such a particle would be very easy to detect, and no such particles have ever been observed, so any theory attempting to use spontaneous symmetry breaking to generate masses must explain this.

We have another problem, however. That is, the above Lagrangian has more degrees of freedom than we started with. Initially, we had one complex scalar with two degrees of freedom (2 d.o.f.) and one massless vector boson (with 2 d.o.f.), and in the end we now have our two scalar fields, plus a massive gauge boson (totaling 3 d.o.f.). Also, we have unphysical bilinear terms in the Lagrangian. We can make this Lagrangian more comprehensible by

17 noting that the complex field, φ = ν + φ1 + iφ2, can be equivalently written as,

iΓ(x)/ν φ = e (ν + φ1(x)). (1.27)

But our original field is invariant under gauge transformations of the form eıα(x). So we are free to add a gauge term to the vector boson A A + 1 ∂ Γ(x) and also rotate the µ → µ eν µ complex φ field by e−iΓ(x). This rotation eliminates the term eiΓ(x), but now of course our effective Lagrangian is not invariant under the gauge transformation since we have set the gauge. Our scalar field becomes,

φ = ν + φ1(x). (1.28)

And the new Lagrangian is,

1 1 D φ†Dµφ = (∂µφ )(∂ φ )+ e2ν2A Aµ (1.29) µ 2 1 µ 1 2 µ 1 +ν2λφ2 + e2ν2A Aµ + ... 1 2 µ

The massless goldstone boson disappears from the Lagrangian, as do the bilinear terms, and only physical fields remain [19]. This gauge is known as the unitary gauge. We are left with a massive gauge boson, and one massive scalar field. But the Lagrangian no longer appears as invariant under the U(1) gauge symmetry. Note that the actual theory is still

U(1) invariant, but the theory evaluated around the vacuum state (i.e. the lowest energy state of the theory) is not. We say that the U(1) gauge symmetry is spontaneously broken

(or spontaneously hidden).

As was noted earlier, we know that the electroweak symmetry must be broken because of the non-zero masses of the W and Z bosons. This example shows that this is possible for the simplified case considered. But we end up with an artifact which is a massive scalar boson. In the next section I will build on this simple example to adumbrate the case for the standard model as a whole, introducing a Higgs mechanism to the SU(2) U(1) symmetry × group, and expanding around the minimum of the Higgs potential. We will see that indeed,

18 our gauge symmetry is no longer manifest in the theory, and the W and Z bosons gain mass. The same field can also be used to provide mass terms for fermions.

1.5.3 Standard Model Higgs Field

In the standard model, we require a more sophisticated mechanism, since we must describe the masses of the three electroweak gauge bosons. We posit the existence of a complex scalar doublet [19], represented by Φ,

+ 1 φ 1 φ1 + iφ2 Φ= = , (1.30) √2  0  √2   φ φ3 + iφ4         which couples to the electroweak sector if we add the following gauge invariant Lagrangian terms to the SM Lagrangian (Eq. 1.20),

=(DµΦ†)(D Φ) V (Φ) (1.31) LH µ − 1 1 V (Φ) = µ2 Φ†Φ + λ( Φ†Φ )2. (1.32) H 2 | | 4 | |

The field Φ interacts with the W and B fields via the covariant derivative,

g g′ D = ∂ + i τ W i B Y. (1.33) µ µ 2 · µ − 2 µ

As was done in the last section, we can expand the potential around some arbitrary minimum of the Higgs potential. Since the proposed Higgs field has four degrees of freedom, this

2 minimum now lies on a four-dimensional hypersphere of radius ν = µ . Similar to Eq. − λ q 1.27, we can use our gauge invariance freedom to rotate to the unitary gauge,

0 0 −iθ(x)τ Φ= e     (1.34) h(x)+ ν → h(x)+ ν        

19 If we redefine our fields as,

1 W ± = (W 1 iW 2) (1.35) √2 µ ∓ µ 3 g2W g1Bµ Zµ = µ − (1.36) 2 2 g2 + g1 3 g2W + g1Bµ Aµ = pµ , (1.37) 2 2 g2 + g1 p we obtain the following mass terms,

2 2 2 2 νg2 + −µ ν g1 + g2) µ Wµ W + ZµZ , (1.38) 2 2 !   p while the Aµ term remains massless. The masses of the fermions can also be generated using the same Higgs field. The quarks can be coupled to the leptons by adding the following gauge invariant Yukawa interactions to the SM Lagrangian 1,

= λ L¯Φe + λ Q¯ Φd + λ Q¯ Φcu + h.c. (1.39) LF e R d L R u L R

In the unitary gauge, this becomes:

= λ e (ν + h)e + ... (1.40) LF e L R

This corresponds to a fermion mass term, me,

λeν me = , (1.41) √2 and a Higgs coupling term, λe me ghff = = , (1.42) √2 ν

1h.c. indicates that we add the hermitian conjugate of the preceding terms

20 which defines the Higgs couplings in terms of the mass and the

(VEV) of the Higgs field, ν. The VEV is given in terms of the W mass and g2,

2m ν = W = 246 GeV, (1.43) g2

where mW is measured, and g2 can be measured also using the W coupling to fermions (via muon decay, which is mediated by the W ). The remaining terms in the Lagrangian of interest involve the Higgs itself,

µ2 λ V =(DµΦ†)(D Φ) Φ†Φ (Φ†Φ)2. (1.44) H µ − 2 − 4

In the unitary gauge this becomes:

µ2 λ λ V =(∂µφ)(∂ φ) (η + h)2 (η + h)4 =(∂µφ)(∂ φ) λν2h2 λνh3 h4. (1.45) H µ − 2 − 4 µ − − − 4

The Higgs mass is then expressed as,

2 mh = 2λν . (1.46)

There are also tri-linear and quartic self-coupling terms. With this, we see that the Higgs couplings to fermions and gauge bosons are determined exactly by the theory, with the Higgs mass as the only parameter (given that the masses of the charged fermions are known). This is very fortunate because it reduces the search for the Higgs boson to a one parameter search, which significantly simplifies the analysis, as we will see later.

We note that the proposed field must be a scalar. Since the Higgs field has a non-zero value in the vacuum state (a non-zero VEV), a non-scalar Higgs would break the Lorentz invariance of the vacuum. Another way to see this is by looking at lepton mass terms which arise after spontaneous symmetry breaking. Since these terms contain a Higgs field, the Higgs field must be a scalar. Therefore, the Higgs boson is predicted by the standard

21 model to be a CP even, spin-zero particle (quantum numbers J PC = 0++). Moreover, the couplings of the Higgs boson to standard model particles are determined by the theory. So to verify that the particle observed at 125 GeV is indeed the Higgs boson requires accurate measurement of the couplings as well as the tensor structure. The Higgs tri-linear and quartic self-coupling of the Higgs should also be measured in order to permit reconstruction of the Higgs potential. But this is not be possible with the LHC data currently available.

1.5.4 Hierarchy Problem and Other Issues

The discovery of a standard model Higgs boson would be very satisfying in that it wraps up all of the theoretical issues presented so far. However, if one considers a broader perspective the SM is problematic even with a Higgs field. The main issue is the following.

Since the Higgs couples to almost all SM particles, the predicted mass of the Higgs boson is affected by quantum corrections which increase the effective mass of the Higgs by astronomical amounts. Since the Higgs is required to be less than about 1 TeV, the bare mass of the Higgs would need to cancel corrections to the Higgs mass to one part in 1017 GeV.

This poses a philosophical problem: the SM has been “fine tuned” to a high degree since quantum corrections and the bare mass cancel almost exactly to result in a Higgs mass at the electroweak scale. Since the bare mass doesn’t arise from some natural symmetry it seems improbable that nature would “choose” a finely tuned mass. This problem is known as the Hierarchy Problem. Some take this to be an indication that there is some other symmetry which requires the Higgs boson to have such a low mass (Super Symmetry is the most popular theory for describing this). In any case, it seems clear that the standard model is at best the correct effective theory for physics occurring below the TeV scale. Regardless of the ultimate theory of nature, which mankind may never know, the standard model at least provides a correct, predictive theory up to that energy scale.

22 1.6 Higgs Properties

As we have seen, the standard model predicts all the properties of the Higgs field except for its mass: the Higgs couples to standard model particles with strengths proportional to their masses, and, for the Higgs to be the mechanism for electroweak symmetry breaking it must also have spin zero and even parity. To confirm the identity of the particle we need to measure its properties as accurately as possible.

Even if the couplings are seen to be roughly consistent with the SM, it is still interesting to measure them as precisely as possible because it is always possible to construct a model that modifies the couplings of the Higgs, or, admittedly more unlikely, which mimics the

Higgs altogether. In the absence of other discoveries at the LHC, the Higgs boson can be used as a probe for new physics beyond the standard model.

In this section, I summarize the necessary theory and strategy for an initial measurement of the Higgs boson properties. I start by outlining the strategy for measuring deviations from the SM couplings of a spin zero particle. I will then describe the measurement of the spin and parity of the newly discovered particle.

1.6.1 Decay Width

SM The SM Higgs boson at 125 GeV is predicted to have a total width of Γh = 4 MeV, resulting from the small coupling to b quarks and the phase space suppression of the WW decay channel [19]. If the Higgs boson couples to a non-SM particle with sufficiently large (10−2), it could increase the total width of the Higgs boson and encompass a large branching fraction [20]. However, the experimental mass resolution for the most sensitive channel at the LHC is a few GeV, making it difficult to measure the total width of the resonance. Therefore, we leave the measurement of the total width to a future analysis.

Given the experimental mass resolution, at low mass we can take the Higgs boson width to be zero, which simplifies calculations pertaining to the Higgs couplings.

23 1.6.2 Higgs Boson Self Coupling

In principle the Higgs tri-linear and quartic coupling should also be measured, but this is well beyond the capability of the current methods and current data sets.

102 1 WW s= 8 TeV pp bb → H (NNLO+NNLL QCD + NLO EW)

-1 ττ gg 10 LHC HIGGS XS WG 2012 10 LHC HIGGS XS WG 2013

H+X) [pb] ZZ →

pp → qqH (NNLO QCD + NLO EW) cc (pp pp σ pp → 1 → WH (NNLO QCD + NLO EW) -2 ZH (NNLO QCD +NLO EW) 10

pp → ttH (NLO QCD) γγ Zγ Higgs BR + Total Uncert [%] HiggsTotal BR + 10-1 10-3

µµ 10-2 10-4 80 100 200 300 400 1000 80 100 120 140 160 180 200 M [GeV] MH [GeV] H

Figure 1.2: Predicted production cross section (right) and branching ratio (left) for a stan- dard model Higgs boson produced in proton-proton collisions at the LHC [2].

1.6.3 Couplings

There exist many models which slightly modify the couplings of the Higgs boson. As a simple example, consider a model which consists of two complex Higgs doublets instead of one. In that case we are free to write a Lagrangian in which one doublet couples to vector bosons, while the other couples to fermions. Then it would be possible to modify the couplings for fermions while leaving the couplings to the W and Z fixed, or visa versa [21].

Another simple modification to the SM is to include new particles that appear in the top triangle for gluon-gluon fusion (see Fig. 1.3). These particles can alter the Higgs production cross sections. So the study of the Higgs boson properties provides not only a test of the standard model, but also a way to probe for new physics.

24 If we neglect the width of the Higgs boson, the cross section for a production process, xx H, and decay, H yy, can be written as, → →

σ Γ σ(xx H yy)= xx × yy , (1.47) → → ΓH

where σxx is the production process cross section, Γyy is the final state partial width, and

ΓH is the total width of the Higgs boson, which can differ from the sum of standard model decay widths if there are new particles that couple to the Higgs boson.

Testing deviations from the standard model: Continuing with the presentation in section 1.5.3, we examine the predicted couplings for standard model particles. For vector bosons, this involves a Lagrangian term of the form,

g M 2 h M 2 W +W −µ + Z ZZ , (1.48) M W µ 2 W   whereas heavy fermions couple with the term,

m m m t htt¯ + b h¯bb + τ hττ,¯ (1.49) ν ν ν and gluons couple via an effective vertex, dominated by a top quark loop (see Fig. 1.3). In the large top quark mass limit this term has the form [?],

αS µν hGµνG . (1.50) 12√2πν

To test the validity of the standard model prediction for the Higgs boson, each of these couplings should be measured. We can introduce a parameterization of deviations from the standard model by introducing an additional degree of freedom for each of the above terms.

We denote these degrees of freedom by κ [22] [23], that is, we introduce a κi for each Higgs coupling, where i = Z,W,t,b,g, and H, where we use g to denote gluons, t and b for quark

Yukawa couplings, and H for a scale factor on the total Higgs boson width. Production

25 and decay rates are proportional to κ2.

A particular observation channel comprises several production processes, and the final state cross section is affected by both production cross section and decay partial width.

The predicted production cross sections for the various production modes, as well as the predicted branching fractions to decays to standard model particles, is shown in Fig. 1.2.

Higgs boson production at the LHC is dominated by gluon-gluon fusion (ggH), wherein two gluons produce a Higgs via a quark triangle (dominated by top), and by vector boson fusion

(VBF), wherein two quarks radiate W or Z bosons which “fuse” via a VVH vertex. These mechanisms can be seen in Fig. 1.3. There are also small contributions from associated

Higgs production (VH and WH) and top-top Higgs.

Figure 1.3: Higgs production mechanisms at the LHC: gluon-gluon fusion (right), vector boson fusion (left).

For a Higgs boson with a mass of 125 GeV, the ggH production process accounts for roughly 88% of the Higgs boson production at the LHC. The VBF and VH processes account for another 9%, leaving only a small amount for ttH. With data collected as of 2012, only a few events are expected in the H ZZ channel for VBF and VH together. It is therefore → impossible to measure all the couplings of the standard model with data from Run 1 of the LHC. It is thus necessary to find a strategy for measuring a maximal reduced set of parameters given the observed data. This is done by introducing an additional set of scaling factors, called signal strength modifiers, which are proportional to experimental observables.

26 We denote these parameters by µ. We can then use measurements of µ to place constraints on κ. The simplest example of µ is an overall parameter that scales the observed cross section for a particular observation channel for all production mechanisms. For example, for H ZZ 4ℓ we introduce a parameter, → →

(σ → BR(H ZZ)) µ = gg H+VBF +VH+ttH × → , (1.51) (σ → BR(H ZZ)) gg H+VBF +VH+ttH × → SM and measure µ. A value of µ = 1 corresponds to the standard model prediction.

It turns out that we can do better than one global µ factor with the existing data set.

Table 1.2 lists the κ parameters that characterize the deviations from the standard model, and the µ for each decay channel which are observable with data available in 2012. The

µ parameters are listed by decay channel and further separated by production mechanism.

Clearly, in order to make inferences about κ for a measured set of µ, we must make some further assumptions or otherwise reduce the number of target parameters. We therefore define a next-to-minimal set of two parameters, κ k = k = k ,... and κ κ = κ , f ≡ b t b V ≡ W Z effectively varying all the SU(2) vector boson and fermions couplings together.

As we shall describe, it is possible to separate production mechanisms experimentally by identifying quarks produced in the VBF production process, or by deriving production mechanism sensitive variables. The H ZZ 4ℓ channel is sensitive both to Higgs → → couplings between quarks as well as vector bosons, and thus to the κF and κV parameters. One must note that LHC Higgs searches rely heavily on simulated data constructed under the assumption of standard model parameters. Therefore, values far away from unity do not actually represent physically motivated Lagrangians. However, these values still represent a test of the standard model, but their physical meaning requires further investigation beyond the context of a SM particle search.

It is possible to construct a global fit for all the κ simultaneously using all search channels. Such studies are conducted by the CMS Higgs combination group (see e.g. [24] and [2]). This requires the definition of a statistical model of enormous complexity, and

27 Table 1.2: Coupling modifiers and signal strength modifiers for decay channels which are observable with 2012 data at the LHC. The µ are measured separately for each channel and can then be used to constrain the global κ parameters.

Decay Production Mechanism Coupling Parameter Measured µ 2 2 2 2 ZZ 4l gg H κZ κg = κV κf µggH+ttH ∗ → → 2 2 2 2 2 VBF (κW + rκZ )/(1 + r)κZ = κV κV µVBF +VH ∗ 2 2 Z ZH κZ = κV µVBF +VH → 2 2 2 2 WW 4l gg H κgκW = κV κf µggH+ttH ∗ → → 2 2 2 2 2 VBF (κW + rκZ )/(1 + r)κW = κV κV µVBF +VH ∗ 2 2 W WH κW = κV µVBF +VH → 2 2 2 2 γγ gg H κgκγ = κf κγ µggH+ttH → 2 2 2 2 2 VBF (κW + rκZ )/(1 + r)κγ = κV κγ µVBF +VH + − 2 2 2 2 τ τ gg H κgκτ = κf κf µggH+ttH → 2 2 2 2 2 VBF (κW + rκZ )/(1 + r)κτ = κV κf µVBF +VH ∗ 2 2 2 W WH κW κτ = κV κf µVBF +VH ¯ → 2 2 2 2 bb gg H κgκb = κf κf µggH+ttH →¯ 2 2 2 2 ttH κt κb = κV κf µggH+ttH W ∗ WH κ2 κ = κ2 κ2 µ → W b V f VBF +VH requires one to make assumptions about variable correlations and systematic uncertainties.

It also presents issues of scientific communications if one wishes eventually to share LHC data publicly [25]. We therefore limit ourselves to those parameters to which we are sensitive using the H ZZ 4ℓ search channel and data available in 2013, but measure parameters → → using the minimal set of assumptions. By constructing an independent analysis, including statistical analysis, we strengthen the measurements made using the ZZ decay channel, thereby increasing confidence in the result by providing a necessary validation. In section

5.2, we present the details of such an analysis and show results for κV vs. κF .

1.6.4 Spin and Parity

The standard model predicts a scalar, even-parity Higgs boson. However, there are models that predict particles decaying to two Z bosons that could have similar mass to the

′ Higgs (the Z , for example: [26]). More plausibly, general models based on Supersymmetry

28 predict multiple Higgs bosons, or Higgs bosons with slightly different properties, such as a

CP-violating Higgs bosons, with mixed parity states, that could help explain the matter- antimatter asymmetry in the universe [27]. It is thus scientifically interesting to determine the exact tensor structure of the resonance at 125 GeV in each decay channel. In this section, we discuss the implications of a non-standard model tensor structure in the H ZZ 4ℓ → → decay channel.

It is possible to treat this problem in a general way by constructing all allowed spin/parity terms for a generic resonance decaying to two Z bosons. We do this by adding all terms to the standard model Lagrangian with couplings to W and Z bosons that are SU(2) U(1) × invariant, but which do not have a vacuum expectation value, and thus do not account for electroweak symmetry breaking. Angular momentum conservation introduces correlations in the angular distributions of the decay products. The angular distributions of the four ob- served leptons can therefore be used to infer the spin and parity of the parent resonance [28].

A generic particle, X which decays to two Z bosons that in turn decay to two leptons each, has 12 degrees of freedom. Four of these are determined by the mass, pT , η, and φ of X. Two of them are set by the Z boson masses. One angle defines the relative position of the two Z bosons in the rest frame of the Higgs boson. This leaves five angles which characterize the decay. Fig. 1.4 shows one such parameterization of angular variables which define the decay of the Higgs to two Z bosons. One angle is an arbitrary angle which is symmetric in phi and provides no information. The angle θ∗ is sensitive to the production mechanism. The remaining angles θ1, θ2, Φ1 and Φ2 are sensitive to the decay couplings [3].

Given early data at the LHC it is impossible to infer directly from the line shapes of the angular distributions the spin and parity of the new boson. Therefore, the exigent strategy in uncovering the spin structure is hypothesis testing: one constructs a discrete set of models for the resonance for a set of plausible spin and parity values. A frequentist figure of merit can then be used to assess the compatability of each model with the data using

29 Figure 1.4: Angles that characterize the decay of a resonance to two Z bosons [3].

the standard model as a baseline. If the probability for all alternate models is sufficiently low one can declare that the SM is the preferred model. Alternatively, one can rank models using Bayes factors [29].

To construct models for comparison with the standard model, we write down all possible

Lagrangian terms for a boson decaying to two Z bosons by writing down the terms that are consistent with Bose statistics, gauge invariance and Lorentz invariance. For example, a spin zero particle decaying to two Z boson has the following generic amplitude:

A(X ZZ)= ǫµǫν ag M 2 + bq q + cǫ qαqβ . (1.52) → 1 2 µν X µ ν µναβ 1 2  

µ ν The terms ǫ1 and ǫ2 correspond to the spin polarizations of the two decay products. The terms with coefficients a and c correspond to an even parity resonance, while b corresponds to an odd parity resonance. Similar but progressively more complicated terms can be constructed for the spin-1 and spin-2 cases, (see [3]). A complete measurement of the tensor structure would be a measurement of each term in the effective Lagrangian, which is complicated by possible interference terms. Barring this, an initial step using the initial

30 data sets is to construct hypothesis tests of various pure spin and parity states. Initial results for the measurement of spin and parity were reported in [30]. This dissertation reports an independent measurement.

The Landau-Yang Theorem [26] describes a selection rule for particles decaying to two photons. Since the spin-1, massless photon has two polarization states, the spin angular momentum of the parent particle decaying to two photons can only be zero or two. In the case of a Higgs boson decaying to two Z bosons, this is not the case, as the Z boson has three polarization states, allowing the possibility of a spin-1 parent particle. It is possible to have a scenario in which two particles are present with near degenerate mass, one of which is a scalar that couples to photons and the other a spin-1 vector that couples to Z bosons. This scenario, while unlikely, must nevertheless be tested, so we include a spin-1 boson hypothesis in our model comparison.

31 CHAPTER 2

EXPERIMENTAL APPARATUS

2.1 The Large Hadron Collider

The Large Hadron Collider is a particle accelerator consisting of a 27 kilometer ring that resides in a tunnel approximately 100 meters below the earth’s surface on the Franco-Swiss border near Geneva Switzerland. The main ring consists of superconducting dipole magnets that generate a magnetic field perpendicular to the radius of the ring. Due to Faraday’s law, charged particles traveling through a magnetic field undergo a change in momentum perpendicular to the direction of the field, B, and of their motion, v,

dp = qv B. (2.1) dt × where q is the charge of the particle. One can see from Fig. 2.1 that this induces an effective force pointing towards the center of the circle. The large magnetic field made possible by superconducting magnets makes it possble to bend the trajectory of charged particles in a circle, keeping them inside the vacuum chamber in the beam line. The LHC ring is built to accelerate protons and lead nuclei in two counter-rotating beams that trace out an approximate circle, with collision points at four experiments located at four points along the circumference. The LHC ring contains two vacuum pipes, each containing a chain of counter-rotating bunches of protons. The four LHC experiments are: ATLAS, CMS, LHC-b and ALICE. The outline of the LHC ring and the position of each experiment is illustrated

32 in Fig. 2.2 and a more detailed map of the accelerator complex is shown in Fig. 2.3.

Within the beam-pipe, the magnets are arranged in a complicated pattern around the two beam cavities so as to maintain the required magnetic field perpendicular to the trajec- tory of the particles. This magnet scheme and the resulting dipole magnetic field is shown in Fig. 2.1.

Figure 2.1: Cross section of the LHC beam pipe (left) and a map of the magnetic field produced by the superconducting dipole magnets (right).

At design energy, the LHC accelerates protons to an energy of 7 terra electron volts

(TeV), or 5.6 10−7 Joules per proton. The LHC magnets deliver a magnetic field of up to × 8.3 Tesla. One electron volt is the energy that an electron gains when it travels 1cm through an electric field potential difference of 1V. The LHC operated at an energy of 3.5 TeV per beam during the 2011 run and 4 TeV per beam during the 2012 run. This corresponds to a center-of-mass collision energy of 7TeV and 8 TeV, respectively.

2.2 The Compact Muon Solenoid

The Compact Muon Solenoid (CMS) detector is one of two general purpose experiments at the LHC. Its design allows for the detection of elections, muons, photons, taus, and jets.

33 Figure 2.2: Aerial view of the land above the LHC with an illustration of the accelerator ring and the location of each detector. ( c CERN)

34 Figure 2.3: Outline of the LHC accelerator complex, which consists of a chain of accelerators yielding protons with progressively higher energies. The acceleration chain starts with the LINAC, where protons are first accelerated to an energy of 50 GeV. The Proton Synchrotron Booster (PSB) increases the energy to 1.4 GeV, followed by the Proton Synchrotoron (PS) which yields protons with energy 26 GeV, and then the Super Proton Synchrotron (SPS), yielding protons of 450 GeV. After the SPS, protons are injected in the main LHC ring.

35 In addition, as the CMS detector covers almost all the solid angle surrounding the interaction point (a property known as hermeticity). Since particles traveling along the beam have no initial momentum in the transverse direction, by energy and momentum conservation we know that the total transverse momentum of the final state particles should be zero. Thus, it is possible to indirectly detect particles that escape direct detection. We do this by observing a quantity called Missing Transverse Energy (Emiss = p (i), where the T − i T vectorial sum goes over all observed final state particles). This allows forP indirect detection of neutrinos, which leave no signature in the detector, as well as any unknown particles that interact weakly with normal matter.

The major components of CMS are shown in Fig. 2.4.

CMS DETECTOR STEEL RETURN YOKE Total weight : 14,000 tonnes 12,500 tonnes SILICON T"CKERS Overall diameter : 15.0 m Pixel (100x150 μm) ~16m! ~66M channels Overall length : 28.7 m Microstrips (80x180 μm) ~200m! ~9.6M channels Magnetic %eld : 3.8 T SUPERCONDUCTING SOLENOID Niobium titanium coil carrying ~18,000A

MUON CHAMBERS Barrel: 250 Dri$ Tube, 480 Resistive Plate Chambers Endcaps: 468 Cathode Strip, 432 Resistive Plate Chambers

PRESHOWER Silicon strips ~16m! ~137,000 channels

FORWARD CALORIMETER Steel + Quartz %bres ~2,000 Channels

CRYSTAL ELECTROMAGNETIC CALORIMETER (ECAL) ~76,000 scintillating PbWO& crystals

HADRON CALORIMETER (HCAL) Brass + Plastic scintillator ~7,000 channels

Figure 2.4: A cartoon diagram highlighting the main components of the CMS detector.

36 We use a right-handed coordinate system to describe the CMS detector with origin at the interaction point. The z-axis extending along the beam axis, with the positive z direction pointing along the counterclockwise direction of the LHC beam. The x-axis points toward the center of the LHC ring and the y-axis points upward [6]. We define an alternate coordinate system based on the spherical coordinate system, where φ is the azimuthal angle in the x-y plane, θ is the polar angle defined with respect to the +z axis, and the pseudorapidity is a transformation of θ defined by η = ln(tan( θ )). − 2 The titular feature of CMS is the 3.8 Tesla magnet that encompasses three of the four subsystems (all except the muon system). The considerable magnetic field provided by the magnet is enough to bend the trajectories of charged particles by a measurable amount.

Using the tracking system, it is possible to fit the trajectory of a charged particle and accurately measure its charge and momentum. The detector comprises several sub-systems: the Tracker, the Electromagnetic Calorimeter (ECAL), The Hadronic Calorimeter (HCAL), and the Muon System. In addition, the ECAL is grouped with a pre-shower detector that is only present on the endcaps. The ECAL, HCAL, and Tracker lie within the magnet, and thus must be “compact,” relative, for example, to the ATLAS experiment. The magnetic

field lines are approximately parallel to the beam line in the volume that lies inside the solenoid and return along iron return yokes outside the magnet, which are instrumented in the muon system. A detailed map of the magnetic field in CMS is performed in order to make accurate momentum measurements and to produce accurate detector simulations

[31].

The CMS detector weighs about 14,500 tonnes, and is about 15 meters in diameter

(equivalent to about five stories of a building) and is 21 meters long. The geometrical cen- ter of the CMS detector is approximately at the interaction point, which is the point at which protons from the LHC beams are brought to collision. The detector can be broken up into geometrical regions known as the Barrel and Endcap, with the barrel spanning a pseudorapidity range of approximately 0 < η < 1.5, depending on the particular subsys- | | tem, and the endcap approximately spanning 1.5 < η < 3.0. In addition, the ultra high | |

37 η range of (3.0 < η < 5.0) is covered by a separate hadronic calorimeter, the hadronic | | forward calorimeter (HF) used for luminosity studies.

2.2.1 Tracking System

The subsystem closest to the beam line and the center of the detector is the silicon based tracking system. It is used to reconstruct the trajectory of charged particles that pass through its volume. These trajectories allow for the reconstruction of charge and momentum using the relativistic Lorentz force law (Eq. 2.1). Because the tracker is the closest subsystem to the collision point, it must be the most accurate, as it must distinguish the paths of particles that are only millimeters apart. The tracker has 13 layers in the

Barrel and 14 layers in the Endcap. The innermost three layers of the tracker consists of silicon pixels of dimension 0.15 mm by 0.10 mm, amounting to 66 million channels. The outer layers consist of single and double sided silicon strips (0.18mm 10cm or 0.18mm × × 25cm), totaling 9.6 million channels. The layout of the tracking system is shown in Fig. 2.5.

Figure 2.5: Diagram of the silicon tracking system in the r-z plane of one quarter of the CMS detector. The dark blue lines in the lower right corner, closest to the interaction point, represent the pixel tracker. The purple and light blue lines represent the single and double sided strips, respectively. Horizontal lines correspond to layers in the barrel, while vertical lines represent strips in the endcap.

38 2.2.2 Electromagnetic Calorimeter

The goal of the ECAL is to transduce the energy of incident electrons and photons into showers of light. By measuring the resulting light, one can infer the energy of the incident particle. An electromagnetic shower occurs when an incoming photon or electron interacts with the material of the detector, converting to lower energy electrons or photons via pair production and bremstrahlung radiation. These conversions continue in a cascading fashion until the pair-production threshold for photons is reached. The CMS ECAL uses lead tungstate (PbWO4) crystals to induce electromagnetic particles to “shower.” The PbWO4 crystals are doped with oxygen impurities, which results in an incredibly optically clear material allowing the measurement of the shower light by avalanche photodiodes (APDs) and vacuum photo-diodes (VPDs), which are glued to the back of the crystals.

The ECAL is composed of 61,200 PbWO4 crystals giving a granularity of 360 in the φ direction by 170 in the η direction. The layout of the crystals in the ECAL is shown in

Fig. 2.6. Each crystal has dimensions approximately 23cm 24mm 24mm, although the × × rear face of the crystal is larger than the front face, forming a narrow truncated pyramid.

The crystals face away from the collision point in a quasi-projective geometry, where the crystal walls are slighlty offset from the projective direction such that it is not possible for particles from the interaction point to traverse the ECAL between adjacent crystals. The crystals are set in a matrix of carbon fiber which ensures that the scintillation light from each crystal is isolated.

The high density crystals have very desirable qualities for an electromagnetic calorime- ter, including a short radiation length (0.89 cm), a small Molli`ere radius (a number that quantifies the width of the resulting showers), and a short decay time (80% of light is dis- sipated in 25ns). The crystals emit a blue-green scintillation light. The light output varies with temperature (about 1.9% per ◦C) so a finely controlled cooling system is required to keep the ECAL crystals within 0.005◦C of the operating temperature of 18◦C.

The performance advantages of lead tungstate crystals come with complications. Under the heavy radiation environment present in CMS, the crystals change transparency to light

39 Figure 2.6: The layout of the components of the electromagnetic calorimeter (ECAL) show- ing the barrel, the two endcaps, and the preshower. Crystals are grouped together into modules and supermodules in the barrel, and into supercrystals and dees in the endcaps.

in both the red and blue parts of the spectrum. The change is dependent on the radiation dose rate and partial recovery of transparency occurs under low radiation conditions at room temperature. This means that the light sensitivity of the ECAL is highly dependent on the LHC running conditions and in order to make an accurate measurement of energy and momentum, the crystal response has to be constantly monitored. This is done by injecting laser light into the crystals and recording the response of each crystal. Since non-negligible transparency changes happen on the order of minutes, laser calibration data must be taken constantly during CMS data taking. The lasers are therefore fired during the “abort gap.”

The abort gaps are empty bunches in the chain of proton bunches that circulate in the

LHC that allow time for dumping mechanisms to turn on and dump the beam in case of a problem.

2.2.3 Hadronic Calorimeter

The HCAL is designed to measure hadrons, particles composed of quarks and gluons, which result from the hadronization of quarks and gluons emerging from a collision. Neutral

40 hadronic particles penetrate deeper into a material before showering so the HCAL is placed further from the interaction point than the ECAL. The HCAL must be composed of a material with high atomic number to increase the probability for a strong interaction to initiate a hadronic shower. The CMS HCAL is composed of brass and steel to induce hadronic showers, along with plastic scintilators that convert the decay products of a shower to light followed by hybrid photodiodes (HPD) that convert the light to an electric current.

The HCAL barrel is split into two half-barrels (cut along the transverse direction), which are each further split into 18 angular sections in φ. Each angular wedge is composed of

17 layered plates of brass and steel backed by scintillator tiles. The tiles are connected to optical fibers, which sample the scintillator light and carry it out of the barrel. The light from each of the 17 layers corresponding to the same region in η and φ are optically summed and read out by the HPDs, which are located on the outside of the detector. It has been proposed that upon upgrade of the CMS detector that the layers of the HCAL be read out separately instead of summing them, to increase the resolution of the HCAL and possibly

dE provide a more accurate measurement of dx . The HCAL spans a radius of 1.8 m to 2.88 m. The brass plates are made from retired

Soviet artillery shells. The HF is located 11.2 m from the interaction point and increases

miss the hermeticity of the detector, allowing for better measurement of ET .

2.2.4 Muon System

The mass of the muon is 200 times larger than the mass of the electron, and since the rate of bremsstrahlung radiation is inversely proportional to the mass of the particle, muons emit almost no bremsstrahlung radiation relative to electrons. In addition, muons are minimum ionizing particles – as they pass atoms, they ionize them, liberating electrons from the valence shell. However, the energy loss per distance traveled due to ionization is small (0.01GeV/cm), even within the dense CMS detector. Therefore, matter is highly transparent to muons, and muons deposite a very small amount of energy in the calorime- ters relative to other observable particles. Therefore the muon system is located outside

41 the magnet, the furthest from the interaction point. The advantage of this is that all other

SM particles are stopped before reaching the muon system, yielding a very pure sample of muons in the muon system.

Figure 2.7: A cross sectional area of the CMS detector highlighting the trajectory of a muon as it passes through the muon system. It leaves readings in four layers (or “stations” of the muon chamber.

Muons are detected via their electromagnetic charge. The main devices for detecting muons are drift tubes (DTs), which are 4 cm long gas filled tubes with a wire cathode [32],

[33]. As muons pass through the drift tube, they ionize the gas, creating free electrons.

These electrons are then accelerated by a large potential difference, traveling towards the cathodes. When they reach the cathode after a small but definite time delay, they create a measurable current in the cathode. This current provides a signature of the presence of the muon. A diagram of a drift tube is shown in Fig. 2.8. Muon measurements are also made with cathode strip chambers (CSCs) [34] and resistive plate chambers (RPCs) [35] that use similar principles to detect muons by observing ionized electrons [36]. The layout of the muon system is shown in Fig. 2.9. Each current measurement in a DT, CSC, or RPC is referred to as a “hit” in a muon chamber. By observing multiple hits as muons travel through the layers of the muon system, one can reconstruct the trajectories of muons and thus measure their momenta (see Fig. 2.7). If for a particular event, in addition to the

42 observation of a hit in the muon system, we observe a track in the “inner tracker” (i.e. not the muon system but the silicon tracking system), and this track is compatible with the same muon trajectory, the muon can be associated with a decay vertex (see sec. 3.3).

Figure 2.8: A diagram showing the cross section of a drift tube (DT) in the muon system. A map of the electric field and electric potential created by the anode wire and cathode strips is shown.

2.2.5 Trigger

The LHC proton beams are split into bunches of protons spaced 25ns or 50ns apart.

At design luminosity, the LHC would deliver 2808 bunches, each with 1.15 1011 protons. × The number of collisions delivered to CMS is given in terms of luminosity. Luminosity is the number of particles passing a given cross sectional area per second, which depends on the number of bunches in each beam, the frequency of each beam, the beam size at the interaction point, and the angle at which the beams cross each other. During the 2012 proton-proton run the peak luminosity was 7.67 1030 cm−2s−1 with 1380 bunches. The × maximum number of proton-proton interactions per bunch crossing was about 35.

A bunch crossing rate of 40 MHz means that the CMS detector delivers collisions every

25ns. However, computational limits require that the number of events that are actually

43 Figure 2.9: The layout of the CMS muon system, showing drift tubes (DTs), cathode strip chambers (CSCs) and resistive plate chambers (RPCs). The view is in the longitudinal plane, with the center of the detector in the bottom left.

recorded be limited to (100)/s. In order to achieve this reduction in rate we must filter O the events which we record. The criteria for doing so should be motivated by reasonable expectations about the physics that we hope to probe with the LHC. That is, we wish to keep only interesting events that have the possibility of revealing new physical phenom- ena, keeping the efficiency for such events as high as possible [37]. The selection is called triggering and is done in several steps. The first step, the Level-1 (L1) trigger, places cri- teria on basic measurable quantities such as the transverse energy, ET = E sin(θ). The L1 trigger decisions are performed with custom made, on-detector electronics, and reduces the event rate by a factor of (1000) to the kHz range. The second step is the High Level O Trigger (HLT), which is performed in a computing farm in a cavern adjacent to the CMS experimental cavern, consisting of about 1000 units. The HLT performs selections based on reconstruction algorithms that calculate more sophisticated physics quantities (such as the number of jets). The HLT reduces the rate by another factor of (1000), bringing the O recorded event rate to the target of about 100 events per second.

44 CMS Integrated Luminosity, pp

Data included from 2010-03-30 11:21 to 2012-12-16 20:49 UTC 25 25 ) 1

¡ 1 1 1 2010, 7 TeV, 44.2 pb¡ 2011, 7 TeV, 6.1 fb¡ 2012, 8 TeV, 23.3 fb¡ fb 20 20

15 15

10 10

5 100 5 £ Total Integrated Luminosity ( Luminosity Integrated Total 0 0

1 Jun 1 Sep 1 Dec 1 Mar 1 Jun 1 Sep 1 Dec 1 Mar 1 Jun 1 Sep 1 Dec Date (UTC)

Figure 2.10: Luminosity recorded by the CMS experiment for 2010, 2011 and 2012 proton physics runs.

45 CHAPTER 3

OBJECTS

Physics objects that are observable in the detector include photons, electrons, muons, taus,

miss charged hadrons, neutral hadrons, and missing transverse energy (ET ). We use physics objects to test the theoretical models that describe the creation of those objects. Objects are measured using the data gathered by the detector subsystems described in section 2. As we have seen, these subsystems are complicated: they involve hundreds of millions of channels of electronic readout, and suffer from many sources of detector noise, systematic uncertainty, and unexpected behavior. To make sense of this abundance of information, we must employ sophisticated algorithms, machine learning techniques, and detector simulation. We verify our understanding of these objects using a variety of methods wherein we compare observed and simulated data with well understood reactions (called “standard candles”). The goal of this chapter is to describe the methods used to measure the physics objects that will be used to construct the H ZZ 4ℓ analysis. I will first describe the procedure for measuring → → tracks, then I will give a summary of the particle flow algorithm and the procedure for reconstructing vertices. I will then go on to describe the identification and calibration of electrons, muons, photons and jets.

3.1 Tracks

The tracker is essential for reconstructing charged particles and photons at CMS. By observing the trajectory that charged particles follow when traveling though the CMS mag-

46 netic field, the CMS tracker allows one to measure the momentum and charge of many objects. In order to reconstruct individual tracks from the data provided by the silicon tracker, we must employ sophisticated algorithms. This is both because of the large num- ber of channels in the tracker, and because of the large number of particles that traverse the tracker simultaneously1. Since tracks are used in the identification and measurement of all of the physics objects used in this analysis, I will first briefly describe the details of the tracking system which are necessary to understand the algorithms used to reconstruct physics objects. I will then describe the identification, isolation, and measurement of physics the objects themselves.

A charged particle moving through a magnetic field is governed by the Lorentz Force,

F = qv B. A particle moving through a constant magnetic field follows a helical trajectory × through the detector. This helix can be described by five parameters, listed in Table 3.1.

Table 3.1: The five track parameters that define the helical pattern of a charged particle traversing the tracker. To fully define the trajectory we need a position on the path of the particle, the charge to momentum ratio (which detrmines the radius of the helix) of the particle, and the angle of the momentum of the particle with respect to the magnetic field.

q p Track charge over momentum λ Track angle w.r.t. the magnetic field φ angular position in the transvese plane r radius of the position z0 z-coordinate of position

The large number of channels in the tracker allows for an accurate measurement of this set of parameters, which are then used to measure the momentum and charge of particles.

The readout from track pixel or strip channels (referred to as hits) along with readouts from their adjacent pixels or strips, are grouped together to form tracking clusters [39]. Tracking algorithms use hits and clusters to estimate the helix parameters. The most basic track

fitter uses the combinatorial track finder (CTF), which is based on the Kalman filter [40].

1At design luminosity there will be as many as 1000 tracks present in the tracker for each bunch crossing [38].

47 In this algorithm tracks are initiated by a seed, which is a group of three collinear hits, or two hits which point to the collision point. The seed hits are fitted to make an initial rough estimate of the five track parameters. With the trajectory provided by the initial parameter estimates, the filter then proceeds to the next layer in the tracker, considering hits which are close to the extrapolated trajectory. This is repeated layer-by-layer to look for additional hits until the edge of the detector is reached. At each layer a fit is performed and the set of parameters is recalculated. At the end of this procedure we have a set of tracks. After obtaining this first collection of tracks one can calculate the probability that each hit belongs to each track. Hits with a high probability of belonging to a particular track are removed from all other tracks. This process proceeds iteratively – removing hits with low probability from each track and adding hits with high probability to others until stability is reached. The result of the Kalman fitter is a set of parameters, along with their full covariance matrix, which is used to estimate the momentum uncertainty of physics objects [41].

3.2 Particle Flow

The general purpose design of the CMS detector makes it possible to accurately measure most fundamental particles, or infer their existence through missing transverse energy. The particle flow reconstruction uses all the subsystems of the CMS detector to compile a list of all stable particles, photons, electrons, muons [42][43][44], and uses this list to measure the

miss momenta of jets and taus and to quantify ET . The final state of interest in our analysis includes four isolated leptons, i.e., electrons or muons that are not inside of jets. The particle flow reconstruction allows us to quantify the isolation of leptons in a natural way: we can simply require that the lepton be sufficiently far from other particle flow objects.

In the particle flow method we begin with a complete set of fundamental elements from all sub-detectors, including charged particle tracks and clusters in the ECAL or HCAL. Then we link these elements into “blocks”. With a set of blocks for an event, the substantial task

48 is to identify these blocks by particle type. For example, a block which contains a large amount of energy in the HCAL, a small deposit in the ECAL and no track would be identified as a neutral hadron, whereas a block containing a tracks consistent with a bremsstrahlung radiation pattern, and a large ECAL deposit could be an electron. I will leave the details of this procedure to the references, but we will make use of particle flow algorithms repeatedly in the object identification and event selection.

3.3 Vertices

Several times in this section we will refer to “primary” and “secondary” vertices. A vertex is a position in the detector, generally close to the center (within 20cm in the z direction, and 5cm in the transverse direction) from which particles appear to originate.

Vertices are reconstructed by considering all possible groupings of tracks and fitting their trajectories under the hypothesis that they share a vertex. This process, called Adaptive

Vertex Fit (AVF), uses an iterative Kalman filter with simulated annealing. For a full description of the reconstruction of vertices, see [38]. Due to the high resolution of the

CMS tracking system, vertices are reconstructed with good resolution (better than 500 µm

[45]). We impose criteria on the position of the vertex and the quality of track fits to define a set of primary vertex candidates. The primary vertex in this analysis is taken to be the candidate whose tracks have the largest scalar sum pT . Figure 3.1 shows an example of primary and secondary vertices in a “pileup” collision (a collision in which there are multiple parton interactions). For a given track, we can calculate the three dimensional impact parameter, IP3D, as the shortest distance between the extrapolated track and the reconstructed primary vertex. For the event selection we use the significance of the three

IP3D dimensional impact parameter, SIP3D = σ(IP3D) where σ(IP3D) is computed using the error matrix of the track fit. A primary lepton is defined as that with SIP3D < 4.

49 Figure 3.1: View of primary vertex (red) and secondary vertices (blue) in the transverse plane at the interaction point. Shaded area indicate jets.

3.4 Electrons

3.4.1 Electron Reconstruction

Electrons in CMS are measured using the ECAL and the Tracker. Because of their charge and small mass, electrons passing through the CMS detector interact heavily with the detector material, constantly losing momentum and radiating bremsstrahlung photons.

The result is that an electron has a distinctive signature characterized by the emission of bremsstrahlung radiation in a narrow span of the φ direction, tangential to the curved trajectory of the electron through the magnetic field. The energy loss for electrons moving through a material is approximated by the Bethe-Heitler formula [46].

The Kalman filter described in the section 3.1 assumes gaussian deviations in the hit position from a given track. However, electrons emitting bremsstrahlung radiation exhibit highly non-Gaussian deviations. Therefore, a dedicated tracking algorithm called the Gaus- sian Sum Filter (GSF) is employed which allows for changes in curvature of the track due to electron energy loss [47]. Clusters of energy deposits in the ECAL are grouped together

50 to form superclusters, whose selection uses the characteristic narrow η shower shape of elec- trons [48]. Using superclusters and tracks, two types of electrons are then reconstructed

– Tracker-driven and ECAL-driven electrons. ECAL-driven electrons are seeded by super- clusters, which are then matched to tracker seeds, with tracks subsequently built using the

GSF algorithm. Tracker-driven electrons are more suitable for lower energy electrons, such as those inside of jets, whereas ECAL-driven electrons are more relevant for high energy electrons, e.g., those from the ZZ resonance decays considered in this analysis [49].

Electron momentum four-vectors are taken from their GSF track fits. To correct for mismeasurement of the energy in the ECAL and tracker, as well as discrepancies between the performance of the measurement in Monte Carlo simulated data and observed data, we apply a series of corrections to both the data and simulated data, which hereafter for brevity we shall refer to as“MC”. First, a multivariate regression is applied to the ECAL supercluster energy measurements using MC. We then combine the ECAL information with the tracker information to get a combined momentum measurement. Since Z ee events → occur in relative abundance in LHC collisions, and since the Z boson has a mass peak whose shape is well understood, we can correct the energy measurements using these events. This is done by adjusting two variables which define the shape of the measured Z ee mass → peak called the scale and resolution. The scale defines the position of the peak, and the resolution defines the width of the peak. After adjusting the energy scale in data (see the next section), the electron momenta in the MC are shifted and smeared in order to obtain a line shape which matches the observed data.

3.4.2 Electron Energy Measurement

The energy response of single channels in the ECAL is first performed to ensure equal response across the detector. That is, using standard candle processes such as W eν → events, we apply calibration factors to ensure that the measured supercluster energy is, on average, uniform across the detector. For details on this procedure, see [50]. The measure-

51 ment of supercluster energy is calibrated using a multivariate regression described in the next section. The corrected ECAL energy information is combined with tracker momentum information to provide an overall measurement of electron energy. This measurement is then calibrated with Z ee events in data and MC for further resolution improvement. →

CMS Simulation

/E 0.14 σ

0.12

0.1 Combined effective resolution

0.08 ECAL effective resolution

0.06 Tracker effective resolution

Combined gaussian resolution 0.04

0.02

0 10 20 30 40 50 60 70 80 90 100 E [GeV]

Figure 3.2: Expected resolution for reconstructed energy in the ECAL barrel for the estimate made from either the ECAL supercluster energy, the tracker momentum measurement, or the combined measurement. The blue triangles are the absolute interval containing 68% of the probability, whereas the red circles assume a gaussian shape.

Multivariate Energy Regression. Simulated events are used to improve the per- formance of the ECAL energy measurement of electrons. Using the simulated energy of particles, we use a boosted decision tree (BDT) in regression mode [51] to model the ra- tio of generated energy divided by ECAL supercluster “raw” energy. The estimated ratio depends on all the observable quantities coming from the ECAL, and is able to take into account correlations between the quantities. We use this factor to correct the energy mea-

52 surement of the supercluster before combining that with tracker information. The variables used in the regression are a combination of cluster shape variables and individual crystal measurements:

SCRawEnergy: the uncorrected energy of the supercluster •

scEta: η coordinate of the supercluster •

scPhi: φ coordinate of the supercluster •

R9: ratio of the energy in the 3-by-3 grid of crystals around the seed crystal to the • uncorrected energy of the supercluster,

etawidth: the angular spread of the supercluster in the η direction •

phiwidth: the angular spread of the supercluster in the φ direction •

NClusters: the number of clusters forming the supercluster •

HoE: ratio of HCAL energy deposition to ECAL energy deposition •

rho: a estimate of the energy density due to pileup interactions •

vertices: number of reconstructed primary vertices •

EtaSeed: η coordinate of the seed cluster •

PhiSeed: φ coordinate of the seed cluster •

ESeed: energy of the seed cluster •

E3x3Seed: energy in the 3-by-3 grid of crystals around the seed crystal •

E5x5Seed: energy in the 5-by-5 grid of crystals around the seed crystal •

σ , parameter describing the supercluster shape (defined below) • iηiη

σ , parameter describing the supercluster shape (defined below) • iφiφ

53 σ , parameter describing the supercluster shape (defined below) • iηiφ

EMaxSeed: energy of the highest energy crystal •

E2ndSeed: energy of the second highest energy crystal •

ETopSeed: energy of the adjacent crystal above the highest energy crystal •

EBottomSeed: energy of the adjacent crystal below the highest energy crystal •

ELeftSeed: energy of the adjacent crystal to the left of the highest energy crystal •

ERightSeed: energy of the adjacent crystal to the right of the highest energy crystal •

E2x5MaxSeed •

E2x5TopSeed •

E2x5BottomSeed •

E2x5LeftSeed •

E2x5RightSeed •

p : transverse momentum of the electron computed using the combination of the • T ECAL energy measurement and the track momentum measurement

In addition, the following variables which indicate the proximity of the electron to gaps between modules and supermodules are used for electrons in the barrel:

IEtaSeed: the index of the seed crystal in the η coordinate •

IPhiSeed: the index of the seed crystal in the φ coordinate, •

IEtaSeed mod 5 •

IPhiSeed mod 2 •

54 ( EtaSeed <= 25) (IEtaSeedmod25) + ( EtaSeed > 25) ((IEtaSeed 25 • | | × | | × − × IEtaSeed /IEtaSeed)mod20) | |

IPhiSeed mod 20 •

EtaCrySeed: the η of the seed crystal in local coordinates •

PhiCrySeed: the η of the seed crystal in local coordinates •

The variables σiηiφ parameterize the shape of the supercluster,

ωi(ηi <η>)(φi <φ>) σ = i∈5×5 − − , (3.1) iηiφ ω P i∈5×5 i P where η and φ are the angular position of the ith crystal in the 5 5 cluster, and <φ> i i × and <η> are the centroid positions of the cluster. The variables σiφiφ and σiηiη are defined similarly. The training is done separately for the barrel and end cap. The resolution for

Z ee events is improved by about 10 % using this multivariate regression technique. → Electron Momentum Measurement. Using the corrected ECAL measurements, we then correct the scale and resolution of the momentum measurement using Z ee → real and simulated events. This is done by fitting the Z ee peak with a Breit-Wigner → line shape convolved with a Crystall Ball (CB) function, using a maximum likelihood fit.

This yields two parameters: ∆m, the distance of the fitted peak to the true Z boson mass peak, and σCB, representing the width parameter of the CB. Using this fit, first we correct the supercluster energy scale in data by varying the energy scale so that Z ee events → match those in Monte Carlo events. The correction to the energy scale is time dependent, and depends on the η region. We then smear the Monte Carlo energies so that the energy resolution, represented by σCB, in simulated data matches that of the observed data.

3.4.3 Electron Identification

Electron identification is used to separate electrons produced in an initial hard scatter- ing (such as those decaying from Zs) and those coming from either converted photons from

55 neutral pion decays, or those within jets. A classical strategy for distinguishing electrons which are relevant to the Higgs analysis involves cutting on ECAL and track variables, and those which are sensitive to the electron vertex. In this analysis we make use of an opti- mized identification algorithm using a Boosted Decision Tree classifier. The input variables include track variables sensitive to bremsstrahlung of the electron in the tracker, variables modeling the geometric matching between the track trajectory and supercluster position, and electromagnetic shower shape variables.

σ : see Eq. 3.1 • iηiη

σ • iφiφ

σ • iηiφ

R9: ratio of the energy in the 3-by-3 grid of crystals around the seed crystal to the • uncorrected energy of the supercluster,

etawidth: width of the supercluster in the η direction •

phiwidth: width of the supercluster in the φ direction •

E /E : Characterizing the shape of the shower in the longitudinal • pre shower supercluster direction (Endcap only)

d : transverse impact parameter (IP) • 0

d : 3D impact parameter • 3D

SIP: Significance of the 3D IP ( d /σ ) • 3D d3D

E /P : Cluster seed energy divided by momentum of track extrapolated to • clusterseed in primary vertex

E /P : Cluster seed energy divided by momentum of track extrapolated to • clusterseed out the ECAL

56 Iso : Particle flow isolation from neutral hadrons • PF,neutral

Iso : Particle flow isolation from all charged particles • PF,charged

Iso : Particle flow isolation from photons • PF,photons

For the same signal efficiency, it is possible to achieve a reduction in background of a factor of two as compared to cut-based electron identification.

3.5 Muons

3.5.1 Muon Reconstruction

Muons are reconstructed using the tracker and the muon system. Two complementary approaches are used in this analysis, one which starts with hits in the muon system (CSC and DT hits) and looks for matching tracker tracks, and one which starts with tracks and

finds matching muon hits. They are referred to as “global muons” and “tracker muons,” respectively [52], [53]. Tracker muon reconstruction is more efficient at low pT, since the tracker is more suited to measure the curvature of low pT muon tracks, and hits in the muon system are more likely at high pT, and visa versa. About 99% of muons produced above 5 GeV are reconstructed with either the global muon or tracker muon algorithm.

3.5.2 Muon Momentum Measurement

Misalignments in offline reconstruction between detectors are known to exist for muons, creating bias in the muon momentum reconstruction as well as degradation of the resolu- tion of the dimuon mass. To correct for this and provide additional calibration of muon momentum we use a method developed at the University of Rochester which uses a lookup table to correct muon momentum as a function of η, φ, and charge. The corrections are based on the variables 1/p and , using Z µµ events [54]. T µµ →

57 3.5.3 Muon Identification

We employ the particle flow muon identification as described in [43]. Undecayed charged hadrons are often reconstructed as muons by the “global” and “tracker” muon reconstruc- tion algorithms. The particle flow muon identification makes use of muon hits, inner tracker hits, and energy deposits in the calorimeters to significantly reduce the misidentification rate of muons from charged hadrons, while simultaneously increasing the identification efficiency for charged hadrons.

3.6 Lepton Isolation

Leptons emerge from hadron interactions both as isolated final state particles and as members of jets resulting from the hadronization of colored particles. Since this analysis is primarily concerned with electrons and muons coming from Z decays, we are not inter- ested in leptons inside hadron showers but those that are geometrically isolated from other particles. In order to help identify the leptons of interest, we define an isolation variable.

The most efficient way of measuring isolation involves particle flow objects. In the particle

flow method, deposits in the calorimeters and the muon system are matched to tracks in the silicon tracker. Tracks can then be identified as corresponding to charged particles such as electrons or muons or charged hadrons. Likewise, deposits in the calorimeters that are not associated with tracks can be identified as neutral hadrons, or photons. We refer to identified particles as “PF Candidates”.

To define an isolation variable for an electron or muon we consider all PF candidates with a solid angle variable defined by, ∆R = √∆Φ+∆η < 0.4 around the lepton under consideration. The isolation is then defined as the scalar sum of pT for all particles in this cone. The relative isolation is this variable divided by the pT of the particle itself:

p Iso = PFCand. T (3.2) PF,rel p P T,l

58 3.6.1 Pileup Correction

At sufficiently high instantaneous luminosity there exists multiple interactions per bunch crossing at LHC experiments, which result in multiple overlapping events in the detectors. This phenomenon is called pileup. As beam optics and bunch proton density are adjusted, the overall pileup changes. Figure 3.3 shows the average number of pileup interactions dur- ing the 2012 data collection at the CMS. This effect is simulated using Monte Carlo gener- ated events assuming a particular pileup profile. After a data taking period has stopped, the MC events are reweighed to match the pileup profile in actual data so that the number of interactions per bunch crossing is correctly represented in our physics models. However, in order to minimize the effect of pileup on the sensitivity of the Higgs analysis, we take steps to reduce the sensitivity of variables to this effect. The isolation variables used in the event selection of this analysis is especially sensitive to pileup, since pileup events deposit energy into the isolation cone of particles (see section 4.3.4). We correct these variables by remov- ing the expected amount of energy due to pileup. In the case of pileup energy from charged particles, we remove those particles which have tracks originating from vertices other than the primary interaction vertex. In the case of pileup from photons or neutral hadrons we use a different method. We first compute the average energy density in the detector which is expected to occur from neutral pileup particles. This energy can be accounted for by multiplying this density by the area of the isolation cone of leptons. However, this energy depends in a nontrivial way on the number of pileup interactions, which change with the intensity of the collisions. We must therefore adjust this energy according the amount of pileup. We estimate the energy density due to pileup events using the FASTJET energy density, ρ [55]. Due to detector effects, this energy density scales in a nonlinear way with the number of pileup interactions. Therefore, we can not simply multiply ρ by the isolation cone area to get the total pileup energy. We instead define an “effective area”, Aeff , as the change in average isolation divided by ρ as a function of the number of pileup vertices and pseudorapidity [56]. The neutral component of the isolation is then defined as:

charged p + max(0, Eγ + Eneutral ρ A ) Iso = i T,i T T − × eff (3.3) REL p P P Tℓ P

59 CMS Average Pileup, pp, 2012, s = 8 TeV p 60 60 <¹> = 21

50 50 /0.04) 1 ¡

pb 40 40

30 30

20 20

10 10 Recorded Luminosity ( Luminosity Recorded

0 0 0 5 10 15 20 25 30 35 40 Mean number of interactions per crossing

Figure 3.3: Number of Interactions per bunch crossing in the 2012 run at the LHC.

3.7 Jets

After a hard scattering in a proton-proton collision, final state quarks and gluons undergo hadronization. Hadrons subsequently decay in showers of stable particles which are collimated in a geometrical region of the detector. The signature left by such a shower is called a “jet”. Jets are reconstructed by clustering particles within a region of solid angle. In this analysis we use the anti-kT clustering algorithm [57] to reconstruct jets (see below) which takes as inputs all particles provided by the particle flow algorithm (see sec. 3.2), using all of the subsystems of the detector. Jets are used in this analysis to categorize the Higgs production mode. That is, it is used to separate VBF events from gluon fusion events. These jets are expected to occur at high pseudorapidity, with relatively high pT .

3.7.1 Jet Reconstruction

A brief explanation of the anti-kT algorithm follows. The idea is to cluster detector objects (in our case, PF Candidates) into jets to maximize the efficiency and resolution of the resulting jets. The historical way to do this is to define some criteria for a seed object, and then construct a “cone” around that object, gathering the surrounding particles in a cone according to some algorithm. These approaches typically led to jets that were “collinear unsafe” or “infrared unsafe”, where, due to the nature of infrared and collinear gluon radiation, the resulting jets depend on the threshold at

60 which clusters can initiate jets. Consequently, if we change the initial conditions of the algorithm slightly, we may get very different jet objects. The anti-kT algorithm clusters particles (indexed i and j) using variables proportional to their distance from each other, dij and their distance from the beam, diB. These quantities are defined in terms of the momentum of the particles as follows:

∆2 d = min(k2p,k2p) ij , (3.4) ij ti tj R2 2p diB = kti , with ∆2 =(y y )2 +(φ φ )2. Changing the exponent, p, changes the behavior of the clustering ij i − j i − j algorithm. It can be shown that using the value 1 leads to jets that are insensitive to soft gluon − radiation [57]. This choice of p defines the anti-kT algorithm. The angular distance parameter, R, is a free parameter defining, essentially, the size of the resulting jets. For this analysis we use R = 0.5.

3.7.2 Jet Identification

Jets are identified using a multivariate jet classifier (see sec. 4.7 for a review of multivariate analysis). The main source of jet background comes from jets that originate from pileup events. The information used to discriminate jets of interest from pileup jets comes from several places: First, the trajectories of the tracks associated with the jets can help determine if a jet came from the primary vertex or a pileup vertex; next, the topology of the jet is used to identify overlapping jets (where, for example, a good jet and a pileup jet overlap in the detector); lastly, other variables which characterize the jet, such as the fractions of neutral and charged hadrons, the charge multiplicity have some additional discriminatory power. Variables based on these jet characteristics are used to train a BDT which is used to as a pileup jet identification variable.

3.7.3 Jet Energy Correction

Since calorimeter energy response is nonlinear, the measurement of the jet energy provided by the calorimeters does not directly map to the energy of the particle which initiated the jet. We thus apply a series of correction to the energies, called “Jet Energy Corrections”. The approach taken by CMS is to factorize the corrections into a series of steps, each of which modifies the four-momenta of jets given some parameter(s), such as the momentum or flavor, to correct for a particular source of inaccuracy in the jet energy [58] [59]. The standard set of corrections used in CMS, and also in this analysis, are as follows:

61 L1 Pileup Correction: Remove the energy deposited in a jet due to pileup events • L2 Relative Jet Correction: Remove the η dependence of jet response - Correct the jet response • in regions of pseudorapidity relative to the central region of the detector

L3 Absolute Jet Correction: Eliminate the p dependence of jet response, so that the measured • T jet pT is equal, on average, to the parton or particle pT . This is done using Monte Carlo events or with standard-candle, data driven techniques.

3.8 Photons

In a H ZZ 4ℓ decay it is possible for one of the electrons emanating from a Z to radiate → → a photon, resulting in a three body decay Z ℓℓγ. Such a photon is called a final state radiation → (FSR) photon. Since the photon can carry away a significant portion of energy, this can significantly affect the four lepton mass resolution, and thus the resolution of the Higgs boson mass. If we are able able to collect the FSR photons we can more accurately reconstruct the Higgs boson decay. I will briefly describe the photon reconstruction and identification algorithms used in this analysis and then describe the method for including FSR photons in the Higgs analysis.

3.8.1 Photon Reconstruction, Identification, and Isolation

The reconstruction and identification of photons in this analysis uses the particle flow photon method described in section 3.2. The photon isolation variable is derived from the PF candidates by defining a cone of size ∆R = 0.3. The isolation is taken to be the sum of pT of all charged hadrons, photons, and neutral hadrons identified by the particle flow reconstruction in this solid angle, compatible with originating with the primary vertex. In order to be included in the isolation calculation, charged hadrons are required to have pT > 200 MeV, while neutral hadrons are required to have pT > 500 MeV. To account for the effect of pileup, another term, labeled IPU , which is the sum of the pT contribution from charged particles that originate from secondary vertices, is added to the isolation sum. A relative isolation term is defined by dividing the absolute isolation by the photon pT:

p + I Iso = PFCand. T PU (3.5) γ p P T,γ

62 CHAPTER 4

ANALYSIS

4.1 Analysis Overview

The goal of this analysis is to discover the Higgs boson and measure its properties using the H ZZ 4ℓ decay channel. The search strategy is to select events that contain two pairs of → → oppositely charged electrons or muons that are consistent with coming from a ZZ decay. Since we are able to fully reconstruct the four-vectors of these four leptons, labeled p1, p2, p3, and p4, we can fully reconstruct the Higgs decay, and can thus reconstruct the four vector of the hypothesized

Higgs boson, pH , including the Higgs boson mass, mH . In addition, since we reconstruct the four vectors of the two Z bosons (and thus the H ZZ decay) we can infer the spin of the observed → particle by utilizing angular correlations between the two spin-1 Z bosons and their parent particle. By using additional information from jets, we can make measurments of the coupling structure of the Higgs boson, in particular the couplings of the observed particle to vector bosons or fermions. The kinimatic properties of this decay are such that there is a high probabilty for one on-shell Z boson, that is, a Z boson with mass near the nominal Z boson mass, and one off-shell Z boson, where the mass is far from the Z boson mass of 91.18 GeV. In reconstructing the Higgs boson from the lepton four-momenta, we work backwards through the decay chain. We first reconstruct and identify the four leptons. We then reconstruct the on-shell Z boson, then the second Z boson, then the Higgs boson candidate: p ,p ,p ,p p ,p p . 1 2 3 4 → Z1 Z2 → H Following this reconstruction, we eliminate most of the standard model background (see Fig. 4.1). The background that remains after this reconstruction is classified into two categories: reducible and irreducible backgrounds (also refered to as ZZ events). Reducible backgrounds are those that contain two leptons that do not come from Z bosons, such as misidentified jets or electrons coming

63 from photon conversions. Irreducible backgrounds are processes containing two Z bosons, but not containing a Higgs boson. With a perfect detector we would be able to distinguish between Higgs boson events and reducible background events by measuring the properties of the leptons. However, even with a perfect detector we would not be able to completely distingish Higgs boson events from irreducible background events, though we achieve some separation by using the kinematic differences between the two processes. The challenge of the event selection is to identify leptons (electrons and muons) that result in the highest possible signal efficiency while minimizing reducible background. The procedures for identifying leptons and computing their isolation and impact parameter were discussed in section 3. We use the values that result from these procedures to define the event selection, which is described in section 4.3.4. After we have defined the event selection, we turn to the problems of distinguishing Higgs boson events from irreducible background events and of measuring the properties of the Higgs boson. We accomplish both tasks with the help of multivariate discriminants. We will discuss multivariate discriminants and their use in section 4.7 of this chapter. To summarize the event selection, we list the steps of the analysis:

1. Trigger: Dilepton or tri-electron triggers (ee, eµ, µµ, eee)

2. Objects: muons: p > 5 GeV, η < 2.4, isolated, coming from primary vertex T | | electrons: p > 7 GeV, η < 2.5, isolated, coming from primary vertex T | |

3. Lepton Combinations: At least one lepton with pT > 20 GeV and one other with pT > 10 GeV

4. First Z Candidate (Z1): Dilepton mass closes to Z (must have mass 40 < mll < 120 GeV)

5. Second Z Candidate (Z2): Dilepton combination with highest pT (must have mass 12 < mll < 120 GeV)

6. Higgs Candidate (m4l): Four-lepton mass (m4l > 100 GeV)

7. Multivariate discriminant: A multivariate function that provides optimal discrimination be- tween signal and background or between different signal hypotheses (see sec. 4.7)

8. Statistical analysis: Significance of discovery and parameter measurement

64 4.2 Datasets

The data sample used in this analysis was recorded by the CMS experiment during 2011 for the run range 160431 to 180252 and during 2012 for the run range 190645 to 207883. This sample correspond to an integrated luminosity of = 5.1 fb−1 in 2011 at 7 TeV and = 19.6 fb−1 in 2012 L L at 8 TeV. The CMS collaboration defines a selection of runs and luminosity sections that requires high quality from all sub-detectors. The absolute proton-proton (pp) luminosity is known with a precision of 2.2% in 2011 and 4.4% in 2012. The analysis relies on primary data sets (PDs), produced centrally by the CMS collaboration, which combine collections of High Level Triggers (HLT) [60]. The detailed content of the PDs evolves in phase with the evolution of the triggers to cope with increasing instantaneous luminosity. For the 2011 data, the analysis relies on the so-called “DoubleElectron” and “DoubleMuon” data sets. The latter PDs are formed by a “OR” of various triggers with symmetric or asymmetric trigger thresholds for the two leptons, with additional identification and isolation requirements. These PDs also include triggers requiring three leptons above a low pT theshold. For 2012, we use an additional PD composed of triggers which require one electron and one muon in order to recover a few percent of inefficiency in the 2e2µ channel at low Higgs boson masses. These triggers form the “MuEG” PD. In addition, we use tri-electron triggers for both 2011 and 2012 data to recover a small amount of low-pT events. The PDs and trigger paths 1 used for this analysis are summarized in Table 4.1. In Table 4.2 we list all the triggers used with 2012 data. SM Higgs boson signal samples, as well as samples for a large variety of electroweak and QCD- induced SM background processes, have been obtained using detailed Monte Carlo (MC) simulation. All data sets were subject to full reconstruction and skimming2. The signal and background samples have been used for the optimization of the event selection strategy. These data sets are further used in the training of multivariate discriminants (see sec. 4.7), and an independent sample is used to compare with the observed data using statistical analysis to conduct a Higgs boson search and making measurements. The samples are also used for the evaluation of acceptance corrections and for

1A Trigger path is the sequence of steps that leads to a HLT trigger decision. These steps are performed by software modules, each of which performs a well-defined task such as digitization or reconstruction of physics objects. At each step intermediate decisions are made which collectively result in a final trigger decision for the path. 2A “skim” is a set of criteria applied to a collection of events in order to reduce the overall volume of data while negligibly affecting signal efficiency.

65 modeling systematic effects, as well as for the background evaluation procedure where measurements in the “background control” region are extrapolated to the “signal” region (see sec. 4.4.3). The backgrounds include 4ℓ contributions from di-boson production, via qq¯ ZZ(∗) and gg → → ZZ(∗), as well as instrumental backgrounds in which hadronic jets or secondary leptons from heavy meson decays are misidentified as isolated leptons. Here and henceforward, Z stands for Z, Z∗, and γ∗ (where possible). For the event generation, ℓ is understood as being any charged lepton, e, µ or τ. The analysis will focus on reconstructed final states with electrons or muons. The main sources of instrumental background contribution are the Z + jets production with Z ℓ+ℓ− decays, the → Zb¯b (and Zcc¯) associated production with Z ℓ+ℓ− decays, and the production of top quark pairs → in the decay mode tt¯ WbW¯b ℓ+ℓ−ννb¯ ¯b. Multiple jet production from light quark QCD hard → → interactions can also contribute in early stages of the event selection, as well as other di-boson (WW , WZ, Zγ) and single top backgrounds. Table 4.3 summarizes the data sets used for this analysis. All the signal and background cross sections are set to next-to-leading-order (NLO) predictions. See section 4.4 for more information on the signal and ZZ Monte Carlo event generation. The general multi-purpose Monte Carlo event generator PYTHIA [61] is used to simulate several processes including QCD light jet production. It is used either to generate a given hard process at leading-order (LO), or to add hadronization, showering, and decays to parton level events that are generated by other programs at higher order. We also make use of the MadGraph program [62] to calculate multi-parton tree-level amplitudes and MadEvent to generate events for some important background processes. This is also the case for the POWHEG NLO generator [63], which is used for the Higgs boson signal and for the ZZ and tt¯ background. For the latter the tt¯ decays are handled, exceptionally, within POWHEG. Finally, we use a dedicated tool, GG2ZZ [64], to generate the gg ZZ contribution to the ZZ cross section. For → the underlying event, the so-called “PYTHIA tune Z2” in 2011 and “PYTHIA tune Z2 star” in 2012, that relies on pT -ordered showers is used. For the parton density functions in the colliding protons, the CTEQ6M set is used except for the POWHEG samples which makes use of CT10 [65].

66 Table 4.1: Data sets and triggers used in the analysis.

Datasets 2011 2012 /DoubleElectron/Run2011A-16Jan2012-v1 /DoubleElectron/Run2012A-13Jul2012-v1 /DoubleMu/Run2011A-16Jan2012-v1 /DoubleMu/Run2012A-13Jul2012-v1 /DoubleElectron/Run2011B-16Jan2012-v1 /MuEG/Run2012A-13Jul2012-v1/ /DoubleMu/Run2011B-16Jan2012-v1 /DoubleElectron/Run2012A-06Aug2012 /DoubleMu/Run2012A-rec-06Aug2012 /MuEG/Run2012A-rec-06Aug2012-v1 /DoubleElectron/Run2012B-13Jul2012-v1 /DoubleMu/Run2012B-13Jul2012-v4 /MuEG/Run2012B-13Jul2012-v1 /DoubleElectron/Run2012C -24Aug2012-v1 /DoubleElectron/Run2012C-PromptReco /DoubleMu/Run2012C-24Aug2012-v1 /DoubleMu/Run2012C-PromptReco-v2 /MuEG/Run2012C-24Aug2012-v1 /MuEG/Run2012C-PromptReco-v2 Muon triggers HLT_DoubleMu7 HLT_Mu17_Mu8 OR HLT_Mu13_Mu8 OR HLT_Mu17_Mu8 Electron triggers HLT_Ele17_CaloTrk_Ele8_CaloTrk HLT_Ele17_CaloTrk_Ele8_CaloTrk OR HLT_Ele17_CaloTrk_Ele8_CaloTrk HLT_Ele15_Ele8_Ele5_CaloIdL OR HLT_TripleEle10_CaloIdL_TrkIdVL _TrkIdVL Cross triggers HLT_Mu17_TkMu8 OR HLT_Mu8_Ele17_CaloTrk OR HLT_Mu17_Ele8_CaloTrk Integrated luminosity 5.1fb−1 19.6fb−1

67 Table 4.2: Triggers in 2012 data analysis.

# Channel Purpose HLT path L1 seed prescale 4e main HLT_Ele17_CaloTrk_Ele8_CaloTrk L1_DoubleEG_13_7 1 OR HLT_Ele15_Ele8_Ele5_CaloIdL_TrkIdVL L1_TripleEG_12_7_5 1 4µ main HLT_Mu17_Mu8 L1_Mu10_MuOpen 1 OR HLT_Mu17_TkMu8 L1_Mu10_MuOpen 1 2e2µ main HLT_Ele17_CaloTrk_Ele8_CaloTrk L1_DoubleEG_13_7 1 OR HLT_Mu17_Mu8 L1_Mu10_MuOpen 1 OR HLT_Mu17_TkMu8 L1_Mu10_MuOpen 1 OR HLT_Mu8_Ele17_CaloTrk L1_MuOpen_EG12 1 OR HLT_Mu17_Ele8_CaloTrk L1_Mu12_EG6 1 4µ backup HLT_TripleMu5 L1_TripleMu0 1 4e and 2e2µ Z T&P HLT_Ele17_CaloTrkVT_Ele8_Mass50 L1_DoubleEG_13_7 5 4e and 2e2µ low pT HLT_Ele20_CaloTrkVT_SC4_Mass50_v1 L1_SingleIsoEG18er 10 4µ and 2e2µ Z T&P HLT_IsoMu24_eta2p1 L1_SingleMu16er 4µ and 2e2µ J/psi HLT_Mu7_Track7_Jpsi T&P HLT_Mu5_Track3p5_Jpsi HLT_Mu5_Track2_Jpsi

Table 4.3: Monte Carlo simulation data sets used for the signal and background processes. Z stands for Z, Z∗, γ∗; ℓ means e, µ or τ; V stands for W and Z.

Process MC σ(N)NLO Comments and sample name generator 7 TeV 8 TeV Higgs boson H ZZ 4ℓ → → gg H POWHEG [1-20] fb [1.2-25] fb m = 110-1000 GeV/c2 → H VV H POWHEG [0.2-2] fb [0.3-25] fb m = 110-1000 GeV/c2 → H ZZ continuum qq¯ ZZ 4e(4µ, 4τ) POWHEG 66.09 fb 76.91 fb ZZTo4e(4mu,4tau) → → qq¯ ZZ 2e2µ POWHEG 152 fb 176.7 fb ZZTo2e2mu → → qq¯ ZZ 2e(2µ)2τ POWHEG 152 fb 176.7 fb ZZTo2e(2mu)2tau → → gg ZZ 2ℓ2ℓ′ gg2ZZ 3.48 fb 12.03 fb GluGluToZZTo2L2L → → gg ZZ 4ℓ gg2ZZ 1.74 fb 4.8 fb GluGluToZZTo4L → → Other di-bosons WW 2ℓ2ν Madgraph 4.88 pb 5.995 pb WWJetsTo2L2Nu → WZ 3ℓν Madgraph 0.868 pb 1.057 pb WZJetsTo3LNu → tt¯ and single t tt¯ ℓ+ℓ−ννb¯ ¯b POWHEG 17.32 pb 23.64 pb TTTo2L2Nu2B → t (s-channel) POWHEG 3.19 pb 3.89 pb T TuneXX s-channel t¯ (s-channel) POWHEG 1.44 pb 1.76 pb Tbar TuneXX s-channel t (t-channel) POWHEG 41.92 pb 55.53 pb T TuneXX t-channel t¯ (t-channel) POWHEG 22.65 pb 30.00 pb Tbar TuneXX t-channel t (tW -channel) POWHEG 7.87 pb 11.77 pb T TuneXX tW-channel-DR t¯ (tW -channel) POWHEG 7.87 pb 11.77 pb Tbar TuneXX tW-channel-DR Z/W + jets (q = d,u,s,c,b) W + jets MadGraph 31314 pb 36257.2 pb WJetsToLNu

Z + jets, mℓℓ > 50 MadGraph 3048 pb 3503.7 pb DYJetsToLL*M-50 Z + jets, 10 < mℓℓ < 50 MadGraph 12782.63 pb 915 pb DYJetsToLL*M-10To50

68 4.3 Event Selection

In this section we outline the strategy for selecting H ZZ 4ℓ events of high purity. We → → begin by discussing the triggers used to record events, then the definition of primary vertices and the selection of leptons, jets, and photons. We then describe the cuts that are used to select Higgs boson events. Lastly, we measure the efficiency of each step of the event selection.

4.3.1 Analysis Triggers

We use di-lepton triggers with pT thresholds on each of the leptons, as well as other selection criteria. The pT thresholds are asymmetrical on the two leptons in order to access events which have one lepton with very low pT while maintaining high trigger efficiency. Tri-electron triggers, eee, are also used to recover additional leptons involving low-pT events. The complete list of triggers and their full definition is given in sec. 4.2.

4.3.2 Primary Vertices

We first require the presence of at least one primary vertex (see sec. 3.3) with the following criteria:

2 1. Track χ fit has a high number of degrees of freedom, NDOF > 4

2. Longitudinal position close to the center of the detector zPV < 24 cm

3. Transverse position (radius) close to the center of the detector rPV < 2cm

If more than one vertex exists satisfying these criteria, we use the vertex with the highest scalar sum pT . This vertex is used to calculate the impact parameter of leptons.

4.3.3 Leptons

For electrons and muons, we define two leptons classes, known as “loose” and “tight” leptons. Tight leptons are primary leptons originating from Z or W decays, which are used to define the region of 4ℓ parameter space where we conduct the final analysis (called the “Higgs Phase Space”). Loose leptons are used to define a control region used to measure the contribution of reducible background in the Higgs phase space region (see sec. 4.4.3). Tight Leptons The following criteria are used to define electrons and muons used in the Higgs boson phase space:

69 Electrons: • – Geometrical acceptance: η < 2.5 and p > 7 | | T – Reconstruction and identification using the criteria given in sec. 3.4

– BDT electron identification cuts (defined below).

– GSF track missing inner tracker hits: nHitsmissing < 2

Muons: • – Geometrical acceptance: η < 2.4 and p > 5 | | T – Global or tracker muon reconstruction

– Particle flow muon identification defined in sec. 3.5

Impact parameter: d < 0.5 cm and d < 1.0cm computed with respect to the primary • xy z vertex

At each step, leptons of a particular flavor are required to be separated by: ∆R> 0.02 from • leptons occuring earlier in the event selection (sec. refsec:eventselection)

Relative particle flow isolation significance (see sec. 3.6): Iso < 0.4 • PF Impact Parameter Significance (sec. 3.3): SIP < 4 • 3D Electrons/muons cross cleaning: Electrons are discarded if ∆R(e,µ) < 0.05 • Electron identification is based on a BDT classifier (see sec. 3.4). Electrons used for this analysis are required to pass the following BDT electron cuts, which depend on pT and η:

5

0.47 | | – 0.8 <η< 1.479 : BDT > 0.004

– η > 1.479 : BDT > 0.295 | | p > 10 GeV: • T – η < 0.8: BDT > 0.5 | | – 0.8 < η < 1.479 : BDT > 0.12 | |

70 – η > 1.479 : BDT > 0.6 | |

Loose Leptons To measure the reducible background (see sec. 4.4.3), we define a set of electrons and muons with relaxed identification and isolation criteria. These are defined the same way as tight leptons but without requiring muons to pass the particle flow muon identification and without requiring electrons to pass the BDT electron identification criteria. We also eliminate the cut on particle flow isolation, IsoPF < 0.4, for both electrons and muons. Jets Jets are reconstructed using the particle flow algorithm, and must pass the particle flow muon “pileup identification” (see sec. 3.7). They are required to have a transverse momentum of p > 30 GeV and η < 4.7. In addition, they are required to be separated from leptons and final T | | state photons radiated from electrons by ∆R> 0.5.

Photons We use particle flow photons (see sec. 3.8) with pT > 2 GeV and in the region of tracker acceptance, η < 2.4, are used for FSR recovery as described in section 4.3.5. | |

4.3.4 Event Criteria

Using the leptons, photons and jets selected by the above criteria, we apply the following event selection. The selection was designed to preserve the signal efficiency and provide sufficient phase space to study the systematic effects related to the background estimate, while also reducing the most important backgrounds up to the same order of magnitude as the signal. The selection is based on the construction of a four-lepton combination built from a pair of di-leptons, each of which, ideally, reconstructs a Z boson. The selection is as follows:

1. First Z: Build an on-mass shell Z boson candidate defined by a same flavour oppositely charged di-lepton pair (e+e−, µ+µ−). Out of all possible pairs of tight leptons in a given event, we choose the one with mass closest to the nominal Z boson mass. The di-lepton mass

is required to be in the range: 40 < mZ1 < 120 GeV. Denote this di-lepton combination as

Z1;

2. Three leptons: Build a trilepton candidate by adding a third lepton of any flavour or charge

to the Z1 candidate;

3. Four leptons: Build a four-lepton candidate by adding another lepton of the same flavour and opposite charge as the third lepton;

71 4. Second Z: Denote the combination of third and fourth leptons as Z2. If more than one

Z2 candidates satisfy the previous criteria, choose the one with the highest pT . Impose the

requirement that the invariant mass of the best Z2 candidate be in the range: 4 < mZ2 < 120 GeV. To further reduce background without significantly affecting signal efficiency, we

require that, of the four leptons selected for the Z1 and Z2, at least one have pT > 20 GeV,

and at least one other lepton pT > 10 GeV.

5. QCD Supression The invariant mass of all opposite-sign, same-flavor lepton pairs must satisfy,

mℓℓ > 4 GeV.

Additional selection cuts are then imposed to define the m4l phase space for the Higgs boson signal. The best choice for this cut depends on the Higgs mass hypothesis being studied. Before discovery, the Higgs mass was restricted to be above 115 GeV, by precision electroweak measurements and by the LEP experiments, and below about 1000 GeV so that WW scattering would satisfy unitarity. The decay kinematics evolve considerably in the range from 100 GeV to 1000 GeV for both signal and background. This can be seen by considering the large spike in the m4l distribution that happens for the ZZ background at a value of twice the Z mass. Since we are interested in studying the properties of a low mass Higgs, where the ZZ 4ℓ decay channel is the most sensitive, we → impose the following selection cut:

Require m > 12 GeV and m > 40 GeV, and m > 100 GeV. • Z2 Z1 4l

At the end of the selection, most of the remaining background comes from irreducible background events. A small contribution of events from the reducible background category also survive. These are mainly composed of events such as: Zbb, tt+jets, Z+light jets, and WZ+jets. The statistical power of the Monte Carlo sample generated for CMS is not adequate to obtain a reliable estimate of the background from these events. Therefore, we measure this contribution using a data driven method (see sec. 4.4.3). The baseline analysis just described is performed with a “cut-based” approach, by imposing greater-than/less-than requirements on lepton variables or event variables. This method is appro- priate when the set of variables are uncorrelated, and have strong individual discrimination (for example the SIP variable in Fig. 4.2). In this case it makes sense to apply a sequence of cuts to a list of variables. By contrast, the events that survive the above event selection consist of a large number of highly correlated, weakly discriminating variables (where the desired discrimination is

72 Figure 4.1: Events surviving for signal and background processes following each state in the event selection. Events are for 8TeV events in the 4e channel.

between signal and background events or between signal hypotheses). In section 4.7, we describe the use a different analysis strategy based on a Bayesian machine learning approach that allows us to achieve optimal sample separation using all the kinematic degrees of freedom of the Higgs boson decay.

4.3.5 FSR Recovery

In the previous section, we describe the process for building Z boson candidates from selected leptons. In this section we describe the method for including final state radiation (FSR) to correct the mass of the reconstructed Z. Final state radiation occurs when one of the decay leptons emits a photon in the final state. The following criteria are required to consider a photon as an candidate FSR photon:

1. ∆R< 0.5 between one of the Z leptons and the photon,

2. Photons must have pT > 2GeV if ∆R< 0.07 and pT > 4 GeV if 0.07 < ∆R< 0.5 to minimize the effect of pileup,

3. And relative isolation of the photon must be IsoP F,rel < 1.

73 Figure 4.2: Distribution of the 3D Impact parameter and the particle flow isolation before the event selection. Observed data are shown in black and simulated events are shown as colored histograms, which are stacked.

Each photon that passes this selection is added to the closest lepton, and the Z mass is recalculated. Only those photons that make the Z mass closer to the nominal Z mass are kept. If there is more than one photon for a given Z boson, the photon with the highest pT is used. Each Z boson can have one FSR photon (additional photons are ignored) – a given event can have zero, one or two FSR photons. After the inclusion of FSR photon(s), the isolation for each lepton is recalculated, having removed the associated photon. The result of inclusion of FSR photons is that the Higgs boson mass resolution is improved, yielding additional Higgs boson candidates since, without the correction for FSR photons, the particle flow isolation cut on leptons may incorrectly veto a lepton that has an FSR photon within its isolation cone.

4.4 Signal and Background Models

4.4.1 Signal Models

The signal models are created using the POWHEG event generator [66] interfaced to PYTHIA to simulate parton showering [61]. Events are calculated at next-to-leading-order (NLO), and the

Higgs pT spectrum reweighted to match the next-to-next-to-leading-order (NNLO) calculation with resummation at next-to-next-to-leading-log (NNLL), as described in [67]. Events are produced

74 separately for the ggH and VBF production mechanisms [68], as well as for spin parity hypotheses by the JHUGen event generator [69]. For the mass measurement (see sec. 5.5.2), WH and ZH, the models are assumed to be similar to VBF , and for ttH the model is assumed to be similar to ggH, with any differences having negligible effect due to the very small contribution of the latter models to the overall event yield. Due to an oversight, the simulated Higgs samples that were available at the time of this anal- ysis do not account for the interference between identical final state leptons. This has an effect on the final state kinematics (see Fig. 4.3). To correct this, we reweight each event depending on seven kinematic variables which characterize the decay of each event. The weight, which is based on the matrix element calculated with the JHUGen program, is the ratio of matrix elements between final states with interference (4e and 4µ) and those without interference (2e2µ). The results of the reweighting is that the discrimination between signal and background is improved, mainly because of to a change in the Z1 and Z2 distributions.

4.4.2 Irreducible Background

The largest source of background after the event selection are the irreducible background (ZZ and Zγ). They are modeled using simulated data created using the POWHEG program. They are found to be in good agreement with data in sideband regions as well as for events in the Z 4ℓ → peak (see Fig. 4.4), wherein one of the decay leptons from a Z boson radiates a photon which pair- produces two photons. Irreducible background is produced for qq ZZ and gg ZZ separately. → → See Table 4.3) for details on the simulated event samples.

4.4.3 Reducible Background

After applying the selection cuts outlined above, the simulated events are not sufficient in event count to provide an accurate estimate for the reducible background in the Higgs phase space region. In addition, it is necessary to validate the simulation for such background sources, since they involve events where one or more jets are misidentified as a lepton due to instrumental effects that may not be modeled precisely in the highly restricted region of interest. Therefore, we apply a method to estimate reducible background from the observed data.

75 0.07 Norm. without weights Norm. 0.05 without weights 0.06 with weighted with weights 0.04 0.05

0.04 0.03

0.03 0.02 0.02 0.01 0.01

0 0 40 50 60 70 80 90 100 110 10 20 30 40 50 60 70

Z1 Mass (GeV) Z2 Mass (GeV)

700

600 # of events 500

400

300

200

100

0 0 0.5 1 1.5 2 2.5 Interference Weights

Figure 4.3: The effect of final state interference which is not modeled in signal simulation. Top: Z1 and Z2 mass before and after reweighting events to include these effects. Bottom: Magnitude of interference event weights.

To do this, we apply an alternate selection on the observed data in order to attain a sample with an enhanced contribution from events with one Z boson and two fake leptons:

1. Z1 candidate: Build an on-mass shell Z boson candidate defined by a same flavour oppositely

+ − + − charged di-lepton pair (e e , µ µ ), label these ℓ1 and ℓ2. Out of all possible pairs of tight leptons in a given event, we choose the one with mass closest to the nominal Z boson mass.

The di-lepton mass is required to be in the range: 40 < mZ1 < 120 GeV. Denote this di-lepton

combination as Z1

2. Find all pairs of same-sign, same flavour loose leptons that are distinct from the two leptons

selected in the previous step, label these ℓ3 and ℓ4. Denote such di-muon systems as Z2. If

more than one Z2 candidate satisfies the previous criteria, choose the one with the highest

pT . Impose the requirement that the invariant mass of the best Z2 candidate is in the range:

12 < mZ2 < 120 GeV.

76 CMS Preliminary - s = 8TeV

35 Reducible Irreducible 30 Higgs Data: 19.6 fb-1 events/bin 25

20

15

10

5

0 100 150 200 250 300 350 400 450

m4l

Figure 4.4: Distribution of the four lepton mass, m4l, for the 4e, 4µ, and 2e2µ channels combined.

77 3. The resulting four lepton system must pass the remaining selection outlined in sec. 4.3.4.

By selecting same-sign candidates, we eliminate signal contamination and irreducible background from our control region. We are then left with a sample consisting mainly of backgrounds containing one Z boson and two fake leptons. The signal/background discriminant (see sec 4.7) is not trained using the IsoPF variable. Because of this, the discriminant distribution is found not to depend on this variable. In order to predict the contribution of reducible background in the signal region given the events in the control region, we calculate the probability for a loose lepton to pass the tight lepton selection. This probability is called the fake rate, denoted fe for electrons and fµ for muons. To calculate the fake rate, we define a set of events with one Z boson and one fake lepton. We then use this fake rate to calculate the probability for an object with one Z boson and two fake leptons to pass the full selection. The sample used to calculate the fake rate is defined as follows:

1. Z1 candidate: To restrict the selection to events that are kinematically similar to reducible background events, first build an on-mass shell Z boson candidate defined by a same flavour oppositely charged di-lepton pair (e+e−, µ+µ−). Out of all possible pairs of tight leptons in a given event, we choose the one with mass closest to the nominal Z boson mass. The

di-lepton mass is required to be in the range: 40 < mZ1 < 120 GeV. Denote this di-lepton

combination as Z1.

2. Require exactly one additional loose lepton which is distinct from the leptons of the Z1 selected in the previous step.

3. Calculate the probability for the lepton to pass the tight lepton selection as a function of

pT , η, and nHitsmissing (for electrons). Denote this probability as fe(pT ,η,nHitsmissing) for

electrons or fmu(pT ,η) for muons.

We then extrapolate the reducible background contribution in the signal region by applying the

MC fake rates and ratio of events where ℓ3 and ℓ4 have opposite charge or the same charge, ROS/SS, which is measured using Monte Carlo events. We select only one loose lepton in the fake lepton region, but we select two fake leptons in the control region, and make additional constraints on the invariant mass of these leptons. Doing this creates the following problem: Consider the case where a photon is radiated from one of the lepton legs of the Z1 (final state radiation). This photon can then pair-produce two electrons with asymmetrical pT . Due to looser requirements on the l3 in the fake rate region, it is probable to select

78 an event where one of the legs doesn’t satisfy the lepton pT threshold of 7 GeV and is lost. The probability for such a photon conversion is correlated with the number of inner layers in the Tracker with missing hits in the track corresponding to that electron, nHitsmissing. This is illustrated by Fig. 4.5, where events with one missing hit show a clear peak at the Z boson mass only for those leptons with one missing hit. We therefore calculate the fake rate for electrons as a function of the number of missing hits, nHitsmissing, as well as the pT and η of a given lepton. The quantity nHitsmissing is restricted to be less than two in the loose lepton selection.

A.U. 0.08 nmissing = 0 0.07 nmissing = 1 0.06 0.05 0.04 0.03 0.02 0.01 0 40 60 80 100 120 140 160 180 200

m3l [GeV]

Figure 4.5: The invariant mass of the selected Z boson and the loose lepton selected for fake rate calculation. The plot shows events with zero layers with missing hits in the inner tracker, and with one layer missing hits. For events with nHitsmissing = 1, the three lepton mass has a prominent peak at the Z mass, demonstrating the presence of leptons from converted photons.

Given the control region events together with the fake rates for electrons and muons, we calculate the contribution of reducible background in the signal region by assigning a weight to each even in the control region:

w = RMC f (p ,η,nHits ) f (p ,η,nHits ). (4.1) Z+X OS/SS ∗ l1 T missing ∗ l2 T missing

79 0.35 0.35

CMS Prelim. 2012, 19.6 fb-1 CMS Prelim. 2012, 19.6 fb-1 0.3 Electrons - Z(ll)+e barrel (|η| < 1.449) 0.3 Muons - Z(ll)+µ barrel (|η| < 1.2) endcap endcap Fake Ratio 0.25 Fake Ratio 0.25

0.2 0.2

0.15 0.15

0.1 0.1

0.05 0.05

0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 p [GeV] p [GeV] T T

Figure 4.6: The probability for a loose electrons (left) or muons (right) to pass the selection for tight leptons, as a function of pT .

CMS Preliminary s = 8 TeV, L = 19.6 fb-1 1200 2P+2F Region (PFC) * 1000 ZZ, Zγ WZ 800 tt Z+jets Events / 10 GeV 600 observed

Landau(x) × P4(x) fit 400

200

0 100 150 200 250 300 350 400 450 500 550 600

m4l [GeV] Figure 4.7: Events in the reducible background control region for 4e, 4µ, and 2e2µ final states combined.

80 0.22 0.5 CMS Prelim. 2012 - 19.6 fb-1 CMS Prelim. 2012 - 19.6 fb-1 0.2 Signal Region, 4µ Signal Region, 4e 0.18 0.4 0.16

Events / 10 GeV 0.14 Events / 10 GeV 0.3 0.12 0.1 0.08 0.2 0.06 0.04 0.1 0.02 0 0 100 150 200 250 300 350 400 450 500 550 600 100 150 200 250 300 350 400 450 500 550 600

m4l [GeV] m4l [GeV]

0.8 CMS Prelim. 2012 - 19.6 fb-1 0.7 Signal Region, 2e2µ 0.6

Events / 10 GeV 0.5 0.4 0.3 0.2 0.1 0 100 150 200 250 300 350 400 450 500 550 600

m4l [GeV]

Figure 4.8: Estimate for reducible background in the signal region plotted separately for the final states 4e, 4µ, and 2e2µ.

81 4.5 Lepton Efficiency

The efficiency for reconstructing and identifying leptons, as well as the efficiencies for applying cuts on lepton isolation and impact parameter (see sec. 4.3.4) are measured using the tag-and-probe technique [52]. In this technique, we define tag leptons, which have a high probability of being electrons coming from a Z decay, and probe leptons, which have looser selection criteria, but whose efficiency is close to unity. In an event with both a tag and probe electron, we calculate the di-lepton invariant mass for two cases: those for which the probe lepton passes a given selection criterion (for which we wish to measure the efficiency), and those which fail the criterion. Figure 4.9 shows the signal and background line shapes for tag+passing probe and tag+failing probe.

CMS Preliminary 2012 s = 8 TeV L = 2.968 fb-1

Tag + passing probe sample Tag + failing probe sample Signal + Background 1400 60 T 20 < p < 30 GeV Background probe 1.4442 < |η | < 2.5 probe Signal 1200 50

Events / ( 1 ) 1000 Events / ( 1 ) 40

800 30 600

20 400

10 200

0 0 60 70 80 90 100 110 120 60 70 80 90 100 110 120

m(tag, probe) [GeV] m(tag, probe) [GeV]

Figure 4.9: Tag and probe fits used to measure electron identification efficiency. On the left is the di-lepton mass for events where the probe electron passes the selection criterion, and on the right is the di-lepton mass for events with probes failing the selection. The signal (green) and background (red) fits are shown for the failing probe plot.

Each histogram is fitted using an extended maximum likelihood fit given functional forms for the signal and background (with the same functional form for both passing and failing histograms). From the fit we extract the normalization of the signal shape for plots with passing probles and failing probes. This normalization constant is an estimator for the number of Z ee events in the → plot. The ratio of the normalizations for the plot with passing probe and failing probe estimates the efficiency of the selection criterion under consideration.

82 The efficiencies measured in such a way can be factorized – to measure the total efficiency for several selection criteria, we use the events passing the first criterion to conduct the tag-and-probe analysis for the subsequent selection criterion. In this way we measure the trigger, identification, isolation, and impact parameter cut efficiencies according to the formula:

ǫ = ǫ ǫ ǫ ǫ . (4.2) SIP |ISO × ISO|ID × ID|RECO × RECO|clustering

These efficiencies depend on the detector geometry and the momentum scale, since calorimetry response depends on the momentum of the measured lepton as well as the region in the detector where the lepton is measured (the barrel region has finer granularity and lower background). Therefore the tag-and-probe analysis is done separately in bins of the two-dimensional pT and η space. The efficiency measured in real data and simulated data is used correct the Monte Carlo events to ensure that the measured efficiency in real data and simulated data match. This is done by defining a scale factor which is dependent on lepton pT and η. The weights for the four leptons are then used to compute a weight for Monte Carlo events. The uncertainty for efficiency measurements is calculated by using the Clopper-Pearson confidence intervals [70] to propagate the uncertainties on each efficiency term in Eq. 4.2 to the uncertainty to the total Monte Carlo scale factor. The uncertainties are included in the final analysis as nuisance parameters (see sec. 4.8).

Electron Efficiency. Electron effiencies are measured in the following steps:

1. Electron Reconstruction and Identification: Measurements for the reconstruction and BDT identification uses the tag-and-probe method with ZZ ee decays. For electrons below → 15 GeV a complementary approach is used to improve the measurement where Z eeγ events → are selected with tight selection on the photon and one e. This enhances the population of

low pT electrons while maintaining a managable purity of Z events. Reconstruction and identification efficiencies for electrons and muons are shown in Figures 4.10 and 4.11.

2. Electron Isolation and SIP3D: For events passing the previous selection we measure the iso- lation and impact parameter in a similar way using Z ee decays. → 3. Trigger: The efficiency of double electron triggers is measured using events passing single electron triggers.

Muon Efficiency. Muon efficiencies are factorized into the following steps:

83 Figure 4.10: Electron reconstruction efficiencies in the Barrel in 7 TeV data.

Figure 4.11: Electron identification efficiencies for the BDT electron identification in the Barrel in 7 TeV data.

84 1. Tracking efficiency: the effiency to reconstruct a muon track. This is measured by tagging on tight muons, where the probe is any muon with hits in the muon system.

2. Reconstruction and Identification: The efficiencies for reconstruction in the muon system and particle flow isolation are measured for muons with a sucessfully reconstructed track. The tag-and-probe method is used with ZZ µµ events for muons with p > 15 GeV. For muons → T with pT lower than 15 GeV, J/Ψ decays are used.

3. Impact Parameter Significance: For reconstructed, identified leptons, the SIP cut efficiency is measured using the tag-and-probe method with ZZ µµ events. The efficiency is over → 99% in all areas of the detector (see Fig. 4.12).

4. Isolation: For events passing the SIP > 4 cut, the isolation cut efficiency is measured using the tag-and-probe method with ZZ µµ events (see Figures 4.12 and 4.12). → 5. Trigger: For events passing the reconstruction, identification, SIP , and isolation, the tag-and- probe method is used to calculate the muon efficiency for the double muon trigger. This is done using single muon triggers. A muon selected by a single muon trigger always has tighter trigger criteria than either muon in a double muon trigger. Therefore, if the second muon passes the double muon requirements, then both the tag and the probe will pass.

Figures 4.12 and 4.13 show a selection of efficiency measurements for muons. Figures 4.9, 4.10, and 4.11 show efficiencies for electrons.

4.6 Event-by-Event Uncertainty

The uncertainty in the four-lepton mass can be estimated from the uncertainties in momenta of the individual leptons and photons in each event. This information is used in the statistical analysis and is especially relevant to the Higgs boson mass measurement. The uncertainty assigned to electrons depends on the electron reconstruction method used. For ECAL-driven electrons (see Sec. 3.4), the ECAL energy measurement is combined with track momenta measurements by taking a weighted average of both, where the weight is equal to the inverse of the variance of each measurement [48]. For tracker-driven electrons which are relevant for low energy electrons, the ECAL energy error is parameterized as a function of ECAL energy. For muons, one uses the full error matrix for the

85 Figure 4.12: Muon efficiencies for Impact Parameter Significance cut (top left), Particle Flow Isolation vs. vertex (top right).

Figure 4.13: Muon efficiencies for Particle Flow Isolation vs. pT in the barrel (top left), and Particle Flow Isolation vs. pT in the endcap (bottom right).

86 five track parameters from the tracker fit (see Sec. 3.1). To estimate the uncertainty for the four- lepton mass, we propagate the errors on the four-momenta of the constituent particles, including FSR photons, using standard error propagation (which is valid when the relative errors are small). The four-lepton mass can be written as a function of the four vectors of the leptons:

m4l = F (~pl1, ~pl2, ~pl3, ~pl4), (4.3)

= F (pT,1,η1,φ1,...,pT,n,ηn,φn), (4.4) ∂F f + x , (4.5) ≈ 0 ∂x i i i X = f0 + a~x, (4.6)

where the four momentum for the ith particle is denoted, pli = pli(pT,li,ηli,φli). In this equation we have approximated the function for m with its first degree expansion, where a = ∂F . For this 4l ∂xi analysis, the input variables comprise 12 or 15 degrees of freedom, depending on whether there is an FSR photon. The set of variables representing these degrees of freedom are denoted, ~x. The error on m4l is then written as,

2 ⊤ σm4l = aΣa , (4.7) where the covariance matrix of the input variables, ~x, is denoted Σ. The covariance matrix used is the error matrix coming from the lepton and photon momentum measurements.

Alternatively, one can use the approximation for the m4l error in which the errors in m4l due

2 to individual lepton errors (which we denote as δmi ) are added in quadrature. If we neglect the uncertainty in the angular variables, we may write,

δmi = F (...,pT,i + δpT,i,η1,φ1,...), (4.8)

2 δm = δmi . (4.9) i sX

Both approximations to the mass uncertainty agree to within 1%. The mass resolution is shown to agree well between simulated events and real events for multiple independent samples of events: for Z 4l events (where one lepton in the Z ℓℓ radiates a photon which then pair produces two → → leptons); for ZZ 4ℓ control region; and for events in the Z + X control region (see sec. 4.4.3) →

87 [24]. This shows that the mass error is well modeled by the simulated data.

88 4.7 Multivariate Discriminant

4.7.1 Introduction

The advancement of experimental particle physics is characterized by the search for increasingly rare signals hiding behind huge backgrounds arising from known physical processes. To find these signals, it is necessary to construct experiments of increasing scale and complexity: larger colliders, more complex detectors, which are accompanied by larger data sets. In addition, the physical models used to describe the signal and background processes are increasingly complex, involving many channels and many observables, and resulting in intricate detector signatures that are shaped in a complicated way by the properties of the models. So rather than looking for bumps over constant backgrounds in one dimension, we must attend to multiple observable variables, all of which contain physically relevant information. In order to fully utilize the information present in the data of such grand experiments physicists have developed so-called “advanced analysis techniques” [71] which allow for the analysis of many variables at once. These methods were widely used in physics analyses at the Tevatron, LEP, HERA, and b-factory experiments, and notably in the discovery of the top quark and single-top [72] [73] [74]. The case of the Higgs boson decaying to four leptons, and its detection at the CMS detector, is such a complex analysis requiring advanced analysis techniques. Given a theoretical model of a quantum mechanical process, we can calculate the probability to obtain a given final state particle of a given four-momentum by computing the matrix element. This probability can be used to randomly generate events for new models as well as for known standard model processes. By comparing predicted distributions of signal and background to data (using the tools described in section 5.3), we can test the validity of a particular model and make discoveries or falsify hypotheses. Advanced analysis techniques used in particle physics include machine learning methods such as boosted decision trees (BDTs), multilayer perceptrons (MLPs) or Bayesian neural networks (BNNs), as well as kernel density estimators, matrix element methods (briefly described in section 4.7.2) and other techniques. For a review of such methods and their application to high energy physics, see [71]. In this analysis we make use of BDTs (see sections 3.4) for electron energy correction and identification. To discriminate between various quantum mechanical processes on an event-by- event level – whether between SM background and hypothesized signal, or between alternate signal hypotheses – we use Bayesian neural networks. Both of these techniques provide a robust tool for using the information from many variables that characterize an event, in order to classify events

89 based on their probability of originating from a given process. In section 4.7.2 I will describe the principles of event classification, and summarize a few methods for classification. In section 4.7.3 I will discuss the particular implementations used in this analysis to search for the Higgs boson and to measure its properties.

4.7.2 Classifiers

To classify events according to their production process, we want make use of all the information at our disposal: the set of kinematic variables as outlined in section 1.6.4 as well as the detector variables used in chapter 4. Denote the set of these variables used for descrimination as ~x. From these variables, we are concerned with calculating one number: the probability that an event is a signal event, given the observation. This is a problem of binary classification. We want to compute:

P (~x s)P (s) P (s ~x)= | , (4.10) | P (~x s)P (s)+ P (~x b)P (b) | | given the prior signal to background ratio P (s)/P (b). In binary classification, P (s ~x)+ P (b ~x) = 1. | | Matrix Element Likelihoods. The functions P (~x s) and P (~x b) in Eq. 4.10 are simply the | | likelihoods for the signal and background processes. We calculate these likelihoods from physical principles. Given the likelihoods which represent the signal and background models, we can calculate the probability in Eq. 4.10. Methods which employ this strategy are known as Matrix Element Likelihood Methods. In general, they calculate the desired probability using the matrix elements available in software packages that generate events. The use of matrix elements is often computationally intensive since one must integrate over all unmeasured variables for every observed and simulated event. In addition, matrix element likelihoods are typically calculated at the parton level. Therefore, they do not account for detector effects unless an explicit smearing of the parton-level variable is performed. If the latter is done, this further increases the computational burden. Since, in such methods, one uses the matrix element representing the interaction and decay of particles at the parton level, the resulting discriminant only depends on kinematic variables. Additional discrimination coming from any other available variables must be included in the analysis separately. In order to simplify the analysis, we seek to approximate Eq. 4.10 using machine learning techniques. Machine learning techniques are algorithms for automated learning from data. In this way we take advantage of fully simulated events to obtain a function which approximates Eq. 4.10

90 at the detector level that is easily computed at run time. We would like to derive a function f(~x, ~ω), which approximates the function P (s ~x), using a set of training data, t, ~x , | { i}

f(~x, ~ω) P (s ~x), (4.11) ∝ | where ~ω are a set of free parameters that define our training model.

Boosted Decision Trees. A simple way to approximate the desired function is to split the multidimensional input space of ~x into subspaces, each of which is assigned a number that approx- imates its relative composition of signal and background. One of the most popular methods is a decision tree, which is similar to binary search tree in that it splits the input in two two pieces in one dimension in successive binary decisions. This is done by optimizing each binary decision choice over a set of input variables. Boosting is a technique for averaging over many input trees by reweighting the training events that are misclassified and reapplying the decision tree algorithm. Boosted decision trees are used in this analysis in electron and muon identification.

Universal Approximators. We can use a generic family of functions defined by a set of parameters, ω, to model the unknown, multidimensional function by adjusting the parameters ac- cording to how well the functional output matches the training data. For example, one can model any periodic function to arbitrary accuracy by using a series of sine and cosine functions (the Fourier series). Similarly, the Wierstrass approximation theorem tells us that any continuous function, f, can be modeled to any accuracy on an interval I by an n dimensional polynomial, where n depends on the desired accuracy of the approximation, ǫ,

f(x) ωnxi = F (~x, ~ω) (4.12) ∝ i i X such that, f(~x) F (~x, ~ω) < ǫ, (4.13) | − | for any all x I. ∈ This idea, that an arbitrary, continuous function can be approximated by a series of basis functions (in this case, increasing exponents of x) can be extended to more than one dimension. For any continuous function f in the region defined by an m-dimensional hypercube, I [0, 1]m, there m ∈

91 exists an integer n and real number ǫ,

n T F (~x, ~ω)= αiφ(βi x + γi), (4.14) i=1 X such that, f(~x) F (~x, ~ω) < ǫ, (4.15) | − | where ω = α,β,γ . The function φ( ), known as the activation function, must be a continuous, { } · monotonically increasing and bounded function, such as the sigmoid function,

1 φ(ξ)= , (4.16) 1 + exp( ξ) − or the hyperbolic tangent, tanh(x). The above theorem was proved in 1989 by George Cybenko [75], and later simplified by Kurt Hornik [76] and Funahashi [77]. The authors shows that one can approximate any function on a compact subset of n-dimensional real space with a sum of multidimensional functions. We refer to such functions as universal approximating functions.

Multi Layer Perceptrons. One universal approximator, and the one used in this analysis, is the multi-layer perceptron(MLP), a type of Neural Network (NN). It has origins in the pursuit of artificial intelligence (see, for example, [78]). It approximates one or more output values, yk, given input values xi, with the functional form,

yk(~x, ~ω)= fk ωiφj(~x) , (4.17) i ! X for activation function f( ) and nonlinear basis functions φ( ). This can be put more generally by · · (1) defining the variables, aj, ωji , such that,

N (1) (1) aj = ωji xi + ωj0 , (4.18) i=1 X

(2) and the constants, zj and ωji , such that,

M (2) (2) zi = ωkj xj + ωk0 . (4.19) j=1 X

92 Then the output values are then given by the function y = σ(a ). The parameters ~ω = ω(1),ω(2) k k { ji ji } are referred to as weights, whereas the zeroth power element are referred to as biases. We use the sigmoid function, Eq. 4.16, as an activation functions, and the tanh(x) function as the basis func- tion. The ω(1) terms add extra degrees of freedom independent of the number of input and output nodes and are referred to as hidden nodes. Figure 4.14 shows a diagram of this network. In principle we can add multiple layers of hidden nodes in a similar manner. For a single layer NN, the universal approximator theorem states that we can achieve any level of accuracy using this functional form, given that we include enough hidden nodes. It is possible to embed multiple layers of Eq. 4.18 and 4.19 to form multiple layers of hidden nodes, or to attach input nodes directly to output nodes, or to do many other things. However, for more than one level of hidden nodes the situation is less clear and there doesn’t yet exist a proof of their universality. In this analysis, we only use neural networks of a single layer and a single output of the form outlined above.

Figure 4.14: Model of a single layer feedforward neural network (graphic from [4]).

The functional form in Eq. 4.17 defines a class of functions, F f(~x, ~ω) , which depends on the { } full set of weights and biases. We know from the universal approximation theorem that, given enough hidden nodes, there exists at least one set of ω which can approximate y to our desired accuracy. The problem, then, is to use this class of functions to create an approximation of 4.10. To do so we must determine a measure of how well a set of parameters, ~ω approximates our function. To do this we need some measure of the performance of our neural network with respect to the true value of

93 the modeled pdf, yk. We thus define the loss function, L(y,f), which measures the loss incurred by using an approximation. The choice of loss function depends on the problem at hand. Some include ˆ the squared error, (y f )2, or the Kullback-Leibler divergence ln f(x,θ) f(x, θˆ)dx (see sec. B). i − i y The expectation of the loss function is the risk function, which isR the expectation value of the loss function for all points in input space:

R(ω)= E (L(y,f)) = L(y,f)f(~x, ~ω)dx. (4.20) ZX

If we have a sample of points from the real distribution we are modeling, such as a selection of signal and background Monte Carlo events, we can approximate this function with the empirical risk function (ERF): N R(ω) L(y,f(~x , ~ω)) (4.21) ∝ i i X For N data points. If we assume gaussian noise, the empirical risk is identical to the negative log likelihood for the data set consisting of the N data points. For the binary classification problem, the negative log likelihood is given by the empirical risk function which uses the KL divergence loss function. There are several ways to derive such an approximation. The frequentist way is to use the method of maximum likelihood, where we minimize the ERF, and find some representative set of weights and biases,ω ˆ. Alternatively, we can perform a Bayesian marginalization. This is the method we use in this analysis, which we will discuss in the next section. To deriveω ˆ in an efficient way we use an iterative method of training, where we begin with an arbitrary set of parameters, compare them with our training data using a likelihood, and iteratively update the parameters in successive training epochs. For binary classification, the appropriate likelihood is the probability for obtaining our M training events given signal and background models:

M P (~xi s) for signal = | (4.22) L  i P (~xi b) for background Y | M  = P (~x s)tP (~x b)1−t. (4.23) i| i| i Y

94 Then, note that P (x s) = P (s x)P (x) and P (x b) = P (b x)P (x). By using these identities, the | | | | likelihood can be written,

M P (s ~x )tP (b ~x )1−t (4.24) L∝ | i | i i Y M = P (s ~x )t(1 P (s ~x ))1−t (4.25) | i − | i i Y M = f(~x , ~ω)t(1 f(~x , ~ω))1−t, (4.26) i − i i Y where we have used the fact that, for binary classification, P (b x) = 1 P (s x). The risk function | − | is the log of this likelihood. By minimizing the risk function we approximate the signal probability, P (s ~x). A standard method for minimization is gradient descent, another is Markov chain Monte | Carlo (MCMC) [79]. The idea of minimizing the risk function to obtain a set of network parametersω ˆ, as noted above, is similar to the method of maximum likelihood, wherein a parameter is estimated by maximizing the likelihood with respect to the parameters. Since the space of parameters is a high dimensional space (NM + 2N + 1 dimensions) and complex (2H H! equivalent points), representing the neural ∗ network with a single point poses the risk of obtaining a local minimum in parameter space, or fitting a statistical fluctuation in the training sample that yields poor results on any other sample. It is usually necessary to add a penalty term to the risk function, which prohibits large weights, or to use something like cross validation, which splits the training sample into multiple parts to test for overtraining and to stop the training procedure. In general, any such method which mitigates overtraining or poor training requires additional training events. An alternative to such methods is to view training as a problem of inference using a Bayesian approach to model the network parameters.

Bayesian Neural Networks. Instead of minimizing the ERF, with respect to the neural network parameters , ~ω, we seek a more robust utilization of the parameter space F f(~x, ~ω by { } marginalizing over the entire space of parameters,

f(~x)= f(~x, ~ω)P (~ω T )d~ω, (4.27) | Z where P (~ω T ) is a posterior density defined over the network parameter space and T denotes the | training data. Since the parameters of the neural network have no straightforward physical interpre-

95 tation, an evidence based prior is difficult to derive (see section B). Instead, we choose a plausible class of priors that constrain the weights to be small. Since we do not have specific information to set the relative values of the various parameters, we can reduce the number of degrees of freedom by assigning hyper-parameters for a group of similar parameters. For example, we require all parameters in a hidden layer to be of one group, to which we assign the same variance and treat the variance, 1/τ, as a hyper-parameter governed by a hyper-prior. We take the priors for the parameters of a given group, u = u ,u ,...,u , to be, { 1} 1 2 k

τ k u2 P (u τ )= u exp τ i (4.28) i| u √ − u 2 2π i !   X The priors for the hyper-parameters, τ, can be chosen to be vague or restrictive in order to minimize the effect of noise in the training sample, and therefore the possibility of overtraining. We use gamma distributions for priors for computability reasons. For full details on the implementation see the dissertation of Radford Neal [80]. This approach yields a discriminant that is resistant to overtraining and is nearly optimal, usually requiring fewer training events to achieve optimum performance when compared to other machine learning techniques. The discriminant has the straight-forward interpretation as the probability, P (s ~x), that a particular event, described by variables ~x, is a signal | event.

4.7.3 Training Events

In this section, I will describe the training details of the BNNs used for the H ZZ 4ℓ → → analysis and evaluate their performance. Training is done with fully reconstructed events. Therefore, all the information from physics events – from event generation, parton showering and detector simulation – are included. In addition, with machine learning algorithms, one can include any well- modeled variable which provides discriminatory information in the training. For the BNNs we use the Bayesian neural network implementation written by Radford Neal [81]. The BNNs are trained using the same simulated data sets that are used for other parts of the analysis. The available data sets are split into two independent subsets: a training set and a testing set. To avoid any bias, the training sets are discarded for all subsequent analysis, including the evaluation of the performance of the BNNs and the final statistical analysis presented in sec. 5.2. The events used for training are those that survive the selection for the Higgs phase space defined in section 4.3.4.

96 CMS Preliminary CMS Preliminary 0.1 0.1

a.u. 0.09 Scalar a.u. 0.09 Scalar

0.08 Pseudoscalar 0.08 Pseudoscalar 0.07 0.07 0.06 0.06 0.05 0.05 0.04 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0 0 -1 -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 -1 -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 φ φ*

CMS Preliminary CMS Preliminary 0.1

a.u. 0.14 a.u. Scalar 0.09 Scalar

0.12 Pseudoscalar 0.08 Pseudoscalar 0.07 0.1 0.06 0.08 0.05 0.06 0.04 0.03 0.04 0.02 0.02 0.01 0 0 -1 -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 -1 -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1

cos(θ*) cos(θ1)

CMS Preliminary

a.u. 0.1 Scalar

Pseudoscalar 0.08

0.06

0.04

0.02

0 -1 -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1

cos(θ2)

Figure 4.15: Angular decay variables used as input variables for the BNNs. Differences ∗ between signal and background can be seen for decay angles cos(θ ), cos(θ1) and cos(θ2) whereas differences between signal production mechanisms can be seen for production angle, φ∗. Higher order correlations can be seen in the correlation matrix.

97 CMS Simulation CMS Simulation 0.3 0.3

a.u. Reducible a.u. Reducible 0.25 Irreducible 0.25 Irreducible Higgs Higgs 0.2 0.2

0.15 0.15

0.1 0.1

0.05 0.05

0 0 50 60 70 80 90 100 20 40 60 80 100 120

mZ1 mZ2

Figure 4.16: Additional input variables for signal/background BNN: Z1 and Z2 masses normalized to show the difference in shape of the distributions for the signal and background processes.

4.7.4 BNN Training

The choice of input variables depends on the desired function of the descriminant. We use two BNNs in this analysis. The first BNN is trained to distinguish events in simulated Higgs samples and simulated qq ZZ, or irreducible, background. The second is trained to distinguish signal → production processes.

Table 4.4: Parameters used to define neural network functions and hyperparameters in the BNNs.

Parameter Description Value Input Nodes Ninput 12 Hidden Nodes Nhidden Ninput + 5 Output Nodes Noutput 1 (binary classification) Training Events ntraining 5000 signal + 5000 background Activation Function - tanh Hyperparameter Shape αh 0.5 Hyperparameter Width σh 0.05 MCMC Steps Nsteps 100 Neural Networks Nbnn 100

98 Signal/Background Discrimination. Background processes which contain two Z bosons can be distinguished from H ZZ events by using the kinematic constraints on the four vectors of → the four leptons. Therefore the input variables to a BNN can be any twelve variables representing these degrees of freedom. Due partially to convention, we choose the following set of input variables for the BNN: the five angular variables described in section 1.6.4, the Z1 and Z2 masses, and m4l of the four-lepton system. By using these variables we can eliminate redundant degrees of freedom (such as the total φ angle of the Higgs boson system). We also leave out the pT,4l (or Higgs pT ) variable since we will use it in the second BNN to distinguish production mechanisms. The distributions for these variables are shown in figures 4.15 and 4.16. We could also include other variables outlined in section 4.3.4. However, the isolation and identification variables cannot be used as they are used to calculate the reducible background from fake leptons. If we trained a BNN using either of these variables, the BNN discriminantion performace for reducible background would be biased. The SIP variable could, in principle, be added as in input variable, but we instead require a threshold in the event selection. The set of Higgs boson Monte Carlo samples are generated for a set of Higgs boson masses. Since the mass of the Higgs boson is a free parameter of the SM (before the observation) we would like to conduct our analysis over a continuous range of masses. To do this, we train the BNN using an ensemble of Higgs boson samples for different masses. Since this analysis concentrates on a low-mass Higgs boson search, we use an admixutre of masses between 110 and 180 GeV, weighted according to their relative expected event yield. The advantage of this is that one can then use a single BNN to construct a likelihood over the entire range of masses under consideration (see sec. 5.3). BNNs are trained separately for events originating from proton-proton collisions at center of mass energy √s = 8 TeV and √s = 7 TeV, and for the different final states, 4e, 4µ, and 2e2µ.

Signal Production Process Discrimination. To measure the Higgs boson couplings, it is necessary to descriminate between Higgs boson production processes, most importantly the ggH and VBF production processes (for details see sec. 5.5.2). We therefore train a second BNN to discriminate between these processes. The VBF process is characterized by two final state jets.

As is stated in sec. 4.3.3, we require a pT threshold of 30 GeV on jets. Those jets that pass this threshold as well as the other selection criteria are labeled “tagged jets”. Both jets coming from a VBF event are reconstructed in roughly 50% of the events in simulated data. We therefore define two orthogonal categories of events:

99 0.18 0.18

0.16 Higgs (gg → H) 0.16 Higgs (gg → H) Norm. Norm.

0.14 ZZ (qq → ZZ) 0.14 ZZ (qq → ZZ) 0.12 0.12 0.1 0.1 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 BNN(x) BNN(x)

0.18 0.18

0.16 Higgs (gg → H) 0.16 Higgs (gg → H) Norm. Norm.

0.14 ZZ (qq → ZZ) 0.14 ZZ (qq → ZZ) 0.12 0.12 0.1 0.1 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 BNN(x) BNN(x)

0.18 0.18

0.16 Higgs (gg → H) 0.16 Higgs (gg → H) Norm. Norm.

0.14 ZZ (qq → ZZ) 0.14 ZZ (qq → ZZ) 0.12 0.12 0.1 0.1 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 BNN(x) BNN(x)

Figure 4.17: BNN discriminant and signal/background efficiency for a BNN trained with five production and decay angles as well as, m , m , and m . BNNs are trained separately Z Z ∗ 4l for 4e (right), 4µ (center) and 2e2µ left. Training is done with fully reconstructed Monte Carlo events at center of mass energy 7 TeV (top) and 8 TeV (bottom).

100 1. Category 1: Events with 0 or 1 tagged jet, njets < 2

2. Category 2: Events with two or more tagged jets, njets > 1

For category 2, we define the following two variables: The invariant mass of the two jet system, mjj, and the difference between the η variables of the two jets, ηjj. These distributions are shown in Fig.

4.18. Using these variables, we train a second BNN, labeled BNNV BF , which is used to distinguish production mechanisms for events in category 2. For category 1, we use the Higgs pT distribution, pT,4l, as a discriminating variable without training a BNN. Since the Higgs boson in VBF events recoils from the two jets, it differs in pT from ggH events. The VBF/ggH BNN and the pT,4l/m4l distributions are shown in Fig. 4.19.

CMS Simulation CMS Simulation 0.3 0.4

a.u. ZZ a.u. ZZ 0.35 0.25 ggH Higgs ggH Higgs VBF Higgs VBF Higgs 0.3 0.2 0.25

0.15 0.2

0.15 0.1 0.1 0.05 0.05

0 0 0 1 2 3 4 5 6 7 8 1002003004005006007008009001000 ∆ η m jj jj

Figure 4.18: Input variables for VBF/ggH BNN: mjj and ∆ηjj, normalized to show the difference in shape of the distributions for the VBF, ggH, and ZZ processes.

4.7.5 Discriminant Performance

The distribution of the BNN discriminant for signal and irreducible background is shown in Fig. 4.17. To compare the performance the BNN with other discriminants, we use a method borrowed from signal detection theory called the receiver operator characteristic (ROC). The ROC plot is the plot of the signal efficiency, ǫs, versus the background efficiency, ǫb, for all possible thresholds on the discriminant. If one makes a two dimensional plot of ǫs and ǫb, the performace of two discriminants can be compared. Discriminants with values closer to (0,1), that is, zero background efficiency and

101 CMS Simulation CMS Simulation 0.25 1

a.u. ZZ a.u. 121.5 < m < 130.5 ZZ 0.9 4l ggH Higgs ggH Higgs 0.2 VBF Higgs 0.8 VBF Higgs 0.7 0.15 0.6 0.5 0.1 0.4 0.3 0.05 0.2 0.1 0 0 0 20 40 60 80 100 120140 160 180 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p T,4l BNNVBF(x)

Figure 4.19: Discriminating variables used to separate Higgs production mechanisms: pT,4l and BNNVBF , normalized to show the difference in shape of the distributions for the VBF, ggH, and ZZ processes.

100% signal efficiency have better discriminatory performance. Figure 4.20 shows the ROC curve for BNNs comaped to other discriminants commonly used in the H ZZ 4ℓ high energy physics. → →

s 1 s 1 ∈ ∈ 0.9 0.9 0.8 0.8 BNN BNNVBF 0.7 0.7 M.E. Likelihood Linear Disc. 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.10.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.10.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

∈s ∈b

Figure 4.20: Receiver operator characteristic plots comparing the performance of multivari- ate discriminants. On the left is the signal/background BNN compared to a matrix element likelihood method (for the 2e2µ chanel, √s = 8 TeV). On the right is the ggH/VBF BNN compared to a linear discriminant.

102 We use independent samples to train BNNs and to perform all subsequent analysis in order to prevent training bias in the analysis. However, overtraining can indicate suboptimal performance of a multivariate discriminant. We therefore perform an overtraining test by comparing the BNN discriminant distributions for the training sample and the testing sample. Figure 4.21 shows such a test carried out for the signal/background BNN.

0.12 signal test signal test 0.1 KS test sig: 0.00047 KS test sig: 0.39 KS test bkg: 0.78 signal train KS test bkg: 0.92 signal train bkg test 0.1 bkg test bkg train bkg train

Normalized 0.08 Normalized 0.08

0.06 0.06

0.04 0.04

0.02 0.02

0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 BNN(x) BNN(x)

signal test 0.1 KS test sig: 0.079 KS test bkg: 0.99 signal train bkg test bkg train

Normalized 0.08

0.06

0.04

0.02

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 BNN(x)

Figure 4.21: Overtraining test: comparison of the performance of the BNN on signal and background files event samples for the channels 2e2mu (top left), 4µ (top right) and 4e (bot- tom). The Kolmogorov-Smirnov test (displayed as “KS test”) indicates good compatibility between testing and training sets. The plotted signal sample is the 125 GeV Higgs signal Monte Carlo. Both signal and background are for √s = 8 TeV MC.

Nonlinear multivariate methods such as BNNs are especially useful for samples with highly cor- related input variables. It is therefore interesting to examine the correlations for the training samples to evaluate the possible advantage of using a particular multivariate analysis technique. Figure 4.22 shows the correlations for the input variables to the signal/background BNN.

103 Correlation Matrix (signal) Correlation Matrix (background)

Linear correlation coefficients in % Linear correlation coefficients in % 100 100 f_pt4l 3 5 22 100 f_pt4l 4 10 -3 15 7 23 -28 1 100 80 80 f_mass4l 18 40 100 22 f_mass4l -14 3 -9 -19 -3 16 16 100 1 60 60

f_Z2mass f_Z2mass 22 100 40 5 40 -5 15 -13 -2 2 -49 100 16 -28 40

f_Z1mass 100 22 18 3 20 f_Z1mass -4 12 1 -11 100 -49 16 23 20

f_angle_costheta2 100 0 f_angle_costheta2 34 -6 2 9 100 -11 2 -3 7 0

f_angle_costheta1 100 -20f_angle_costheta1 -1 -4 -20 100 9 1 -2 -19 15 -20 -40 -40 f_angle_costhetastar 100 f_angle_costhetastar 11 -2 100 -20 2 12 -13 -9 -3 -60 -60 f_angle_phistar1 2 100 f_angle_phistar1 -4 100 -2 -4 -6 15 3 10 -80 -80 f_angle_phi 100 2 f_angle_phi 100 -4 11 -1 34 -4 -5 -14 4 -100 -100 f_angle_phif_angle_phistar1f_angle_costhetastarf_angle_costheta1f_angle_costheta2f_Z1massf_Z2massf_mass4lf_pt4l f_angle_phif_angle_phistar1f_angle_costhetastarf_angle_costheta1f_angle_costheta2f_Z1massf_Z2massf_mass4lf_pt4l

Figure 4.22: Correlation matrix for BNN input variables for gg H signal (with m = → H 125 GeV) and qq ZZ background. →

Spin/Parity Hypothesis Discrimination. Using the same variables used for the signal/background discriminant, we train a BNN to distinguish between the SM Higgs bosons and Higgs bosons with non-SM spin and parity. The performance is found to be almost identical to matrix element likeli- hood methods reported in [24].

104 4.8 Systematic Uncertainties

Our final measurements of the properties of the Higgs boson are only as good as our knowledge of the detector and our knowledge of the theory used to predict the signal and background. We quantify our imperfect knowledge of the detector and theory by including experimental and theo- retical uncertainties in our analysis. Statistical inference about the properties of the Higgs boson requires the construction of a probabilistic model, which will be presented in full in section 5.2. We must include a full description of the statistical uncertainties in such a model. In this section I will outline the systematic uncertainties that affect this analysis, how they are estimated and how they are modeled. The signal and background predictions enter the model via counts in our binned likelihood, Eq. 5.5. This likelihood depends on the observed data as well as the signal and background models, s and b, respectively. Each systematic uncertainty introduces an additional parameter, θ, to the likelihood via the signal and background counts:

= P (D s,b) P (D s(θ),b(θ)). (4.29) L | → |

Each systematic uncertainty is estimated with a dedicated study. For experimental uncertainties this usually involves an independent experiment to study the performance of a particular piece of the detector. With theoretical uncertainties it usually involves recalculating predicted numbers by varying theoretical assumptions. For instance, by varying the resummation scale used to hadronize events in Monte Carlo simulation, we can estimate the change in shape of our final distributions. We encode our knowledge of the systematic uncertainties by constructing a prior density for all the systematic uncertainty parameters: π(θ) = i π(θi), which is the product of priors for individual parameters, assumed to be independent of eachQ other. In this section I will describe how this prior is constructed. I will outline the functional forms of the priors for individual systematic uncertainties, then I will discuss the specific systematic uncertainties that affect the H ZZ 4ℓ analysis. → →

4.8.1 Systematic Uncertainty Priors

We assign a density with a particular functional form to the nuisance parameters that define the effect of the uncertainties. The functional form of the systematic uncertainty depends on the nature

105 of the systematic uncertainty under consideration. The following is a summary of the functional forms used in this analysis. For more details see Ref. [82].

Normal/Gaussian Distribution • For an observable x, with mean value µ and standard deviation σ the density is,

1 (x−µ)2 π(x)= e 2σ2 . (4.30) σ√2π

Since an observable with a Gaussian distribution is not constrained to be greater than zero, a Gaussian is only used for values that can be negative.

Log-Normal Distribution • The log-normal distribution is the p.d.f. of a random variable whose log is normally dis- tributed. Therefore, a log-normal random variable, X, has the form X = eN(µ,σ), where N represents the normal distribution. We can model a log-normal variable with random variable

µ+σZ Z µ Z which is normal with mean 0 and unit variance: X = e = X0κ , with X0 = e and κ = eσ. The p.d.f. for a log normal distribution is:

1 [ln(θ/θ˜)]2 π(x)= exp 2 , (4.31) θ ln(κ)√2π 2ln (κ) !

where θ˜ is the estimate for the observable, and κ characterizes the uncertainty estimates. The uncertainties are expressed in tables 4.5 and 4.6 in terms of κ.

The advantage of a log-normal distribution is that its p.d.f. is truncated at zero, so that, unlike the normal distribution, an observable with a log-normal systematic uncertainty cannot fluctuate below zero. This is the case for most uncertainties in this analysis whose associated nuisance parameters appear as modifications to the overall event yield for signal or background hypotheses. Therefore, most systematic uncertainties in the H ZZ 4l analysis use this → → distribution.

Gamma Distribution The gamma distribution arises as a natural prior for parameters that • are measured using counting experiments (see, e.g. Appendix A). In this analysis all the signal and backgrounds are estimated by counting experiments, simulated data or dedicated control

106 experiments. The gamma distribution, in its most generic form, is

1 x π(x)= xk−1 exp , (4.32) θkΓ(k) θ   where k is the shape parameter and θ is the scale parameter. As with the log-normal dis- tribution, the gamma distribution is also truncated at zero and can thus be used to model uncertainties representing event yields. Conveniently, the gamma prior is also the conjugate prior for the poisson distribution, so that it can be marginalized with a poisson likelihood exactly.

Given a prior distribution, it is straightforward to draw a random sample from it. For a normal distribution, the scale uncertainties priors can be sampled by drawing a random number ε from a normal distribution, Eq. 4.30, with µ = 0 and σ = 1, then applying a factor of (1 + εκ). For the log normal distribution, with parameter κ, the scale uncertainties priors can be sampled by drawing a random number ε from a normal distribution (Eq. 4.30), with µ = 0 and σ = 1, then applying a factor of κε. The total scale factor for multiple systematic uncertainties is then the product of scale factors.

4.8.2 Summary of Systematic Uncertainties

Experimental Uncertainties. The following is a summary of the experimental uncertainties affecting this analysis:

Luminosity The LHC Luminosity affects the total scale of the signal and background. Lumi- • nosity measurements are conducted by measuring the energy deposition in the HF detector. However, measurements from HF have been shown to be non-linear in luminosity, and suffer from several additional idiosyncrasies as outlined in [83]. Therefore, measurements of lumi- nosity are also carried out using Van Der Meer scans where the inner pixel detector is used as a counting device, proportional to the total number of interactions [84].

Trigger The trigger efficiency and their uncertainty are measured using the tag-and-probe • method which was described in section 4.5. Triggers generally depend on some scale which

miss is proportional to the total energy of the object being measured, such as total ET or ET .

For the triggers used in this analysis, they are, by definition, a function of lepton pT . They are also dependent on the pseudorapidity of the leptons being measured. The effect of trigger

107 efficiency uncertainty is accounted for by reweighting each lepton by its pT and η dependent efficiency, and generating a total event weight.

Lepton Momentum Scale Uncertainty The measurement of lepton energy and momentum • is corrected in data and MC as described in section 3. In principle, a change in the energy and momentum scale on these measurements can cause a change in shape of other kinematic

observables. The electron ET scale uncertainty is taken to be 0.4% while the muon pT scale is 0.5%. The effect of these uncertainties is expected to be small. We nevertheless assign a nuisance parameter to both uncertainties, vary the electron energy and muon momentum scale, and recalculate all relevant observables.

Data-to-MC Scale Factor Lepton reconstruction, isolation, and identification efficiencies • are measured using tag-and-probe techniques. Scale factors are applied to Monte Carlo events to match the efficiencies in data (see sec. 4.5). These factors are dependent on the position

in the detector and the energy of the lepton, and are thus parameterized by pT and η. We therefore vary assign a nuisance parameter for the scale factor uncertainties, and the MC is

rescaled upon varying this parameter, where the rescaling depends on the pT s and ηs of the leptons in each event.

Jet Energy Scale Jets are complicated objects since they are composed of multiple particles • and their energy is estimated using the tracker, ECAL, and HCAL together. One can estimate the energy of jets using standard candles such as the Z bb decays, but this is a difficult → measurement due to high background from QCD sources. Still, it is possible to estimate the energy of jets, so long as we consider the uncertainty on the energy scale as well as the reso- lution. This uncertainty is relevant to this analysis since we use jet variables to discriminate Higgs production mechanisms. We account for the uncertainty on jet measurement by vary- ing the jet energy scale (JES) and recalculating all variables which depend on the energy and momentum of jets.

Theoretical Uncertainties. The following is a summary of the theoretical uncertainties that we consider in this analysis:

Cross Section The theoretical uncertainties on the cross section of each production mode is • treated in detail by the Higgs Cross Section Working Group [2]. They are derived by varying QCD and factorization scales.

108 Higgs Branching Fraction Uncertainty Branching ratio uncertainties are taken from the • calculations in [85] and are assumed to be constant over Higgs boson masses.

Parton Distribution Functions The composition of the hadrons cannot be calculated using • perturbative QCD. Instead, one constructs models of the composition of hadrons which are described by the probability of observing a quark or gluon with hadron momentum fraction x for some external, measurable momentum transfer scale, Q2. The Q2 scale is defined by the experiment, and can be any observable quantity which is proportional to the energy of the collision, such as direct photon energy. The models for the behavior of partons are called parton distribution functions (PDFs) and enter the calculation of the cross section for all particles produced at the LHC. The PDFs are constrained by experimental data from previous experiments, and are extrapolated to LHC energies in Q2 and analysis dependent x scales using the DGLAP equations [86]. The PDF uncertainties translate to uncertainties in the predicted cross sections for our signal and background models. This uncertainty generally increases at low x values. To take account of this uncertainty, we follow the recommendations of the PDF4LHC group [87]. These recommendations use multiple estimates of PDFs, namely from the collaborations CTEQ10 [65], MSTW08 [88], and NNPDF [89]. We determine that their is a mass-independent uncertainty of 2% in the overall event yield arising from the PDF uncertainties.

Higgs p uncertainty The Higgs signal Monte Carlo was generated with POWHEG at next • T to leading order (NLO), which is known to be suboptimal especially for soft gluon radiation,

in an inaccurate description of low Higgs pT , due to recoil against soft jets, with respect to generators which include NNLO and resummation of next to leading log terms [90]. We

therefore find it necessary to reweight the Monte Carlo events in pT to that obtained by the

HRes [91] generator, and include an associated uncertainty on pT which is obtained by varying the resummation scale. Additional uncertainty is introduced because HRes assumes an infinite top mass in the quark triangle of gg Higgs production. By varying the mass of the top we

obtain an additional uncertainty in pT after reweighting.

Scale Systematics. The systematic uncertainties listed in the previous section differ by how they affect the final distributions of the analysis. Most of the systematic uncertainties that are relevant to the H ZZ 4ℓ analysis modify the overall scale of the signal or background. That is, → → they multiply the overall event yield of a process by some factor, leaving the shapes of distributions

109 the same. Several systematic uncertainties, however, change the shapes of the distributions. In this case, the analysis variables must be recomputed to include the effect of systematic uncertainty. Systematic uncertainties that affect the overall event yield modify the scale of the final event counts, s and b, and are referred to as scale uncertainties. Those that affect the relative counts of each bin, or the parameters of fits to distributions, are referred to in this paper as shape uncertainties. Both uncertainties can be modeled with uncertainty priors whose functional forms were given in section 4.8.1. A list of uncertainties in terms of their κ value is shown in Table 4.5 for 2011 data and 4.6 for 2012 data.

Table 4.5: A summary of the systematic uncertainties affecting the overall scale in 7 TeV data listed in terms of κ value.

gg H VBF WH ZH ttH qq ZZ gg ZZ Z+jets → → → Higgs BR 1.02 1.02 1.02 1.02 1.02 - - - e eff. 5.5-11 5.5-11 5.5-11 5.5-11 5.5-11 5.5-11 5.5-11 5.5-11 µ eff. 2.9-4.3 2.9-4.3 2.9-4.3 2.9-4.3 2.9-4.3 2.9-4.3 2.9-4.3 2.9-4.3 e trig. eff. 1.015 1.015 1.015 1.015 1.015 1.015 1.015 1.015 µ trig. eff. 1.015 1.015 1.015 1.015 1.015 1.015 1.015 1.015 luminosity 1.022 1.022 1.022 1.022 1.022 1.022 1.022 1.022 ggpdf fit - - - - fit pdf 4l accept. 1.02 1.02 1.02 1.02 1.02 - - - pdf qq¯ - 1.022 - - - fit - - QCD scale H 8.7-10 ------QCD scale qq VV ------2.6-6.7 →

Table 4.6: Systematic uncertainties affecting the overall scale in 8 TeV data listed in terms of κ value.

gg H VBF WH ZH ttH qq ZZ gg ZZ Z+jets → → → Higgs BR 1.02 1.02 1.02 1.02 1.02 - - - e eff. 5.5-11 5.5-11 5.5-11 5.5-11 5.5-11 5.5-11 5.5-11 5.5-11 µ eff. 2.9-4.3 2.9-4.3 2.9-4.3 2.9-4.3 2.9-4.3 2.9-4.3 2.9-4.3 2.9-4.3 e trig. eff. 1.015 1.015 1.015 1.015 1.015 1.015 1.015 1.015 µ trig. eff. 1.015 1.015 1.015 1.015 1.015 1.015 1.015 1.015 luminosity 1.044 1.044 1.044 1.044 1.044 1.044 1.044 1.044 ggpdf fit - - - - fit pdf 4l accept. 1.02 1.02 1.02 1.02 1.02 - - - pdf qq¯ - 1.022 - - - fit - - QCD scale H 5.5-7.9 ------QCD scale qq VV ------2.6-6.7 →

110 Shape Systematics. For uncertainties affecting the shape of distributions, we use the follow- ing procedure:

1. Sample from the given systematic uncertainty prior

2. For each sample, recalculate each input variable on an event-by-event basis

3. Re-fill all histograms used in the statistical analysis with new variables

4. Average all results using the set of histograms from steps 1-3

For this analysis, the relevant histograms we must recalculate are the input variables of the

BNN, and m4l. The shape uncertainties, and how each is adjusted in order to perform the Bayesian integration over systematic uncertainties are listed below.

electron E scale - Adjust the overall scale of electron E and then recalculate Z , Z and • T T 1 2 Higgs mass.

muon p scale - Adjust the overall scale of muon p and then recalculate Z , Z and Higgs • T T 1 2 mass.

muon reconstruction efficiency uncertainty - Adjust the event weight according to the • pT and η of each muon, recalculate total event weight.

electron reconstruction efficiency - Adjust the event weight according to the p and η of • T each electron, recalculate total event weight.

Higgs p uncertainty - Re-weight the Higgs POWHEG signal events as a function of p to • T T,4l account for the finite top mass assumption and variations in the resummation scale.

The effect of varying all systematic uncertainties – scale and shape – is shown in Fig. 4.23.

111 CMS Simulation s = 8 TeV CMS Simulation s = 8 TeV 25000

No Systematics 14000 No Systematics

20000 68% C.L. 12000 68% C.L. Normalized Normalized 95% C.L. 10000 95% C.L. 15000 8000

10000 6000

4000 5000 2000

0 0 110 120 130 140 0 0.2 0.4 0.6 0.8 1

m4L (GeV) BNN

Figure 4.23: The effect of all systematic uncertainties on the signal (mH = 125 GeV) m4l distribution and BNN distribution.

112 CHAPTER 5

RESULTS AND INTERPRETATION

In this section, I will show the results of the analysis, including the comparison of data to the theoretical prediction for the variables defined in the event selection (sec. 4.3.4) as well as the distributions of the multivariate discriminants (sec. 4.7). I will then describe the statistical methods used to make measurements based on these results. Finally, I will present the results of the low-mass Higgs boson search, the measurement of the Higgs boson mass, and the measurement of its spin and couplings.

5.1 Results

This sections shows comparisons between simulated data and observed data for the 2011 and 2012 LHC runs. Figures 5.1, 5.2, and 5.4 show key distributions following the event selection. Figure

5.3 shows two dimensional histograms for the m4l variable and the BNN discriminant, with data points overlaid. Then, Fig. 5.5 shows the comparison of the BNN distributions with data. The data and Monte Carlo events that enter these plots are the same as those used in the rest of the chapter to conduct the statistical analysis and perform measurements.

5.2 Statistical Analysis

In section 1.6, we outlined the theoretical motivation for identifying and measuring the properties of the Higgs boson, and stated which measurements are possible with the data from Run 1 of the LHC. We must first quantify the statistical significance of the observation of the new particle. In

113 CMS Preliminary - s = 8TeV CMS Preliminary - s = 8TeV

200 Reducible Reducible Irreducible Irreducible 60 180 Higgs Higgs Data: 19.6 fb-1 events/bin 160 events/bin Data: 19.6 fb-1 50 140 120 40 100 30 80 60 20 40 10 20 0 0 50 60 70 80 90 100 20 40 60 80 100 120

mZ1 mZ2

Figure 5.1: Comparison of observed and simulated data for the Z1 and Z2 distributions after event selection.

CMS Preliminary - s = 8TeV CMS Preliminary - s = 8TeV

25 Reducible Reducible Irreducible 140 Irreducible Higgs (m =125 GeV) H Higgs Data: 19.6 fb-1 Data: 19.6 fb-1 events/bin 20 events/bin 120

100 15 80

10 60

40 5 20

0 0 80 100 120 140 160 180 0 20 40 60 80 100 120140 160 180 p m4l T,4l

Figure 5.2: Comparison of observed and simulated data for the m41 and pT,4l distributions after event selection.

114 CMS Simulation, s = 8TeV, 19.6fb-1 CMS Simulation, s = 8TeV, 19.6fb-1 1 0.9 1 0.18 0.9 0.8 0.9 0.16 BNN(x) BNN(x) 0.8 0.7 0.8 0.14 0.7 0.7 0.6 0.12 0.6 0.6 0.5 0.1 0.5 0.5 0.4 0.08 0.4 0.4 0.3 0.06 0.3 0.3 0.2 0.2 0.2 0.04 0.1 0.1 0.1 0.02 0 0 0 0 100 110 120 130 140 150 160 170 180 100 110 120 130 140 150 160 170 180

m4l m4l

Figure 5.3: Two dimensional histograms in m4l and D(x) (signal/background BNN dis- criminant) for Higgs events (mH = 125 GeV) on the left, and ZZ background events on the right. Data are shown in black triangles and all final state channels are combined.

CMS Preliminary - s = 8TeV CMS Preliminary - s = 8TeV

50 Reducible 50 Reducible Irreducible Irreducible Higgs (m =125 GeV) Higgs (m =125 GeV) H H Data: 19.6 fb-1 Data: 19.6 fb-1 events/bin 40 events/bin 40

30 30

20 20

10 10

0 0 50 100 150 200 250 0 1 2 3 4 5 6 7 8 m ∆ η jj jj

Figure 5.4: Comparison of observed and simulated data for the VBF discriminant input variables, mjj and ηjj, after event selection.

115 CMS Preliminary, 19.2 fb-1 CMS Preliminary, 19.6 fb-1

10 ZZ 10 ZZ 110 < m < 145 121.5 < m < 130.5 4l ggH Higgs 4l ggH Higgs VBF Higgs VBF Higgs Data Data events/bin 8 events/bin 8

6 6

4 4

2 2

0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

BNN(x) BNNVBF(x)

Figure 5.5: Comparison of observed and simulated data for the BNN distributions. The signal/background BNN is shown on the left, and the ggH/VBF discriminant is shown on the right.

this section, we present the statistical analysis used to quantify the significance of discovery of the new particle and the measurement of the properties of that particle. Ideally, measuring the properties of the Higgs boson involves measuring all the parameters of the Higgs sector with the greatest possible accuracy. That is, we would like to compute the joint probability of the set of parameters we wish to measure, given the observed data,

P (σ data), (5.1) | where σ is the set of all parameters of the signal model, in this case, the mass and couplings, σ = m ,κ ,κ , and the data is the observed data set. Given this probability density, we can { H V F } calculate simpler distributions such as the joint probability of mass and total final state cross section: P (m ,µ data), or the probability of only the mass variable P (m data), by integrating over the H | H | other free parameters with respect to a prior density. It is not possible to compute the probability density in Eq. 5.1 directly. However, from our knowledge of physical processes and our experimental devices, we can construct statistical models for signal and background processes. These models predict potential observations, either counts in counting experiments or measurements of parameters, as a function of model parameters. The

116 models used in this analysis were described in sec. 4.4. From these models, we construct likelihoods, or the probability model evaluated at the observed data given the signal and background models: P (data s(σ),b(σ)). This likelihood can be used to make inferences about the parameters of the | model by invoking Bayes’ theorem:

P (σ data) = P (data σ)π(σ)/P (data). (5.2) | |

This calculation requires the likelihood function as well as the function π(σ), known as the prior probability density. We will talk more about the meaning and derivation of the prior probability distributions in Appendix B. The probability density function for σ, P (σ data), is called the posterior | probability density 1. In the rest of this chapter we will discuss the likelihood, the prior (for both the systematic parameters, as well as the parameters of interest), the treatment of systematic uncertainties, and, finally, we will present the resulting posterior probabilities.

5.3 Likelihood Function

All the information about an experiment, which consists of the experimental design and the observed data, is contained in a likelihood function. The likelihood is computed from a statistical model based on the design of the experiment. The simplest example of such a statistical model is an experiment that makes a single observation yielding a count, n, given a predicted average count λ. In this case the experimental design determines our statistical model exactly, and it is dependent on one free parameter, λ. This model is described by the Poisson distribution:

e−λλn P (n λ)= . (5.3) | n!

This distribution describes the probability to observe a count n given the mean count λ. The likelihood in this case is simply P (n λ) with n replaced by the observed count D. | More complex experiments require more complex likelihoods. In this analysis we want to consider likelihoods that depend on multiple parameters. For example, we want to measure the mass of the Higgs boson and its couplings to fermions and vector bosons. Statistical models describing

1We may occasionally use the term probability, when strictly speaking we mean probability density. The correct term should be clear from the context.

117 such systems can be calculated in multiple ways. If we have a set of functions that describe the probability distribution of an observable x for signal and background, then we can construct an unbinned likelihood:

N P (data x) = exp[ (S + B)] [µSf (x )+ Bf (x )] , (5.4) | − s i b i i Y which represents the probability for observing N events, characterized by xi, which are assumed to be described by p.d.f.’s fs(xi) and fb(xi), derived from our signal and background models. S and B represent the total expected signal and background counts. Note that fs(xi) and fb(xi) can depend on multiple observables. We can also consider a binned likelihood, where the information from the signal and background models information is contained in binned p.d.f’s, that is, histograms. Each bin is an independent counting experiment described by a Poisson distribution, whose likelihood is given in Eq. 5.3. The likelihood for the entire histogram is then a product of the likelihoods of each bin. For example, if our signal and background models predict signal counts, s, and background counts, b, then the likelihood is: N P (data µ,s,b)= Poisson(D ,µs (m )+ b ), (5.5) | i i H i i=1 Y for N bins, where Di, si, and bi are, respectively, the observed count, and the expected signal and background counts in the ith bin. Included in this equation is a signal strength modifier, µ, inserted for later use (see sec. 1.6.3 for the definition).

The advantage of a binned likelihood is that it does not require analytical functions, fs(xi) and fb(xi), to be introduced. Finding these functions generally involves fitting a functional form to simulated data. Instead, one can use the simulated data directly and therefore avoid the need to add additional parameters. This helps keep the analysis as simple as possible, which is exceedingly important in an analysis that is already very complex. The likelihoods used in this analysis are constructed from a 2D and 3D histogram of BNN discriminants (sec. 4.7), reconstructed four-lepton mass, m4ℓ, and four-lepton pT,4ℓ (sec. 4.3.4). The likelihood, Eq. (5.5), is then built from the N bins of the multidimensional histograms.

118 5.4 Nuisance Parameters

Any parameter in the likelihood which is not of current interest is referred to as a nuisance parameter. The nuisance parameters for this analysis are those which correspond to the systematic uncertainties discussed in section 4.8, which are not of scientific interest. To recap, theoretical and experimental uncertainties introduce nuisance parameters into the likelihood, each of whose density is assigned a prior, π(θ) (see sec. 4.8.1). In this section we show how the systematic uncertainties are accounted for in the statistical analysis.

Systematic Uncertainties. The systematic uncertainties affect the likelihood by modifying the signal and background models. We account for the effect the systematic uncertainties by in- troducing one additional parameter to the likelihood for each uncertainty. For the case of scale systematic uncertainties, which affect the overall event yield of the signal and background, the effect on the likelihood is straightforward: the affected count in each been is scaled by a multiplicative factor which depends directly on the nuisance parameter. For shape systematic uncertainties, which affect the relative counts of each bin, we must recalculate the bin counts following the procedure outlined in section 4.8.2. As noted earlier, in order to perform inferences using Bayes theorem, we must multiply the likelihood by the prior probability density for all parameters. For the systematic uncertainties in this analysis the priors were given in sec. 4.8. The prior encodes our knowledge of the systematic uncertainties of the experiment. Since we are not interested in the actual values of the systematic uncertainties, we must eliminate the nuisance parameters from the likelihood. There are two ways in which this is generally done: profiling and marginalization. In the method of profiling the nuisance parameters, θ, are replaced by their conditional maximum likelihood values, θˆ, that is, by values that maximize the likelihood for fixed values of the parameters of interest. In marginalization, we integrate over a parameter with respect to its prior probability, or probability density if it is a continuous parameter.

Statistical Uncertainties. Each predicted signal and background count is based on some auxiliary experiment, whether from a sideband measurement or from simulated data. In either case, there exists some uncertainty in the expected signal or background count since we have samples of finite size. These uncertainties, which are referred to as statistical unceratainties, are modeled by the Poisson distribution, so we can derive the exact form of their prior. For a binned likelihood, we can marginalize over these priors analytically (See Appendix A for details).

119 Since statistical uncertainties are treated differently from experimental and theoretical uncer- tainties, we treat the model as a hierarchical model, first marginalizing over statistical uncertainties, integrating them exactly and then integrating the resulting marginalized likelihood over all remaining nuisance parameters. This hierarchical model is illustrated in Fig. 5.6.

P (data m ,µ)= P (data s, b, µ, m )π(s,b θ) ds db π(θ) dθ (5.6) | H | H | Z Z 

We can perform the outer integral using Monte Carlo integration. We draw random numbers from the π(θ) distribution and, for each randomly selected point in the space of θ, the likelihood,

P (data s, b, µ, m )π(s,b θ) db ds, (5.7) | H | Z is calculated. The likelihood is then marginalized over all nuisance parameters,

1 K = P (data s, b, µ, m )π(s,b θ ) db ds , (5.8) L K | H | k kX=1 Z  where K is the number of points sampled from the nuisance prior. The systematic uncertainty is therefore taken into account by integrating over the nuisance prior. We can visualize the effect of this procedure in one dimension by performing this integration over a few key distributions. Two distributions that enter the likelihood are the m4l and BNN distributions. The one and two standard deviation bands produced by the sampling over nuisance parameters, corresponding to the set of systematic uncertainties listed in section 4.8, is shown in Fig. 4.23.

5.5 Discovery, Parameter Estimation, and Hypothesis

Testing

Having constructed a likelihood with all systematic uncertainties included, we can make many measurements. The first is the one free parameter of the standard model, the Higgs mass, mH . In the event that we observe an excess of events above background for some value of mH , we want to quantify its significance. After a discovery, we wish to measure not nonly mH but also the couplings of the Higgs boson to vector bosons, κV , and fermions, κF . In addition, we want to measure the

120 Figure 5.6: Schematic of the marginalization of systematic uncertainties in the hierarchical model.

spin and parity of the Higgs boson by comparing observations to the SM prediction and to alternate models.

5.5.1 Discovery Significance

Computing the significance of discovery can be done in several ways. We follow a Bayesian approach by computing the Bayes factor for two alternative hypotheses [92]. In the case of a search in the context of a standard model Higgs boson, the hypotheses are the background-only hypotheses (SM with no Higgs), which we label , and the signal hypothesis (SM with Higgs), labeled . H0 H1 Consider,

P ( data) P ( ) P (data ) K = H1| = H1 |H1 (5.9) P ( data) P ( ) P (data ) H0| H0 |H0 P (data m ,µ) = | H , (5.10) P (data µ = 0) | where P ( ) and P ( ) are the prior probabilities of the signal model plus background model, H1 H0 and the background-only model, respectively. The ratio P (data|H1) is called the Bayes factor. If P (data|H0) set P ( ) = P ( ), then we get Eq. 5.10. This equality is equivalent to the assertion that we H1 H0

121 do not have a prior preference for either model. Note that P (data µ = 0) is the likelihood for the | background-only hypothesis. Equation 5.10 represents the “local Bayes factor,” since it is computed for a particular Higgs boson mass and signal strength modifier. We can also define a “global Bayes factor”:

P (data ) p(data mH ,µ)π(mH ,µ) K = |H1 = | , (5.11) global P (data ) p(data µ = 0) |H0 R | where π(mH ,µ) is the prior probability of the parameters, mH and µ. A full discussion of priors is given in Appendix B. For this analysis we define significance as:

2ln(K), (5.12) p which is roughly equivalent to the “n-sigma” quantity Z often used to quantify the background-only p-value in the asymptotic limit [82].

5.5.2 Parameter Estimation

With the data available in 2013 we can measure a limited set of parameters in the Higgs sector of the standard model Lagrangian. As discussed in chapter 1, these parameters are the Higgs mass, mH , and the couplings of the Higgs boson to vector bosons, κV , and fermions, κF , which are measured from the signal strength modifiers, µV and µF , for processes involving these couplings. The Higgs mass measurement is highly correlated with the measurement of the couplings, as can be seen from Fig. 1.2, therefore, we construct a likelihood which depends on all the parameters of interest in order to capture all correlations between them. We write the likelihood as,

N P (data m ,µ ,µ )= Poisson(D ,µ sV (m )+ µ sF (m )+ b ), (5.13) | H V F i V i H F i H i i=1 Y

F V where si and si are the predicted bin counts according to the standard model for production processes involving the Higgs boson coupling to vector bosons and fermions, respectively. That is, the

F histograms represented by si are filled using Monte Carlo events from the ggH and ttH production V processes, whereas si are filled using Monte Carlo events from the VBF and VH processes. To capture the model dependence on the three parameters of interest, we must build the likelihood

122 with histograms of variables that are sensitive to all three parameters. To discriminate signal from background, we use the BNN discriminant. For the Higgs mass, mH , we use the observable m4l. To distinguish production mechanisms, we use the production mechanism discriminant introduced is section 4.7.4. The likelihood is thus built from a 3D histogram (m4l, BNN, BNNV BF ). Two dimensional projections of this histogram with observed data overlaid are shown in Fig. 5.3. To verify that the likelihood above yields accurate measurements of the parameters of interest, we plot the pull of the measured distributions. The pulls for the mH and µ parameters are shown in Fig. 5.7. The measured values and widths are obtained from the two dimensional (mH ,µ) dis- tribution (see Fig 5.3). by fiting a bivariate Gaussian. A pull with a mean of zero indicates that the measurements are unbiased [93]. The width of the pull tells us how accurately the uncertainties have been assessed.

χ2 / ndf 95.94 / 33 χ2 / ndf 62.74 / 31 p0 75.19 ± 3.36 80 p0 73.46 ± 3.01 p1 0.003085 ± 0.031410 p1 toys toys 0.1287 ± 0.0336 100 p2 0.9247 ± 0.0271 70 p2 0.992 ± 0.024

80 60 50 60 40

40 30 20 20 10

0 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5 (m - 125)/σ (m ) (µ - 1.)/σ (µ) H toy H toy

Figure 5.7: The pull of the µ and mH distributions measured by Eq. 5.13. A normal distribution is fit to the histogram. The plots shows the integral of the fit (p0), the mean of the normal distribution (p1), and the standard deviation of the fit, (p2).

5.5.3 Spin/Parity Hypothesis Testing

To determine the spin and parity of the Higgs boson, we construct a figure of merit similar to the Bayes factor presented in the previous section. Given the SM signal hypothesis, , and some H1

123 alternate hypothesis (the SM with non-scalar spin and parity), , we compute the following Bayes H2 factor: P (data ) q = |H1 . (5.14) P (data ) |H2 In principle, q, or some function of q such as Z = 2ln(q), would be sufficient to perform the hypothesis test. However, in order to compare my resultsp with those based on p-values, we use q to calculate p-values. The p-value is the probability that the a hypothesis could fluctuate to produce the observed data counts (or higher). If the probability that the alternate (non-SM) hypothesis could produce the observed data is sufficiently small, we deem the model excluded. By performing such a comparison on a reasonable set of well-motivated samples, we increase the confidence that the observed particle is indeed a SM Higgs boson. To measure the spin and parity fully, we would need to measure the contribution from all gauge-invariant terms describing a particle with non-SM spin and parity, but with the same mass as the observed particle. Such a measurement is beyond the scope of the available data.

5.6 Measurements

In this section we present the final measurements.

5.6.1 Discovery Significance

The significance of the observed data is calculated using the procedure outlined in sec 5.5.1. The significance for the low mass Higgs boson is shown in Fig. 5.8

5.6.2 Higgs Mass and Couplings

The Higgs mass and coupling measurements are calculated together with the three dimensional likelihood defined in sec 5.5.2. The result is shown in 2D projections. The first result is for the mass, mH versus the global signal strength modifier (which scales all production processes), µ. The second result concerns the production mechanism signal strength modifiers for Higgs production mechanisms with couplings to vector bosons and fermions, µV versus µF . Both results are shown in Fig. 5.9.

124 CMS Preliminary, s = 8TeV L=19.6fb-1 6 4.5 4 5 3.5 4 3 2.5 3 2 2 1.5

1 1 0.5 0 118120122 124126 128130132 134

Figure 5.8: The observed significance as calculated from the Bayes factor for an observation at a given Higgs mass and signal strength modifier, with √s = 8 TeV data.

CMS Preliminary, s = 8TeV, 19.6fb-1 CMS Preliminary, s = 8TeV, 19.6fb-1 3 20 20 µ VBF ln L ln L 18 µ 18 2.5 ∆ 5 ∆ 16 -2 16 -2 14 14 2 4 12 12 1.5 10 3 10 8 8 1 6 2 6 4 4 0.5 1 2 2

118120122 124126128 130132 134 1 2 3 4 5 µ mH ggH

Figure 5.9: Likelihood scan of mH versus a global signal strength modifier, µ, which scales all signal production mechanisms (left). The likelihood scan for separate modifiers for the VBF production mechanism and gluon fusion mechanism, ggH, is shown on the right.

125 5.6.3 Higgs Spin and Parity

We refer to the recently released document (see [24]) on Higgs properties for the measurement of spin and parity. For all hypotheses considered, the observed data are found to be consistent with a spin-0, scalar particle. For reference, the p-values computed with the CLs method [94] for the J P = 0− hypothesis is 0.16%, for the J P = 2+ hypothesis it is 1.5%, and for the J P = 1+ and J P = 1− hypotheses it is < 0.1%.

5.7 Conclusion

Using the H ZZ 4ℓ channel, we make the following measurements: → → 1. Mass: m = 125.7 0.5 GeV H ± 2. Couplings: κ = 1.3 0.5 F ± 3. κ = 1.08 2.3 V ± 4. Spin-parity: A scalar, even parity particle is consistent with observation when compared to all other hypotheses.

We have performed a search for the Higgs boson in the mass range 100 < mH < 180 GeV using the 2011 and 2012 data sets of the CMS detector at the LHC. We performed measurements of the Higgs boson mass and Higgs couplings to fermions and vector bosons. All values are consistent with the standard model predictions within the accuracy of the measurement. No other significant excess is observed in this mass range. More data are required to verify the identity of the observed particle, and to fully evaluate its consistency with a standard model Higgs boson. We note that the likelihood we have constructed, which is dependent on mH , µV , and µF , is available in a RooStats workspace, which can be used to share results outside of the experimental collaboration. This dissertation presents some of the most complex and beautiful experimental results in the history of science. These results are only possible due to the hard work and dedication of many people, and I am very grateful to have played a part in such a monumental discovery. We conducted all steps of the analysis independent of other groups performing the same measurements. In doing so, we contributed to the development of the analysis, and also helped independently validate each step, thereby mutually increasing the confidence of the collaboration in the measurements.

126 Some novel elements of our analysis are the fully Bayesian treatment of statistical and systematics uncertainties, as well as the technique for measuring reducible background. Moreover, our Bayesian method is able to deal with sparse data without difficulty. The first feature allows for a greatly simplified statistical analysis with respect to that performed in [24]. Because of its simplicity, we expect that this method will be used for more complex analyses conducted during Run 2 of the LHC, including precision measurements of the Higgs boson properties.

127 APPENDIX A

MARGINALIZATION OF STATISTICAL

UNCERTAINTIES

Signal and background predictions in high energy physics are often made with models that follow Poisson statistics. If we use simulated data, for example, we often bin the observables in histograms, and compare the simulated counts with observed counts using the Poisson distribution. Alternatively, we could use a sideband experiment or some other background region to measure the background, and then scale that value to the region of interest. In either case, we are using one Poisson experiment to estimate the mean counts in our signal region. Those means are then compared with observed data. Each time we conduct a measurement in this way we introduce uncertainty, since there is uncer- tainty in each measurement. These uncertainties are often treated as negligible, or are estimated. In this section, I describe a method for marginalizing over the statistical uncertainty exactly. The binned likelihood used in this dissertation has the form:

M P (D s(m ,µ),b)= Poisson(D s + b ). (A.1) | H i| i i i=0 Y Or, with multiple backgrounds:

M N P (D s(m ,µ),b)= Poisson(D s + b ). (A.2) | H i| i ij i=0 j Y X

The signal counts are taken directly from Monte Carlo event histograms in the space (BNN, m4l). Similarly each background comes from either MC or from a background control region. Each aux-

128 iliary sideband or Monte Carlo experiment yields some count, Xi, which is drawn randomly from a Poisson distribution with mean count λ:

P (X λ) = Poisson(X λ). (A.3) i| i|

What we are really interested in is not the observed count, X, unknown expected count, λ. This λ is proportional to the expected count in our region of interest, si by some constant of proportionality, which we call α, λ = αsi. These constants are determined from the design of the experiment. In order to find the parameter of interest, s = λ/α , we then need to perform a Bayesian inversion:

1 P (λ X )= P (X λ)π(λ) = Poisson(X λ) (A.4) | i i| i| λ 1 1 r −λ Xi− Xi− −αsi e λ 2 (αsi) 2 e = 1 . Xi! ∝ Γ(Xi + 2 )

Here we have used the Jeffrey’s prior for λ, and normalized the posterior density. We see that the posterior density for the parameter λ has the form of a gamma distribution. We conclude that gamma distributions for the priors of each of our sideband or Monte Carlo distributions is an acceptable model for expected counts:

1 Xi− −αsi α(αsi) 2 e 1 π(si)= 1 = Gamma(si Xi + , 1/α) (A.5) Γ(Xi + 2 ) | 2 1 Yi− −βbi β(βbi) 2 e 1 π(bi)= 1 = Gamma(bi Yi + , 1/β), Γ(Yi + 2 ) | 2 where Xi = αsi. These priors then encode the uncertainty in our measurement of signal and background counts introduced when we have a finite sample in our simulated or side band experiment. In the case of simulated data, it is possible to generate a larger sample and multiply each event by a weight, ω, to obtain the correct overall yield, Ntot = ωi. In this case the error introduced from our finite

2 sample size is propotional to δNtot = ωPi instead of the unweighted Poisson errors δN = √N, corresponding to a samples size of NPoissonpP . The change in uncertainty can be accounted for by scaling Xi so that the Poisson relative error is the same as the sum weight squared relative errors

129 as follows:

δN δN Poisson = weighted (A.6) NPoisson Nweighted 1 ω2 = = i √N ω Poisson pP i N 2 ( ω )2 N = weighted =P i Poisson (δN )2 ω2 weighted P i P

Figure A.1: A single bin likelihood consisting of either a Poisson Distribution or a Poisson- Gamma distribution. For the Poisson-gamma likelihood, the effect of marginalizing with respect to the gamma prior is to “smear” the distribution, accounting for the uncertainty introduced by measuring the Poisson mean with a side-band experiment.

Therefore, to encode the correct uncertainty in our gamma prior, we scale the counts Xi and, to maintain the correct count in our region of interest, our scale factors α, by a factor S:

N N ω S = tot = weighted = i . (A.7) N (δN )2 ω2 weighted weighted P i P In the case that we wish to scale one of our counts by some factor, µ; X µX (for example i → i the signal strength modifier introduced in section 5.2), we would need to scale α accordingly:

δN = ω2 = (µ ω )2 = µ ω2 (A.8) tot i → ∗ i i qX qX qX Ntot = µδNtot.

130 Then, 1 S S. (A.9) → µ

Now that we have an appropriate prior for the signal and background counts, si and bi, we can marginalize these counts by integrating over them with respect to the prior:

M N N P (D m ,µ)= ds db Poisson(D s + b )π(s ) π (b ). (A.10) | H ij i| i i i j ij i=0 i j j Y Z Y X Y

For N backgrounds labeled by the index j. This can be rewritten in a more computable form:

( + ) 1 M Di − si Pj bij N Yij − −βj bij (si + j bij) e (βjbij) 2 e P (D mH ,µ)= (dbij) , (A.11) | D ! Γ(Y + 1 ) ij i=1 P i j=1 ij 2 Z Y Y Y where the j = 0 term represents the signal. We factorize this using the multinomial distribution:

n n k1 k2 km (x1 + x2 + ... + xm) = x1 x2 ...xm (A.12) k1,k2,...,km k1,kX2,...,km   km = Di, (A.13) m X where we have employed the multinomial coefficients. We can write the marginalized Likelihood as,

Y − 1 (k +Y − 1 ) M Di N ij 2 j ij 2 −(1+βj )bij βj bij e P (D mH ,µ)= dbij . (A.14) | k !Γ(Y + 1 ) i=1 j=0 j ij 2 Y k1,...,kXN =0 Y Z

Doing the integral over the signal counts:

(k +Y − 1 ) Y − 1 j ij 2 −(1+βj )bij ij 2 bij e βj dbij (A.15) k !Γ(Y + 1 ) Z j ij 2 1 Yij − 2 β 1 j −(1+βj )bij (kj +Yij − 2 ) = dbije b k !Γ(Y + 1 ) ij j ij 2 Z 1 Yij − 1 2 −(kj +Yij − 2 ) 1 βj (1 + βij) Γ(kj + Yij + 2 ) = 1 , kj!Γ(Yij + 2 ) where we use the gamma function, Γ(z)= e−ttz−1dt. R

131 Finally, we have the marginalized likelihood:

1 M Di N Yij − 2 1 β Γ(kj + Yij + ) P (D m ,µ)= j 2 . (A.16) H (k +Y − 1 ) 1 | j ij 2 k !Γ(Y + ) i=1 j=0 (1 + βij) j ij 2 Y k1,...,kXN =0 Y

This likelihood is properly normalized over the data counts, so that likelihoods can be compared to one another. Also notice that in the event that there are zero observed signal counts in the side band experiment (i.e. there is zero background count in a bin), the prior probability in one bin becomes Gamma(s 1 , 1/α), where α is the ratio of observed sideband counts to expected counts i| 2 over all bins. So zero bins have nonzero probability proportional to the overall histogram counts in both sideband and expected region.

132 APPENDIX B

PRIORS

In section 5 we discussed the need for prior probabilities for both our parameters of interest as well as nuisance parameters. Nuisance parameter priors were discussed in 5.4. In this section we discuss possible priors for parameters of interest. Informative Priors In Bayesian analysis the choice of prior should be guided by the prior knowledge of the experimenter. For example, if there is a region of parameter space that is known to be impossible, one can assign a very low probability to that space. More empirically, if one has knowledge of a previous experiment which already constrains the parameters of interest, one can encode the knowledge of the previous experiment in a natural way using the Bayesian framework by simply using the posterior probability distribution of the previous experiment as the prior for the new experiment. Such priors are known as informative or subjective priors. Non-informative Priors In many cases, however, we can do none of these things – we may not have any empirical prior knowledge of our parameter of interest, or we may wish to not use that knowledge, or it may be difficult to obtain and use in a reliable way. So in order to carry out the Bayesian analysis, we need to construct an appropriate prior. In the case that we are ignorant of the parameter of interest, we should have a prior that reflects our lack of prior knowledge. There are several options that are commonly used to deal with such situations. The classical choice is to choose a flat prior, π(θ)= C for a constant C, possibly with C normalizing the flat prior over some range, θr, C = 1/ R θ. This prior seems reasonable. However, it may not be a suitable non-informative prior. ForR one thing, it is not invariant under one-to-one re-parameterization, e.g., if we make the transformation θ = ρ log(θ), our flat prior now becomes π(θ) π(ρ) = π(θ) dθ = eρ, which is not uniform at → → | dρ |

133 all in ρ . Also, unless we restrict our parameter space to some finite range, a flat prior will lead to an improper posterior density, that is, one whose integral over the parameter space is infinite (non-finite posterior). Jeffreys Prior If we possess no information about a particular parameter, our prior should have minimal effect on the posterior density. There are several ways of deriving such a prior. A common method is the Jeffreys prior. The goal of the Jeffreys prior is to make a generalized non-informative prior which avoids the pathologies of naive priors such as the flat prior. The Jeffreys prior makes use of the Fisher Information which quantifies the new information which an observation of data, X, carries about a parameter θ,

∂2 (θ)= E log f(X; θ) θ , (B.1) I − ∂θ2  

where the expected value, E, is taken with respect to the likeli hood, f(X; θ) – it takes the average of the information over all possible data sets of a given statistical model . The Fisher Information M quantifies the amount of information that a random variable X can provide about a parameter θ. Thus it tells us something about how much information we can expect from our experiment, whose outcome comes in the form of a specific value for the random variable X. The Jeffreys prior is defined in terms of as: I

π (θ). (B.2) J ∝ I p The second derivative measures the curvature of the likelihood with respect to the parameter of interest. Near the maximum likelihood estimate (MLE) for θ, the first derivative is zero, and the log likelihood is approximately symmetrical. A blunt log likelihood therefore caries low ”informa- tion” about the parameter of interest. The result is a prior which essentially emphasizes the areas of parameter space where data have the most influence on the posterior density. The Jefferey’s prior satisfies local uniformity (it doesn’t change over the area of interest and doesn’t have large fluctuations outside the area of interest). This Jeffreys prior is invariant under re-parameterization of θ. This is especially useful for the case of scientific communication: if one wants to share the results of an experiment by sharing the posterior distribution, with say, a theoretical , one doesn’t need the exact parameterization

134 of the problem in order to make use of the information. The posteriors are related by a change of variable. Calculating the Jeffreys Prior Jeffreys priors can be numerically challenging due to the calculation of the Fisher information matrix, which involves integrating over all possible data sets. If the data is multidimensional, such as the case of a binned likelihood where the number of bins N is large, this integral becomes intractable. One can use Monte Carlo methods to approximate this integral, or, if the likelihood is of a particular form (namely, the data is linear in the second derivative of the parameter of interest) one can use the Asimov data set [95]. Reference Priors The Jefferey’s prior nicely satisfied our requirement for a non-informative prior. However, it can have poor convergence properties. It can also be computationally intractable for a large number of parameters. Because of these limitations, Bernardo [96] and Bernardo and Berger [97] developed a prior that is more generalizable to any number of parameters. To do this, one introduces the Kullback-Leibler (KL) divergence, which measures the difference between two functions p(x) and q(x):

∞ p(x) D (P,Q)= ln p(x)dx. (B.3) KL q(x) Z−∞  

In the language of information theory, it measures the amount of information (such as the Fisher Information) lost when p(x) is used to approximate q(x). Conversely, it measures the amount of information gained if our posterior density, p(x) is obtained using a prior q(x). This concept allows one to determine how much information is gained when following the Bayesian formalism with a given prior, π(θ), and a given posterior distribution, p(θ) = p(x θ)φ(θ)/p(x), assuming we have | observed data x. A truly non-informative prior would be the function p(θ) which maximizes the information gained when updating the prior with data to obtain a posterior density. Following [96], consider a statistical model, defined by some likelihood that depends on a M potential observation x and model parameters θ: p(x θ),x(k) ,θ Θ , where we make M ≡ { | ∈ X ∈ } k observations of x, to obtain the set x(k) = x ,...,x . Now, further assume we have some { 1 k}∈X test statistic, t = t(x(k)), such as a maximum likelihood estimate, which estimates our parameter θ based on the observations x(k). The test statistic, t, is said to be asymptotically sufficient if, given

(k) p(θ|x) an infinitely large data set x , limk→∞ p(θ|t) .

135 In calculating the KL divergence we must choose one observation in . Let us instead average X the KL divergence over all possible observations via the test statistic, t, obtaining a measure of the total expected divergence:

p(θ x) p(θ), = dx p(θ x) log | dθ. (B.4) I{ M} | p(θ) ZX Z×

This is known as the Expected Intrinsic Information of a distribution. When the distribution satisfies certain regularity conditions, and under the assumption of large k, the expected intrinsic informa- tion reduces to the Shannon entropy of p(θ) [98]. Then the prior, p(θ) which maximizes the missing information, given an observation x, is argmin (p(θ)) . The calculation of such prior is a problem {I } of calculus of variations. I will not go through the derivation of this proof but simply write down the explicit form of the reference prior that is useful for this analysis:

For some test statistic tk which is an asymptotically sufficient estimator of θ, define,

p(t θ)h(θ) f (θ) = exp p(t θ) log k| dt . (B.5) k { k| p(t ) k} ZT  k 

Then the reference prior is given by

f (θ) f(θ)= lim k , (B.6) k→∞ fk(θ0)

for some point θ0 in Θ. If we further require that the posterior distribution in the limit of an infinitely large data set con- verges to a normal distribution with standard deviation, s(θˆ)/√k, where θˆ is a consistent estimator of θ, then the reference prior is given by,

π(θ )= s(θ)−1 (B.7) |M

In addition, if the asymptotic posterior satisfies regularity conditions, the posterior distribution is normal with standard deviation s(θˆ)/√k, where θˆ is the maximum likelihood estimator of θ, and

∂2 s−1(θ)= p(x θ) log(p(x θ))dx, (B.8) − | ∂2θ | ZX

136 − 1 which is the Fisher information matrix. Then, from eq. B.7 we see that s 2 (θ) is the reference prior for , which is equivalent to the Jeffreys prior. M [Likelihood principle and Reference Priors] The likelihood principle asserts that all information in a sample is contained in the likelihood function. So for a particular experiment, for the observed data D all information is contained in a likelihood L(D θ). Reference priors, as well as the Jeffreys | prior do not obey the likelihood principle since they employ the universe of all possible data, given a particular model. However, many methods in high energy physics violate the likelihood principle, including significance tests which are used as the basis for discovery (such as the CLs method). Reference Priors for Multiple Parameters In the context of high energy physics we are often concerned with multiple parameter problems. This is especially true for Higgs physics, where we must measure the Higgs boson couplings to all standard model particles, as well as the mass of the boson. Thus we require a method for developing reference priors for multiple parameters. Again, we sketch the method developed by Bernardo and Berger [97]. I will consider the case of a model with two free parameters, and discuss a method for calculating the prior in the context of the binned likelihood that we use. Consider a likelihood with an asymptotically normal posterior distribution. If we have two parameters of interest, θ Θ and λ Λ, with covariance matrix V (θ,ˆ λˆ), where θˆ and λˆ are ∈ ∈ consistent estimators of λ and θ, with:

vθθ vθλ V (θ,λ)= , (B.9)   vλθ vλλ   then the conditional prior for λ follows from eq. B.7 [96]:

π(λ θ)= h1/2(θ,λ). (B.10) | λλ

If π(λ θ) is proper, we can use eq. B.6 to obtain a 2D reference prior, assuming that the test | statistics λ and θ are asymptotically normal, which is generally true since we are using the maximum likelihood estimate (MLE) of a Poisson distribution, the 2D likelihood is:

π(θ,λ)= π(λ θ)π(θ)= π(λ θ)exp π(λ θ) log(v−1/2(λ,θ))dλ . (B.11) | | { | θθ } ZΛ For the Poisson distribution it can be shown that the joint posterior distribution of θ,λ is { }

137 derived from the Fisher information matrix, I(θ,ˆ λˆ) [96]. In addition, for a likelihood that is the product of Poissons, the Fisher information matrix can be numerically approximated using the Azimov data set as shown in [95]. 2D Reference Priors Approximation For a binned likelihood, the reference prior can be approximated by the Jeffreys Prior:

∂2 ln( ) (B.12) s∂2µ L

If we have a likelihood which depends on two parameters, θ1 and θ2, the reference prior can then be obtained by iterating,

P (D θ ,θ ) (B.13) | 1 2 π(θ ,θ )= π(θ θ )π(θ ) (B.14) 1 2 1| 2 2 P (D θ )= P (D θ ,θ )π(θ θ )dθ (B.15) | 2 | 1 2 1| 2 1 Z ∂2 π(θ1 θ2)= 2 ln(P (D θ1,θ2)) (B.16) | s∂ θ1 | ∂2 ∂2 π(θ2)= 2 ln(P (D θ2)) = 2 ln( P (D θ1,θ2)π(θ1 θ2)dθ1), (B.17) s∂ θ2 | s∂ θ2 | | Z where the average is taken over data sets, whose distributions are according to the probability model P (D θ ,θ ). | 1 2

138 REFERENCES

[1] S Dawson. Introduction to electroweak symmetry breaking. arXiv preprint hep-ph/9901280, 1999. [2] LHC Higgs Cross Section Working Group, S. Dittmaier, C. Mariotti, G. Passarino, and R. Tanaka (Eds.). Handbook of LHC Higgs Cross Sections: 1. Inclusive Observables. CERN- 2011-002, CERN, Geneva, 2011. [3] Yanyan Gao, Andrei V Gritsan, Zijin Guo, Kirill Melnikov, Markus Schulze, and Nhan V Tran. Spin determination of single-produced resonances at hadron colliders. D, 81(7):075022, 2010. [4] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2007. [5] Amos Breskin and R¨udiger Voss. The CERN large hadron collider: Accelerator and experiments. Cern Geneva, 2009. [6] CMS Collaboration, R Adolphi, et al. The CMS experiment at the CERN LHC. Jinst, 3(S08004):50, 2008. [7] Georges Aad, T Abajyan, B Abbott, J Abdallah, S Abdel Khalek, AA Abdelalim, O Abdinov, R Aben, B Abi, M Abolins, et al. Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC. Physics Letters B, 2012. [8] CMS Collaboration. Combined results of searches for the standard model Higgs boson in pp collisions at. Physics Letters B, 710(1):26 – 48, 2012. [9] Mark Srednicki. Quantum field theory. Cambridge University Press, 2007. [10] Vernon Barger and Roger Phillips. Collider physics. Addison-Wesley publishing company, 1997. [11] . Non-abelian gauge fields. relativistic invariance. Physical Review, 127:324– 330, 1962. [12] Steven Weinberg. A Model of Leptons. Phys. Rev. Lett., 19:1264–1266, 1967. [13] and John Clive Ward. Weak and electromagnetic interactions. Il Nuovo Cimento Series 10, 11(4):568–577, 1959. [14] Peter W Higgs. Broken symmetries and the masses of gauge bosons. Physical Review Letters, 13(16):508, 1964. [15] F. Englert and R. Brout. Broken Symmetry and the Mass of Gauge Vector Mesons. Phys.Rev.Lett., 13:321–322, 1964. [16] G.S. Guralnik, C.R. Hagen, and T.W.B. Kibble. Global Conservation Laws and Massless Particles. Phys.Rev.Lett., 13:585–587, 1964. [17] T.W.B. Kibble. Symmetry breaking in nonAbelian gauge theories. Phys.Rev., 155:1554–1561, 1967.

139 [18] J. Goldstone. Field theories with superconductor solutions. Il Nuovo Cimento, 19:154–164, 1961. [19] Abdelhak Djouadi. The anatomy of electroweak symmetry breaking: Tome I: The Higgs Boson in the Standard Model. Physics reports, 457(1):1–216, 2008. [20] Bogdan A Dobrescu and Joseph D Lykken. Coupling spans of the Higgs-like boson. Journal of High Energy Physics, 2013(2):1–16, 2013. [21] Michael E Peskin. Theoretical Summary Lecture for Higgs Hunting 2012. arXiv preprint arXiv:1208.5152, 2012. [22] A David, A Denner, M Duehrssen, M Grazzini, C Grojean, G Passarino, M Schumacher, M Spira, G Weiglein, M Zanetti, et al. LHC HXSWG interim recommendations to explore the coupling structure of a Higgs-like particle. arXiv preprint arXiv:1209.0040, 2012. [23] Combination of standard model Higgs boson searches and measurements of the properties of the new boson with a mass near 125 GeV. Technical Report CMS-PAS-HIG-13-005, CERN, Geneva, 2013. [24] Properties of the Higgs-like boson in the decay H to ZZ to 4l in pp collisions at sqrt s =7 and 8 TeV. Technical Report CMS-PAS-HIG-13-002, CERN, Geneva, 2013. [25] S Kraml, BC Allanach, M Mangano, HB Prosper, S Sekmen, C Balazs, A Barr, P Bechtle, G Belanger, A Belyaev, et al. Searches for new physics: Les Houches recommendations for the presentation of LHC results. The European Physical Journal C-Particles and Fields, 72(4):1–9, 2012. [26] Wai-Yee Keung, Ian Low, and Jing Shu. Landau-Yang Theorem and Decays of a Zˆ Boson into Two Z Bosons. Physical review letters, 101(9):91802, 2008. {} [27] Rohini M Godbole, David J Miller, and M Margarete M¨uhlleitner. Aspects of CP violation in the HZZ coupling at the LHC. Journal of High Energy Physics, 2007(12):031, 2007. [28] CP Buszello, I Fleck, P Marquard, and JJ van der Bij. Prospective analysis of spin-and CP- sensitive variables in H ZZ l+ 1 l-1 l+ 2 l-2 at the LHC. The European Physical Journal C-Particles and Fields, 32(2):209–219, 2004. [29] James O Berger and Luis Ra´ul Pericchi. Accurate and stable Bayesian model selection: the median intrinsic Bayes factor. Sankhy¯a: The Indian Journal of Statistics, Series B, pages 1–18, 1998. [30] CMS Collaboration. On the mass and spin-parity of the Higgs boson candidate via its decays to Z boson pairs. arXiv preprint arXiv:1212.6639, 2012. [31] CMS Collaboration. Precise mapping of the magnetic field in the CMS barrel yoke using cosmic rays. Journal of Instrumentation, 5(03):T03021, 2010. [32] C Albajar, N Amapane, P Arce, C Autermann, M Bellato, M Benettoni, A Benvenuti, M Bon- tenackels, J Caballero, FR Cavallo, et al. Test beam analysis of the first CMS drift tube muon chamber. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 525(3):465–484, 2004. [33] Gianluca Cerminara. Commissioning, operation and performance of the cms drift tube cham- bers. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrom- eters, Detectors and Associated Equipment, 617(1):144–145, 2010. [34] CMS Collaboration. Performance of the CMS cathode strip chambers with cosmic rays. Journal of Instrumentation, 5(03):T03018, 2010.

140 [35] CMS Collaboration. Performance study of the cms barrel resistive plate chambers with cosmic rays. Journal of Instrumentation, 5(03):T03017, 2010. [36] CMS Collaboration. Performance of CMS muon reconstruction in pp collision events at sqrt (s)= 7 TeV. arXiv preprint arXiv:1206.4071, 2012. [37] CMS Collaboration. The CMS high level trigger. European Physical Journal C, 46:605–667, 2006. [38] CMS Collaboration. Tracking and vertexing results from first collisions. Technical Report CMS-PAS-TRK-10-001, CERN, 2010. Geneva, 2010. [39] CMS Collaboration. CMS Physics: Technical Design Report Volume 1: Detector Performance and Software. Technical Design Report CMS. CERN, Geneva, 2006. There is an error on cover due to a technical problem for some items. [40] R Fr¨uhwirth. Application of kalman filtering to track and vertex fitting. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Asso- ciated Equipment, 262(2):444–450, 1987. [41] Wolfgang Adam, Boris Mangano, Thomas Speer, and Teddy Todorov. Track reconstruction in the CMS tracker. Technical Report CMS-NOTE-2006-041, CERN, Geneva, Dec 2006. [42] CMS Collaboration. Particle-Flow Event Reconstruction in CMS and Performance for Jets, Taus, and MET. Technical Report CMS-PAS-PFT-09-001, CERN, 2009. Geneva, Apr 2009. [43] Particle-flow commissioning with muons and electrons from j/psi and w events at 7 tev. Tech- nical Report CMS-PAS-PFT-10-003, CERN, 2010. Geneva, 2010. [44] CMS Collaboration. Commissioning of the particle-flow reconstruction in minimum-bias and jet events from pp collisions at 7 tev. Technical Report CMS-PAS-PFT-10-002, CERN, Geneva, 2010. [45] Tracking and vertexing results from first collisions. Technical Report CMS-PAS-TRK-10-001, CERN, 2010. Geneva, 2010. [46] Heitler Bethe and W Heitler. On the stopping of fast particles and on the creation of posi- tive electrons. Proceedings of the Royal Society of . Series A, Containing Papers of a Mathematical and Physical Character, 146(856):83–112, 1934. [47] Wolfgang Adam, R Fr¨uhwirth, Are Strandlie, and T Todorov. Reconstruction of electrons with the gaussian-sum filter in the CMS tracker at the LHC. Journal of Physics G: Nuclear and Particle Physics, 31(9):N9, 2005. [48] St´ephanie Baffioni, Claude Charlot, Federico Ferri, David Futyan, Paolo Meridiani, Ivica Puljak, Chiara Rovelli, Roberto Salerno, and Yves Sirois. Electron reconstruction in CMS. Eur. Phys. J. C44S1, 1(10), 2006. [49] CMS Collaboration. Electron reconstruction and identification at sqrt(s)=7TeV. 2010. [50] CMS Collaboration. Electron and photon energy calibration and resolution with the CMS ECAL at √s = 7 tev. 2013. [51] Andreas Hoecker, Peter Speckmayer, Joerg Stelzer, Jan Therhaag, Eckhard von Toerne, Helge Voss, M Backes, T Carli, O Cohen, A Christov, et al. TMVA-toolkit for multivariate data analysis. arXiv preprint physics/0703039, 2007. [52] Serguei Chatrchyan et al. Performance of CMS muon reconstruction in pp collision events at √s = 7 TeV. JINST, 7:P10002, 2012.

141 [53] CMS Collaboration. Performance of muon identification in pp collisions at sqrt(s)=7TeV. cds, 2010. [54] A. Bodek and J. Han. Improved Rochester misalignment and muon scale corrections extracted for 2011a, 2011b CMS data. Technical Report CMS AN 2012/298, CMS, 2012. [55] Matteo Cacciari, Gavin P Salam, and Gregory Soyez. Fastjet user manual. The European Physical Journal C, 72(3):1–54, 2012. [56] CMS Collaboration. Updated results on the new boson discovered in the search for the Standard Model Higgs boson in the H ZZ() 4l channel in pp collisions at s = 7 and 8 TeV. CMS Physics Analysis Summary HIG-12-041, CMS, 2012. [57] Matteo Cacciari, Gavin P. Salam, and Gregory Soyez. The Anti-k(t) jet clustering algorithm. JHEP, 0804:063, 2008. [58] A Gurtu et al. Determination of jet energy calibration and transverse momentum resolution in CMS. Journal of Instrumentation, 6(11):No–pp, 2011. [59] CMS Collaboration. Jet energy corrections determination at 7 tev. CMS Physics Analysis Summary JME-10-010, 2010. [60] CMS Collaboration. Commissioning of the CMS High-Level Trigger with cosmic rays. Journal of Instrumentation, 5(03):T03005, 2010. [61] Torbj¨orn Sj¨ostrand, Stephen Mrenna, and Peter Skands. PYTHIA 6.4 physics and manual. JHEP, 05:026, 2006. [62] Johan Alwall, Pavel Demin, Simon de Visscher, Rikkert Frederix, Michel Herquet, Fabio Mal- toni, Tilman Plehn, David L Rainwater, and Tim Stelzer. MadGraph/MadEvent v4: the new web generation. Journal of High Energy Physics, 2007(09):028, 2007. [63] Stefano Frixione, Paolo Nason, and Carlo Oleari. Matching NLO QCD computations with Parton Shower simulations: the POWHEG method. JHEP, 11:070, 2007. [64] T Binoth, G Ossola, CG Papadopoulos, and R Pittau. NLO QCD corrections to tri-boson production. Journal of High Energy Physics, 2008(06):082, 2008. [65] Hung-Liang Lai, Marco Guzzi, Joey Huston, Zhao Li, Pavel M Nadolsky, Jon Pumplin, and C-P Yuan. New parton distributions for collider physics. Physical Review D, 82(7):074024, 2010. [66] Simone Alioli, Paolo Nason, Carlo Oleari, and Emanuele Re. NLO Higgs boson production via gluon fusion matched with shower in POWHEG. Journal of High Energy Physics, 2009(04):002, 2009. [67] Daniel de Florian, Giancarlo Ferrera, Massimiliano Grazzini, and Damiano Tommasini. Transverse-momentum resummation: Higgs boson production at the Tevatron and the LHC. Journal of High Energy Physics, 2011(11):1–22, 2011. [68] Paolo Nason and Carlo Oleari. NLO Higgs boson production via vector-boson fusion matched with shower in POWHEG. Journal of High Energy Physics, 2010(2):1–18, 2010. [69] Nhan Tran Markus Schulze. JHUGEN MC Generator: http://www.pha.jhu.edu/spin/, 2012. [70] CJ Clopper and Egon S Pearson. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26(4):404–413, 1934. [71] Pushpalatha C Bhat. Advanced analysis methods in particle physics. In Annual Review of Nuclear and Particle Science, volume Vol. 61: 281-309. 2011.

142 [72] CDF Collaboration. Observation of Top Quark Production in pp Collisions with the Collider Detector at Fermilab. Phys. Rev. Lett., 74:2626–2631, Apr 1995. [73] D0 Collaboration. Observation of the Top Quark. Phys. Rev. Lett., 74:2632–2637, Apr 1995. [74] Pushpalatha C Bhat, Harrison B Prosper, and Scott S Snyder. Top quark physics at the Tevatron. International Journal of Modern Physics A, 13(30):5113–5218, 1998. [75] George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989. [76] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991. [77] Ken-Ichi Funahashi. On the approximate realization of continuous mappings by neural networks. Neural networks, 2(3):183–192, 1989. [78] Frank Rosenblatt. The perceptron. Psych. Rev, 65(6):386–408, 1958. [79] Bernd A Berg. Introduction to Markov chain Monte Carlo simulations and their statistical analysis. Markov Chain Monte Carlo. Lect. Notes Ser. Inst. Math. Sci. Natl. Univ. Singap, 7:1–52, 2005. [80] Radford M Neal. Bayesian learning for neural networks. PhD thesis, University of Toronto, 1995. [81] Radford Neal. Software for flexible Bayesian modeling and Markov chain sampling. 2004. [82] CMS Collaboration. Procedure for the LHC Higgs boson search combination in summer 2011. CMS Note, 2011/005b, 2011. [83] Absolute calibration of the luminosity measurement at CMS: Winter 2012 update. Technical Report CMS-PAS-SMP-12-008, CERN, Geneva, 2012. [84] S van der Meer. Calibration of the effective beam height in the ISR. Technical Report CERN- ISR-PO-68-31. ISR-PO-68-31, CERN, Geneva, 1968. [85] A Denner, Sven Heinemeyer, Ivica Puljak, D Rebuzzi, and M Spira. Standard model Higgs- boson branching ratios with uncertainties. The European Physical Journal C, 71(9):1–29, 2011. [86] Guido Altarelli and G. Parisi. Asymptotic Freedom in Parton Language. Nucl.Phys., B126:298, 1977. [87] Sergey Alekhin, Simone Alioli, Richard D Ball, Valerio Bertone, Johannes Blumlein, Michiel Botje, Jon Butterworth, Francesco Cerutti, Amanda Cooper-Sarkar, Albert de Roeck, et al. The PDF4LHC working group interim report. arXiv preprint arXiv:1101.0536, 2011. [88] AD Martin, W James Stirling, Robert S Thorne, and G Watt. Parton distributions for the LHC. The European Physical Journal C, 63(2):189–285, 2009. [89] Stefano Forte and Graeme Watt. Progress in the determination of the partonic structure of the proton. arXiv preprint arXiv:1301.6754, 2013. [90] Giuseppe Bozzi, Stefano Catani, Daniel de Florian, and Massimiliano Grazzini. Higgs boson production at the LHC: transverse-momentum resummation and rapidity dependence. Nuclear physics B, 791(1):1–19, 2008. [91] Daniel de Florian, Giancarlo Ferrera, Massimiliano Grazzini, and Damiano Tommasini. Higgs boson production at the LHC: transverse momentum resummation effects in the H γγ, H WW lνlν and H ZZ 4l decay modes. Journal of High Energy Physics, 2012(6):1–26, 2012.

143 [92] David JC MacKay. Information theory, inference and learning algorithms. Cambridge university press, 2003. [93] Luc Demortier and Louis Lyons. Everything you always wanted to know about pulls. CDF note, 43, 2002. [94] Alexander L Read. Presentation of search results: the CLs technique. Journal of Physics G: Nuclear and Particle Physics, 28(10):2693, 2002. [95] Glen Cowan, Kyle Cranmer, Eilam Gross, and Ofer Vitells. Asymptotic formulae for likelihood- based tests of new physics. The European Physical Journal C-Particles and Fields, 71(2):1–19, 2011. [96] Jos´eM Bernardo. Reference analysis. Handbook of statistics, 25:17–90, 2005. [97] James O Berger and Jos´eM Bernardo. Estimating a product of means: Bayesian analysis with reference priors. Journal of the American Statistical Association, 84(405):200–207, 1989. [98] Luc Demortier, Supriya Jain, and Harrison B Prosper. Reference priors for high energy physics. Physical Review D, 82(3):034002, 2010.

144 BIOGRAPHICAL SKETCH

The author was born in Eagle, Idaho in 1982. He attended high school in rural Idaho, worked on his family farm, participated in math and programming competitions, and learned math and science through correspondence classes. After graduating high school he attended the University of Idaho where he conducted undergraduate research in nano-physics and attained a degree in Physics and Mathematics in 2005. He spent several years working in Portland, OR as well as traveling in Europe, Africa and Mexico before returning to graduate study at Florida State University in 2008 to pursue a Ph.D. in Physics. He joined the high energy physics group at FSU in 2009 where he conducted his Ph.D. research at the CMS detector at CERN in Geneva, Switzerland.

145