<<

Jets + Missing Energy Signatures At The

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By Khalida S. Hendricks, M.S. Graduate Program in Physics

The Ohio State University 2019

Dissertation Committee: Linda Carpenter, Advisor

Amy Connolly

Annika Peter

Antonio Boveia c Copyright by

Khalida S. Hendricks

2019 Abstract

In this work we consider new ways to use jets plus missing energy signatures in searches at the Large Hadron Collider. We study the Higgs boson (h) decay to two light jets at the 14 TeV High-Luminosity- LHC (HL-LHC), where a light jet (j) represents any non-flavor tagged jet from the obser- vational point of view. We estimate the achievable bounds on the decay product branching fractions through the associated production V h (V = W ±,Z). As a reasonable estimation, we only focus on the boosted region of high pT (h) and the three leptonic decay channels of the vector boson. We find that with 3000 fb−1 data at the HL-LHC, we should expect approximately 1σ statistical significance on the SM V h(gg) signal in this channel. This cor- responds to a reachable upper bound BR(h jj) 4 BRSM (h gg) at 95% confidence → ≤ → level. A consistency fit also leads to an upper bound of BR(h cc) < 15 BRSM (h cc) → → at 95% confidence level. The estimated bound may be further strengthened by adopting multiple variable analyses, or adding other production channels. We then consider some simple techniques applied to the same channels. We use both a Fully Connected Neural Network (FCN) and a Convolutional Neural Network (CNN) on a statistically identical dataset as the one used for the cuts-based analysis of the Higgs decay to light jets. We found that that both networks improved upon the cuts-based results in two of the three signal channels, and roughly matched the cuts-based analysis on the third. This gave an improvement on the significance of the analysis from 0.59 for the cuts-based analysis to 0.61 using the FCN, and 0.62 using the CNN. Finally we consider the HL-LHC discovery potential in the 3 ab−1 data set for gluinos in the gluino-weakino associated production channel. We propose a search in the jets plus missing energy channel which exploits kinematic edge features in the reconstructed transverse mass of the gluino. We find that for squark masses in the 2 TeV range we have 5 sigma discovery potential for gluino masses in the range of 2.4 to 3 TeV, competitive with the projections for discovery potential in the gluino pair production channel.

ii Acknowledgments

There are many people to whom I owe sincere gratitude for their contributions to my work and education. I would first like to thank my advisor, Linda Carpenter, for her patience, mentorship, and guidance throughout my graduate career. During the course of my research, I had the privilege of collaborating with Tao Han, Zhuoni Qian, and Ning Zhou; I would like to thank them all for their insight and patience. I would like to thank Richard Furnstahl for his enthusiastic and generous support in so many areas, from homework help to obscure coding issues to career advice. I would also like to thank Jesi Goodman for the advice, encouragement, and guidance which she continued to give generously even after moving on from her postdoctoral position at OSU to pursue her own career. I have been lucky to have Sushant More, Russell Colburn, and Humberto Gilmer as my office mates over the years. They have provided much assistance from discussing physics to helping finding bugs in codes to working together on homework, and helping to maintain sanity. Finally I would like to thank my family and many friends who have provided support and encouragement over the course of my entire educational career. I would especially like to thank my father, John Hendricks, for his continuous and unconditional support throughout my life, even as I took many unexpected directions and detours.

iii Vita

October 13, 1978 ...... Born—Los Alamos, NM

May, 2013 ...... B.S., North Carolina State University, Raleigh, North Carolina

Publications

Increasing Discovery Threshold is Rare SUSY Scenarios Part I: Gluinos. Linda M. Car- penter, Khalida Hendricks, arXiv:1812.08406 (2018).

Higgs Boson Decay to Light Jets at the LHC. Linda M. Carpenter, Tao Han, Khalida Hendricks, Zhuoni Qian, Ning Zhou, Phys Rev. D 95, 053003 (2017).

Pion Momentum Distributions in the Nucleon in Chiral Effective Theory. M. Burkardt, K. S. Hendricks, Chueng-Ryong Ji, W. Melnitchouk, A. W. Thomas, Phys Rev. D 87, 056009 (2013).

Fields of Study

Major Field: Physics

Studies in: Collider Phenomenology Higgs Physics

iv Table of Contents

Page Abstract...... ii Acknowledgments...... iii Vita...... iv List of Figures ...... vii List of Tables ...... x

Chapters

1 Introduction1 1.1 The Standard Model...... 2 1.2 Problems with the Standard Model...... 12 1.3 Higgs Physics at the Large Hadron Collider...... 16 1.3.1 Overview of the Large Hadron Collider...... 17 1.3.2 SM Higgs Couplings: Measurements and Searches ...... 21 1.4 Supersymmetry...... 23 1.4.1 The Minimal Supersymmetric Standard Model...... 31 1.4.2 Breaking SUSY...... 34 1.4.3 Additional Problems solved by SUSY ...... 35 1.4.4 Current LHC SUSY searches ...... 37 1.5 Machine Learning...... 40 1.5.1 How Artificial Neural Networks Work ...... 41 1.6 Summary ...... 50

2 Higgs Decay to Light Jets at the Large Hadron Collider 51 2.1 Introduction...... 51 2.2 Signal and Background Processes...... 53 2.3 Signal Selection...... 55 2.3.1 `+`− + jj channel ...... 59 ± 2.3.2 ` + ET + jj channel ...... 59 2.3.3 ET + jj channel ...... 60 2.3.4 Background control ...... 61 2.4 Alternative Discriminants with Missing Energies ...... 64 2.5 Results and Discussion...... 66 2.5.1 Signal significance ...... 66 2.5.2 Bounds on the branching fractions and correlations with h b¯b, cc¯ 67 → v 2.5.3 Bounds on light-quark Yukawa couplings ...... 69 2.6 Summary and Conclusions...... 70

3 Applying Basic Machine Learning Techniques to Collider Phenomenology 72 3.1 Introduction...... 72 3.2 Data Preparation...... 72 3.3 The Network ...... 76 3.4 Analysis and Results...... 78 3.4.1 Results: 2-lepton channel ...... 80 3.4.2 Results: 1-lepton channel ...... 81 3.4.3 Results: 0-lepton channel ...... 83 3.4.4 Combined Results ...... 84 3.5 Outlook and future work...... 85

4 Increasing the Discovery Potential Using Rare SUSY Scenarios: Gluinos 86 4.1 Introduction...... 86 4.2 Production Modes ...... 88 4.3 Event kinematics and SUSY parameter space...... 89 4.4 Cuts-based analysis...... 92 4.5 Results...... 95 4.6 Conclusions...... 97

5 Conclusion 99

Bibliography 101

Appendices

A Machine Learning Data 108 A.1 Feature Key...... 108 A.2 Correlation Tables ...... 114 A.2.1 2-lepton Correlations...... 114 A.2.2 1-lepton Correlations...... 116 A.2.3 0-lepton Correlations...... 118

vi List of Figures

Figure Page

1.1 The primary Higgs production channels at the LHC. (a) The primary produc- tion channel for the Higgs boson at the LHC, gluon fusion. (b) The second largest prodcution channel is vector boson fusion. (c) Associated production with a vector boson. (d) Associated production with e tt¯ pair...... 19 1.2 Higgs to massless gauge bosons via heavy intermediate particles...... 21 1.3 Higgs pair production at the LHC...... 23 1.4 1-loop corrections to the Higgs mass. (a) The fermion correction to the Higgs mass given by Eq. 1.53. (b) & (c) The scalar corrections to the Higgs mass given by Eq. 1.54...... 24 1.5 Gauge interaction “near miss” in the SM, left, and SUSY unification, right. The kink in the right graph shows where SUSY appears, altering the coupling strengths to bring them together. Image from LEP...... 36 1.6 Gluino mass limits in various channels from ATLAS...... 38 1.7 Gluino mass limits for a particular SUSY model with particular choices for sparticle masses and other parameters...... 39 1.8 Basic function diagram of an Artificial Neural Network...... 41 1.9 The feedback loop of a neural network. Image credit [39]...... 46 1.10 An illustration of how the lower-level nodes in a CNN look for broad patterns of lines and curves in order to classify objects in an image. To recreate what the CNN “sees”, the algorithm output was interrupted early in the training cycle [40]...... 47

vii 1.11 An illustration of how a CNN trained to “see” animals and other com- monplace objects “saw” Van Gogh’s famous 1889 painting, “Starry Night”. ABOVE: Vincent Van Gogh’s 1889 painting, “Starry Night”. From the Google Cultural Institute[41]. BELOW: This image was created by train- ing a CNN to identify commonplace objects. The algorithm looks for line and curve patterns in any picture and the final output should then correctly identify objects in the image. The network does this by looking for key iden- tifying features that it isolates and characterizes throughout training, then attempting to find and extract those features from the input images it re- ceives. However, in this case, the algorithm output was interrupted before being fully trained, and the incomplete interpretation of the image was it- eratively fed back through the network as the new signal. From the Google Deep Dream Gallery[42]...... 49

2.1 Higgs boson transverse momentum distribution for the signal processes qq Zh (upper solid curve) and gg Zh (lower dashed curve) at the 14 TeV→ LHC...... → 54 2.2 Kinematical distributions of the signal process pp Zh, h gg (solid curves, scaled up by a factor of 5000) and the leading→ background→ pp Zjj → (dashed curves) for (a) pT (Z), (b) Rjj, (c) mjj, and (d) event scatter plot in Rjj pT (Z) plane, with the (red) dense band with crosses as the signal events and− (blue) dots as the background. Generator level cuts of Eqs. (2.6) and (2.7) have been applied...... 56 2.3 Invariant mass distributions mjj of the signal process pp Zh, h gg, Z `` (solid curves, scaled up by a factor of 5000) and the→ leading background→ → pp Zjj (dashed curves) for (a) with 2 jets only, (b) with 2 leading jets to → reconstruct mjj, (c) with 2 leading jets plus other jets together to reconstruct mjets. All selection cuts as in Sec. 2.3.1 except for mh cut are applied. . . 58 2.4 Invariant mass distributions constructed from (a) two-jet events and (b) three-jet events with different pile-up values µ = 0, 15, 50, 140, respectively. 58 + − h i 2.5 Invariant mass distribution mjj for Z(` ` )+jets at the 14 TeV LHC for (a) MC simulated events normalized to 10 fb−1, and (b) fitted spectrum from three-parameter ansatz function in Eq. (2.10) range from 60 GeV to 300 GeV (solid curve)...... 62 2.6 Generated distribution from three-parameter ansatz function in Eq. (2.10) −1 −1 for mjj with (a) 300 fb , (b) and 3000 fb (right)...... 62 2.7 Fitted results for 300 fb−1 (left) and 3000 fb−1 (right)...... 63 2.8 Scatter plot of 10000 events for the signal (blue crosses) and background (red dots) in the visible pT T vQ plane...... 65 − 2.9 Signal strengths in correlated regions for (a) 1σ contour in 3-dimension (µb, µc, µj), (b) and (c) contours in µc-µj plane, for only and including systematic uncertainties, respectively. The shadowed contour regions are the projection of the 3D contour (µb, µc, µj) onto the µc-µj plane at 1σ and 2σ, and the solid ovals are for fixing µb = 1. The grey triangle area at the upper right corner is unphysical BR(h bb) + BR(h cc) + BR(h jj) > 1. . 69 → → →

viii 3.1 LEFT (a): 41 12 “images” of 0-lepton particle events after rescaling using SciKit Learn’s× Min-Max Scaler. The top row are signal events; the bottom row is background events. RIGHT (b): 41 12 “images” of 1-lepton particle events after rescaling using SciKit Learn’s× Standard Scalar. The top row are signal events; the bottom row is background events...... 74

4.1 Production modes for a single colored MSSM particle in association with a chargino or neutralino...... 88 4.2 Relative production cross sections for colored SUSY particles produced in association with a bino-like neutralino or various benchmark masses. The production cross sections are plotted as a function of squark masses given in TeV on the x axis...... 89 4.3 Histograms giving the distribution of various kinematic discriminants in sig- nal and background events of sample size 10000. (a) The ET distribution of events. (b) The di-jet invariant mass distribution of events. (c) The di-jet mT 0 distrbution. (d) The di-jet mT i distribution...... 91 4.4 Scatter plots of missing transverse energy vs various invariant masses in signal and background events. The black dots are background, the red blue and + green dots show events with gluinos produced in association with χ0, χ and − χ respectively. (a) The distribution of mT 0 using the exclusive di-jet cut method. (b) The distribution of mT i using the exclusive di-jet cut method. (c) The distribution of mT 0 using the inclusive all-jet method. (d) The distribution of mT i using the inclusive all-jet method...... 94 4.5 Significance for gluino-weakino production vs gluino mass for various squark masses. The upper plot gives significances for the search which uses the di-jet mT 0 tansverse mass discriminant. The lower plot gives the significance for the search which uses the di-jet mT i tansverse mass discriminant...... 96 4.6 Significance for gluino-weakino production vs gluino mass for various squark masses. Significances are given for the search which uses the all-jet mT 0 transverse mass discriminant...... 97

ix List of Tables

Table Page

1.1 Particle content of the Standard Model with gauge group quantum numbers and electric charge. Only the first generation or copy of the fermions is shown; all information presented here is identifcal for the other two generations. *Sometimes the notation u, u¯ is also used to denote four-component spinors that correspond to physical particles and anti-particles.

For clarity in this work I will use u, u¯ to represent two-component Weyl spinors and ψu,ψu to denote the four-component spinor physical particles...... 2 2 1.2 Experimental values of 18 independent SM parameters[6]. 1. sin θW is used to obtain the weak coupling. Value given in the MS¯ scheme. 2. The Higgs VEV, v = √  −1/2  2GF ) ≈ 246.22GeV . 3. The u, d, and s are given in the MS¯ scheme. 4. The CKM parameters can alternately be given in terms of three mixing angles and a phase. . 15 1.3 Higgs production cross sections at the LHC [6] ...... 19 ∗ 1.4 Higgs branching ratios at mh ≈ 125 GeV. An ( ) indicates that one of the bosons is 2 2 ♦ produced off mass shell with q < 4mV .A( ) indicates a loop diagram: the massless γ and g compete with the massive particles via 1-loop processes shown in Fig. 1.2. . . . . 20 1.5 The MSSM particle content. As in the SM, there are three generations of both quarks and 3 Y leptons. We again use the convention Q = T2 + 2 ...... 32 1.6 Typical search signatures at the LHC for direct gluino and first- and second-generation squark production assuming different mass hierarchies [6]...... 37

2.1 Cross sections in units of fb for signal and dominant background processes, with the parton-level cuts of Eq. (2.6), and boosted regions pT (V ) > 150, 200 GeV...... 55 2.2 The consecutive cut efficiencies for signal `+`− jj and dominant background processes at the LHC...... 59 ± 2.3 The consecutive cut efficiencies for signal ` ET jj and dominant background processes at the LHC...... 60 2.4 The consecutive cut efficiencies for signal ET jj and dominant background processes at the LHC...... 61 2.5 Fitted results for the background rates from various fitting functions as in Eqs. (2.10) and (2.11)...... 63 2.6 Fitted results for the background rate from various fitting ranges by the fitting function in Eq. (2.10)...... 64

x 2.7 Signal significance achieved from each channel and combined results for both statistics and systematics dominance...... 66 2.8 Flavor tagging efficiency...... 68 2.9 Fraction of SM decay channels ...... 68 2.10 Extrapolated upper bounds at 95% CL on the light-quark Yukawa couplings SM κq = yq/yb (q = u, d, s)...... 70

3.1 Example confusion matrices. The bottom has numeric values for the 0-lepton chan- nel CNN. This particular matrix resulted in a 76.1% signal pass rate and a 40.5% back- ground pass rate for an isolated channel significance of 0.414...... 78 3.2 15 features with the highest correlation to the event label for the 2-lepton channel. . . . 80 3.3 2-lepton channel results...... 81 3.4 1-lepton channel results...... 82 3.5 0-lepton channel results...... 83 3.6 Combined significance achieved using a FCN, CNN, and traditional cuts, with the 1-loop signal process omitted from the 2-lepton and 0-lepton channels. 84

4.1 Cut flow for signal and the main Z jj background for 2 benchmark points with 1.6 TeV quarks. The transverse mass mT 0 is reconstructed using the exclusive di-jet method. To demonstrate the change in efficiencies as the transverse mass window shifts with gluino mass, the top benchmark point gives cut flow for a 2.2 TeV gluino while the bottom benchmark point gives cut flow for a for 1 TeV gluino...... 93

A.1 Key to the first 28 rows of the 41 × 12 particle collision event images...... 111 A.2 Feature map for last 13 rows of the 41×12 particle collision event image. Features with the highest correlations to labels are in bold with list placement as a superscript (blue=0-lepton channel, brown = 1-lepton channel, red = 2-lepton channel). Subscripts deonte negatively correlated features with the same color scheme...... 112 A.3 Key to the last 13 rows of the 41 × 12 particle collision event images...... 113 A.4 Feature map for last 13 rows of the 41×12 particle collision event image. Features with the highest correlations to labels are in bold with list placement as a superscript (blue=0-lepton channel, brown = 1-lepton channel, red = 2-lepton channel). Subscripts deonte negatively correlated features with the same color scheme...... 113 A.5 15 features with the highest negative correlation to the event label for the 2-lepton channel. 115 A.6 15 features with the highest correlation to the event label for the 1-lepton channel. . . . 116 A.7 15 features with the highest negative correlation to the event label for the 1-lepton channel. 117 A.8 15 features with the highest correlation to the event label for the 0-lepton channel. Features related to the TVQ observable first mentioned in Chapter 2 have been highlighted. . . . 119 A.9 15 features with the highest negative correlation to the event label for the 0-lepton channel. 120

xi Chapter 1 Introduction

Physics is considered by many to be the foundational field of study upon which all other sciences explaining the world around us are ultimately based. Throughout history, humankind’s quest to understand “how things work” has inevitably led to deeper and more fundamental questions. From telescopes to particle colliders, humans have sought to answer these questions that go beyond what the human eye can see and the human hand can measure. Modern uses the most cutting edge technology to probe phenomena on the vast scales of the universe as well as on the tiniest scales of the most fundamental particles that make up everything else around us. Research in particle physics consists of formulating models that can explain phenomena and then testing those models, either directly or indirectly, through carefully designed experiments. Experimental feedback can then inspire and guide further theoretical development in the pursuit of fundamental truths. The Standard Model (SM) of particle physics is the culmination of decades of theoretical and experimental work. With the experimental observation in 2012 of a particle that is consistent with the SM Higgs boson, the SM is “complete” in the sense that the complete particle content proposed by the SM has been experimentally observed. However, as we continue to refine measurements and expand upon the successes of the SM, phenomena have arisen that cannot be fully explained. Explaining these phenomena will require us to look beyond the SM. Below I will review our current understanding of particle physics as described by the SM. I will then review some of the problems with the SM, before looking at ways to further refine our knowledge of the SM and possibly find clues to what lies beyond. In particular I will review the Higgs boson as a central piece of the SM as well as a promising portal to Beyond Standard Model (BSM) physics, including Higgs physics at the Large Hadron Collider (LHC) . I will also briefly review the status of LHC searches for signs of Supersymmetry (SUSY), one of the leading contenders for a successor to the SM.

1 Category field notation quantum numbers electric charge corresponding

(SU(3)C , SU(2)L, U(1)Y ) particle(s)*     2 u 1 3 Fermions quarks q =   (3, 2, 3 )   ψu,ψu,ψd,ψd d 1 − 3 u¯ (3¯, 1, 4 ) 2 − 3 − 3 ¯ ¯ 2 1 d (3, 1, 3 ) 3     ν 0 leptons ` =   (1, 2, 1)   ψe, ψe, ψν, ψν e − 1 − e¯ (1, 1, 2) +1 ± Gauge electro- Bµ (1, 1, 0) 0 Z, γ, W bosons weak W i (1, 3, 0) ( 1, 0, +1) µ − א color Gµ (8, 1, 0) 0 g Scalar Higgs H (1, 2, 1) 0 h

Table 1.1: Particle content of the Standard Model with gauge group quantum numbers and electric charge. Only the first generation or copy of the fermions is shown; all information presented here is identifcal for the other two generations. *Sometimes the notation u, u¯ is also used to denote four-component spinors that correspond to physical particles and anti-particles. For clarity in this work I will use u, u¯ to represent

two-component Weyl spinors and ψu,ψu to denote the four-component spinor physical particles.

The Standard Model

The Standard Model of particle physics contains 12 gauge bosons and three copies each of four types of fermions, with each copy identical except for mass. Adding the Higgs boson gives a model containing 25 particles1. These physical particles arise from interactions of various quantum fields. The full field content of the SM is depicted in Table 1.1. It cannot be ignored that right from the onset, the SM does not include gravity. This is one of the problems with the SM; I will briefly discuss this along with other problems in a later section. The field content of the SM can be organized in several different ways. One way to organize the fields is by interaction type: strong, weak and electromagnetic interactions via

the gauge groups SU(3)C , SU(2)L, and U(1)Y , respectively, where C, L, and Y denote the appropriate “charge” for each group. The gauge group quantum numbers of each particle are also depicted in Table 1.1. Building the SM as a gauge theory requires that SM Lagrangian remains invariant under each gauge group. To achieve this, covariant terms are added to the derivatives that supply us with kinetic terms for each particle field. The SM uses a

1Each fermion aso has an anti-fermion, bringing the total particle count to 37 if you want to count those.

2 generic covariant derivative

i i Y1 (ig2W T ig1Bµ (1.1 א Tאµ = ∂µ ig3G D − µ 3 − µ 2 − 2 where the vector indices are given by the Greek index µ = 0, 1, 2, 3 ; the SU(3) gen- { } the SU(2) generators are given by ; 8 ,... ,1 = א erators are given by the Hebrew index { } the Latin index i = 1, 2, 3 and T3, T2, and Y1 correspond to the generators of the three { } 2 gauge groups SU(3)C , SU(2)L, and U(1)Y , respectively . Any field that is a singlet under a particular gauge group will not include that gauge group in its covariant derivative; ie, א א the covariant derivative for quarks will include the SU(3)C covariant term ig3GµT3 because quarks are triplets under SU(3)C , but leptons are singlets under SU(3)C so the covariant derivative for leptons will not include this term. Another way to organize the SM is by field spin type: scalar bosons (the Higgs), vector bosons, (Z, W ±, photons, gluons), and spinors (quarks and leptons). Quantum field theory gives a basic form to the kinetic terms for each of these types:

 1 µν (vector )  4 µν − Vµ V V iψγ¯ µψ (spinor ψ) (1.2)  D† µ  ( µΦ) Φ (scalar particle Φ) D D where µν is the field strength tensor that represents the commutator of the appropriate V covariant derivative for the desired gauge group:

a a abc b c µν = [ µ, ν] = ∂µ ∂ν gif (1.3) V D D Vν − Vµ − VµVν where a, b, and c are the appropriate gauge indices, gi is the appropriate gauge coupling constant, and f abc are the structure constants for the gauge group algebra. Structure constants must obey the relation T a,T b = if abcT c. For SU(2) the structure constants are just the Levi-Civita ijk. For U(1) the structure constants are zero since U(1) is abelian. With these two organizational schemes in mind, we can now construct the SM La- grangian by writing the SM Lagrangian as the sum of kinetic parts, which include the gauge interactions through the covariant derivative, plus the Yukawa interactions and the Higgs potential:

vector scalar spinor = + + + Y ukawa V (H) (1.4) L Lkinetic Lkinetic Lkinetic L − Each term of the SM Lagrangian is discussed in more detail below.

2 Dividing Y1 hypercharge by a factor of 2 is just one convention; other sources may not include the factor 1 of 2 which requires a rebalancing of the quantum numbers in Table 1.1. Variations in conventions may also change the sign of the U(1)Y hypercharge. In any case, the electric charge Q, U(1)Y hypercharge, and the 3 third component of SU(2) isospin T2 should be simply related. Using the convention presented here, this 3 Y relation is given by Q = T2 + 2 .

3 The Gauge Sector ( vector ) Lkinetic

The SM gauge sector consists of an SU(3)C octet, a SU(2)L triplet, and a U(1)Y vector boson. The Lagrangian is built from the kinetic terms given for a vector in Eqs. 1.2 and 1.3, applied to each of the three SM gauge groups:

vector 1 µν 1 i iµν 1 µν (µν (1.5 א א = Lkinetic −4GµνG − 4WµνW − 4B B where

צ ב צבאg3f א ν∂ א µ∂ = א Gµν Gν − Gµ − Gµ Gν i i i ijk j k = ∂µ ∂ν g2 (1.6) Wµν Wν − Wµ − WµWν µν = ∂µ ν ∂ν µ B B − B in accordance with Eq. 1.3. The first term in Eq. 1.5 is the kinetic term for gluons; the second and third terms together are the kinetic terms for what eventually emerge as the Z and W ± bosons, as well

as photons. These four physical particles arise from a mixing of the µ and µ fields via W B electroweak symmetry breaking (EWSB), which will be covered in the Higgs sector below.

The SU(3)C and the SU(2)L gauge groups are nonabelian and thus have a third term in their kinetic Lagrangian as seen in Eq. 1.6. This third term in both cases allows for

self-interactions: when fully expanded, Eq. 1.5 will give terms with three and four µ’s G and three and four µ’s. This will result in gluons that can have three- and four-point self W ± interactions. Because the µ’s eventually mix to create Z’s, W ’s, and photons, we will W get three- and four-point interactions between the electroweak gauge bosons with a W +, a W −, and either one or two Z’s and/or photons.

The gauge group of electromagnetism, U(1)EM , also emerges as a subgroup of SU(2)L ⊗ U(1)Y via EWSB. This creates a direct relationship between electromagnetic charge, the

third component of SU(2)L isospin, and U(1)Y hypercharge. Using the conventions adopted here, this relationship is given by Y Q = T 3 + 1 . (1.7) 2 2

The Higgs Sector ( scalar and V (H)) Lkinetic The Higgs sector consists of the kinetic term for the Higgs and also the Higgs potential. The Higgs field itself is a complex scalar doublet containing four degrees of freedom which can be written ! ! φ+ 1 φ + iφ H = = 1 2 . (1.8) 0 φ √2 φ3 + iφ4

4 We now look at the Higgs potential,

λ  v2 2 V (H) = HH† (1.9) 2 − 2 where v is the global minimum for the Higgs field. As long as λ > 0, the minimum for this potential is at φ = √v . | | 2 Using the right hand side of Eq. 1.8 and expanding the potential gives us

1 1 1 2 V (H) = λ φ2 + φ2 + φ2 + φ2 v2 . (1.10) 2 2 1 2 3 4 − 2

The vacuum expectation value (VEV), v, could be arbitrarily distributed over the four degrees of freedom φi; however, using global and local SU(2)L U(1)Y transformations, we ⊗ can assign the entire VEV to just one of the four degrees of freedom, so that Eq. 1.8 now looks like: ! 1 φ + iφ H0 = 1 2 . (1.11) √2 (φ3 + v) + iφ4 We now take the VEV of this rotated Higgs field: ! 1 0 0 H0 0 = (1.12) h | | i √2 v

Finally, we expand H0 around the vacuum to recover the physical Higgs field, h: ! 1 0 H0(x) = . (1.13) √2 v + h(x) where h(x) is a real scalar field. Plugging this back into Eq. 1.9, the Higgs potential becomes

1 1 1 2 V (H0) = λ h2 + 2hv + v2 v2 2 2 − 2 (1.14) 1 2 = λ h2 + 2hv . 4 Thus from the Higgs potential we get a Higgs mass term, λv2h2, plus three-and four- point Higgs self-interactions proportional to λ. We now look at the kinetic terms for the physical form of the Higgs field. The general kinetic term for the Higgs is 0† µ 0 µH H (1.15) D D as per Eq. 1.2, with the covariant derivative

i i Y1 µ = ∂µ ig2 T ig1 µ . D − Wµ 2 − B 2 5 The SU(3)C covariant terms are deleted since the Higgs is a singlet under SU(3)C . Expanding Eq. 1.15 we get:

   2 ! 0† µ 0 1 i i Y1 0 µH H = 0 (h + v) ∂µ ig2 T ig1 µ . (1.16) D D 2 − Wµ 2 − B 2 h + v

i i i i Recalling that T for i = 1 and µY1 are diagonal 2 2 matrices while T for Wµ 2 B × Wµ 2 i = 2, 3 are off-diagonal 2 2 matrices, we get the following expansion of the Lagrangian: × scalar 1 µ 1 2 2 = ∂µh∂ h (h + 2hv + v ) Lkinetic 2 − 2   (1.17) 2 1 2 2 2 2 2 3 2 3 1 2 2 g ( ) + g ( ) + g ( ) g1g2 µ + g ( µ) . × 2 Wµ 2 Wµ 2 W µ − B Wµ 4 1 B We can see that in addition to a simple kinetic term for the real scalar Higgs field, we also get various three-and four-point interactions between the Higgs and the ’s and , W B where v contributes to the coupling strength. We also get terms for the and in the 2 2 i 2 3 W B form of v gi for = , , and a term v g1g2 . These terms would not have V V {B W } BW emerged if the Higgs VEV had been zero. The i=1,2 are clearly mass terms, but the 3 and must be diagonalized before we W W B can extract a physical mass from them, giving us physical vector bosons of

3 2 1 2 2 2 Zµ = cos(θW ) µ sin(θW ) µ ,MZ = v g1 + g2 W − B 4 (1.18) 3 2 Aµ = sin(θW ) + cos(θW ) µ ,M = 0. Wµ B A where θW is the weak mixing angle given by g sin(θ ) = 1 . (1.19) W p 2 2 g1 + g2

Because Aµ remains massless, there remains one last U(1) symmetry which is unbroken 3 and which preserves charge T + Y . This is the U(1)EM gauge group for electromagnetism with Aµ identified as the photon. i=1,2 Finally, we note that do not have definite U(1)EM quantum numbers, so we take W linear combinations to form the physical W ± bosons:

1 1 W ± = ( 1 i 2),M 2 = v2g2. (1.20) µ √2 Wµ ∓ Wµ W 4 2 Thus, Electroweak Symmetry Breaking (EWSB) occurs when the Higgs VEV breaks the SU(2)L U(1)Y symmetry of the SM, giving rise to three massive electroweak gauge ⊗ bosons and one massless electroweak boson.

6 The Fermion Sector ( spinor ) Lkinetic The kinetic term for each fermion in the SM follows the form for spinors given in Eq. 1.2: µ iψγ¯ µψ. (1.21) D This form is for a Dirac fermion. The Dirac basis for four-component spinors was developed from kinematics in an attempt to find an equation of motion for fermions similar to the nonrelativistic Schrodinger equation, and it is useful for calculations in Quantum Electrodynamics (QED) and Quantum Chromodynamics (QCD) [1]. However, both the Standard Model and theories beyond the SM are more naturally represented with Weyl spinors. A Dirac fermion in the Weyl basis can be written ! χ Ψ = L (1.22) ηR where χL is a left-handed Weyl spinor and ηR is a right handed Weyl spinor. Weyl spinors can be transformed from right-handed to left-handed and vice versa using the operator iσ2ξ∗, so we can re-write Ψ as ! χ Ψ = L , (1.23) 2 ∗ iσ ηL 2 ∗ Where the entire term, (iσ ηL), is a right handed-Weyl spinor but ηL by itself is a left- handed Weyl spinor. In this form, we recognize that under charge conjugation, χL ηL, → finally giving ! χ Ψ = L , (1.24) 2 ∗ iσ χ¯L

CP whereχ ¯L = ηR is a left-handed Weyl spinor. So the entire fermion content of the SM can be written in terms of purely left-handed Weyl spinors, and the Standard Model is entirely contained in the SU(3)C SU(2)L U(1)Y gauge group. Table 1.1 gives the SM particle ⊗ ⊗ content and quantum numbers in terms of left-handed Weyl spinors. The fermions in the Standard Model are comprised of left-handed SU(2) singlets and doublets organized into quarks and leptons, each with three generations. The first-generation doublets are: ! ! ν u ` = e and q = (1.25) e d

7 The first-generation singlets are3:

u,¯ d,¯ and e.¯ (1.26)

The second and third generation fermions fall in the identical order as the first, but with greater mass. The kinetic term for a Weyl spinor then becomes

µ χi¯ σ¯ µχ (1.27) D whereσ ¯ is the four-vector composed of the plus the three . The covariant derivative is again given by Eq. 1.1, giving rise to interactions between the fermions and the gauge bosons. The termσ ¯µ is a 2 2 matrix with the physical Z and D × photon on the diagonal and the W ± bosons on the off-diagonal. When contracted with the doublet χ, we end up getting interaction terms that look like:

 fZf¯  fAf¯ (1.28)   fW¯ ±f 0

where f 0 is the partner of f in the doublet, ie, dW¯ +u oreW ¯ −ν, obtained by multiplying the doublet by the off-diagonal W ± terms. These are called “Flavor Changing Charged Currents”. We note that the neutral bosons are diagonal, and thus there are no “Flavor Changing Neutral Currents” (FCNC’s) at tree level in the SM, a theoretical prediction that so far has held true in experiments. This is not quite the end of the story for fermion-gauge boson interactions, however. The Weyl fermions in the SM Lagrangian are actually in weak eigenstates, and it turns out that at least for the neutrinos and quarks, these weak eigenstates are not simultaneously diagonalizable with the physical mass eigenstates. A unitary transformation takes weak eigenstates to mass eigenstates and vice versa:

f W = f f M , f M = ( f )†f W = f f W , (1.29) i Vij j j Vij i Vji i where f = u, d, e, ν , i, j = 1, 2, 3 designates the generation, and M and W denote mass { } { } ¯ and weak eigenstates, respectively. Note that f = f , since f and f¯ combine to form the Vij Vij physical, massive fermion and thus must have the same mass eigenstate basis.

3 c Notational differences abound. Some sources useu ¯ ≡ u to denote the left-handed SU(2)L singlets. Sometimes, the SU(2)L singletν ¯ is included in this list, as including it is allowed by the mathematical symmetry of the SM; however, the nature and origins of neutrino mass are still under hot debate and thus it is often left out as something that has not yet been experimentally confirmed.

8 The gauge interaction terms are then written: ( Zf¯W f W i i (1.30) ¯W 0W W fi fi

where for the purposes of this discussion the Z boson and photon act the same so the photon term has been dropped. Coverting to mass eigenstates we get:

( ¯M f f M f f ¯M M Zfj ji ikfk = ji ikZfj fk V V 0 V V 0 (1.31) W f¯M f f f 0M = f f W f¯M f 0M i VjiVik k VjiVik j k f f ¯ ¯M M Now = Ijk = δjk. So the Zff interaction collapses down to Zf δjkf = VjiVik j k ¯M M Zfj fj , and we can see that the neutral bosons can only interact with a pair of fermions of the same flavor and generation. f f 0 However, = Ijk since they are transformation matrices for different quarks with VijVij 6 different masses. Therefore the W ff¯0 interaction remains as it is, mixing both flavor and generation. The two matrices are combined and rolled into the coupling constant for each f f 0 vertex: = Vff 0 . We note that for quarks there are nine possible combinations and VjiVik arrange them into a matrix:   ud us ub V V V VCKM =  cd cs cb  (1.32)  V V V  td ts td V V V This is called the Cabibbo-Kobayashi-Maskawa (CKM)matrix. It is a real-valued unitary

matrix. Some contraints are put on the values of the Vff 0 to ensure the matrix is unitary, allowing us to parameterize the matrix using three mixing angles and a CP-violating phase:

θ12, θ13, θ23, δ . Alternately, these four degrees of freedom can be redefined in terms of the { } “Wolfenstein parameters”, λ, A, ρ,¯ η¯ : { }  λ2 3  1 2 λ Aλ (¯ρ iη¯) − λ2 −2 4 VCKM =  λ 1 Aλ  + (λ ) (1.33)  − − 2  O Aλ3(1 ρ¯ iη¯) Aλ2 1 − − The construction of a CKM-like matrix could be repeated for the lepton gauge interac- tion term; however, it turns out that this is unecessary in the SM. The charged leptons are (`={e,µ,τ}) assumed to be in simultaneous mass and weak eigenstates, thus = δij, leaving Vij us with the W boson interaction:

¯W W ¯M W W `i νi = W `i νk (1.34)

We interpret this to mean that in a W `ν interaction, the produced neutrino is always in

9 a definite flavor state that is equal to the flavor state of its charged partner and is not in a mass eigenstate. The physics of neutrino mass and weak eigenstates that leads to the phenomenon of neutrino oscillations is beyond the scope of this thesis. In summary, the fermion sector produces kinetic terms for the fermions as well as gauge boson-fermion couplings via the covariant derivative. The gauge boson-fermion couplings in the SM follow three general rules: (1) the neutral gauge couplings get no mixing between fermion generations or flavors, (2) the charged gauge couplings mix both flavor and genera- tion in the quark sector, and (3) the charged gauge couplings mix flavor but not generation in the lepton sector.

The Yukawa Sector

We have seen how the scalar Higgs and the fermions interact with the gauge bosons through the covariant derivative, but there is no reason why the fermions cannot also interact with the Higgs. Thus we must add mathematically allowed terms, called Yukawa interactions, to the SM Lagrangian describing such interactions. As it turns out, the Yukawa interactions between fermions and the Higgs provide a crucial mechanism for fermions to gain mass.

The Higgs is a doublet under SU(2)L with U(1)Y hypercharge. Therefore any interaction term between the Higgs and fermions must preserve SU(2)L U(1)Y gauge invariance. The ⊗ Higgs doublet itself must contract with another doublet, so we must use a fermion doublet in each term and one or the other (the Higgs or the fermion) must be the Hermitian

conjugate. We then add the SU(2)L singlets to ensure that spinor indices contract and total hypercharge is zero. We thus have a pair of Yukawa terms corresponding to each 4 SU(2)L singlet :  y H†qd¯+ y∗d¯†q†H  d d ¯ † ∗ † † ¯ yuH qu¯ + yuu¯ q H (1.35)  † ∗ † † yeH `e¯ + ye e¯ ` H where we have used the Higgs “conjugate” H¯ = iσ2H∗, in order to contract the Higgs with the u part of the q doublet. The y’s are dimensionless constants whose complex phase can be absorbed into the rephasing of the singlets, so they are usually treated as real numbers. It can be verified using Table 1.1 that the hypercharge for each individual term does in fact add up to zero. To illustrate [2,3] how this translates to interaction terms between the Higgs and the physical fermions and how fermion mass terms arise, consider the Yukawa interaction for leptons. Recalling that h is a real scalar field so h† = h, we can expand the lepton terms

4A fourth term could be added if one chooses to include the “right-handed neutrino” singletν ¯ which does not violate the mathematical rules governing the SM; however, as the origin of neutrino mass is still undetermined this is not considered to be part of the SM.

10 from Eq. 1.35: ! ! ye ν ye † † † 0 Y uk,lep = (0 h + v) e¯ + e¯ (ν e ) L √2 e √2 h + v y y = e (h + v)ee¯ + e e¯†e†(h + v) (1.36) √2 √2 y   = e (h + v) ee¯ +e ¯†e† √2 We now attack from the other end, starting with a Dirac electron in the Weyl basis: ! eL ψe = eR (1.37) † 0 ψe = ψeγW so that ! † 0  † †  0 1 ψeγW = eL eR 1 0 (1.38)  † †  = eR eL

Then !  † †  eL ψeψe = eR eL eR (1.39) † † = eLeR + eLeR Comparing this to the last line in Eq. 1.36, we make the identification:

† † † † eLe + e eR ee¯ + e e¯ R L ≡ (1.40) † † † † eL e , e e¯ , e e , eR e¯ ⇒ → R → L → → to give us

ye  † † ye (h + v) ee¯ +e ¯ e (h + v)ψ ψe √2 → √2 e (1.41) ye ye = hψ ψe + vψ ψe √2 e √2 e where we recognize first term as a three-point interaction between the Higgs, the electron, and the positron and the the second term as a fermion mass term. Gathering together the constants that go into the mass term, we see that

yf v mf = . (1.42) √2

11 Up to this point we have discussed only the first generation of fermions under the assumption that the second and third generations are identical except for mass. From Eq. 1.35, we can write a more general Yukawa term for the up-type quarks as

u u † = y H¯ qiu¯k + h.c. (1.43) LY ukawa ik where i, k = 1, 2, 3 for the three generations. Again gathering together all the constants { } that contribute to mass terms, we can arrange the yik’s into a 3 3 matrix for each type × of massive fermion (up, down, and charged lepton):

u,d,e u,d,e yik v M = (1.44) ik √2 Taking the up-type quark as an example, after EWSB, the up-type Yukawa terms be- come u u,d,e u yik = Miku¯iuk + hu¯iuk + h.c. (1.45) LY ukawa √2 However, as before, the fermions in the Lagrangian are in weak eigenstates, and before we can obtain a physical Higgs-fermion coupling or a physical mass, we need to translate them to mass eigenstates. Returning to Eq. 1.36, we have terms of the form

u u u M f f M yik M f f M Y ukawa = Miku¯j uj + hu¯j uj + h.c. (1.46) L VjiVkj 2 VjiVkj where we have inserted unity as in Section 1.1 in order to transform the weak eigenstates. Rearranging,

 u  u  f u f  M M f yik f M M Y ukawa = Mik u¯j uj + h u¯j uj + h.c. (1.47) L Vji Vkj Vji 2 Vkj We can see that the remaining unitary matrices diagonalize the quark mass matrix as well as the Higgs-quark coupling to give real, diagonal mass and coupling values [2]. A similar set of equations gives us the down-type quark masses and couplings. We could aso construct a set of equations for the electron mass and couplings, but the SM assumes that the electron mass and weak eigenstates are simultaneous, so there is no need to go through more than the single-generation process described in Eq. 1.36-1.41 for each generation. We also see that like the neutral gauge bosons, the Higgs-fermion interactions do not mix flavor or generation.

Problems with the Standard Model

Despite its many successes, there are many questions remaining that indicate a need for a model that goes beyond the Standard Model. It might be argued that some of issues with the SM sre more aesthetic or philosophical problems - they are “issues” only inasmuch as

12 they violate our human sense of beauty, naturalness, balance, or reason. Others, such as the existence of dark matter, have been firmly experimentally demonstrated to be inconsistent with or outside of the SM.

1. Gravity

Gravity is the central concern in most introductory physics courses, and for good reason: it is the primary force that constrains and orders daily life on the human scale. It seems odd, then, that one of the crowning achievements of modern phsyics, the Standard Model, does not mention it at all. A very reasonable excuse for the omission of gravity in the SM is that gravity is so weak compared to the other forces at the scale of particle interactions that it is not “needed” (alternately, gravity does not become important to particle interactions until very high energies). However, nobody doubts that it does in fact exist and that it is important in the overall picture, so any truly complete theory must find a way to include it. We know that gravity will become important in particle interactions at the Planck scale,

−1/2 19 Mpl = (8πGNewton) 10 GeV. (1.48) ≈ Thus the SM will no longer describe physics at that scale; some new theory must be found.

2. The Hierarchy problem

One of the primary motivating factors for BSM theories for the past several decades has been the Higgs hierarchy problem. While the fermion masses are protected by chiral symmetry and the vector boson masses are protected by gauge symmetry, the Higgs is the only elementary scalar in the SM and as such, it is susceptible to quadratice divergences that create a runaway mass scenario in which corrections to the mass are significantly larger than the Higgs mass itself. We already know that we need new physics at the Planck scale to accommodate gravity, but by the time we reach that scale, the one-loop corrections to the Higgs mass are already 30 orders of magnitude larger than m2 125 GeV [4], requiring higher order loop corrections h ≈ to also be unreasonably large in order to provide the cancellations required to keep the Higgs mass manageable. This strongly motivates the expectation that some new physics comes into play prior to the Planck scale to cut off the Higgs mass corrections at a more reasonable size. New physics at the TeV scale, to be specific, would keep the Higgs mass proportional to this scale, requiring much more reasonable cancellations to stabilize the Higgs mass.

13 3. Dark Matter

The existence of dark matter (DM) has been empirically demonstrated in several phe- nomena, such as gravitational lensing showing center of mass offsets (most notably in the Bullet Cluster), rotational curves of spiral galaxies that require more mass than what is visible, and CMB measurements. Matter explained by the SM only accounts for 5% of the energy density in the universe; DM contributes 25%, and dark energy fills in the remaining 70%. So even if the SM were a perfect model of visible matter, we would still have no understanding of 95% of the energy density of the universe. A truly comprehensive model of the fundamental workings of the universe must include DM. A full understanding of the evolution of the universe also requires a theoretical explaination for both DM and dark energy.

4. Neutrino mass

For decades it eas assumed that neutrinos are massless, but the 1998 discovery of neutrino oscillations [5] proved that neutrinos do have a small but nonzero mass5.

A relatively simple extension to the SM, the addition of an SU(2)L singletν ¯, could give neutrinos mass in the same way that other fermions gain mass through the Higgs mechanism, but there are still many questions about the nature of neutrinos and their masses.

5. Arbitrary parameters

The SM requires a total of 18 arbitrary parameters6 that are not predicted by theory: they must be extracted empirically from experiments. The origins of these parameters remains a mystery. In particular, the mass hierarchy of the fermions seems to be begging for some as-yet undiscovered underlying order. It could be argued that this is just the “way it is” and that nature is not obligated to satisfy our human desire for order and reason, or that these constants must have these values simply because if they did not, physics would be changed in a way such that humans might not have evolved to observe it (the “anthropic principle”): we are here, so the parameters must have these values and it is not arbitrary at all. However, a constant and time-honored theme of the physical sciences has always been to find simpler, more fundamental mechanisms behind any complicated structure, so having a messy zoo of arbitrary parameters is, to many, an obvious clue that there must be something even more fundamental (and hopefully simpler) behind the SM parameter values.

5It is possible that the lightest neutrino is still massless. 6 There are 19 parameters if θQCD is included.

14 Sector Parameter Experimental Value

Gauge αs 0.1181(11) 2 (1) sin θW (MZ ) 0.23122(4) −1 αEM 137.035999139(31) (2) −5 −3 Higgs GF 1.1663787(6) 10 GeV × mh 125.1 0.14 GeV ± (3) 0.49 Quarks mu (2 GeV) 2.16−0.26 MeV 0.48 md (2 GeV) 4.67−0.17 MeV 11 ms (2 GeV) 93−5 MeV

mc 1.27 0.02 GeV ± 0.4 mb 4.18−0.3 GeV

mt 172.9 0.4 GeV ± Lepton me 0.5109989461(31) MeV

mµ 105.6583745(24) MeV

mτ 1776.86 0.12 MeV ± CKM(4) λ 0.22453 0.00044 ± A 0.811 0.026 ± 0.019 ρ¯ 0.124−0.018 η¯ 0.356 0.011 ± Table 1.2: Experimental values of 18 independent SM parameters[6]. 1. sin2 θ is used to obtain the weak coupling. Value given in the MS¯ scheme. W √  −1/2  2. The Higgs VEV, v = 2GF ) ≈ 246.22GeV . 3. The u, d, and s are given in the MS¯ scheme. 4. The CKM parameters can alternately be given in terms of three mixing angles and a phase.

+ many, many more....

The above issues are not by any means the only issues with the SM, and their relevance may be reordered depending on an individual’s particular areas of interest and threshold for concern. However, attempting a more comprehensive list of issues and questions that are not explained by the SM is well outside the scope of this thesis.

15 Higgs Physics at the Large Hadron Collider

Studying and measuring the properties of the resonance found by the LHC in 2012 at 125 GeV to ensure that it is, in fact, the Higgs boson predicted by the SM is extremely ∼ important. The Higgs boson is the final piece of the SM puzzle, and new results for coupling measurements and other Higgs properties will be crucial to our understanding of fundamen- tal physics going forward. Thus any program to measure the Higgs properties will result in relevant, critically important new knowledge, even if that new knowledge is not “new physics”. This makes the Higgs a very appealing candidate for research for the practically minded physicist even if no deviations from the SM are found. Of course, the excitement that most theorists crave is contingent upon such Higgs mea- surements finding something inconsistent with the SM Higgs. The Higgs delivers significant potential to this “new physics”-centered audience as well; it is a very promising candidate for BSM physics. The Higgs couples to nearly every particle in the Standard Model7, mak- ing it quite reasonable to expect that if any SM particle couples to BSM particles, there is a good chance it will be the Higgs. BSM Higgs physics also offers a broad range of options to model builders, and can stand alone or be embedded into other BSM theories (most notably, perhaps, Supersymmetry). The Higgs can be produced and studied in a number of different particle collider scenarios, making it a more flexible and accessible mediator than many other BSM portal candidates as far as phenomenological testing of BSM Higgs theories. Finally, in the absence of any evidence that conclusively contradicts the SM, it is difficult to determine which, if any, current BSM models are truly well motivated. Any deviation of the Higgs properties from the SM Higgs will give a us a well defined trajectory to follow in the search for BSM physics, providing much needed focus for our efforts. It is also interesting to note that 15 of the 18 “arbitrary” parameters in Table 1.2 are related to the Higgs boson - the nine fermion masses, the Higgs mass (alternately, the Higgs constant λ), GF (direct correlation to the Higgs VEV), and the four CKM matrix inputs. The fact that so many of the parameters of unknown origin in the SM that seem to be begging for a “new physics” explanation are directly related to the Higgs suggests that “new physics”, whenever it is discovered, has a very good chance of also being related to the Higgs. Thus systematically measuring and cataloguing the properties of the Higgs boson is a task of great importance not just for our current understanding contained within the SM, but also as an important step in either finding or ruling out any potential Higgs interaction with BSM particles. At the present time, this task largely depends on searches conducted at the Large Hadron Collider.

7The Higgs does not couple to massless particles, most notably, photons and gluons. Although we now know that at least two neutrino generations must have mass, the SM does not include the neutrino SU(2)L singletν ¯, and therefore in the SM the Higgs also does not couple to neutrinos.

16 Below I will briefly review the LHC and Higgs productions and decays at the LHC. I will also present a selection of relevant SM and BSM Higgs topics that are guiding these search efforts.

Overview of the Large Hadron Collider

The Large Hadron Collider was built with the search for the Higgs boson as one of the primary objectives. After the discovery[7,8] of the Higgs in 2012, the LHC remains the primary tool to directly produce and study Higgs bosons8. The LHC is a pp collider is expected to operate at a center-of-mass energy of 14 TeV. The two main experiments are ATLAS and CMS. The specific features of each determine what types of signatures can be detected and at what resolution; however, the general layout is the same for most particle detectors. Because the LHC collides two proton beams, there is a significant QCD background to almost every process. Initial states from each collision can only be estimated using parton distribution functions (PDFs), so the accuracy of any signal setection is dependent upon the uncertainties of the PDF used. Most of the heavier particles produced at the LHC decay to light quarks (u, d, s, and sometimes c), so searching for a specific light quark signal is quite a challenge. The challenge of sorting out various quark processes is only made more difficult by hadronization. Quarks and gluons quickly hadronize9, forming jets that appear in the detector as a shower of composite hadrons rather as distinct particles. In high energy jets, the separation between the individual particles or hadrons is often too small to be resolved by the calorimeters, thus particle showers are simply designated as “jets” in the data for a particle collision event. Various jet reconstruction techniues of varying degrees of accuracy are used to try to determine the likely source of a given jet or group of jets. Hadrons containing b quarks are relatively longer lived, and their relatively high masses means the decay products come off at large angles from the original hadron. The experi- mental signature for a b-quark is therefore the original jet from the primary vertex, followed by a secondary vertex from the b-decay that is displaced by several millimeters. This al- lows us to “b-tag” a jet that is likely the result of a bottom quark decay with about 70% accuracy[9]. Electrons, photons, and muons have significanty less participation in the QCD soup de- scribed above; they generally give clear signals and are easy to detect. Taus usually decay prior to reaching the detector and can decay into leptons or hadrons; identification preci- sion depends on the decay mode. Electroweak decays to lighter leptons similarly provides

8Higgs bosons have also been produced and studied to a limited extent at the [10]. 9The generally decays before hadronizing; it is often treated differently than other quarks for this reason.

17 reasonable resolution for the Z and W ± bosons. Any particle from an LHC event that is seen by the detector can be categorized as a photon, lepton (electron, muon, and tau can be individually i, or jet, with some jets additionally being b-tagged with reasonable accuracy. Measureable characteristics for each of these particles include η10, φ, and transverse momentum, among others. In addition, missing energy is calculated as an event product. A standard format, called “LHCO format”, exists [11] for listing the properties of particles and missing energy emitted from a given event as seen by the detector. This format facilitates data sharing and compatibility across various simulation and analysis platforms.

Higgs Production

The dominant Higgs production channel at the LHC is gg h via top quark loops as → shown in Fig. 1.1a. Gluon-gluon fusion accounts for 85% of Higgs production at the ∼ LHC [2]. An effective Lagrangian can be constructed for the hgg coupling that is especially useful given that gluon fusion is the dominant production mode at the LHC:

eff gαsNg 0 a µν,a hgg = h GµνG (1.49) L 24πMW where Ng is the number of heavy quarks with mQ > mh running in the loop. ≈ The secondary Higgs production channel, called Vector Boson Fusion (VBF), is via fusion of Z or W ± bosons, and is shown in Fig. 1.1b. The cross section for this channel is about one tenth the size of the cross section for the gluon fusion channel and has a distinctive signature: the two incoming quarks are scattered at a small angle, producing two energetic jets at high pseudorapidity moving with opposite trajectories. Such jets are referred to as forward jets. Two other smaller, but useful channels associated production of a Higgs along with a vector boson and associated production of a Higgs along with a tt¯ pair (Fig. 1.1c and Fig. 1.1d). These two processes provide an opportunity to study the Higgs coupling to the vector bosons or the top quark, respectively, directly from tree level processes. Various Higgs production rates are shown in Table 1.3. The total production of Higgs bosons at the LHC is incredibly small compared to the large QCD background of quarks and gluons, making measurement of its properties difficult.

10  θ  Pseudorapidity, η, is defined η = −ln tan 2 . Pseudorapidity depends only on the polar angle of the particle’s trajectory to the beam, not its energy. 18 q q g ′ t W t H H t

W g q′ q

(a) (b)

q W, Z g t

t¯ H

t q¯ h g t¯

(c) (d)

Figure 1.1: The primary Higgs production channels at the LHC. (a) The primary pro- duction channel for the Higgs boson at the LHC, gluon fusion. (b) The second largest prodcution channel is vector boson fusion. (c) Associated production with a vector boson. (d) Associated production with e tt¯ pair.

Production Cross section [pb]

process √s = 13 TeV √s = 14 TeV

+5% +5% gluon fusion 48.6−5% 54.7−5% +2% +2% VBF 3.78−2% 4.28−2% +2% +2% WH assoc. prod. 1.37−2% 1.51−2% +5% +5% ZH assoc. prod. 0.88−5% 0.99−5% ¯ +9% +9% tt assoc. prod. 0.51−13% 0.60−13% bf Total 55.1 62.1

Table 1.3: Higgs production cross sections at the LHC [6]

19 Higgs Decay

In the SM, the Higgs couples at tree level to all massive fermions and gauge fields. Therefore, at the LHC, the Higgs can decay to ff¯ for all kinematically allowed decay modes (mh > 2 mproduct). ± Since we know that the Higgs mass is 125 GeV < 2mV for both the W and the Z, ≈ the only way for the Higgs to decay to the massive vector bosons is to decay with at least one of the products off its mass shell. The partial width to any fermion channel is

N g2m2 ¯ c f 3 Γ(h ff) = 2 β mh, (1.50) → 32πmW

r 2 4mf where Nc is 1 for leptons and 3 for quarks, and β = 1 m2 . The partial width is − h therefore directly proportional to the mass of the fermion, and branching ratios for the ¯ various kinematically possible decays favor the heavier particles. At mh 125 GeV, bb has ≈ the highest branching ratio as shown in Table 1.4. The b¯b branching ratio has a significant effect on the other branching ratios because it makes up such a large proportion of the Higgs total width. Thus uncertainties in the mass of the bottom quark and the value of αs affect not only the b¯b branching ratio but all other Higgs branching ratios as well. The massless γ and g do not couple directly to the Higgs, but due to their couplings to other massive particles, they have 1-loop decays as shown in Fig. 1.2. The top quark loop is the dominant quark loop in these decays although the same loop with a bottom quark contributes a small amount. The h γγ contributions from the top and W loops interfere → destructively, reducing the partial width by about a third. −3 The total width of the 125 GeV SM Higgs boson is ΓH 4.07 10 GeV[6]. ≈ ≈ ×

process BR process BR

H b¯b 58.4% H ZZ∗ 2.62% → → H WW ∗ 21.4% H γγ♦ 0.227% → → H gg♦ 8.6% H Zγ 0.153% → → H τ +τ − 6.27% H qq¯ <0.03% → → H cc¯ 2.9% q = u, d, s → { }

∗ Table 1.4: Higgs branching ratios at mh ≈ 125 GeV. An ( ) indicates that one of the bosons is produced 2 2 ♦ off mass shell with q < 4mV .A( ) indicates a loop diagram: the massless γ and g compete with the massive particles via 1-loop processes shown in Fig. 1.2.

20 Branching ratios for the Higgs decays can be found in Table 1.4. As mentioned, the LHC cross section for pp qq is extremely large compared to the → Higgs production cross section, so the QCD background to any Higgs process at the LHC is very high. Thus, the most sensitive channels for Higgs searches at the LHC are Higgs decays to leptons (electrons, muons, and their neutrinos), such as H γγ, H ZZ∗ 4`, or → → → H WW ∗ lνlν. These processes have signatures that can be distinguished from the → → QCD background with relative ease, despite having much lower cross sections. In 2012, the Higgs was in fact discovered in the H γγ and the H ZZ∗ 4` channels by ATLAS → → → [7] and CMS [8].

g γ t t

H t H t

t t (a) (b) g γ

W γ γ W H H W (d) γ (c) W γ W

Figure 1.2: Higgs to massless gauge bosons via heavy intermediate particles.

SM Higgs Couplings: Measurements and Searches

At this time we are fairly certain that the particle found at 125 GeV is in fact a scalar ≈ boson [12]. Other measurements underway indicate that it behaves like the Higgs boson more or less as predicted by the SM. However, many of the Higgs couplings have not been measured to much , or have not been measured at all. In the SM, the Higgs couplings are firmly fixed once the masses of the vector bosons and fermions are known. Thus measuring the Higgs couplings will either validate the SM, if they are found to be in agreement, or, if deviations are found, indicate the presence of BSM physics. However, large QCD backgrounds severely limit the sensitivity of the LHC to Higgs couplings. Higgs couplings can be broadly categorized into Higgs self-couplings, Higgs couplings to

21 vector bosons, and Higgs couplings to fermions. The latter two are by far the best measured.

Higgs Couplings to Vector Bosons and Fermions

A useful way to parameterize the Higgs couplings to vector bosons and fermions is called the κ-framework [13, 58]. This framework uses two scaling factors, one for vector-Higgs couplings and the other for fermion-Higgs couplings. One factor of κ is multiplied for each type of Higgs coupling in both the production and decay mode, and this product is then divided by κH , the Higgs total width modifer, which is assumed to have no contributing processes outside of the SM. As an example, if a Higgs is produced via top quarks and decays to two W bosons, the κ scaling would be:

κ2 κ2 ¯ f V ¯ σ (tt H WW ) = 2 σSM SM (tt H WW ) (1.51) B → → κH B → → This parameterization is simple enough to be widely used; however, it assumes that all fermions use the same scaling and all vector bosons use another. This may be subject to scrutiny as more Higgs data becomes available, and other schemes using modifers for subgroups of fermions and bosons or for each individual particle have been proposed and used as well.

Current best fits [14] for κf and κV are:

+0.48 +0.08 κf = 1.520.41 , κV = 1.100.08 . (1.52)

Direct Higgs couplings to the tau, top quark, and bottom quark have now been observed [15, 16, 17, 18, 19] with all three couplings so far in agreement with the SM prediction. The recent measurement of the Higgs decay to a W ± pair [14] is also consistent with the SM within uncertainties. Couplings to the lighter fermions remains a challenge. Because the coupling is propor- tional to the fermion mass, the cross section for these decays are quite small. With the additional challenge posed by the QCD background at the LHC, no direct observation of their couplings is possible.

Higgs Self-Couplings

Finally, the Higgs self couplings are extremely important in the characterization of the Higgs potential. They also represent the most promising chance of deviations from the SM. Measuring the Higgs self-couplings at the LHC continues to pose a challenge. The trilinear self coupling could be measured via Higgs boson pair production. The background for this process, Higgs pair production via heavy quarks, interferes destructively with Higgs pair production via the trilinear coupling (Fig. 1.3).

22 g t, b g H H t, b t, b t, b H t, b

t, b t, b H H

g g

Figure 1.3: Higgs pair production at the LHC.

The inclusive cross section for Higgs pair production at the √13 TeV LHC is only about 40 fb. LHC-HL sensitivity to the overall Higgs pair prodtuction via various decay modes have been made by both ATLAS and CMS, with the highest sensitivity through HH b¯bγγ. ATLAS and CMS projected sensitivities in this channel to be 1.3σ and 1.6σ → respectively [6]. Measurement of the quartic Higgs coupling at the LHC has essentially been deemed impossible, even with the high luminosity upgrade.

Supersymmetry

The instability of the Higgs boson discussed in Section 1.2 is problematic not only for the SM, but for any BSM theory, as any additional fermions added to the model that couple even indirectly to the Higgs will also add mass corrections. One method of solving the hierarchy problem, Supersymmetry (SUSY), was first pro- posed11 in 1977 [20] and has since become a primary focus of BSM theoretical work due to its mathematical beauty and utility. Recall that the hierarchy problem is caused by fermion loops that contribute runaway mass corrections. Specifically, these mass corrections are given by:

λ2 " Λ2 + m2 ! # 2 f 2 2 UV f ∆mh = 2 ΛUV + 3mf ln 2 + ... (1.53) −8π mf √ 2mf where λ is the Yukawa term from the SM Lagrangian, yf = v . Now, compare this to the mass corrections the Higgs would get from another elementary

11It was originally proposed as an interesting mathematical theory, not as a solution to the hierarchy problem. However, the hierarchy problem has become the primary motivator making Supersymmetry the leading BSM theory.

23 f

H H

(a)

S S

H H

H H

(b) (c)

Figure 1.4: 1-loop corrections to the Higgs mass. (a) The fermion correction to the Higgs mass given by Eq. 1.53. (b) & (c) The scalar corrections to the Higgs mass given by Eq. 1.54.

scalar[21]:

λ2  Λ2 + m2   ∆m2 = S 2Λ2 m2 ln UV S + ... h 16π2 UV − S m2 S (1.54) 2   2 2   2 λS 2 ΛUV + mS ∆mh = 2 2mS ln 2 + ... −16π mS The Feynman diagrams for these corrections as well as the fermion corrections are depicted in Fig. 1.4. It is apparent that if the mass of the scalar is equal to the mass of the fermions, and the √ 2mS Yukawa term for the scalar is constructed the same way (yS = v ),the fermion correction and the scalar corrections will cancel exactly, thus stabilizing the Higgs mass12. The basic tenet of SUSY, then, is that some operator, , generates a supersymmetric Q transformation that turns a fermionic state into a bosonic state, and vice versa:

Boson = F ermion , F ermion = Boson (1.55) Q| i | i Q| i | i The fermionic and bosonic states are contained within irreducible representations of the

12Breaking this symmetry so that the SM fermions and SUSY partners do not have equal mass are covered in Sec. 1.4.2.

24 SUSY algebra called supermultiplets. Each supermultiplet has equal fermionic and bosonic degrees of freedom, leading to systematic cancellations that solve the hierarchy problem. The fields within these supermultiplets are called “superfields”. Bosons commute and fermions anti-commute, so the superalgebra used for superfields must accommodate this. This is accomplished using Grassman numbers as “bookkeeping” tools. Since the Grassman numbers have the property θi θi = 0, a superfield expanded in two-component Grassman · spinors necessarily terminates after a finite number of terms. The most general superfield[4] with equal bosonic and fermionic degrees of freedom is

† † † † † † † † † † † † µ S(x, θ, θ ) = a + θξ + θ χ + θθb + θ θ c + θ θ θη + θθθ ζ + θθθ θ d + θ σ¯ θvµ (1.56) where a, b, c, d, and vµ are bosonic fields and ξ, χ, η, and ζ are two-component fermionic fields. All of the fields are functions of the spacetime coordinates xµ. Each Grassman spinor θ is a two component spinor with Grassman numbers as components. The use of Grassman numbers leads us to expand our normal spacetime dimensions to an eight-dimensional “superspace” where instead of Lorentz invariance over spacetime, we now have invariance under supersymmetry13: Z Z 4 2 2 † δA = 0, forA = d x d θd θ S (1.57)

The Lagrangian density is obtained by integrating just over the Grassman numbers, leaving only the expression under the normal R d4x spacetime integral. Note that integration over a single theta or theta dagger is zero, and the integration over any two Grassman spinors is just 1. Any term that includes two θ’s or two θ†’s is essentially the coefficient of the θ’s under the superspace integral, and is therefore “extracted” by the integration over the Grassman spinors. We also have a set of superspace coordinates, (yµ, θ, θ†), with

yµ = xµ + iθ†σµθ. (1.58)

Supersymmetric theories contain two types of superfield: chiral superfields and vector superfields. Each field contains a SM fermion or boson and its supersymmetric partner. Thus in a supersymmetric theory we replace every SM chiral (fermion) field with a chiral superfield and each SM vector field with a vector superfield. Constructing chiral and vector superfields imposes constraints on the field content, reducing the number of terms in each SUSY superfield.

13 δA = 0 up to full derivative terms; the Lagrangian can change by surface terms and still leave the Euler-Lagrange equations unchanged.

25 Chiral Superfields

Most supersymmetric theories are concerned with left-chiral superfields. A left chiral superfield is a superfield S such that † S = 0, where † is the left-chiral supersymmetric Dα˙ Dα˙ covariant derivative14 given by ∂ ∂ † = + 2i (θσµ) (1.59) Dα˙ −∂θ† ∂yµ∗ The chiral covariant derivative is itself an anti-commuting two-component object, like θ. Once we impose DS¯ = 0 on the general superfield S, we get the left-chiral superfield:

S Φ = φ + √2 θψ + θθF. (1.60) → 1 where ψ is a spin- 2 Weyl fermion, φ is a spin-0 scalar spartner, and F is an auxiliary scalar field. All fields are now given as functions of the superspace coordinates yµ. We note that any combination of left-chiral superfields returns a left-chiral superfield. The auxiliary field balances the number of bosonic and fermionic degrees of freedom. The Weyl fermions in the chiral superfield are complex two-component spinors; they each have four degrees of freedom when off-shell and two degrees of freedom when on-shell15. The bosons in the chiral superfields are complex and now have only two degrees of freedom on- or off-shell. Thus the chiral auxiliary field F must have two degrees of freedom when off-shell and zero degrees of freedom when on-shell. The gauge invariant term in the Lagrangian for the F field is just, FF †. The fields in the chiral superfields have the following supersymmetric transformations:

δφ = ψ µ † δψ = i(σ  )∂µφ + F (1.61) − † µ δF = i σ¯ ∂µψ − terms in each SUSY superfield. Finally, we can obtain the kinetic terms for the chiral superfields by Taylor expanding around the Grassman variables in terms of xµ: 1 Φ(y) = φ(x) + iθ†σ¯µθ∂ φ(x) + θθθ†θ†∂ φ(x) µ 4 µ (1.62) i † µ + √2θψ(x) θθθ σ¯ ∂µψ(x) + θθF (x) − √2 Taking the expansion of Φ∗Φ will give the derivatives we need for kinetic terms for the

14 α † †α˙ There are four chiral covariant derivatives, Dα, D , Dα˙ , and D . Imposing DαS = 0 will give right- chiral superfields, Φ∗ 15The reduced number of degrees of freedom on-shell are due to constraints imposed by the equations of motion.

26 spinors. The eventual kinetic terms for the chiral superfields are

µ † † µ † kin = ∂µφ∂ φ + iψ σ¯ ∂µψ + F F (1.63) L The terms from Eq. 1.62 required to build these kinetic terms all include a factor of θθ, and when multiplied by Φ∗, they gain another factor of θ†θ†, taking a similar form as the auxiliary field D in the vector superfield. Therefore they are called “D-terms” and taking the integral over the Grassman spinors “extracts” the coefficients of the D-terms: Z 2 2 † ∗ ∗ µ † † µ † d θd θ Φ Φ = [Φ Φ]D = ∂µφ∂ φ + iψ σ¯ ∂µψ + F F (1.64) where ]D is common notation to denote the extraction of the D-terms.

Vector Superfields

A vector, or real, superfield is defined by S = S†. Since the vector field contains the gauge bosons, a choice of gauge can further constrain the included terms. In the Wess- 1 Zumino gauge, the vector superfield contains a spin-1 boson Aµ, a spin- 2 two-component fermion λ, and an auxiliary scalar field D:

µ 1 V = θσ θA¯ µ + iθθθ¯λ¯ iθ¯θθλ¯ + θθθ¯θD,¯ (1.65) − − 2 again, all fields are functions of the superspace coordinates yµ. The auxiliary field again balances the number of bosonic and fermionic degrees of free- dom. The the bosons in the vector superfield have three degrees of freedom when off-shell and two degrees of freedom on-shell. The Weyl fermions λ still have four degrees of freedom when off-shell and two degrees of freedom when on-shell. Thus the vector auxiliary field D must have one degree of freedom when off-shell and zero degrees of freedom when on-shell. This is accomplished by not giving it a kinetic term in the Lagrangian; it has an invariant 1 2 ∗ term in the Lagrangian 2 D . The auxiliary filed D is real; D = D. In keeping with the basic premise of supersymmetry given in Eq. 1.55, the vector su- perfield is invariant under the following supersymmetry transformations (again given in the Wess-Zumino gauge):

1  † µ † µ  δAµ =  σ¯ λ + λ σ¯  −√2 i µ ν 1 δλ = (σ σ¯ ) Fµν + D (1.66) 2√2 √2 i  † µ † µ  δD =  σ¯ µλ + µλ σ¯  √2 ∇ ∇

27 where is the gauge covariant derivative: ∇ a a abc b c µλ = ∂µλ + gf A λ (1.67) ∇ µ where a, b, and c are indices for the gauge group that run over the adjoint representation of the gauge groups (SU(3)C SU(2)L U(1)Y as in the SM). × × The vector superfield also has gauge transformations:

a a a abc b c A A + ∂µΛ + gf A Λ µ → µ µ λa λa + gf abcλbΛc (1.68) → Da Da + gf abcDbΛc → Supersymmetric transformations and gauge transformations commute. As in the SM, the vector superfields mediate gauge interactions via the gauge covariant derivatives which replace the regular derivatives in the kinetic terms of the chiral fields (Eq. 1.63) as well as in the supersymmetric transformations (Eq. 1.61).

a a µφ = ∂µφ igA (T φ) ∇ − µ ∗ ∗ a ∗ a µφ = ∂µφ + igA (φ T ) (1.69) ∇ µ a a µψ = ∂µψ igA (T ψ) ∇ − µ In addition to using the gauge covariant derivatives, the supersymmetric transformation for the F field gains a term √2g(T aφ)†λ†a. The Lagrangian for the vector supermultiplets is

1 µν † µ 1 2 gauge = FµνF + iλ σ¯ µλ + D (1.70) L −4 ∇ 2

Fµν is the component field strength for Aµ and has the same form as the field strength tensors in the SM: a a abc b c Fµν = ∂µA ∂νA + gf A A (1.71) ν − µ µ ν

Supersymmetric Gauge Interactions

The component field strength for the vector field Aµ is given in Eq. 1.71. A field strength for the entire vector superfield can also be constructed using the same chiral covariant derivative from Eq. 1.59:

† † α = αV W D D D i   (1.72) µ ¯ν µ † = λα + θαD + (σ σ θ)α Fµν + iθθ σ ∂µλ 2 α

where we can see that α is a left-chiral superfield. W

28 α We can then take the integral of the object α in superspace: W W Z 2 α α 2 µ † 1 µν i µνρσ d θ α = [ α] = D + 2iλσ ∂µλ F Fµν +  FµνFρσ (1.73) W W W W F − 2 4 where now all the fields after taking the integral over d2θ are functions of xµ. We can define an adjoint representation of : W a a α = 2gaT (1.74) W Wα Then all the gauge kinetic and self-interaction terms can be obtained from

aα a 1 α [ α]F = 2 T r[ α]F (1.75) W W 4kaga W W where ka is a normalization constant.

The Superpotential

To create non-gauge interaction terms, we first look for renormalizable functions of φ and ψ (the salar and fermionic terms in the chiral superfields) that are invariant under the SUSY transformationss given in Eq. 1.61. There are only two candidates:

∗ ij ∗ i int = aW (φ, φ ) ψiψj + bW (φ, φ ) Fi + h.c. (1.76) L where W (φ, φ∗)ij is a function of degree 1 in φ, φ∗ and W (φ, φ∗)i is a function of degree { } 2 in φ, φ∗ and a and b are just constants. { } These terms must also be invariant under the action given in Eq. 1.57. Taking the variation using Eq. 1.61, we get

ij i 0 = δ int = aδW ψiψj + bδW Fi L  ij ij  δW δW ∗k ij = a (ψk)ψiψj + ∗k (δφ )ψiψj + W δ(ψiψj) δφk δφ (1.77)  i i  δW δW ∗k i † µ + b (ψk)Fi + ∗k (δφ )Fi iW  σ¯ ∂µψ + h.c. δφk δφ − Each term must vanish or cancel with one of the other terms. The first term will vanish under the Fierz identity16, but the second will not. Thus, δW (φ, φ∗)ij is not invariant under SUSY unless it does not contain any φ ’s; it must be ∗ holomorphic (complex analytic). In general, any left-chiral superfield must be holomorphic in order for it to participate in non-gauge interactions. This means the fifth term in Eq. 1.77 also must be zero.

16 (ψi)ψj ψk + (ψj )ψkψi + (ψk)ψiψj = 0

29 Expanding and simplifying the third term in Eq. 1.77 gives

ij ij † µ ij W δ(ψiψj) = 2iW  σ¯ ψj∂µφi + 2W Fiψj (1.78)

Notice that the second term here would cancel out the fourth term in Eq. 1.77 if we impose ∂W i b = 2aW ij (1.79) ∂φj These leaves us with only two terms remaining to cancel:

? ij † µ i † µ 0 = W  σ¯ ψi∂µφj W  σ¯ ∂µψ (1.80) − Plugging in Eq. 1.79, we see that this does indeed cancel. Finally, we look for the most general holomorphic form of W ij:

∗ ij ij ijk W (φ, φ ) ψiψj = M ψiψj + y φkψiψj (1.81) where M ij and yijk are scalar matrices that correspond to the mass and Yukawa matrices for ∗ the ψiψj and φkψiψj interactions, respectively. The Fierz identity used above requires ψiψj to be symmetric under interchange of i and j, so M ij and yijk must also be symmetric. We note that W ij is automatically symmetric if we define it as the second derivative of another function of scalar fields: ∂2W W ij (1.82) ≡ ∂φi∂φj Again using Eq. 1.79 this gives ∂W W i = (1.83) ∂φi W is then given by 1 1 W = Liφ + M ijφ φ + yijkφ φ φ (1.84) 2 i i j 6 i j k where W is called the “superpotential”, and is a holomorphic function of scalar fields that contains only left-chiral superfields. The interaction Lagrangian Eq.1.76 is then given by

∂W 1 ∂2W int = F ψψ + h.c. (1.85) L ∂φ − 2 ∂φ2 Notice that this Lagrangian contains an F in one term and a ψψ in the other. The superfields that these terms came from had Grassman spinors as coefficients: one θ for each ψ and two θ’s for the F . Thus the interaction Lagrangian reflects the extraction of the F-terms in the superpotential.

30 † The F term in Eq. 1.85 cancels with the F F term in kin (Eq. 1.63), to give an ∗ ∗ L equation of motion Fi = W . Since W is a function of scalar fields, this means the F − i i term can also be written in terms of scalar fields. While the superpotential contains terms involving non-gauge interactions between chiral supermultiplets, we can also add terms that allow for the gauge fermionic (gaugino) and gauge auxiliary fields D to interact with the chiral fields:

∗ a a †a a ∗ a a gauge−chiral = √2g(φ T ψ)λ √2gλ (ψT φ) + g(φ T φ)D (1.86) L − − ∗ a a 1 2 The final term, g(φ T φ)D cancels with the D term in gauge, to give an equation of 2 L motion Da = g(φ∗T aφ), so that D also can be expressed in terms of the scalar fields. Since both the vector and chiral auxiliary fields can be expressed in terms of the scalar fields, the scalar potential V (φ, φ∗) in SUSY is set by the gauge and chiral interactions and is given by

1 X 1 X V (φ, φ∗) = F ∗iF + DaDa = W ∗iW + g2(φT aφ)2. (1.87) i 2 i 2 a a a

The Supersymmetric Lagrangian

The most general renormalizable Lagrangian for a supersymmetric gauge theory in su- perspace is

2   h a a i 1 gaΘa aα a ∗i 2gaT V j = i [ α]F + c.c. + Φ e Φj + ([ (Φi)]F + c.c.) (1.88) L 4 − 32π2 W W i D W The first term includes a CP violating parameter Θ. The third term is just the the chiral superfield kinetic term from Eq.1.64. The addition of the expotential makes the kinetic term

supergauge invariant. The final term [W (Φi)]F contains the superpotential from Eq. 1.84. Writing the Lagrangian this way in superspace is a compact way to write out all the component terms given in Eqs. 1.63, 1.70, 1.75, 1.85, and 1.86.

The Minimal Supersymmetric Standard Model

The simplest version of a supersymmetric model that contains the Standard Model is known as the Minimal Supersymmetric Standard Model (MSSM), where “Minimal” means that the MSSM contains the minimum particle content necessary to make it a renormaliz- able, anomaly free theory that is at least as successful as the SM. Although experiment has essentially ruled out this simplest version of SUSY at the weak scale, it is still used as a launching point for other SUSY extensions of the SM as well as being as a common starting reference point in any SUSY discussion. The sparticle content of the MSSM is given in Table 1.5. The supersymmetric partners

31 Category superfield field particle spartner sparticle quantum notation type notation notation # s 1 Vector spin-1 spin- 2 Superfields gluons g gluinos G˜ (8, 1, 0)

VG,W,B W bosons winos ˜ (1, 3, 0) W W B bosons binos ˜ (1, 1, 0) B B 1 Higgs spin-0 spin- 2     φ0 φ˜0 ˜ Sector d down-type Hd =   higgsinos Hd =   (1, 2, 1) H φ− φ˜− −     φ+ φ˜+ ˜ u up-type Hu =   Hu =   (1, 2, 1) H φ0 φ˜− 1 Chiral spin- 2 spin-0     u u˜ 1 Superfields Q quarks q =   squarks q˜ =   (3, 2, 3 ) d d˜ U¯ u¯ u¯˜ (3¯, 1, 4 ) − 3 ¯ ¯ ¯˜ ¯ 2 D d d (3, 1, 3 )     ν ν˜ L leptons ` =   sleptons `˜=   (1, 2, 1) e e˜ − E¯ e¯ e¯˜ (1, 1, 2)

Table 1.5: The MSSM particle content. As in the SM, there are three generations of both quarks and 3 Y leptons. We again use the convention Q = T2 + 2 .

of SM fermions are called sfermions, and the supersymmetric partners of the SM boson fields are called gauginos and higgsinos. As in the SM, the eigenstates of the symmetry groups are not necessarily mass eigen- states. This allows mixing between the particles involed in EWSB. In the case of the MSSM, 1 this means that the spin- 2 electroweak gauge bosons (winos, binos, and higgsinos) can all mix, as can the scalar Higgs and any sfermion partners that have the same charge after EWSB. The mass eigenstates of the electroweak gauginos are generically called weakinos, with the exact mixing depending on the parameters chosen for the model. We also note that in addition to adding a supersymmetric partner to each SM fermion and boson field, we have added a second Higgs doublet. This is because the supersymmetric 1 partners of the Higgs, the higgsinos, are spin- 2 fermions and thus contribute to gauge

32 17 anomalies and the SU(2)L Witten anomaly. In order for these anomalies to automatically 1 cancel we must add an even number of spin- 2 fermions with net zero hypercharge to the theory, so two higgsinos with opposite hypercharge fit this requirement minimally. We also note that in supersymmetry, the Higgs is simply the spin-0 counterpart to 1 a spin- 2 fermion in a chiral supermultiplet, mathematically constructed just like the SM fermion left chiral supermultiplets. Therefore unlike the Higgs doublet in the SM, the Higgs doublet in SUSY must be holomorphic, which prevents us from using the Higgs conjugate to give mass to the up-type quarks. The second Higgs doublet also provides a solution to this problem by allowing one doublet to couple to the down-type quarks and the other doublet to couple to the up-type quarks. The two Higgs doublets in SUSY act as a Type II 2 Higgs Doublet Model (Type II 2HDM), and still function in electroweak symmetry breaking. A detailed description of how this is done, or on the particular qualities of a Type II 2HDM, are beyond the scope of this thesis. Now that we have the field content of the MSSM, we can construct MSSM Lagrangian. Each vector superfield from Table 1.5 gets inserted into the first (kinetic) term of Eq. 1.88. Each chiral superfield from Table 1.5 is inserted into the K¨ahlerpotential. Finally, the Yukawa terms of the SM become the MSSM superpotential:

ij ij ij W = y U¯iQj u + y D¯iQj d + y E¯iLj d + µ u d (1.89) u H d H ` H H H Finally, the MSSM contains a key feature called R-parity, which is included in many supersymmetric models. R-parity prevents proton decay, an extremely undesireable feature of generic supersymmetric models as proton decay has been essentially ruled out experi- mentally18. R-parity is defined as 3(B−L)+2s Rp = ( 1) (1.90) − where B is the baryon number, L is the lepton number, and s is the spin of the component field. In addition to preventing proton decay, applying R-parity means that all SM particles have Rp = +1 while all sparticles have Rp = 1. This gives the additional benefit of R- − parity by creating a stable Lightest Supersymmetric Partner (LSP) that cannot decay to two SM particles. It is this LSP that makes a natural dark matter candidate.

17 3 2 3 3 3 3 3 3 U(1)Y and U(1)Y SU(2)L: in the SM these cancel via (2YL − Y e − Yν ) + 3(2YQ − Y u − Yd ) = 0 and (YL + 3YQ) = 0. 18The MSSM without R-parity would see a proton half-life of 1034 years; recent experiments put the proton half-life at > 1034; other SUSY extensions predict a mush shorter proton lifetime. Both of these are much greater than the age of the universe ∼ 1010 years.

33 Breaking SUSY

The strongest motivation for SUSY, as a solution to the hierarchy problem, requires that the fermionic and scalar Higgs mass correction terms systematically cancel each other. In a perfect, unbroken symmetry, this cancellation depends on the particles and their superpart- ners having the same mass. This is problematic because experiment overwhelming rules out the possibility of most SUSY particles at the same mass as their SM counterparts. For example, we can be fairly certain that there is no negatively charged boson with the same mass as the electron.

Looking back at Eqs. 1.53 and 1.54, we notice that so long as λf = λS, the quadratic contributions to the Higgs mass still cancel exactly even if the scalar partners to the fermions are not equal. Therefore, we can imagine a different structure for the scalar Yukawa coupling √ √ 2mf 2mS to the Higgs where λS = = , thus maintaining the quadratic cancellations v 6 v but allowing the scalars to have a different mass than the fermions19. The logarithmic corrections would not exactly cancel under this scheme, but those corrections are much smaller, leaving the total Higgs mass corrections proportional to the scale of the scalar masses. However, the scalars would then need a new way to obtain their additional mass that does not come from the Higgs mechanism. Even within the SM there are symmetries that predict a set of particles to have the equal mass until some mechanism breaks the symmetry. For example, the SM gauge bosons ± (Z, W , and the photon) are all initially massless under SU(3)C SU(2)L U(1)Y until ⊗ ⊗ the Higgs mechanism causes EWSB and the Z and W ± gain a mass that, within the overall mass range of the SM, is quite large (ie, the mass difference between the Z and the photon is quite large on the scale of the symmetry breaking mechanism). It is therefore reasonable to look for a way to break SUSY so that the SM particles retain their experimentally measured masses, but their spartners all gain a larger mass that has made them, to date, undetectable. There are two ways to break a symmetry: spontaneously and explicitly. Spontaneous symmetry breaking (SSB) is by far the preferred method, as it does not require the addition of terms that are not invariant under SUSY. There are two ways to spontaneously break SUSY, called “F-type” and “D-type”. Theories featuring these types of SSB result in very light sfermions, such as a light selectron, which has been ruled our experimentally, and/or they require sunstantial modifications to the particle content and gauge symmetries of the MSSM. Therefore, the MSSM uses explicit SUSY breaking. The terms chosen to break SUSY must leave the cancellation of the quadratic divergences intact in order to retain SUSY as a solution to the hierarchy problem; that is, radiative corrections due to the supersymmetry breaking terms must be proportional to those terms. This is called “soft” SUSY breaking.

19The mechanism behind such a structure, and why it would be like that, is called the “little hierarchy problem”.

34 The soft SUSY breaking terms of the MSSM are: 1   = M3G˜G˜ + M2W˜ W˜ + M1B˜B˜ + h.c L − 2  ¯˜ ¯  [au]ijq¯iHuu¯˜j + [ad]ijq¯iHddj + [ae]ij`iHde¯˜j + h.c. − (1.91) 2 † 2 † 2 ˜† ˜ 2 † 2 † [m ]ijq˜ q˜j [m ]iju¯˜ u¯˜j [m ]ijd¯ d¯j [m ]ij`˜ `˜j [m ]ije¯˜ e¯˜j − q˜ i − u¯˜ i − d¯˜ i − `˜ i − e¯˜ i 2 † 2 † m H Hd m H Hd (BµHuHd + h.c.) − Hu u − Hd u − where Mj for j = 1, 2, 3 are mass terms for the gauginos, the ai for i = u, d, e are { } ¯˜ ˜ { } trilinear couplings, the mk for k = q,˜ u,¯˜ d, `, e¯˜ are scalar mass terms, and mH are Higgs { } u,d mass terms. Soft SUSY breaking adds a plethora of new, arbitrary constants. This does not help contend with the argument that the high number of arbitrary constants already in the SM is unnatural, and that the ultimate BSM theory should have fewer constants. It is therefore worth noting that all of the dimensionless couplings and all but one mass term in the MSSM Lagrangian correspond to parameters in the SM that have already been measured by experiment [4]. Thus one of the primary criticisms of SUSY (other than the lack of experimental evidence to support it), that it adds too many arbitrary parameters, could be nothing more than a failure on our part to devise the correct SUSY breaking strategy.

Additional Problems solved by SUSY

Although most strongly motivated as a means of solving the hierarchy problem, SUSY also offers elegant solutions to some other problems with the SM. In particular, SUSY provides potential dark matter candidates and fulfills the dreams of many physicists by unifying the strong and electroweak forces.

Gauge Unification

Since the unification of electricity and magnetism into electromagnetism as a single force was proposed by James Clark Maxwell and experimentally verified by Michael Faraday over a century ago, physicists have souhgt to unify all the forces of nature. The discovery of neutrinos offered a third force, the weak force, and electroweak theory unifying the weak and electromagnetic forces was experimentally validated first by the discovery of the Z and W bosons and completed with the discovery of the Higgs boson. However, the strong interaction governing quark and gluon interactions in the SM oper- ates independently of the electroweak interactions. Although it could be argued that this is “just the way nature is”, physicists crave deeper symmetry and simplicity in whatever the final theory is and they generally expect the strong interaction to unify with the electroweak interaction at higher energies. Extrapolations of the SM interactions to higher energies show

35 Figure 1.5: Gauge interaction “near miss” in the SM, left, and SUSY unification, right. The kink in the right graph shows where SUSY appears, altering the coupling strengths to bring them together. Image from LEP.

that the electromagnetic and weak couplings do intersect, but there is a tantalizing “near miss” for the strong coupling to also intersect at that same point. The temptation to find a way to tweak these couplings to get them to all unify at one energy is compelling, and this is indeed achieveable by the MSSM as shown in Fig. 1.5.

A Natural Dark Matter Candidate

One of the common features of supersymmetric theories is R-parity, a symmetry intro- duced prevent proton decay. R-parity ensures that there exists a LSP that cannot decay to a SM particle. This massive particle interacts with the SM only indirectly through gravity and sometimes weak interactions, making it a perfect DM candidate. The LSP can be any number of SUSY particles depending on the specific model and parameters chosen. Such an LSP must be electrically neutral, or it would leave an obvious eletromagnetic signal (and would not really be “dark” any more).

...Even Gravity?

The central tenet of Supersymmetry is the transformation that turns a fermion into a boson and vice versa (Eq. 1.55). Thus it should be obvious that one could do two successive transformations on a field, returning to the same field you started with. In SUSY we find that there is one twist to this seemingly simple proposition: the field returned is evaluated 36 Mass Main Dominant Typical Hierarchy Production Decay Signature 0 mq˜ mg˜ q˜q,˜ q˜q¯˜ q˜ qχ˜ 2 jets+ ET + X  → 1 ≥ 0 mq˜ mg˜ q˜g,˜ q¯˜g˜ q˜ qχ˜ 3 jets+ ET + X ≈ → 1 ≥ g˜ qq¯χ˜0 → 1 0 mq˜ mg˜ g˜g˜ g˜ qq¯χ˜ 4 jets+ ET + X  → 1 ≥ Table 1.6: Typical search signatures at the LHC for direct gluino and first- and second-generation squark production assuming different mass hierarchies [6].

at a different coordinate in spacetime. Thus SUSY is profoundly connected with spacetime transformations. Imposing invariance under local supersymmetric transformation in fact gives rise to fields that reproduce general relativity. Thus SUSY suggests a natural way to unify gravity with the other fundamental forces of nature that has been extensively utilized in Grand Unified Theories (GUTs). SUSY GUTs are beyond the scope of this thesis, but this appealing feature of SUSY is worth taking note of even if only in passing when considering low-energy SUSY searches.

Current LHC SUSY searches

The phenomenology of SUSY is highly model dependent and requires choices to be made as far as the nature of SUSY breaking, the SUSY breaking scale, and values for the resulting arbitrary parameters. It also depends on whether or not R-parity is included in the model. However, empirical data from astrophysical observations and experiments including heavy meson decays and precision electroweak measurements greatly constrain the parameter space for supersymmetric theories. Thus searches for SUSY at the LHC largely focus on direct detection of generic BSM particles. Because the cross sections of color-charged particles at the LHC is high, there tends to be a higher sensitivity to squarks and gluinos; however, large SM backgrounds pose a challenge to triggering and analysis. The primary color-charged sparticle processes at the LHC are pair production of squarks and gluinos, or squark-gluino pairs. Since these sparticles are expected to have a much higher mass than any SM particles, high transverse momentum

(PT ) jets are a typical signature. If R-parity is conserved, any sparticle interaction will also ultimately result in decay to at least two LSPs which should be neutral and thus undetectable at the LHC, so high missing energy is also typical. CMS and ATLAS use simplified SUSY models[22] as a framework in order to aid in the interpretation of experimental results. These limit the production and decay modes of SUSY particles in order to leave masses and other model parameters more flexible. Although such

37 Figure 1.6: Gluino mass limits in various channels from ATLAS.

measures are necessary to focus and optimize LHC searches, results must still be carefully analyzed to determine how well they represent a more specific model. Fig. 1.6 shows some of the limits on the gluino mass by decay channel of some simplified SUSY models. Lower bounds on gluino masses rely heavily on the model used and the masses of both squarks and gluinos. For example in one simplified model[23] with a massless neutralino, squarks and gluinos are both entirely excluded below 2 TeV, with the lower bound for mg˜ mq˜ at about 2.7 TeV (see Fig. 1.7). In another model[24], gluinos with mass below ≈ mg˜ 2 TeV are excluded for massless neutralinos, but no lower bound could be set for ≈ neutralinos of mχ˜ 1 TeV. ≥ Another question that might be asked then, is how heavy could a supersymmetric par- ticle be before the LHC simply could not detect it with a discovery level significance? The high range of the five sigma discovery potential for gluinos is currently estimated to be around 2.4-2.8 TeV jets plus ET channel[25, 26, 27]. With the next high energy hadron collider projected not to be online until the 2040’s, high energy physicists could remain in “limbo” for another 20 years or more only to find that SUSY was just barely around the corner that whole time. With the absence of any evidence of TeV-scale supersymmetry at the LHC, some have proposed that we abandon the hierarchy problem and embrace SUSY instead for its gauge unification and dark matter candidates[28]. It is also possible that TeV-scale SUSY exists

38 Figure 1.7: Gluino mass limits for a particular SUSY model with particular choices for sparticle masses and other parameters.

and simply cannot be detected at the LHC for some reason.

39 Machine Learning

Particle phenomenology requires sorting, managing, and analyzing large and complex datasets in an attempt to extract elusive signals. It is therefore natural for phenomenologists to seek out more cutting edge techniques from data and computer science. One of the most promising avenues is machine learning, in particular Artificial Neural Networks (ANNs). As early as 1995, attempts were made to use a neural network in the search for the Higgs boson at LEP [29] despite the relative immaturity of data-driven machine learning. However, setbacks in the field produced a general pessimism towards neural networks even amongst computer scientists. In 2006, Geoffrey Hinton et al. published a breakthrough paper[30] that demonstrated a machine’s ability to learn to recognize handwritten numerals with an astonishing (>98%) precision. Since then neural networks have become ubiquitous in today’s technological society, driving personalization of features not just by tech giants like Facebook and Google, but also in every day life from coupon mailers customized based on your shopping history to automatically sorting emails into your spam folder. Artificial Neural Networks are gaining popularity in collider physics as well, primarily for use in triggering and filtering systems as well as for jet reconstruction and tagging[31]. Despite the successful use of ANNs for phenomenological purposes that achieved com- parable or improved results over traditional analyses [32, 33, 34, 35, 36, 37], some phenome- nologists remain skeptical of the “black box” of machine learning - the fact that it is often difficult to determine just what features the machine is learning that allow it to come to its comclusions. After all, scientists who study the most fundamental particles and interactions generally do so because they want to be able to understand and trace every intricate detail of the process. They don’t just want an answer, they want to know why and how that answer was obtained, and unfortunately those details can prove difficult to extract from an ANN. One approach is to use Machine Learning not just to perform analysis and produce results, but to identify features of interest. Again, jet reconstruction seems to be leading the way in this regard [31], using ANNs to identify novel jet observables that can then be applied to traditional analyses in a way that allows physicists to see exactly which events are getting cut from the signal and background, and why. Another hurdle in the application of machine learning to particle phenomenology is the considerable learning curve. Effective use of ANNs requires a degree of expertise in computer science that can be intimidating and time-consuming to learn. Data preparation and pre- processing can be extensive, and high performance computing resources may be required. However, as neural networks continue to surge in practical, every-day applications as well as continuing to be at the forefront of computer science technology, software packages that

40 Figure 1.8: Basic function diagram of an Artificial Neural Network.

lower the bar for entry by automating many of the more technical aspects of neural network design are becoming more and more widespread. Below the basic building blocks of Artificial Neural Networks will be described.

How Artificial Neural Networks Work

ANNs take datasets, often with a large number of complex variables, and attempt to classify them. An unsupervised neural network will attempt to find patterns and common- alities in the dataset, and its output will be a set of categories that it has discovered within the data. A supervised neural network will attempt to learn how to classify the data based on a given label or “answer key”. Thus an unsupervised ANN might be useful to try to find new observables that can then be applied to a cuts-based analysis, while a supervised ANN is generally used when we know the categories (signal or background) and we want the network to sort the data into those categories. Neural networks can sort data into multiple categories (a multiclass dataset) or into two binary categories. In this work we focus on a supervised neural network with binary output. The individual chunks of data that make up a dataset are called “instances”; variables in each dataset that may be present in all or just a few instances are called “features”. Features are quantifiable, single-valued pieces of information. For example, a pixel given in

41 RGB coordinates contains three features: the red, blue, and green components. Features can be non-numeric, such as words, but this requires additional processing and will not be covered here. A pixel given in hex coordinates is a single feature that has combined the three features from the RGB coordinates into a single value. Similary, energy and momentum could be quantified by component as four distinct features, or all together as the four-momentum squared. The method of defining these variables may impact how the neural network learns. Often, there are connections and dependences between features that allow a higher level feature to be constructed that is more useful to the network than the most basic features. Networks can be constructed to find linear or even some nonlinear functions of the individual features, but there is no guarantee that it will construct the same higher level features that a human would, as the human is constructing them based on experience that the network does not have access to. Thus in general if it is known that a higher level feature is relevant to the problem, it is manually added into the dataset. Choosing the most effective and efficient set of features and using human knowledge of the data and the problem to create higher level features is called “feature engineering”. In a dataset of a particle collision events, each event is an instance. The simulated or actual observed quantities coming out of that event, for example the output from an LHCO file, are the features. Higher-level features, such as reconstructed masses, may be added. Finally, each instance has a classification or label. The classification is essentially the “answer key”: it is the category that the instance belongs to. The goal is to get the output of the neural network to match a given instance to the correct classification. In the case of particle phenomenology, we want the network to take a particle collision event and use the observable quantities from the various particles involved in the collision to correctly match the event to either the “signal” or “background” label. Depending on the type of neural network used, the features can be arranged in a single vector, or in a square block like an image depending on the nature of the relationships between features, if any exist. A basic diagram showing the structure of a neural network is shown in Fig. 1.8. Each feature corresponds to an input in the first layer of the neural network. The input layer sends information to one or more hidden layers, which ultimately pass processed information to the output layer. In a Fully Connected Network (FCN), each node between successive layers is connected, and each connection has a weight that is used to process the input from the previous layer. Since each input is combined at each hidden layer node, inputs with different scales can severely skew the feature’s relative importance. Features are therefore scaled to have a value between 0.0 and 1.0 (or, in some cases, from -1.0 to +1.0) prior to being fed into the network. When you train the network, the network starts with randomized weights on each con- nection. It takes a subset of data (the “training” data) and sends it through the network.

42 The corresponding output is compared to the correct output - the label. Of course, for the first iteration, the output is essentially a random guess and may or may not come close to the correct classification. An error is then calculated for each output node, then backprop- agated though the network using a “cost function” that re-weights each connection so that the network will get closer to the correct answer the next time. The network then begins again, sending the input through the network with the newly adjusted weights. This is repeated many times over until the network begins to give correct output at an acceptable level. The specific set of weights between each node is called the network model. It is important to note that in the most basic neural network, each event or image must have comparable information at each node. If in one event, feature number 17 is the φ for the fourth jet, then this value for φ is what is passed through the hidden layers, multiplied by weights, and eventually sent as part of the output. The error is adjusted and backpropagated and the weights are adjusted based on this feature representing the φ angle of the fourth jet. If in the next event feature number 17 is instead the number of tracks in the 2nd jet, the series of weights calculated for that node from the previous event’s φ of the fourth jet will be completely wrong, and the network will make no progress in learning. In order for the network to learn how to recognize types of features regardless of location, a more complicated architecture is required. It is quite possible to “overtrain” a neural network. This happens when the network essentially memorizes the correct answers for each instance, using features specific to the training data set that may not generally correspond to the correct label in an expanded dataset. Thus the goal is to train a network enough that it can recognize relevant features and use those to correctly classify the instance, but not train it so much that it cannot “generalize” these features to input that it has not yet been exposed to. Overtraining the network is also referred to as “over-fitting” the data. Finding the best balance of accurate classification with broad generalization is called “optimizing the model”, where the model is the specific combination of weights at each node throughout the network. Once the network is trained, it is presented with new data that was not in the training set as input. This new data tests the accuracy and generalization of the trained network. This subset of the entire dataset is called the “test” or “validation” data. Once the net- work performs as desired on the test data, it can be applied to additional data where the classification is not known. In applications to collider phenomenology, the process is:

1. Simulate as many particle collision events as possible.

2. (Optional) Add higher level features, or subtract basic features, depending on their hypothesized relevance to the network or previous experience training this type of network on this type of data.

43 3. Process the data so that you can present each event to the network in an optimal format for machine learning.

4. Select a subset of the data to serve as training data, and a subset to serve as test data. Training data should include roughly equal numbers of events from background and signal to avoid overtraining on one classification. Both sets should be shuffled so that the order in which the network is presented with events is randomized.

5. Train and optimize the network using the training dataset.

6. Test the network on the test data.

7. Evaluate the performance of the network and either proclaim it successful, or re- evaluate the network architecture and optimization. Once a reliable network model has been trained on simulated events that have an avail- able “answer key”, the network can be applied to real collider events (that unfortunately do not come with an “answer key”).

The Cost Function

A primary ingredient to a successful neural network is the implementation of the cost function. This is also the part that dictates what goes on in the “black box” during back- propagation, so we will review how it works below. Since the cost function comes into play during backpropagation, the discussion starts with the output. While the output from the final layer of the network is a number between 0.0 and 1.0 that roughly translates to a probability of being 1.0, the actual label is a binary integer: 0.0 or 1.0. Thus we must construct a cost function that translates the continuous prediction variablep ˆ to the discreet, binary prediction variabley ˆ that is a direct prediction of the label value, y. The simplest way is to simply round the answer up or down as appropriate: ( 1, ifp ˆ 0.5 yˆ = ≥ (1.92) 0, ifp ˆ < 0.5

However, in order to compute an error that can then be used to adjust weights, we need a differentiable function. One way to do this is with a logistic cost function. The logistic cost function for a single instance is computed as ( log(ˆp) < 0.5, if y = 1 c(ˆp) = − (1.93) log(1 pˆ) 0.5, if y = 0 − − ≥ Since log(t) gets large as t goes to zero, the “cost” will be high if the model estimates a probability close to 1 when the true label is 0 and vice versa; the “cost” will be lower if the model predicts a value closer to the label value. 44 The logistic cost function over the whole training set is the average cost function across all instances: m 1 X J(ˆp) = [yjlog (ˆpj) + (1 yj) log (1 pˆj)] (1.94) m − − j=1 where m is the number of instances in the training dataset. Unfortunately, there is no analytical equation to minimize the cost function [38]; it must be done iteratively through a method called gradient descent. At each successive layer, the output of the neural network can be seen as a linear combination of features with their weight at that layer:

pˆ = (θ0 + θ1x1 + θ2v2 + ... + θnxn) (1.95)

where the θi are the weights, the xi are features or nodes for the current layer, andp ˆ is the output from the current layer that will become input for the next layer. A node called the

“bias node” is initialized at a value of 1 at each layer and serves mathematically as x0. It

is given the bias weight θ0 which is randomized on the first pass like all other weights. At the final layer, the outputp ˆ is converted to a discreet valuey ˆ using Eq. 1.92 and com- pared to the true label, y using some performance measure. The most common performance measure is Mean Square Error:

m 1 X 2 MSE = (ˆpj yj) (1.96) m − j=1 where m is the number of instances in the training dataset. The gradient of the MSE with respect to each contributing node is taken to find the minimum of the MSE, and each weight corresponding to each xi is adjusted by one step towards this minimum. This is called “gradient descent”. There are different ways to implement gradient descent. In “batch gradient descent”, the MSE is computed using each instance in the training set. In “stochastic gradient descent” (SGD), a random instance is selected from the training data and MSE is calculated for ∇ just that one instance and then applied to the weight corrections for all the other instances. Stochastic gradient descent is much faster and less memory intensive but will bounce around and never reach the true minimum for the dataset as a whole; batch gradient descent leads to a smoother descent towards the minimum but requires significantly more memory. Stochastic gradient descent can also help the function escape from a local minimum, if the cost function is not a smoothly decreasing function. Hybrid approaches, such as “mini- batch” gradient descent, attempt to balance these considerations. Once the gradient is calculated, all the weights are adjusted in one step towards this minimum. The size of the step determined by the “learning rate”, η, and the choice of learning rate is another critical critical decision for the network. If the learning rate is

45 Figure 1.9: The feedback loop of a neural network. Image credit [39].

too small, it takes a long time for the cost function to converge, using significantly more computing power along the way. But if the learning rate is too large, the cost function may never converge.

The step for a given weight θi is given by

δθi = η θMSE(θ)i (1.97) ∇ and the new weighted value is 0 θ = θi δθi. (1.98) i − Fig. 1.9 shows the feedback loop of the input, forward pass, predictions, loss function, optimizer, and backpropagation through the weights.

Other considerations

There are a number of other technical decisions that must be made in constructing a neural network, and the mathematical reasoning and details behind these functions is beyond the scope of this thesis. However, when giving results from a neural network, some of the following choices may be specified:

The activation function that determines the exact form of the cost function. A logistic • activation function was used in Eqs. 1.93 and 1.94. Other common types of activation function includee the tanh hyperbolic tangent function and the ReLU function. 46 Figure 1.10: An illustration of how the lower-level nodes in a CNN look for broad patterns of lines and curves in order to classify objects in an image. To recreate what the CNN “sees”, the algorithm output was interrupted early in the training cycle [40].

The type of regression model. Logistic regression is often used for binary classifica- • tions. “Softmax” regression can be used for multiple classes, but can also give good results in binary classification problems. Softmax regression can deal with possible negative values (ie, when features are scaled from -1 to +1).

Number of hidden layers: 1 layer can train almost anything, but more layers can be • faster and help with the network’s ability to generalize.

Number of nodes per hidden layer. • The batch size, if batch gradient descent is used. • The number of times the entire dataset is run through the network (called epochs). •

Convolutional Neural Networks

Convolutional Neural Networks (CNNs) were developed with the goal of being able to train a neural network to recognize characteristic features in images, even when those fea- tures were in different locations or even slightly different orientations within the image itself. Their design was thus inspired by studies of the biogical visual cortex, which showed that many neurons in the visual cortex only react to stimuli in a local receptive field. In addition, some neurons reacted only to lines in a certain orientation (horizontal, vertical, slanted, etc.). Finally, some neurons had larger local receptive fields, and reacted to com- plex patterns of lines. This suggested that image data reaching the visual cortex was first processed by low-level neurons with smaller fields, then higher-level neurons processed in- formation from the low-level neurons, and finally the entire image was reconstructed from many neurons with overlapping receptive fields. 47 In CNNs, therefore, the nodes in the first convolutional layer do not connect with every single pixel in the input image; they are instead assigned a receptive field of pixels to connect with. The nodes in the second convolutional layer are then connected with a set of nodes in the first layer, covering a larger area or feature set, and so on until the entire image has been covered. This hierarchical approach allows the network to identify characteristic features in a dataset, even if those features appear in different locations. Thus a convolutional neural network can still recognize a hand-written digit if it is rotated or flipped, or it can identify an animal even if the the animal is positioned differently, so long as the key characteristics of the animal used by the network are still visible. Figs. 1.10 and 1.11 gives some insight on how a CNN goes about learning specific characteristics, and how it attempts to recognize these characteristics in an input image. In addition to image recognition, CNNs are also useful in other complex classification applications such as voice recognition and natural language processing.

48 (a)

(b)

Figure 1.11: An illustration of how a CNN trained to “see” animals and other commonplace objects “saw” Van Gogh’s famous 1889 painting, “Starry Night”. ABOVE: Vincent Van Gogh’s 1889 painting, “Starry Night”. From the Google Cultural Institute[41]. BELOW: This image was created by training a CNN to identify commonplace objects. The algorithm looks for line and curve patterns in any picture and the final output should then correctly identify objects in the image. The network does this by looking for key identifying features that it isolates and characterizes throughout training, then attempting to find and extract those features from the input images it receives. However, in this case, the algorithm output was interrupted before being fully trained, and the incomplete interpretation of the image was iteratively fed back through the network as the new signal. From the Google Deep Dream Gallery[42].

49 Summary

In this chapter, we have reviewed the Standard Model which to date provides the most accurate description available of fundamental particles and their interactions. Despite its impressive success, there are still many things we do not know. We discussed the importance of measuring the Higgs properties and how they may provide a more accurate trajectory in our search for BSM physics. We reviewed SUSY, which remains the leading BSM theory despite the increasingly dsturbing lack of evidence for low-energy supersymmetric particles. We briefly discussed both Higgs and SUSY searches at the LHC. Finally we touched on a few simple Machine Learning techniques that could aid us in sorting through the vast quantities of data coming out of the LHC. This thesis is organized as follows: Chapter 2 covers Higgs to Light Jets. Chapter 3 describes a preliminary investigation using Machine Learning techniques in a side-by-side comparison with the results obtained in the cuts-based analysis described in Chapter 2. Chapter 4 describes a different cuts-based analysis, this time in search of the supersymmetric gluino.

50 Chapter 2 Higgs Decay to Light Jets at the Large Hadron Collider

This chapter is based on the work published in Phys. Rev. D 95, 053003 (2017)

Introduction

As we know for the Higgs detection at the LHC, γγ and ZZ were the discovery channels for the Standard Model-like Higgs boson (h)[7,8]. Next came the WW decay channel, all have been measured with more than 5σ significance at Run I by both experiments ATLAS [12] and CMS [43]. While the ZZ,WW channels are tree-level processes, most directly related to the electroweak symmetry breaking (EWSB) with the coupling strength proportional to MW,Z gv, the Higgs coupling to the top quark is best inferred from its ∼ contribution to the production gg h and the decay h γγ with a fitted accuracy of → → around 30% [44]. A direct measurement from Higgs and top associated production is yet to be established [45, 46]. For the lepton side, the challenging decay channel h τ +τ − has → also reached 5σ observation with a combined analysis of the two experiments [44]. With the upgrade of LHC to its higher center of mass energy at Run II and more accumulated data, the difficult mode h b¯b is expected to reach 5σ soon after several hundreds fb−1 at → 14 TeV [47]. Thus, the Higgs couplings to the heaviest generation of fermions will soon be settled to the values expected from the Standard Model (SM) prediction at an accuracy of about 20% [48], and verifying the pattern of non-universal Yukawa couplings. We next consider the LHC upgrade to a total integrated luminosity of 3000 fb−1 at 14 TeV (HL-LHC). While the precision measurements of those couplings will continue in the LHC experiments, it is imperative to seek other “rare decay” channels, in the hope of uncov- ering any deviations from the SM. Among the rare channels, it is perhaps most promising to observe the clean mode gg h µ+µ− [49], despite the small decay branching fraction → → BR(h µ+µ−) 2 10−4. A 5σ observation may be conceivable at the end of the run → ∼ × 51 for HL-LHC with 3000 fb−1 [48], which would be of significant importance to establish the pattern of the Yukawa couplings by including a second generation fermion. For the other hadronic channels, it would be extremely challenging to make any measurements at the LHC due to the overwhelmingly large QCD backgrounds.20 The most promising production mechanism for the hadronic decay signal of the Higgs boson is

pp V h, where V = W ±,Z. (2.1) → With W/Z decaying leptonically to serve as effective triggers, the Higgs signal may be de- tected from the construction of the invariant mass of the hadronic products. To sufficiently suppress the large QCD backgrounds, it was proposed [53] to look for highly-boosted events for h b¯b against the leptonic W/Z. Studies on these processes at HL-LHC shows a → 20σ (9σ) significance for the signal V h, h b¯b, with statistical (systematic added) un- ≈ → certainty estimated [47]. Marching to the channel involving the second generation quarks, the sensitivity to V h, h cc¯ is significantly worse. Bounds are extrapolated in a recast → study in Ref. [54] to be 6.5 times the SM value (statistic errors assumed only). This is ∼ expected, given that BR(h b¯b) is 20 times larger than BR(h cc¯), that expected → ∼ → b-tagging is twice as efficient as c-tagging, and that the dominant background V bb(cc) in the relevant kinematic region is about the same order. An interesting proposal to search for h J/ψ + γ [55] does not seem to increase the observability for hcc coupling due to → too low an event rate [56, 57]. It is natural to ask to what extent one would be able to search for other hadronic decays of the Higgs boson. We here quote the updated calculations of the branching fractions for the 125 GeV Higgs boson decay hadronically in the SM [58]

BR(h b¯b) = 58.2%, BR(h cc¯) = 2.89%, (2.2) → → BR(h gg) = 8.18%, BR(h uu,¯ dd,¯ ss¯) < 0.03%. (2.3) → → While the decay rates to light quarks predicted in the SM would be too small to be observ- able, the decay to a pair of gluons, mediated via the heavy top quark, will be nearly three times larger than the cc¯ channel. The experimental signatures for those channels would be to search for the un-tagged light jet pairs jj, which form a mass peak near the Higgs boson mass mh. Obviously, the lack of a heavy-flavor tag makes background suppression difficult. However, we point out that the event sample so defined naturally exists and falls in to a class of mis-tagged events for h b¯b, cc¯ searches as well, that must be properly quantified → 20Due to the much cleaner experimental environment, a lepton collider such as International Linear Collider (ILC) [50] or a circular e+e− collider [51, 52], running at the Zh threshold or higher energies, will give us much better sensitivity to the hadronic decays of the Higgs. The expected accuracy on h → gg and h → cc will be 7% (2.3%) and 8.3% (3.1%) respectively, with the 250 GeV (1TeV) mission [50].

52 with respect to the mis-tag rates as the “contamination” to the genuine decays of the Higgs boson to light jets. In this work we set out to study Higgs decay to a pair of light un-tagged jets h jj, in → the associated production channel as in Eq. (2.1). We will exploit the leptonic final state decays of the electroweak gauge bosons, and employ a hadronic tag for the Higgs boson while optimizing the mass reconstruction. Evaluating the major sources of statistic (or systematic) uncertainties, we argue that a 1σ sensitivity of 1 (or 4) times the SM value can be achieved for the case where the Higgs decays to un-tagged jets. This is achieved with a judicious choice of kinematic discriminants and a combination of the final state channels. Together with h b¯b and h cc¯ studies, the un-tagged channel puts an independent → → dimension of bound in the space of branching ratios of Higgs decays to quarks and gluons. Assuming a well measured ggh coupling at the end of HL-LHC [48], the result further puts comparable but independent constraints on the light-quark Yukawa couplings. We also estimate that this channel may offer a better probe to the strange-quark Yukawa coupling. This paper proceeds as follows, Section 2.2 specifies the signal and dominant background processes. Section 2.3 describes and presents the detailed analyses and gives the main results in terms of the cut-efficiency tables and figures. In the same section, we also study how to control the systematic errors for the large backgrounds. Section 2.4 describes an alternate search strategy based on momentum balance discriminants. Section 2.5 calculates the signal sensitivity and presents obtained constraints on Higgs couplings to quarks and gluons in a correlated manner, while Section 2.6 summarizes and concludes.

Signal and Background Processes

As discussed above, the promising channel in which to study the Higgs decay to light jets is the associated production with an electroweak gauge boson W or Z, which subsequently de- cays to leptons. Depending on the production mechanisms and the final states, we consider the following subprocesses

qq¯ W ±h `±ν + jj, (2.4) → → ( `+`− + jj, qq,¯ gg Zh (2.5) → → νν¯ + jj,

where ` = e, µ and j = g or u, d, s. Practically, j is a gluon as expected in the SM. We thus generically denote the SM signal by V h(gg), whenever convenient. In our calculations, events are generated with MadGraph at the leading order, with “NN23NLO” as the PDF set. For the gg Zh process via the quark loops, we use Madgraph NLO [59] and Madspin → [60]. This channel contributes about 10% 20% to the total Zh production rate. We − apply an overall rescaling of QCD K-factors to the signal processes, to match the total

53 10−4

10−5 qq→ Zh

10−6 (fb/GeV) T(h) σ − → d 10 7 gg Zh d p

10−8

10−9 0 50 100 150 200 250 300 350 400 450 500 p (GeV) T(h)

Figure 2.1: Higgs boson transverse momentum distribution for the signal processes qq Zh (upper solid curve) and gg Zh (lower dashed curve) at the 14 TeV LHC. → →

NNLO QCD and NLO EW cross section results taken from summary of Higgs cross section working group [58]. The K-factors are about 2 and 1.2 for the gg and qq¯, respectively. We have included the finite masses for the fermions running in the loop in the gg initiated process. Some care is needed regarding the gg process because of its different transverse momentum (pT ) dependence and sensitivity to new physics contribution in the loop as discussed in Ref. [61]. In Fig. 2.1, we compare the Higgs boson transverse momentum distributions for the signal processes qq¯ Zh and gg Zh. The qq¯-initiated channel → → peaks at p 50 GeV, a typical mass scale associated with the final state particles of Zh. T (h) ≈ The gg-initiated channel peaks at around p 150 GeV, due to the top mass threshold T (h) ≈ enhancement. The differential cross section of gg drops faster than qq¯ with increasing pT (h), due to the destructive interference between the triangle and box diagrams. The Higgs is further decayed according to the branching ratios listed in Ref. [58]. Events are then showered and hadronized using PYTHIA6 [62], and run through DELPHES [63] for detector simulation and jet reconstruction. For the SM backgrounds, we mainly consider the dominant irreducible background process V +jj at LO, where the V decays and contributes accordingly to the three signal channels. At the generator level, we apply some basic cuts on the jets to remove infrared and collinear divergences for the QCD background processes

p > 20 GeV, ηj < 3,Rjj > 0.4. (2.6) T (j) | | The hadronic jets are reconstructed with anti-kt jet algorithm with a cone size R = 0.4. In our future analyses, we will be considering a relatively boosted Higgs recoiling off of the 54 σ (fb) cuts Eq. (2.6) + Eq. (2.7) + pT (V ) > 200 GeV qq¯ Zh `+`− gg 3.5 0.39 0.17 → → gg Zh `+`− gg 0.71 0.20 6.2 10−2 → → × qq¯ Zjj `+`− jj 2.5 105 1.2 104 4.8 103 → → × × × qq¯ W h `ν gg 20 2.3 0.99 → → qq¯ W jj `ν jj 2.5 106 1.0 105 3.9 104 → → × × × pp tt¯ `νjjb¯b 1.1 105 1.5 104 5.7 103 → → × × × qq¯ Zh νν gg 11 1.2 0.50 → → gg Zh νν gg 2.1 0.60 0.18 → → qq¯ Zjj νν jj 7.4 105 3.6 104 1.4 104 → → × × × Table 2.1: Cross sections in units of fb for signal and dominant background processes, with the parton-level cuts of Eq. (2.6), and boosted regions pT (V ) > 150, 200 GeV.

vector boson. Therefore, to improve the simulation statistics, we also add a generator-level cut on the vector boson

pT (V ) > 150 GeV. (2.7) In Table 2.1 we give the cross sections used for our signal and background processes including the basic cuts in Eq. (2.6) and with various pT thresholds for the vector boson.

The first is the total cross section with no pT (V ) cut, the second and third demand pT (V ) cuts of 150 and 200 GeV respectively. No cuts on the final state leptons are applied for the table.

Signal Selection

In further studying the signal characteristics in Eqs. (2.4) and (2.5), we categorize the channels according to the zero, one, or two charged leptons from the vector boson decays. In addition, the signal has two leading jets from the Higgs decay, with invariant mass of the

Higgs boson. At high pT (h), the distance between the two hadronic jets can be estimated as 1 mh Rjj p , (2.8) ≈ z(1 z) pT (h) − where z, 1 z are the momentum fraction of the two jets. The LO parton-level distributions − of three kinematic discriminants for the Zh channel, the transverse momentum pT (Z), the jet separation Rjj, and the di-jet invariant mass mjj, are shown in Fig. 2.2, comparing the signal (solid) and dominant background (dashed), after the generator-level cuts as in

Eqs. (2.6) and (2.7). Obviously, pT (Z) is singular for the QCD background as seen in Fig. 2.2(a). The two jet separation Rjj in Fig. 2.2(b) shows the either collinear feature from the parton splitting in the final state radiation (FSR) or back-to-back near π due to

55 p >150 GeV 30 p >200 GeV T(Z) T(Z) 3 Zjj 25 Zjj 2.5

20 2 (pb) jj 15 (pb/GeV)

1.5 σ d d R T(Z) σ

d 10 1 d p Zh[gg]×5000 0.5 5 Zh[gg]×5000 0 0 160 180 200 220 240 260 280 300 320 340 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 (a) p (GeV) (b) R T(Z) jj

5 p >200 GeV T(Z) p >150 GeV 1.4 4.5 T(Z) Rjj < 1.4 1.2 Zh[gg]×5000 4 3.5 1 3 0.8 jj R 2.5 (pb/GeV) jj 0.6 σ 2 d d m 0.4 1.5

0.2 1 Zjj 0 0.5 0 50 100 150 200 250 150 200 250 300 350 400 450 500 (c) m (GeV) (d) p (GeV) jj T(Z)

Figure 2.2: Kinematical distributions of the signal process pp Zh, h gg (solid curves, scaled up by a factor of 5000) and the leading background pp→ Zjj (dashed→ curves) for → (a) pT (Z), (b) Rjj, (c) mjj, and (d) event scatter plot in Rjj pT (Z) plane, with the (red) dense band with crosses as the signal events and (blue) dots as− the background. Generator level cuts of Eqs. (2.6) and (2.7) have been applied.

the initial state radiation (ISR) for the background process, and is narrowly populated near

2mh/pT (h) for the signal. The resonance bump near mh is evident as in Fig. 2.2(c). Because of the small rate, the signal curves have been scaled up by a factor of 5000. We also show an event scatter plot in Fig. 2.2(d), where the (red) dense band with crosses presents the signal events and the (blue) dots show the background events. We see the strong correlation between the boosted pT (Z) and collimated jets with smaller Rjj. To suppress the huge QCD di-jet backgrounds, we must optimize the reconstruction of the Higgs mass. There are two common methods to reconstruct hadronic decays of Higgs boson depending on the kinematical configurations. One is the sub-structure (fat-jet)

56 approach: an early example for Higgs search in b¯b channel was introduced in Ref. [53]. Because of the highly boosted nature of the Higgs boson, a fat-jet identified as the hadronic decay products of the Higgs boson is first selected. Various jet substructure observables and techniques such as mass-drop and filtering [53], pruning [64], trimming [65], N-subjettiness [66] etc. can be applied on the fat-jet to further improve the reconstruction of the invariant mass. The other approach is to simply resolve the leading jets. This is the common practice when the Higgs is produced not far from the threshold, and the Higgs is identified as the sum of the two leading jets. Experimentally, the anti-kt jet algorithm, given its regular jet shape, gives good reconstruction of hadronic jets, and is the default hadronic jet reconstruction algorithm used at ATLAS/CMS. The V h(b¯b) search at LHC is currently carried with the two resolved jet with anti-kt R = 0.4 method. In a recent analysis [67] the two methods are compared for the W h, h bb process for LHC14 in the kinematic region 200 GeV < → pT (h) < 600 GeV. The resolved approach is better in the 200 GeV < pT < 300 GeV range. The jet-substructure approach is significantly better in the pT > 600 GeV. The results are

qualitatively expected, since the high pT corresponds to a smaller cone-size of the fat-jet as argued in Eq. (2.8). Since the signal events tend to populate near the kinematic threshold, we will exploit the resolved method with two hard jets. However, additional QCD radiations from the highly energetic jets are not negligible. Kinematically, it gives a reconstructed di-jet mass peak smeared towards lower value. Some related effects including the NLO correction is studied in Ref. [68]. We thus propose a modification of the two-jet-resolved method by including possible additional jets in the decay neighborhood – a “resolved Higgs-vicinity” method.

After clustering the jets with anti-kt ∆R = 0.4, two leading pT jets are clustered as the “Higgs-candidate”. Then additional jets j0 are also clustered to the “Higgs candidate” in sequence of angular vicinity, whenever RHj0 Rmax. For the rest of the analyses, we choose ≤

Rmax = 1.4. (2.9)

The optimal method is to select events with two leading pT jets that satisfy Rjj Rmax, ≤ and add to the di-jet system any sub-leading jets within the distance Rmax. In practice, we find that including one additional hard radiation in the decay is sufficient. In Fig. 2.3 we compare several resolved-jet methods in their reconstruction of the Higgs mass, against the

V jj background. The central and hard jet requirements are p > 30 GeV and ηj < 2.5. T (j) | | In Fig. 2.3(a), we reconstruct the Higgs with the two leading pT jets and veto events with more than two central hard jets. As shown in the plot, the veto method removes the background most efficiently, the cut also reduces the signal significantly. Fig. 2.3(b) shows the 2jet-inclusive case, which is the same as (a) but does not veto additional jets. It improves the signal rate, but the signal mass peak is still smeared to the lower value. Fig. 2.3(c) is the “resolved Higgs-vicinity” method, which adds the additional hard jet, and sharpens the

57 12 jet-veto 12 no veto 12 no veto 10 10 10

8 8 8

6 6 6 (fb/GeV) (fb/GeV) (fb/GeV) jj jj jets σ 4 σ σ 4 d 4 d d d m d m d m 2 2 2

0 0 0 60 80 100 120 140 160 180 200 40 60 80 100 120 140 160 180 200 40 60 80 100 120 140 160 180 200 (a) m (GeV) (b) m (GeV) (c) m (GeV) jj jj jets

Figure 2.3: Invariant mass distributions mjj of the signal process pp Zh, h gg, Z `` (solid curves, scaled up by a factor of 5000) and the leading background→ pp →Zjj (dashed→ → curves) for (a) with 2 jets only, (b) with 2 leading jets to reconstruct mjj, (c) with 2 leading jets plus other jets together to reconstruct mjets. All selection cuts as in Sec. 2.3.1 except for mh cut are applied.

0.08 0.08 < µ >=0 < µ >=0 0.07 < µ >=15 0.07 < µ >=15

Normalized 0.06 < µ >=50 Normalized 0.06 < µ >=50 0.05 < µ >=140 0.05 < µ >=140 0.04 0.04

0.03 0.03

0.02 0.02 0.01 0.01

0 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300

mjj (GeV) mjets (GeV)

Figure 2.4: Invariant mass distributions constructed from (a) two-jet events and (b) three- jet events with different pile-up values µ = 0, 15, 50, 140, respectively. h i mass peak to help increase the overall S/√B sensitivity. We study the sensitivity to pile-up contamination of this reconstruction method. In Fig. 2.4, we compare it with the two jet resolved method adding pile-up samples in DELPHES. As expected, the additional-jet method is more sensitive to the pile-up jets, yet still retains a slight advantage even under pile-up value µ = 140 [69]. h i In the following, we describe the searches with the detailed signal and background anal- yses, for the channels with two, one and zero charged leptons, respectively. For simplicity, we use 2 jets reconstruction of the mass peak from now on.

58 cut eff (%) qq¯ Zh `+`−gg gg Zh `+`−gg qq¯ Zjj `+`−jj → → → → → → σ (fb) 3.9 10−1 2.0 10−1 1.2 104 × × × 2 leptons 59% 52% 40% 2 jets 51% 49% 32% ≥ 70 < mll < 110 50% 49% 31% pT (``) > 200 GeV 26% 23% 16%

Rj1j2 < 1.4 21% 12% 5.3% 95 < mh < 150 GeV 14% 7.6% 1.9% final (fb) 5.4 10−2 1.5 10−2 2.4 102 × × × Table 2.2: The consecutive cut efficiencies for signal `+`− jj and dominant background processes at the LHC.

`+`− + jj channel

For the two-lepton channel, we simulate the signal processes as in Eq. (2.5) with Z → `+`−, h gg. We require exactly one pair of charged leptons `± = e± or µ±, same flavor, → opposite charge, along with at least two energetic jets. The dominant background is by far from Z + jj. The two leading pT jets are required to be close by having a separation less than Rmax = 1.4, and an invariant mass between 95 and 150 GeV. They satisfy the following acceptance cuts

2 leptons with p > 30 GeV and ηl < 2.5 • T (l) | | p > 200 GeV • T (``)

at least 2 jets with p > 30 GeV and ηj < 2.5 • T (j) | |

Rj j < 1.4 • 1 2

95 GeV< mh < 150 GeV •

The di-jet mass window around mh is chosen to optimize the S/√B at HL-LHC. Table 2.2 shows the efficiency of applying the sequence of cuts. The overall efficiencies are about 14%, 7.6%, for the qq,¯ gg initiated signal processes, respectively, and about 1.9% for the background process. We would like to point out that from only the statistical sense, the signal sensitivity S/√B would not be notably increased from the generator level results to that with final cuts. However, the fact that the background is reduced by around two orders of magnitude helps to control the systematic uncertainties, as we will discuss later.

± ` + E T + jj channel

For the one-lepton channel, we look at signal process in Eq. (2.4) with W ν`, h gg. → → The dominant backgrounds are W + jj and tt¯. Similar to the last section, the acceptance 59 cut eff (%) qq¯ W h `νgg qq¯ W jj `νjj tt¯ `νjjb¯b → → → → → σ (fb) 2.3 1.0 105 1.5 104 × × ET > 30 GeV 94% 87% 93% 1 lepton 72% 52% 62% pT (`ν) > 200 GeV 39% 24% 26% 2 jets 35% 20% 22% ≥ Rj1j2 < 1.4 27% 6.8% 11% 95 < mh < 150 GeV 18% 2.5% 2.5% final (fb) 4.1 10−1 2.5 103 3.7 102 × × × ± Table 2.3: The consecutive cut efficiencies for signal ` ET jj and dominant background processes at the LHC.

cuts are

one lepton p > 30 GeV and η` < 2.5 • T (`) | |

p > 200 GeV, ET > 30 GeV • T (ν`)

at least 2 jets with p > 30 GeV and ηj < 2.5 • T (j) | |

Rj j < 1.4 • 1 2

95 GeV < mh < 150 GeV. •

The W transverse momentum pT (ν`) can be reconstructed from the charged lepton plus the missing transverse momentum ET . Table 2.3 shows the cut-flow at various stages of the cuts applied. The overall efficiencies are about 18% for the qq¯ initiated signal process, and about 2.5%, 2.5% for the W jj, tt¯ background processes, respectively.

E T + jj channel

The zero-lepton channel is studied with signal processes as in Eq. (2.5) with Z νν, h → → gg. The dominant background again mainly is Z + jj. Similar to the above, the cuts acceptance are

lepton veto with p > 30 GeV η` < 2.5 • T (`) | |

ET > 200 GeV •

at least 2 jets with p > 30 GeV ηj < 2.5 • T (j) | |

Rj j < 1.4 • 1 2

95 GeV < mh < 150 GeV. • 60 cut eff (%) qq¯ Zh ννgg gg Zh ννgg qq¯ Zjj ννjj → → → → → → σ (fb) 1.2 6.0 10−1 3.6 104 × × ET > 200 GeV 49% 44% 42% 2 jets 45% 43% 35% ≥ Rj1j2 < 1.4 36% 25% 12% 95 < mh < 150 GeV 23% 15% 4.5% final (fb) 2.7 10−1 8.9 10−2 1.6 103 × × ×

Table 2.4: The consecutive cut efficiencies for signal ET jj and dominant background processes at the LHC.

The ET is essentially from pT (Z). Table 2.4 shows the cut-flow at various stages of the cuts applied. The overall efficiencies are about 23%, 15%, for the qq,¯ gg initiated signal processes, respectively, and about 4.5% for the background process. Results presented in the above three sections have been double checked by other ap- proaches.

Background control

As calculated earlier and presented in the previous tables, the signals for h gg in the → SM associated with W/Z to leptons at the 3000 fb−1 HL-LHC may lead to sizable event rates, with about 200 events for the `+`− channel, 1300 events for the `±ν channel, and 1200 events for the νν channel, respectively. However, the difficulty is the overwhelmingly large SM background, with a signal-to-background ratio at the order of 10−4. As such, one must be able to control the systematic errors to sub-percent in order to reach statistically meaningful result. This is an extremely challenging job, and one would not be able conclude without real data to show the detector performances. On the other hand, there are ideas to shoot at the goal. Here we adopt one of the commonly considered methods and demonstrate our expectations.

For the two lepton and ET channel, the dominant background is the SM Z + jj produc- tion. With current selection, the two jet invariant mass spectrum is smoothly decreasing within a range of [60, 300] GeV and our signal region lies between 95 GeV and 150 GeV. Making use of the well-measured side-bands, the estimation of background contribution in the signal region could be obtained directly from a fit to the mjj distribution. We generated Z+jets samples with MadGraph generator corresponding to 10 fb−1 and passed the events through PYTHIA and DELPHES to simulate the parton shower and ATLAS detector effect.

We adopt a parameterization ansatz to fit the distribution in the mjj range from 60 GeV to 300 GeV

p2 p3 f(z) = p1(1 z) z , (2.10) −

61 400 450 Data-like 10fb-1 400 350 350 3 Param (0) 300 300 Events / 5 GeV Events / 5 GeV 250 250 200 200 150 150 100 100 50 e 50 2 0 0 −2

0 50 100 150 200 250 300 350 400 Significanc 2 2 m (GeV) mjj [GeV] 70 80 90 10 2×10 jj

+ − Figure 2.5: Invariant mass distribution mjj for Z(` ` )+jets at the 14 TeV LHC for (a) MC simulated events normalized to 10 fb−1, and (b) fitted spectrum from three-parameter ansatz function in Eq. (2.10) range from 60 GeV to 300 GeV (solid curve).

×103 12000 120

10000 100

8000 80 Events / 5GeV Events / 5GeV

6000 60

4000 40

2000 20

0 0 100 150 200 250 300 100 150 200 250 300

mjj (GeV) mjj (GeV)

Figure 2.6: Generated distribution from three-parameter ansatz function in Eq. (2.10) for −1 −1 mjj with (a) 300 fb , (b) and 3000 fb (right).

where pi are free parameters and z = mjj/√s. This ansatz is found to provide a satisfactory fit to the generated Z+jets MC simulation at 14 TeV, as shown in Fig. 2.5. In order to estimate the uncertainty of background determination for 3000 fb−1 inte- grated luminosity, we take this three-parameter function in Eq. (2.10) as the baseline to generate the data-like spectrum following Poisson fluctuation. Figure 2.6 shows the gen- erated spectra for 300 fb−1 and 3000 fb−1. We fit these spectra with three-parameter, four-parameter and five-parameter functions within the range of [60, 300] GeV but exclud- ing the signal region [95, 150] GeV. The fitting results and uncertainties are summarized in Figure 2.7 and Table 2.5. Besides the three-parameter function, four-parameter and five-parameter functions are tested as below

2 p2 p3+p4 log(z) p2 p3+p4 log(z)+p5 log (z) f(z) = p1(1 z) z , f(z) = p1(1 z) z . (2.11) − − 62 ×103 Data-like 300fb-1 Data-like 3000fb-1 12000 5 Param 120 5 Param 4 Param 4 Param 10000 3 Param 100 3 Param

Events / 5 GeV 8000 Events / 5 GeV 80

6000 60

4000 40

2000 20

2 2 2 10 2 10 mjj (GeV) mjj (GeV) 0 0 −2 −2 Significance Significance 2 × 2 2 × 2 70 80 90 10 2 10 mjj (GeV) 70 80 90 10 2 10 mjj (GeV)

Figure 2.7: Fitted results for 300 fb−1 (left) and 3000 fb−1 (right).

Background 300 fb−1 3000 fb−1 Expectation 8.29 104 8.26 105 × × 3-parameter (8.39 0.05) 104 (8.28 0.01) 105 4-parameter (8.38 ± 0.05) × 104 (8.27 ± 0.01) × 105 5-parameter (8.39 ± 0.04) × 104 (8.29 ± 0.01) × 105 ± × ± × Uncertainty 1.32% 0.21%

Table 2.5: Fitted results for the background rates from various fitting functions as in Eqs. (2.10) and (2.11).

We also vary the fitting range from [60, 300] GeV to [70, 250] GeV and [80, 200] GeV to test the stability, which are summarized in Table 2.6. If we consider the variation due to this fitting range as another source of systematics, the uncertainty of background estimation of Z(``)+jets for 3000 fb−1 is 0.33%. The uncertainty considered here includes the fitting uncertainty, fitting function variation and fitting range variation, which is largely depending on the statistics of side-band region. The background uncertainty from fitting is dominated by the statistics of side-band regions, which is proportional to the background yield. To the first-order estimation, the uncertainties of Z(νν)+jets and W (ν`)+jets are comparable at the order of 0.1%. We thus summarize the systematic percentage uncertainties for the three leptonic channels as

Z(`+`−) + jj : 0.33%; W (`±ν) + jj : 0.10%; Z(νν) + jj : 0.13%. (2.12)

As seen for example in Table 2.3 for the one-lepton channel, the tt¯ background is sub- dominant yet not negligible. There are other smaller and non-negligible processes such as semi-leptonic decays of di-boson, which are not included in our current studies since they would not change our conclusions. Full simulation and control shall be required on all

63 3000 fb−1 True [60, 300] GeV [70, 250] GeV [80, 200] GeV 3-param. 8.26 105 (8.28 0.01) 105 (8.26 0.03) 105 (8.27 0.05) 105 × ± × ± × ± × Table 2.6: Fitted results for the background rate from various fitting ranges by the fitting function in Eq. (2.10).

the relevant processes once the data is available. For our purpose of estimating the signal sensitivity, it suffices to say that the di-jet invariant mass distribution for backgrounds is smooth in the signal region, fitted with simple functions as done above. Since the subdom- inant backgrounds are statistically much smaller compared to the V jj process, they would not affect our final results and conclusion.

Alternative Discriminants with Missing Energies

We note that a momentum balance discriminant has been proposed in Ref. [70] as a useful kinematic variable in processes where a new resonant particle is produced in association with a SM vector boson radiated in an initial state, pp R + V . The transverse momenta → of these states should balance pR pV = 0. (2.13) T − T Due to detector effects and radiation, the measured momentum balance is not perfect and it is particularly more severe for the background since the QCD processes tend to have larger radiation. This is a useful kinematic discriminant between the signal and background [70]. However it is not applicable whenever there is missing energy in the event. In fact, the definition of the missing transverse energy in an event is the negative of the vector sum of the visible pT . In the above example it offers only a tautology for the momentum balance discriminant. We offer, in the case of events with significant missing energy, a new discriminant to capture the kinematic features of the event. We define this discriminant by calculating the scalar sum of the transverse momenta of the visible particles in the event, and then subtracting the missing transverse energy

T vQ Σi pT i ET . (2.14) ≡ | | − | | This is a version of a momentum balance discriminant, referred as T vQ (Transverse event Quality). Since the missing momentum in an event is defined by the negative of the vector sum Σi~pT i , the quantity T vQ is the difference between the scalar and vector sums of the | | visible pT in the event. T vQ tends to be small when the observable particles are a highly collimated collinear bunch, while it takes a large value when the observable particles spread out and when R + V production is near the kinematical threshold. It would be more intuitive to look at the signal and background in a two dimensional 64 +

+ signal 800 + ● background ++

+ ++ ● ++ + + 600 + + + + + ● + + + + + + + + + ● + + + + + + + +++ ● + + + + + +++ + + ++ + + ● + ● + + ++ + + + + ● + + + + ( GeV ) + + + + + ● + +● ++ + + + + ●++ + + +++● + ++ + 400 ++ ++ ++ + + + ● + +++++ ++ ++ + ++ ++ +++++ ● ++ + + + + +++ ++++● ● + T + ++ ++ + + + ++● ++ + ● + ++ ++++++ + ++ ++ ● +●+++ + + + + ++●+++++ ++ ++ + ● p + + ●++ ++● ● + + +++++ + +++++ ++++++++++++++● ++++++ + ++ + ++++ ++++ ++ ● +●●●+ + ++● +++++ +++ ++● + + +● ● ●++ +++++++++++++● +++ + ++ + +++ +++● ++++++++++++++++● ++ +++++++++ ● +●+●+++ + ● ++ + ++ +++● +++ ++++●++++++++● + ++ + ++ + ● ++●+ ++● +++ + + + ● +++++++++++●++++++++●++●+++++ + + + +++++++++++++++++++●++++++●+++++++● ++ + + + + + ++++++●+●+●+● ++++++++●● +● +●+++ ●++++ + + + + +++++●●++++● +++++++++++ +++● +● ● + + ● ++ ++● ++++++++●++++++++++++ ● ●++ ++ + + + ●+●+++++++++++●+●+++++●++++++++●++●++ + + + +++ ++++ ++++● ●++++ ++● ++++++++● ++●++++++● + + + + +++++●+++●●++++ ●+●++●++++ + +++ + + ● + +●● ++●++++++++●+● +++++●++++++● + + + + +●++++++++++++++ +●+++●++●+++++++++● ++ + + + + +++++++++++++++● +++++●+++++++● +●+● ++ ++++++ + ● ++● +●++++++++++++++●++++●++++●+++++●++++●++ + + + ●+ +●+++●++++++++++++++●●++++●++++++●++++++ ++++●+● + + + ●++ + +●++++++●++++● +++●++++●+++●+++++++● +++● +●+● + +● + + + ++ + +++++++++●++++++++++●+●++++++●+++●+++++++++++++ + + + + + +++++++●+++++++++++++++● ++++●++++●++++ ++● + ● ++ ++++++++++++++++++++++●●++●●+++●●●+++● +●++● +++● ● ●++++ + + + + + +++++++++● ++++++●++++●+++++●++●+++++●+++●+++++++● ++++++++●++●+ + + + ++++ ● +++++++++●+++● ++++●+●++++●++++++++●++++++++ + +++ + + + +● + +++++++●+++++++●+++●+●+++●++++++++●+++++++●++● +++● ++ + ● ● + +++++++++●+●++++++++++●+●++●++++++++●+●●++●++++● +++++++●+● +●++++ + + + ++++++ + +++++++●++●++++●++●●+++++++++++●+++● ++++●++● ++ ++ + + + +++ +++++●●+++●++++●+++++●●++●++●●++++++++++++++++++●++++●+ +● + + + + ++ ++++●+++++●++●+● +●++●+●+++++++++++●+++●+++●+●++++●+●+++++++++++++ 200 ++ ++ + +● ● +++++++●+●++●●++++●+++++++● ●++●+●+● ++++●++++++++++++ ++ ● +++●++++++++++++● +●+●++●++++●+●++++++++●+++●+++●+++●+● + ● +++ ●+ ++ ● + + ++++● ++●++●+++++++●+++++++++●++●+●+●+++●+●+++++●+●++++++++++●+++●++++++++++++ + + + + + +++++●+++++++++++++●++●++●++●++●++●++●++●+●+●+++●+●+++++●+++●++++++●●● +++● ● +● + + + ++ ++++●+++++++++++++●++●++++●+●+++●+●+●++●++●++●++●●+●++++●●++●++●+●++++●+●+++●+++●+● +++++++● + + ++ + +++++ +++++●++++●●++● +++●+●+●+++●++●++++++++●+++●+●+++++●+++●++●●+●+++++●●+++●+●+●++++ ++ + + + + ++++ ++++++++++++●+++++●+++++●●++●++++++●++●●+++●++●+●++++++●++++++● +●++++++++●● ● +● + ++ + ++ ++++++++●++++++● +●++++++●+●++●●+●+●+++●●++●●++●++●++++●●+●+●+●+●+++++●+●++●++++●++++++++++ ++++ ●+ + + + +● ++++++●++++++++++++++●●+●+●++++●●++++++●++●+●++●●++●+++●●+●++++++●+●+●+●+●++++++●++++++ + + + + + ● ● +++++++++++●++●+++++●+●++●+++●●+●+●++●+●+●++●+●+●+++●+++++●+●●+●+●+●+●●+●++++++++●++●++++++●●●++++ + ++ ++++● +●++++●++●++++●++●●+●+●++●++++●+++++●++●+●++●++++●++++++●++●++●+● ++● ●+● ++++++++ ++ ● ● + ● + +● ++++++ ++●+++++●+●+●++++● +●++●●+++●+++++●++●●+●+●+●++●+●+●●++●+●●++●●++●+●+++++●++●+●++●+++++ + ● + + + ++ ++ +++● +●●++●+++++●++++++++●+++●+++●+●++●+●+++●+++●++● +●+●++●+●++●+●++●●+++●+++++●+●++●+●●+●++● ++ ● +● ++++++●++●+++●●+●++++●++●++●+●+●+++●+●●++●+●●++●++●+●+●●+●+●●++●+++++●++●●++●+●+●++●+●+●●++●+++●+●+++++●++++ + + + ● +● + +●+ +++●+●+++++●+●+●++●++●++●++●+●+●+●++●+●+●+●++●+●●+●●++●++●●++●+●+●+●++●●+++●+●++●+●+●++●●+●+++++●+++●++●++++++●+++++++++++ + +++ + +++++++++++●+++●●+●+●++++++●+●+●+●●+●●+●++●+●++++●+●+●●+●+●++●●●++●++●+●+●+●+●●+●●+●+●●+●+●+●+●++●++●+●+++++ ● ++++ + ++● +++++● +● ●+++++●++●+●+●●+++●+●++●+●+●++●++●+●●+●+●++●+●+++●+●+●+●+●+++●●+●+●+●++●+●+●●++●+●++●++++++●++●++●+●+++++●++++ ++ + + ++ ++ ● ++● ● +++++●+++●+●++++++++●●++●●+●+++●++●●+++●+●+●++●●+●+●●+●+●●++●+●●+●+●++●●●+●++●●+●+●++●+●●++●●+●+●++●+++++●+●+++●++●+++++++ + ● ● + + ● ++●++●++●+++●++++++++++●++●++●+●●+++●+++●+●●+●++++●+●●+●+++●+●++●●+●+●++●●●+●+●+●++●●●+●++●+++●●+++●+●+++●●●●+++●++●+●●+●++++++++++ +++ + + + + ●+ + +++●●+++●++●+++●+●+++++++●+●+++●+●+●+●●+●+●●+●+●+●●●+●+●+●+●+●+●+●+●●+●+++●+++●+●●+++●●+++●+●●+●+++++●+●+++●●+++●●++●+●●++● ●+ + + ● ++ ● +● +++++●+++++++++●+++●+●+●+●●++●++●+●+++++●●+●+●●+●++●+●+●●+●+++++++●+●●+●+●●+●+●+●+●●+●●+++●●++●+●●+++●+●●++●+●●++●++●++++++● ++●++ ● + + +●++++++++●++●++●●● +●+●+●++++●+●++●●+●+++●+●+●+●●+●●●++●+●+●●+●●+●++●+●+●●+●●+●+●+●●+●+●+●●+●●++●+●+●●+●+●●+●●+●+●++●+●+●●++●+●+●++++●+●●+++● ++++● +++● ● ● ● ● +++++++++++● +++++++●●+●++++●●+●+●●+●+●●+●+●●++●+●+●+●+●+●+●+●●+●●++●++●+●●●++●+++●●++●+●++●●+●●++++●+●+●+●+●●++●+●+●++●++++++++++++●+ ● + + + +++ ++● ●+++● ++++++●+●+●+●++●●++●++●+●+●+●+++●●●+●+++●+●++●++●●+●●+●+●●+●+●●+●●+●+●+●+●+●++●++●●+●●+●●+●+●+●+●+●●+●●+++●++●+●+●++++●●+●+●+● + ● + +● +++● ● + + ● ● ● ++ +●+●++++●●++++● +●●++++++++●+●●+++●+●●++●+●●+●+●+●+●+●+++●●●+●●+●●+●+●●++●●+●●+●●+●●●+●+●++●+●●++●+●●+●●+●●++●+●●+●●+●●+++●++●+●+●+●●+++++++++●++●+● + ● + ++ +● + ● ● +●++●++++●+● +●+●●●+●+●+●++●●+●+●+●+●++●+●+●++●++●●+●+●●+●+●+●●+●+●+●+●+●+●●++●+●+●●●+●●+●+●+●●+●+●●+●+●+●+●+●●+●●++●●+●+●+●+●+++++●+●+++● ++● +● ● + + ++++ ++ +++++ ●●++●●●●●++●+●++++++++●++●+●+●+●●●+●●+●++●+●●+●+●●+●+●●++++●+++ +++●●+++●●+●●+●+●+●++●+●●+●+●●+●●+●+●+●+●+●+●●+●+●+●+●+●+●+● ++●●+●●++●+●+●+●●+●+●●+●+++ + + ++++ +● +●++++++●●+++●+●+++●●+●●+●+●●●+●+●+●●++●+●●+●●++●+●●+●++●●+●+●+●+●●●++●●+●●●+●+●+●+●●+●●++●●●+●●+●+●●+●+●●++●+●●+●●●+●+●+●+●●●+●●+●+●+●+●+●●+●●++++●●+●+●++● ++● ++ +● ● + + +●●+● ●●++●+++●●+●+●●++●+●+●+●●●++●++●++●+●+●+●++++●+●+●●+●●+●●●+●●+●+●●+●+●●●+●●++●+●+●+●●●+●+●●+●+●●●+●+●●+●●●+●+●●+●+●●+●●+●●●+●+●++●●+●●+●+●+●+●●+●+●+●++●+●●+●+●+●●++●++●+●●++++ + + + + ●+ + ++ ● +++●+++++++●+●++●●+●+●++●+●+●++●●+●+●●●+●●+●●●+●●+●+●+●+●+●+●++●●+●+●●+●●+●+●●●++●●+●●+●+●●+●+●+●●●+●●+●●●++●●+●●+●●+●+●++●+●+●+●+●+●+●+●●+●+++●●●++●+●+●+●++●●++●+●++● +●+● ++● ● + ● + ++++●+●● +●++●● ++++●●●++●●+●+●●+●●++●●+●●++●+●++●+●++●●+●●+●●+●●●++●++●●+●●+●+●+●++ +●●●+●●+●+●●+●●+●+●+●●+●●+●●++●●●+●●●●+●++●●+●+●●+●+●+●●●●+●+●+●+●●++●+++●+●●+●++●+●●+●●++●●+++●++++ + + ● ● +● ++●●++●+●+●++●++●●●+●+●++●+●+●+●●+●+++●+●+●+●●●+●+●●++●●+●●●+●●+●●+●+●●+●●+●●+●●+●●+●+●+●+●●+●+●+●+●●+●+●●+●●+●●+●●+●+●+●●●+●●●+●●+●●+●●+●●+●++●++●+●●+●●+●+●●+●+●++●+●●+●●+●+++++●+● ●+ + + + +++ +● +● ++●+●+●●+●+●+●+● ●+●+●+++●●+●++●+●●+●+●●+●●+++●●+●●+●●+●●●+●●+●●+●+●+●●+●+●+●●+●●●+●+●●+●●+●●+●++●+●+●●+●●●+●+●●●+●+●●+●+●●+●+●●+●+●+●+●●+●+●+●●●+●+●●++●●●++●+●●+●+++●+●●+●+●++●●+●++● ●+++++++● + ++ + ● +●+●+●● ●++●●●++●+●●++●++●+●+●●●+●+●●+●+●●+●●+●+●●+●●+●●●++●●+●●+●●+●●●+●●+●●+●●++●●+●●++●●+●●●●++●+●+●●+●+●●●+●+●●+●●●+●●+●+●●+●+●●+●++●●++●●●+●●+●●+●+●●++●+●●●++●++●+●+●+++++●●+● ●+ ++●+ ● ++++●++●+●+●+● ●+●●++●●+●●+●+●++●●●+●+●●+●++●+●++●++●+●●+●+●+●●●++●+●●+●●●+●●+●●+●●+●●+●●+●●+●●+●●+●●+●●+●●+●●+●●+●+●●+●●+●●+●●+●●+●●+●+●+●●+●●+●+●●+●+●+●+●+●●+●++●●+●+●+●+●+●●+●+●+●●+●+●+● ++++● + ● + ●+● ● +++++++●+●++●+●●++● ●+●●+●+●+●●+●+●●+●●+●●+●+●+●++●++●+●+●+●●●+●●+●●+●+●+●●●+●●+●+●●+●+●●+●●+●+●●+●●+●+●+●●+●●+●+●●+●+●●●++●+●●●+●●+●●●+●●+●●+●●+●+●●+●+●●●+++●●+●●+●+●+●●●+●●●+●+●+●++●+●●+●+●●+++●++●● +●++++● ++ ●●● ++●+++++● ●+●+●++●+●+●+●●+●++●++●●●+●+●+●●●+●●+●+●●+●●+●+●●+●●●+●+●+●●+●●●+●+●●●+●●●+●●+●+●●+●●+●●+●+●●+●●+●●+●●●+●+●●●+●●+●●+●●+●●+●+●●+●●●+●+●+●●+●●+●+●●+●●+●●+●●●+●●+●●+●●+●+●●++●+●●+●+●+●+●●+●+●+●+++●++++++++● + ++ ● + ++● ● ●++++●●●+●+●+●++●●●++●+●●++●+●+●+●●+●●●+●●+●●●+●+●++●+●●+●●+●●+●●+●+●+●●+●+●●+●●●+●+●●+●●+●+●●+●+●●+●●+●●+●●+●●+●●●+●●+●●+●●+●●+●+●●+●●●+●●+●+●●●+●+●●●+●●+●+●+●+●+●●●+●●+●+●+●●●+●++●+●+●●+●+●+●++● + +● ++ ● + ● ●++●●++++●●●●+●++●+●+●+●●+●+●●+●●+●+●●●+●++●●+●+●●●+●●●+●+●●+●●+●+●●●++●●●+●●+●●+●+●●+●●+●●●+●●+●●●+●●+●●+●●+●+●●+●●+●●+●●+●●●+●+●●●+●●+●●+●+●+●●+●●+●●+●●+●●●+●+●●+●●●+●+●+●●+●●●●++●●+●●+●+●+●●+●+●+++●+● +● ++ ++++ + + + ++ +●●● ++●++++● +●+++●●+●●++●●+●+●●●+●●+●●●+●●+●+●+●●●+●●+●+●●●+●●+●●++●+●●●+●●●+●●+●●+●●+●●+●●+●●●+●●+●●+●●●●+●●+●●●+●●+●+●●+●●●+●●+●●●+●●●+●+●●+●●+●●+●+●●+●●+●+●●+●+●●+●●+●+●++●●+●+●●+●●●+●+●●●+●+●●+●+●●●+●●++●+●++●++ + ● ++ ●●●● ++●●● ++++●++++++●+●●●+●●●+●●++●●+●●++●●+●●+●●+●+●●+●●●+●+●●+●●+●+●+●●●+●+●●+●+●●+●●+●●●+●●+●●+●●+●●+●●+●●●+●●+●●●+●●+●●+●●+●●+●●+●●+●●+●●+●●●+●●+●●+●●+●●+●●+●+●●+●●●+●●+●+●●+●+●●+●+●●+●++●●+●+●+●●●++●++●+●+●+●+●●●+●●+● ● + ● ● + + ●●● ● +● ●+●+++●++●++●●●+●+●●●+●+●●●●+●●+●●+●+●●●+●●+●+●+●●+●●+●●+●+●●●+●+●●●+●●+●●●+●●+●●+●●+●●+●●●+●●+●●●+●●+●●+●●●+●●+●●+●●+●+●●●+●●+●●+●●+●●+●●+●●+●●●+●+●●●+●●+●●+●+●●+●●●++●●++●+●●+●●+●●+●++●●++● ●+●++●●●●●●●++●+● +●++ ● + + ++ ●●+●●●●++●+● +●+●++●●+●●+●+●●●+●+●●●+●+●●+●●+●●●+●+●●●+●+●●●+●●+●●+●●+●●●●+●+●●+●●●●+●+●●+●+●●+●++ ●●+●●●+●●●+●●+●●+●●+●●●●+●●+●●+●●+●●+●●●●+●●+●●+●+●●+●●+●●+●●+●●+●●+●●+●●●+●●●+●●+●●+●●●++●+●●+●●+●●+●●++●●+●+●●●++●●+●+●●+●●●+●+●●+●+●●+●+●+●+●●+●+● +● +● ++ + ● + +●●●+●● +● ●++●++++●+●+●●++●●●+●+●+●●+●●+●●+●●●+●●+●●●+●●+●●+●●+●+●●●+●+●●●+●●+●●+●●●+●+●●●+●●●+●●+●●●+●+●●+●●+●●+●●+●●●+●●●+●●+●●+●●●+●●+●●+●+●●●+●●+●●+●●●+●●+●●+●●+●●+●+●●●+●+●●+●●●+●●+●+●+●●●●+●●●+●++●+●●●●●●+● ++● ++ +● ●++ + +●●●●●●●+●+●●+●●++●+●●+●+●●++●●+●+●+●●+●+●●●●+●●+●●+●●+●●+●●+●●+●●●+●●+●●+●●+●●●+●●+●●+●●+●●+●●●+●+●●●+●●+●●+●●●+●●+●●+●●+●●+●●+●+●●+●●+●●●+●●+●●+●●+●●●●+●●●+●●+●●+●●+●●●+●●●+●+●●+●●+●●+●++●●+●+●●+●●●+●+●●+●●+●●+●+●●+●●●+●+●●+●+●●●●●● ●●● +● + + +++ +++●●+●●●●●●●+● ++●+●●+●+●+●+●●+●●●+●+●+●+●+●●●+●+●●+●●+●●+●●+●●+●●+●●●+●+●●●+●●●+●●●+●●+●●+●●+●●+●●+●●+●●+●●●●+●●+●●+●●+●●+●●●+●●+●●●+●●+●●+●●+●●●+●+●●●+●●●+●●+●●+●●+●●+●●●+●●+●●●●+●●+●+●●+●●++●●+●●●●+●●+●+●●●+●●+●●+●● +●+● + ● ++ +++ ++●●+●+●●+++●+●+●+●+●++●●+●●++●+●●+●●+●+●●+●●+●●+●●+●●+●●●+●+●●●+●●●+●+●●●+●●●+●●●+●●●+●●+●●+●●+●+●●●+●●+●●+●●●+●●+●●+●●+●●●+●●+●●+●●+●●++●●+●●●+●●+●●+●●+●●+●●+●●+●●+●●●●+●●+●+●+●●●+●●+●●+●●+●●●+●●+●++●●+● ●●++●● +++●+ + ● + +++ ++●●+●●●●●+●+●●++●+●+●+●+●●●+●++●●+●●+●●●+●●+●●+●+●●+●●+●●+●●+●●●+●●+●●+●●+●●+●●●+●●+●●+●●+●●●+●●+●●●+●●●+●●+●●●+●●+●●+●+●●●+●●●+●●+●●+●+●●+●●●+●●+●●+●●●+●●+●●●+●●●●+●●+●+●++●●●+●+●++●●+●●++●●+●+●●++●+++● ●●+●● + ++++ ++●+●●●+●●+●+●++++●+●●+●+●+●●●+●+●●+●+●●●+●●+●●+●●+●+●●+●●●+●+●●+●●●+●●●+●●+●+●●●+●●+●●●+●●+●●+●●+●●●+●●+●●●+●●●+●●●●+●●+●●+●●●+●●+●●●●+●●+●+●●+●●+●●●+●●+●●●++●+●●+●●●●+●+●●+●●+●●++●●●+●●+●+●●+●●+●●+●●+●+●+●●+● ++ + ● ● + +++++ +●●+●●●●++●+●+●+●++●●++●+●●+●+●+●●+●●+●●+●●+●●+●●●+●●+●●+●+●●●+●●+●●+●+●●●+●●+●●●+●●+●●+●●●+●●+●+●●+●●+●●●+●●+++●●+●●+●●+●●●+●●+●●●+●●●+●●+●●+●●+●●+●●●+●●●+●+●●+●●+●●●+●●+●●●●●●●+●●+●●+●●+●●+●+●+●●+●● ●● ●●+●+ +● +++++++●+●●●+●●●●+●●++●+●●+●++●●●+●+●●●+●+●●●+●●+●●+●●+●●+●+●●+●●+●●+●●●+●●+●●●+●●+●●+●●+●●●+●●+●●●+●●+●●●+●●+●●+●●●+●●+●●+●●+●●+●●+●●+●●+●●+●●+●●+●●+●●+●●+●●+●●+●+●+●●+●●+●+●++●●++●●+●+●+●●+●++●●●+●+●++●●++●●+●++●++●+ + ++++++++++●+●●●●+●●+●●+●●+●●+●+●●+●+●+●+●●+●+●●●+●+●●+●●+●+●●+●●+●●●+●●+●●+●●●+●●+●+●●●+●●+●●+●+●●+●●+●●●+●●+●●●+●+●●+●●●+●●●●+●●+●●●●+●+●●+●●●+●●●+●●●●+●+●++●●+●●+●●+●●+●●+●+●●+●+●+●●+++●+●+●+ ++● + + ● ++++++++++●●●+●●+●●+●++●●+●●+●+●●+●●+●●+●●+●●+●+●●+●●+●●●●+●+●●+●+●+●●●++●+●●+●●●●+●●+●●●+●●+●●+●●+●●●+●●+●●+●●●+●●+●●+●●+●●●+●●+●●+●+●●+●●●+●●+●●●++●●+●●●+●●+●+●●●●+●●●+●●+●+●●●●●+●● ● ++● ++● +++++++++++●●●+●●●+●●++●●●+●●●++●+●●+●●+●●+●●●+●●+●●+●●+●●+●+●●+●●+●●+●●+●●+●+●●●●+●●+●●+●●●+●●+●●+●●●+●+●●●+●●●+●●+●●+●●+●●+●●+●●●+●●+●●●+●+●+●●+●+●●+●●●●+●●●+●●+●●+●●●●●+●+●+●+●+●● +● ●+● ● +++++++++++●+●●+●+●●++●+●+●+●●+●+●●+●+●●+●●+●●●+●●+●●+●●●●+●●+●●●+●+●●+●●+●●●+●●●+●●+●●+●●+●●●+●●+●●+●●+●●+●●+●●+●●●+●●+●+●●●●●+●●+●●●+●+●++●●+●●+●●+●●●+●●+●●++●●+●●●●●+●● ++● ● +++++++++●+●●+●●+●●+●+●●+●+●●++●+●+●+●●+●●+●●+●●+●+●●++●●●+●●+●●+●●+●+●●●+●+●●●+●●+●●+●●●+●●●+●●+●●+●●●+●●+●+●●+●●●●+●●+++●●+●+●●●+●●●●++●+●●●●●++●●●+ ● ++++++++++++●●+●●+●●+●+●+●+●●+●+●+●+●●+●●+●+●+●●+●+●●+●●+●+●●+●●●+●●●+●●+●+●●+●●●+●●+●●+●+●●+●+●●●●+●●+●●+●●●+●●●+●+●+●+●+●++●●+●+●●● ●● +● ●● ● ++++++++●●+●●+●+●●++●++●●+●+●●●●++●+●●+●+●●+●+●++●+●●+●●+●●●+●+●●+●●+●+●●●+●+●+●+●+●●●+●●●●+●++●+●+●+●●+●●●●+●● ●● ● ●+ ++++++++++++++●+●●+●+●++●+++●++●+●+●●+●+●+●+●+●+●●+●+++●●+●+●+●+●●●++●●+++●●++●+●●●+●+++++●+● ++ 0 +++++++++++++++++++++++++++++++++++++++++●+++++++++++ ++ + -200 -150 -100 -50 0 50 100 TvQ (GeV)

Figure 2.8: Scatter plot of 10000 events for the signal (blue crosses) and background (red dots) in the visible pT T vQ plane. −

space of discriminants. Consider the ET signal from pp Zh νν gg. We plot the event → → population in the p T vQ plane as shown in Fig. 2.8. We see that in the signal sample T (jj) − (blue crosses), regions of large visible pT correlate with the zero value of T vQ. Events with high boost, and therefore columnated Higgs decay products, correlate with lower values of T vQ as predicted. The QCD background sample Z+jets (red dots), on the other hand, tends to further spread out. Another simple discriminant, somewhat correlated with T vQ for the Zh final state is a transverse angular variable, φZh defined as the angle between the missing transverse energy vector and the vector sum of the visible pT . This is clearly motivated since we expect the Z and h states to be nearly back to back in the event, in contrast to the QCD multiple jet events. We examined the selective cuts ( 30 GeV < T vQ < 10 GeV) or − (π 0.5 < φZh < π + 0.5) and found them effective in separating the signal from the − backgrounds. In exploiting more kinematical variables in some treatment like Boosted- decision-Tree technique (BDT) or Neural Networks (NN), those discriminative variables may be taken into consideration.

65 + − ± σ (fb) ` ` + jj ` + ET + jj ET + jj combined V h signal 7.0 10−2 4.1 10−1 3.6 10−1 × × × V jj background 2.4 102 2.5 103 1.6 103 × × × 0.25 0.61 0.49 0.82 S sys 0.09 0.17 0.17 0.26 S Table 2.7: Signal significance achieved from each channel and combined results for both statistics and systematics dominance.

Results and Discussion

Signal significance

As we see from the cut-flow Tables 2.2-2.4, the V jj backgrounds are dominant. We calculate the signal statistical significance as

Nsig = p , (2.15) S Nbkg with the statistical uncertainty of the dominant background as the only uncertainty. The combined significance of the V h(gg) signal is shown in Table 2.7. The three leptonic channels from the V decays give comparable contributions. The two-charged-lepton channel has the smallest signal strength, but is cleaner in signal identification. The one and zero-charged- lepton channels show good reconstruction and contribute better sensitivities. Adding the 0, 1, 2 charged-lepton channels, the pure statistical estimation gives a 0.82σ significance, which indicates how challenging an observation of the SM V h(gg) signal could be. When the signal rate and S/B is small, one must worry about the systematic uncer- tainties for the measurements. As discussed in length in Sec. 2.3.4, we rely on the precision side-band fit to control the systematics in the signal region near mjj mh. If B is the fitted ∼ background percentage uncertainty, we then assume the systematic error to be B Nbkg. × We thus present a different significance dominated by the systematics, defined as

Nsig sys = , (2.16) S B Nbkg × −1 As shown in Sec. 2.3.4, with 3000 fb of data and mjj signal mass window taken as

95 150 GeV, we have B = 0.33%, 0.10%, 0.13% for the two, one and zero lepton channels, − respectively. The results with this significance estimation are also shown in Table 2.7. The outcome is worse than the statistical-error-only treatment. We would also hope the further reduction of non-statistic uncertainties with more dedicated background fitting schemes, once real data is available from experiments.

66 Bounds on the branching fractions and correlations with h b¯b, cc¯ → The interpretation of these results to bound on individual Higgs decay channels needs further discussion. Thus far, we have only simulated h gg as the Higgs decay channel, → since it dominates the SM branching fraction of the Higgs decay to light jets. Practically, however, contributions from mis-tagged h b¯b, h cc¯, and possible light-quark pairs are → → all accumulated in the events and should be taken into account correlatively. Thus, the signal we have been searching for in this study really is h j0j0 where j0 is an “un-tagged → jet” including possible b, c and j (g, u, d, s) contributions. Listed in Table 2.8 are the working points for the tagging/mis-tagging efficiencies as- suming that different observable event categories listed as different rows are un-correlated.

For instance, a b quark will be tagged as a b with a probability of bb = 70%, and mis-tagged 0 as a c and an un-tagged j with cb = 13% and j0b = 17%, and so on. Here the subscript a denotes the jet-tagged flavor category, and i denotes the parton as the source channel. The numbers are the same as in Category “c-tagging I” of Table 1 in Ref. [54], as reason- able estimates for the experimental performance at the 14 TeV LHC, and for consistency of later comparison. We extend to the double-tagged event categories with corresponding Higgs branching fraction channels as,

2 ai (BR)i eai = P 2× . (2.17)  (BR)j j aj × We show in Table 2.9 the percentage contributions of these decay channels h ii in each → experimentally tagged category a. For instance, a pair of un-tagged jets in category j0j0 will have a probability of 74% from the SM Higgs decay to a pair of gluons, and 16% or 10% from b¯b or cc¯, respectively. With the current tagging efficiency, we translate the significance 0.82σ on BR(h jj) to the un-tagged signal category BR(h j0j0) by rescaling as → → j 0.82σ j0 = S = = 1.1σ, (2.18) S ej0j 74% that accounts for mis-tagged b¯b, cc¯ contributions as well. In other words, if an observation of h j0j0 were made in the future LHC run, the interpretation for individual channels → would be based on Table 2.9, with updated tagging efficiencies. As is customary, we define the signal strength for a decay channel h ii as → BR(h ii) µi = → , (2.19) BRSM(h ii) → where we consider ii = b¯b, cc,¯ and jj. Assuming each category is statistically indepen- dent and following Gaussian statistics. We combine the three categories to get the three

67 j = g, u, d, s

ai b-quark c-quark j eai h b¯b h cc¯ h jj b-tag 70% 20% 1.25% bb-tag 99.6%→ 0.4%→ 0%→ c-tag 13% 19% 0.50% cc-tag 90.4% 9.6% 0% un-tag j0j0 17% 61% 98.25% un-tag j0 16% 10% 74%

Table 2.8: Flavor tagging efficiency Table 2.9: Fraction of SM decay channels

dimensional contour constraint on µb, µc, µj correlatively based on the relation { } 2 X X (xa xa) 2 > χ2 = − a σ2 S a a P 2 prod P 2 SM prod 2 X ( i aiBRiNsig i aiBRi Nsig ) = − (2.20) p 2 a ( Nbkg) P 2 X ( i eai µi 1) = −2 (1/ a) a S where a is the significance from each category identified by experiments, and eai are the S double efficiencies from each decay channel i in category a given in Table 2.9.21 We take

a = (11, 1.35, 1.1 (0.35)) for the three categories, assuming only statistical errors with S 3000 fb−1 data. The first number is from Table 12 in the ATLAS MC study [47], making use of “One+Two-lepton” combined sensitivity. The second number comes from Fig. 2(a) of Ref. [54], the extrapolated study on the same MC dataset assuming the same tagging efficiency. Assuming most of the sensitivity on µc comes from the double c-tagged category, −1 we likewise rescale the number with ec0c and a √2 since they consider 2 3000 fb data × from two experiments. The third number is from our current “Zero+One+Two-lepton” un- tagged jets study, with the number in parenthesis including the systematic error. The fully correlated signal strengths are plotted in Fig. 2.9, for (a) a 3-dimensional contour in (µb, µc,

µj) at 1σ, (b) the projected contour on the µj µc plane with statistical error only, and (c) − with systematical error dominance. The shadowed contour regions are the projection of the

3D contour (µb, µc, µj) onto the µc-µj plane at 1σ and 2σ, and the solid ovals are for a fixed value µb = 1. Allowing µb to float, the contour regions are slightly larger than the ovals. We note that certain values of the parameter space plane are excluded when BR(h bb) → + BR(h cc) + BR(h jj) > 1 and where our SM production assumption breaks down. → → This is represented in the plots by the gray shaded region. The 95% Confidence Level (CL) global upper bounds (approximately 2σ) on the branching fractions with statistical errors

21 P Note the different efficiencies defined in Tables 2.8 and 2.9, with the normalizations ai = 1 in P a categories, and i eai = 1 in channels. 68 10 10 LHC 14 TeV(3000 fb -1) LHC 14 TeV(3000 fb -1) Combined channels(Statistics Only) Combined channels(Systematics Only)

��-������ 5 5 ������� ��-������ �������

j �� j μ -������ SM μ SM 0 ������� 0 �� 68.3% -������ 68.3% ������� 95% 95%

-5 ��-������ ��-������ -5 ��-������ ��-������ ������� ������� ������� ������� -15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15

(b)μ c (c)μ c

Figure 2.9: Signal strengths in correlated regions for (a) 1σ contour in 3-dimension (µb, µc, µj), (b) and (c) contours in µc-µj plane, for statistics only and including systematic uncertainties, respectively. The shadowed contour regions are the projection of the 3D contour (µb, µc, µj) onto the µc-µj plane at 1σ and 2σ, and the solid ovals are for fixing µb = 1. The grey triangle area at the upper right corner is unphysical BR(h bb) + BR(h cc) + BR(h jj) > 1. → → →

(systematic errors) for 3000 fb−1 with respect to the SM predictions can be obtained as

BR(h jj) 4 (9) BRSM (h gg), (2.21) → ≤ × → BR(h cc¯) < 15 BRSM (h cc¯), (2.22) → × → Although this bound on the h gg channel is not nearly as strong as that from the → production fit gg h assuming the SM value, our study and results lay out the attempt → of the search for the direct decay of the Higgs boson to gluons and the light quarks. The result for cc¯ is comparable with the best existing extrapolations [71, 54], although adding the un-tagged category slightly improve the constraints on the c-quark Yukawa coupling, as expected. Further improvements can be made by including the production of the vector boson fusion (VBF) [72] and tth¯ [73]. They are the sub-leading contributions to the h jj → study at Run I and become more important production channels at Run II [74]. Our study includes for simplicity only double-tagged categories, and single b or c tagged categories can be further included as done in the recast by Ref. [75]. Statistics can be further improved by analysis with likelihood fitting, BDT, etc. once data is available.

Bounds on light-quark Yukawa couplings

So far, possible contributions from light quarks (u, d, s) have been ignored in accordance with the SM expectation. The bound on h jj in Eq. (2.21) can be translated into those →

69 −1 (fb ) κu κd κs L 300 (un-tagged j0j0) 1.3 1.3 1.3 3000 (un-tagged j0j0) 0.6 0.6 0.6 Current Global Fits [79] 0.98 0.97 0.70 300 [77] 0.36 0.41 3000 [71] 1

Table 2.10: Extrapolated upper bounds at 95% CL on the light-quark Yukawa couplings SM κq = yq/yb (q = u, d, s)

for the light quark Yukawa couplings. Assuming the SM ggh coupling, and varying one light quark Yukawa yq at a time, we translate our bound on µj to the Yukawa couplings 2 for light quarks u, d, s by scaling the branching fraction with µq y . Our results of the ∝ q bounds on the Yukawa couplings normalized to yb are shown in Table 2.10. There have been attempts to probe the light quark Yukawa couplings in the literature [76, 71, 77, 78].

Recent studies on the inclusive Higgs production and its spectra of pT (h) and yh claim various improved constraints on the couplings [71, 77], compared to constraints from a global fit [79]. The upper bounds from our study of Higgs decay to light jets are comparable to those derived from the Higgs production kinematics, as also shown in Table 2.10, and thus provide complementary information to the existing approaches. We also see from the table that our result may offer a better probe to the strange-quark Yukawa coupling.

Summary and Conclusions

We have carried out a detailed study of the Higgs boson decay to light un-tagged jets in the vector boson associated channel pp V h, with h gg and V = W ±,Z decaying to leptons → → at the 14 TeV HL-LHC with 3000 fb−1. To differentiate the di-jet signal from the huge SM QCD backgrounds, we have maximized the signal sensitivity by combining searches in the 0, 1 and 2-leptonic decay channels of the vector bosons. We used MadGraph, PYTHIA, and DELPHES for the signal and background simulations. Our findings can be summarized as follows.

In Sections 2.3.1-2.3.3, we optimized the kinematical cuts according to the individual • signal channels to enhance the S/√B as well as S/B. The boosted kinematics for the di-jet signal has the advantage to improve S/B, while to keep the S/√B roughly the same. We proposed the “di-jet-vicinity” Higgs mass reconstruction method as seen in Fig. 2.3, and tested its effectiveness against the pile-up effects as in Fig. 2.4.

In Sec. 2.3.4, we studied in great detail on how to control the systematic errors by • making use of the side-bands with a few fitting functions. We found that with 3000

70 fb−1, it is conceivable to achieve the sub-percent level systematic uncertainties, as given in Eq. (2.12). It would be crucially important to take advantage of the large statistics and to keep the systematics under control.

We may reach about 1σ combined significance for the un-tagged di-jet channel, as • shown in Table 2.7 and in Eq. (2.18). We also considered the correlation with mis- tagged events from h b¯b, cc¯ channels, as discussed in Sec. 2.5.2 → Assuming the SM V h production, our results can be translated to upper bounds on the • branching fractions of 4 and 15 times the SM values for BR(h gg) and BR(h cc¯), → → respectively, at 95% CL, seen in Eqs. (2.21) and (2.22).

Exploiting our results, indirect upper bounds on light-quark Yukawa couplings can • be extracted, as summarized in Table 2.10, and compared with the currently existing literature.

We pointed out that there are other variables to explore. Kinematic discriminants like • T vQ and φZh as discussed in Sec. 2.4 may be among them. In the hope to improve the simple cut-based analyses, multiple variable methods like BDT and NN would be promising. Addition of other production channels such as VBF and tth¯ will also help to strengthen the bounds.

After the Higgs boson discovery and initial measurements for the SM-like properties at the LHC Run I and Run II, it is imperative at the HL-LHC to tackle the more challenging channels with the rare Higgs decays. Our studies on the Higgs decay to the light un-tagged jets would hopefully serve as an initial proposal among the future efforts.

71 Chapter 3 Applying Basic Machine Learning Techniques to Collider Phenomenology

This chapter is based on preliminary results from ongoing work.

Introduction

Measuring the Higgs coupling to light quarks is an important step in complete under- standing of the Standard Model (SM) as well as a potential launchpad for physics Beyond the Standard Model, should these couplings fail to match SM prediction. In Chapter 2, we proposed a method to measure this coupling at the Large Hadron Collider (LHC) using associated production to help sort through the large QCD background. Using a traditional cuts-based analysis, we found that a 1.1 σ signal significance could be obtained that allowed us to set bounds on the branching fractions of BR(h gg) and BR(h cc¯). → → In this chapter we propose to apply machine learning tools to the same dataset instead of a cuts-based analysis. Our goal is to match or exceed the results obtained in Chapter 2 using simple, widely available machine learning tools as a proof of concept to lay the groundwork for future, more sophisticated efforts.

Data Preparation

We simulated signal and background events using the exact same specifications as the events generated for the cuts-based analysis in Chapter 2. This includes using the same generator level cuts. As in Chapter 2, we used MadGraph to simulate particle collisions, Pythia to shower, and PGS to serve as the simulated detector. This resulted in LHCO formatted data files[11] frequently used for particle physics phenomenology.

72 Because the format of input data can be critically important for neural networks, it is worth discussing the format of these files in some detail. Raw LHCO event files contain 11 columns, representing standard information and parameters such as the type of particle, kinematic quantities such as eta and phi (the trajectory of the particle away from the collision), and other measurable quantities at the LHC. Each row represents a particle, a jet, or ET . There are 6 types of rows in the LHCO format numbered 0 to 6: photons, electrons, muons, taus, jets, and ET , where particle type 5 is reserved for future use. The number of rows in each collision event can vary depending on the number of particles detected; there can be multiple particles of any given type except for ET , which is calculated to represent the amount of pT needed to make the net pT coming out of the event equal zero, and thus there is only one row for ET in any event. For brevity, in this chapter we will use the word “particle” or “particle type” to refer to a row of a given type, even though missing energy and jets are not particles in the proper sense of the word. Because lhco files are simple text, we will also use “eta” and “phi” as the labels for data in those those columns of data rather than the Greek symbols η and φ. In addition to the standard LHCO particle types, we added rows containing higher level parameters that were calculated for use in the cuts in Chapter 2, such as reconstructred masses of the vector bosons in the event. The columns for these additional rows followed the format of the 11 standard parameters where possible, but in some cases additional columns were added; to compensate for this the original LHCO particle types were all given two additional columns with a value of 0. The first column of the LHCO file, particle number, was dropped as this is simply a bookkeeping tool to determine how many particles were in each event that has no relevance to the actual physical properties of the particles. This resulted in 13 possible particle types and 12 parameter columns per particle. To prepare the data for the neural network, we wanted a uniform number of rows per event, so that each pixel in our n n image would represent the same parameter for the same × particle type. We therefore re-processed the data by counting the maximum number of rows per particle type in any event in the entire signal and background dataset and assigning that number of rows for each particle type for every event. We ordered particle within each type by pT , then filled in zeroes where a specific event did not actually contain data for an ith particle of that type. This resulted in events with 41 rows and 12 columns, populated largely with zeroes. A full key to these particle types and the 41 12 feature event image is × included in App.A. Although the channels were trained and tested separately, we created uniform images for each channel to make human interpretation easier and to allow for the possibility of eventually training and testing the entire dataset together. Finally, we scaled the data. In our preliminary runs we tried using both the Min- Max scalar and the Standardization Scalar included in the Python Scikit Learn (sklearn) package. The 41 12 “images” of events from the 0-lepton channel are shown in Fig. 3.1. × 73 (a)

(b)

Figure 3.1: LEFT (a): 41 12 “images” of 0-lepton particle events after rescaling using SciKit Learn’s Min-Max Scaler.× The top row are signal events; the bottom row is back- ground events. RIGHT (b): 41 12 “images” of 1-lepton particle events after rescaling using SciKit Learn’s Standard Scalar.× The top row are signal events; the bottom row is background events.

The Min-Max scaled images shown in Fig. 3.1a rescale the values of each feature so that the maximum value of that specific feature across the entire dataset is equal to 1, the minimum value of that feature is set to 0, and the remaining values are uniformly scaled to range

74 from 0.0 to 1.0. These images clearly show the field of added zeroes as a uniform dark blue background, but the yellow peak quantities can be deceptive as a single outlying feature will compress the values of that feature in all other events into a very small range. The images in Fig. 3.1b were scaled using standardization scaling22. This type of scaling finds the mean for that feature across the entire dataset, and scales each individual feature value in a normal distribution around that mean. Thus, the “background” of each image is a different shade in the heatmap. Standardization scaling is less sensitive to outliers in the data. The Min-Max scaled images may be easier for the human eye to interpret and give a good illustration of what each “image” means for a given particle event. The first 13 columns are leptons (particle types 0,1,2 and 3). Although these images represent the “0-lepton” signal and background processes, in practice some leptons are still generated during showering. In columns that allow for negative values (eta for all leptons and ntrk for charged leptons, which correspond to columns 2 and 6), this means that a value of zero is no longer scaled to 0.0. Thus even though most of these events did not contain a lepton, they have what appear to be hot spots in columns 2 and 6, indicating that some event in the dataset had a stray lepton in those rows. The cluster of hotter pixels denote the beginning of the jet rows, which fade out at the higher number of jets. Again, the apparent hotspots in the second column reflect that some event in the dataset had a jet in that row with a negative eta value, thus skewing the pixel color for that feature in all other events. As the number of events with a particle in those lower rows decrease, a single event with a relatively high eta becomes more and more of an outlier, making the resulting skew appear more drastic.

After the jet rows run out we have one row for particle type 6 ( ET ) and then the higher level parameters begin. The first few higher level paramaters require two leptons, thus there is little activity in these rows for the 0-lepton channel. We then have one row for jet variables and a few rows for higher level parameters calculated with only one lepton. The

last row has higher level paramaters calculated for a ET plus jets signature, and we can see that again activity picks up for the last row. The standard scaling seems more muted to the casual glance but in reality it gives a more accurate picture of what is going on in the event. There is almost no activity in the leptonic rows as would be expected for the 0-lepton channels. The key activity occurs in the met and jet rows, where any values that are darker than the background indicate lower- than-average values. Thus, for example, the 6th image in the bottom row has jets that cluster well below the average values. Meanwhile, the jet clusters in the row of signal events cluster at higher than average jet values, oftentimes with the two leading jets at higher than average values and subsequent jets at lower than average values. This is precisely the

22 Note that the generated images were chosen randomly from the signal and each background datasets; the images in Fig. 3.1a do not correspond to the same events as those depicted in the images in Fig. 3.1b.

75 kinematic behavior we are looking for from the signal and background events for this study. Our preliminary runs achieved better results with standardization scaling, so standard- ization is what we used for the remainder of the training process. As a final note on data preparation, we chose to use the same code to calculate and organize the additional particle types because (1) we wanted to present the neural network with the exact same data used in the cuts-based analysis as much as possible, and (2) data preparation is by far the most difficult and time consuming task in using neural networks, so for a preliminary study where we did not know if we would get worthwhile results it was easiest to use existing code to prepare the data. However, one major drawback in preparing the data using code that was originally written to deal with a cuts-based approach is that some of the higher level parameter rows required the event to pass some preliminary cut in order for the row to be created. Because the intent of this study was to compare the performance of a neural network without using traditional cuts, all cutoff values were set to zero or infinity as appropriate. Unfortunately, this inadvertently resulted in an entire higher level particle type, type 9, being ommitted from the dataset that included very relevant features such as the reconstructed mass of pairs of jets. We corrected the code to allow for a more generic creation of this particle type, but this in turn led to the creation of up to 45 jet-pairs per event, resulting in 86 12 event images. We did attempt to run × these images through the neural networks; however, we were unable to fully optimize and process those results prior to the completion of this thesis. We will comment on preliminary observations from the 86 12 dataset in each channel separately. × The Network

In order to make a close comparison to each of the three signal channels explored in Chapter 2, we train and fit a neural network to each channel in isolation. We input the processed data as Python-based Pandas Dataframes in Jupyter notebooks23. For each channel, we ran the data through both a Fully Connected Network (FCN) and a Convolutional Neural Network(CNN) using Keras and backended with Tensorflow24. Although each network for each channel was optimized individually, in the end the datasets proved similar enough that the optimization was comparable for all three channels, allowing us to use the same optimization values for each channel. Each FCN was constructed with 400 hidden layers. The hidden layers used the tanh activation; the output layer use the “softmax” activation. Each layer had the same number

23Jupyter[80] is a popular web-based application broadly supported in the Computer Science community, including Google which hosts a free Jupyter notebook platform, Google Colab[81], that can handle very small machine learning tasks. Jupyter supports over 40 programming languages. A number of useful data analysis and machine learning packages, many of which are also open-source, are written using Python, including the powerful yet simple and popular open-source neural network library Keras[82]. 24Tensorflow[83] is a powerful open source software library designed for large-scale Machine Learning.

76 of nodes as the input layer (41 12 = 492). Optimized convergence occurred at 15 epochs × with a batch size of 128. For each CNN we used 70 layers for our first convolutional layer and 30 for our second. 5 3 receptive fields and max pooling were used for all convolutional layers. The convolu- × tional layers used the “ReLU” activation. The final layers consisted of 64 fully connected layers that also used the “ReLU” activation, and the final output layer used the “softmax” activation. The CNNs converged at 25 epochs using a batch size of 256. For each channel we had a number of signal events and a number of background events. Each channel was scaled separately, but the total dataset per channel of background and signal events was combined for scaling. After scaling, the signal and background were separated again for independent shuffling and selection into the reduced datasets that would be used in the neural network as a test/train sample. For the 0-lepton and 2-lepton channels, we had one signal process and one background process consisting of 10,000 events each. The 1-lepton channel had two different back- ∼ grounds, and two processes for each signal and background resulting in a larger dataset and a slightly different sample selection process. For the 0-lepton and 2-lepton channels, we independently shuffled the signal and back- ground events, then randomly selected 8,000 events each of background and signal to be included in the test/train sample. The background and signal events in the test/train sample were then shuffled together. Finally, we randomly selected 80% of the combined signal and background sample set to be the training dataset and 20% of the combined signal and background sample set to be the training dataset, for a final test/train split of 3,200 test events and 12,800 training events. The neural networks were then trained on the training sample. Once training was optimal, the test events were passed through the trained model and various performance measures were calculated. Finally, we repeated this process ten times for each channel, selecting a different random collection of 8,000 events for the test/train sample and a different selection of events for the training and testing sets, respectively. This helped ensure the performance measures were consistent and allowed us to establish a mean and standard deviation to the accuracy of each performance measure. The 1-lepton channel required a slightly different treatment. Because the positive lepton and negative lepton processes for each signal and background were simulated separately, we had significantly more events to use. We initially had 19,997 signal events, 19,996 pri- mary background events, and 19,990 secondary background events. From this we randomly selected 15,000 of each background into a combined background set of 30,000 events. We shuffled this combined background, then selected 15,000 background events that included events from both background processes. This was then shuffled with a randomized selection

77 Pred = Sig Pred = Bgnd True = Sig TP FN True = Bgnd FP TN

Pred = Sig Pred = Bgnd True = Sig 1165 365 True = Bgnd 678 995

Table 3.1: Example confusion matrices. The bottom matrix has numeric values for the 0-lepton channel CNN. This particular matrix resulted in a 76.1% signal pass rate and a 40.5% background pass rate for an isolated channel significance of 0.414.

of 15,000 signal events to create a test/train sample of 30,000 events. From this we further selected a training sample of 80% and a test set of 20%, resulting in an overall training dataset size of 24000 events and a testing dataset size of 6000 events. Like the other two channels, this process was repeated independently ten times for the FCN and ten times for the CNN.

Analysis and Results

In order to directly compare results to those of the cuts-based analysis described in Chapter 2, we used neural networks to obtain quantities for percentage of “passed” signal and background events in each channel to compare with the final line in each of Tables 2.2, 2.3 and 2.4. The neural network actually gave us more information than just a pass rate, in the form of a confusion matrix, as shown in Table 3.1. The confusion matrix shows how many signal events were classified correctly as signal (true positive or TP), how many signal events were incorrectly classified as background (false negative of FN), how many background events were classified incorrectly as signal (false positive or FP), and how many background events were correctly classified as background (true negative or TN). We used the true positive (TP) to correspond to NS, the number of signal events that passed cuts,

and the false positive (FP) to represent NB, the number of background events that passed cuts. A example confusion matrix for a specific run of the 0-lepton channel through the CNN is also given in Table 3.1. We calculated an isolated significance for each channel using Eq. 2.15:

Nsig = p (3.1) S Nbkg where Ni is the pass rate, εi multiplied by the generated cross section, σi (the top line in each

78 of Tables 2.2, 2.3 and 2.4). For multiple signal and/or background processes contributing to P a given channel, N = εjσj. This allowed us to compare the neural network performance j to the cuts-based approach channel by channel. Although not a part of the neural network itself, the Python-based Scikit Learn pack- age25 includes a buffet of data analysis and visualization options. Among them is the capability to create a correlation matrix. We passed our 492 features plus the label into the correlation function to find each feature’s correlation to the label. The data has many fea- tures, many of which are interdependent, so all individual feature correlations were low. It is also important to note that relationships between the features are likely more important to the network than any single feature, and features that have low individual correlation may actually be quite a bit more important in the larger image and in combimnation with other features. The 15 features with the highest positive and negative correlations to the label for the 2-lepton channel are shown in Table 3.2. The negative correlations for the 2-lepton channel, as well as the positive and negative correlations for the other two channels, are included in App.A, Tables A.5 through A.9, along with a more detailed discussion to accompany each table. It is interesting that for the 2-lepton channel, the number of tracks contained in the jet with the highest, second highest, and third highest pT all appear on the list. Other jet features that appear are the jmas of the first and second jets, and the pT of the leading jet. This indicates that the cuts-based analysis may have underutilized jet features and available jet reconstruction techniques, as noted in the conclusion to Chapter 2. On the gauge boson side, the reconstructed Z mass is strongly correlated for both electron pairs and muon pairs. This validates the premise of the study outlined in Chapter 2, where associated production was used to provide a leptonic “handle” by which a Higgs production event could be more readily identified than through more dominant processes that have mostly hadronic decay products. Additionally, the pT of the leading and second muons are on the list, but the pT of electron pairs is not.

The appearance of the muon pT as a relatively highly correlated feature explains an item on the list that at first glance may be puzzling: the reconstructed W boson mass, for a channel that does not include W bosons. This reconstructed mass is a function of the lepton pT and the missing energy in the event. Since we have seen that the muon pT has significant correlation, it is not surprising that a variable that is a function of the muon pT was also correlated, even if the physical object that function is supposed to represent does not actually exist in this channel.

25Scikit learn is another free machine learning library 79 feature # correlation description 167 0.273000 ntrk for leading jet 179 0.170194 ntrk for 2nd leading jet 369 0.093555 reconstructed Z mass (muon pair) 345 0.089172 reconstructed Z mass (electron pair) 165 0.083783 pt of leading jet 166 0.083000 jmas of leading jet

441 0.078577 leading muon pT

81 0.078577 leading muon pT 445 0.072701 reconstructed W boson mass (leading muon) 178 0.072644 jmas of 2nd jet

484 0.071065 ET > 30GeV (P/F)

93 0.070699 pT of 2nd leading muon

453 0.070699 pT of 2nd leading muon 191 0.066121 ntrk of 3rd jet 457 0.062831 reconstructed W boson mass (2nd leading muon)

Table 3.2: 15 features with the highest correlation to the event label for the 2-lepton channel.

Results: 2-lepton channel

The processes used for the 2-lepton channel were:

qq¯ Zh `+`−gg (signal) → → (3.2) qq¯ Zjj `+`−jj (background) → → For the 2-lepton channel we used only the tree-level signal process. We opted not to use the loop signal process gg Zh `+`−gg used in Chapter 2 due to abnormalities in the → → data suggesting that we may not have produced a statistical match to the dataset used in Chapter 2. The dataset included 9,998 signal events and 9,999 background events. In Chapter 2, we found that the 2-lepton channel cuts reduced the tree-level signal to 14% and the background to 1.9%, resulting in an isolated significance of 0.190 for this channel. As can be seen in Table 3.3, both the FCN and the CNN obtained better results, with an isolated channel significance of 0.224 and 0.230, respectively. Although it would require significant network engineering to determine exactly why the neural networks obtained better results, we can offer a few suggestions. Statistically iden- tical sample sets of LHCO formatted event data were used, as well as identically calculated higher-level variables in the additional LHCO particle types. However, in the cut-based

80 2-lepton channel FCN CNN traditional mean standard mean standard cuts deviation deviation signal pass rate 0.657 0.029 0.617 0.088 0.14 bgnd pass rate 0.328 0.022 0.277 0.082 0.019 single channel significance 0.224 0.006 0.230 0.008 0.190

Table 3.3: 2-lepton channel results.

analysis all information was passed through cuts with human-defined cut limits. It could be that the human-defined limits could be massaged into more accurate cuts, such as by using binned cut limits, which would more effectively compare to the networks’ minimization of a cost function using gradient descent. Second, if we assume that the more highly correlated features contributed more to the network’s classification, then Table 3.2 suggests that the jet analysis of the cuts-based approach could have been significantly improved upon, and that more attention could be paid to things like number of tracks in a jet and the pT of the leptonic decay products of the Z boson even prior to reconstruction of the boson itself. The full 86 12 dataset including the particle type 9 did not appear to improve upon on × the results using only the 41 12 dataset. This was surprising, as particle type 9 included × many resconstructed jet variables including the reconstructed Higgs mass. However, the variation of the performance measures was significant, indicating that proper optimization of the full 86 12 dataset might still eventually yield better results. × Results: 1-lepton channel

The processes used for the 1-lepton channel were:

qq¯ W h `νgg (signal) → → qq¯ W jj `νjj (primary background) (3.3) → → tt¯ `νjjb¯b (secondary background) → where ` could be either positive or negative, and where the bottom quarks in the secondary background further decay. The 1-lepton channel presented an additional challenge based on the way the events were generated. This channel was the only channel to have a second significant background process. Thus in order to get a number of events that “passed” as signal that we could then multiply by the appropriate cross section and compare process by process to Table 2.3, we

81 1-lepton channel FCN CNN traditional mean standard mean standard cuts deviation deviation signal pass rate 0.768 0.024 0.807 0.043 0.18 bgnd (Wjj) pass rate 0.469 0.041 0.494 0.069 0.025 bgnd (tt¯) pass rate 0.124 0.013 0.165 0.052 0.025 single channel significance 0.418 0.006 0.423 0.012 0.419

Table 3.4: 1-lepton channel results.

first trained the model with the combined background set as described in Section 3.3, but we tested the model individually on test sets containing only events from each of the signal and two background processes to get individual pass rates. This was the only channel in which one of the networks (the FCN) failed to improve on the results of the traditional cuts-based analysis, and the although the CNN did slightly better, the traditional cuts were still within one standard deviation of the CNN’s results. It is difficult to comment on why the neural networks did as well as, but no better than, the traditional cuts without more sophisticated techniques. The most basic tool we used, the correlation matrix, suggested that none of the most highly correlated features, whether positively or negatively, coincided with any of the low level or higher level features used in the traditional cuts-based analysis. This is the only channel where the negative correlations were higher than the positive correlations, so it could be that the networks were able to match the cuts-based results primarily by identifying events as background moreso than by identifying events as signal. The negatively correlated featured suggested that the presence of third leading and subsequent jets with high values for pT , jmas, phi, and number of jet tracks disfavored signal. We also note that the 1-lepton channel had more events to work with than the other channels. While the number of events simulated probably would not have greatly affected the pass rate in the traditional cuts-based analysis, training a model with more events gener- ally leads to a more accurate result so higher number of events may have made the training of the 1-lepton more successful despite its lack of strong positively correlated features. We again conclude that more sophisticated jet reconstruction techniques would signif- icantly improve the results in this channel. The preliminary runs of the 1-lepton channel using the full 86 12 dataset supported this. Isolated significances of around 0.45 were × routinely achieved in various runs; however, the variance of the performance measures in these runs was again very large, indicating the need to properly optimize the network for

82 0-lepton channel FCN CNN traditional mean standard mean standard cuts deviation deviation signal pass rate 0.656 0.041 0.718 0.042 0.23 bgnd pass rate 0.307 0.028 0.334 0.044 0.045 single channel significance 0.411 0.013 0.432 0.014 0.370

Table 3.5: 0-lepton channel results.

this dataset before drawing any conclusions.

Results: 0-lepton channel

The processes used for the 2-lepton channel were:

qq¯ Zh ννgg (signal) → → (3.4) qq¯ Zjj ννjj (background) → → For the 0-lepton channel we again used only the tree-level signal process, ommitting the loop signal process gg Zh ννgg due to similar abnormalities in the data as found → → in the 2-lepton channel. The dataset included 9,999 signal events and 9,998 background events. The 0-lepton channel saw the greatest improvement over the cuts-based analysis, sug- gesting that there are hidden patterns in the data that are not captured by the usual cut parameters. The CNN performed better than the FCN in all channels, but the 0-lepton channel had the greatest difference. Although future work should use more sophisticated tools to investigate why the neural networks did better in this channel, we can again look to the correlation matrix with caution to make some initial suggestions. The raw value of ET was highly correlated, validating the

use of ET as a critical feature in the cuts-based approach. Jet features again played a large role, with the features of the first three jets being positively correlated and the features of the fourth and subsequent jets being negatively correlated. This channel also showed that the additional particle type created for the TvQ variable proposed in Chapter 2 may indeed give increased sensitivity to channels with a jets plus large ET signature. This channel did not show much improvement using the 86 12 images, but again, × optimization of those images proved difficult and there was significant variance in the per- formance.

83 FCN CNN trad. cuts σ (fb) signal background signal background signal background 2-lepton 2.6 10−1 3.9 103 2.4 10−1 3.3 103 5.4 10−2 2.4 102 × × × × × × 1-lepton 1.8 4.9 104 1.9 5.2 104 4.1 10−1 2.9 103 × × × × 0-lepton 7.9 10−1 1.1 104 8.6 10−1 1.2 104 2.7 10−1 1.6 103 × × × × × × total 2.8 6.4 104 2.96 6.7 104 7.3 10−1 4.6 103 × × × × combined 0.61 0.62 0.59

Table 3.6: Combined significance achieved using a FCN, CNN, and traditional cuts, with the 1-loop signal process omitted from the 2-lepton and 0-lepton channels.

Combined Results

The combined results from all three channels given in Chapter two uses both the tree and loop signal processes in the 2-lepton and 0-lepton channels. In this study we omitted the loop processes due to irregularities in the results that made us question the statistical equivalence of our dataset with the dataset used in Chapter 2. It is impossible to know how the neural network would be affected by the addition of another signal process, so we cannot estimate a combined significance for the neural networks with those channels included. Instead, we calculate a new combined significance for the traditional cuts using Eq. 2.15 without the loop processes included in order to more accurately compare our results. The combined significance using each method (FCN, CNN, traditional cuts) is shown in Table 3.6. In all three channels, the CNN performed better than the FCN. The correlation matrix shows that due to the structure of the data, the more correlated features tended to cluster (see App.A). This may indicate that the receptive fields of the CNN were able to do a better job of identifying patterns of several features than the FCN did. Features such as the jet pT and jmas of leading and subsequent jets would all fall into the same receptive field in the CNN, amplifying their importance in discriminating between signal and background. This indicates that although better data grooming may improve our results, the basic tenet of keeping similar data types in the same column and rows arranged by particle type in order of leading pT (or perhaps ordered some other highly correlated variable) should be preserved to harness the power of the overlapping receptive fields in the CNN. It is interesting to see that in all three channels, the feature with the highest correlation to the instance label was the number of jet tracks associated with the leading jet. This is a variable that was not considered at all in the cuts-based analysis of Chapter 2. It is difficult to say whether this variable is worthy of more consideration in traditional cuts

84 based analyses, or if it is simply the most consistently, correlated feature in a dataset with 492 features. We hope that in future work we can use additional machine learning tools to try to determine the true relevance of the number of tracks associated with the leading jet.

Outlook and future work

The CNN improved on the traditional cuts-based results in all three channels while the FCN improved on the traditional cuts-based results in the 0- and 2-lepton channels and was comparable to the cuts-based results in the 1-lepton channel. This was accomplished using very simple, entry level, open access machine learning tools and limited dataset prepara- tion. We expect that these results can be improved upon further by including a number of enhancements to this analysis in the future:

We would like to use the loop signal processes in the 2- and 0-lepton channels to fully • match the dataset used in Chapter 2.

We would like to optimize the networks to include the jet variables from particle type • 9 that were ommitted from this analysis.

We would like to clean and optimize the features used in training rather than includ- • ing every feature available plus a large number of redundant and possibly irrelevant features.

We would like to simulate larger datasets (more events per channel) to improve learn- • ing ability.

We would like to try more sophisticated techniques such as k-fold validation to make • more efficient use of the data.

We would like to pre-exclude regions of high background to facilitate the training of • the network as described in [32].

We would like to employ an unsupervised autoencoder to extract filters from the • machine learning algorithm that will help us deconstruct the features that are being used by the algorithm to sort signal from background. These filters may give insight into new parameters or observables that can then be used to enhance traditional cuts-based analyses.

As we continue this avenue of research in the future, we would also like to systematically record our efforts in a pedagogical document, hightlighting specific, accessible, easy to use tools and routines of interest to particle phenomenologists. We hope this will enable other researchers to dip their toes into the machine learning tsunami that seems inevitable given the incredible technological progress this field has seen in the past two decades.

85 Chapter 4 Increasing the Discovery Potential Using Rare SUSY Scenarios: Gluinos

This chapter is based on work submitted to Phys. Rev. D.

Introduction

The Large Hadron Collider is well into its search for physics beyond the weak scale. The discovery of a light seemingly fundamental scalar boson reinforces the urgency of the hierarchy problem. Though Supersymmetry is a leading paradigm to explain the naturalness of this new particle, SUSY partners have not yet been discovered. It is assumed that the first smoking gun signal for Supersymmetry will come in the form of jets plus missing energy signals from the production of strongly coupled superpartners, the squarks and gluinos. However, existing searches already greatly constrain the masses of light-flavored squarks and gluinos, except in highly mass degenerate scenarios. An estimation of the gluino/squark lower mass bound in the jets plus ET channel from ATLAS puts gluino masses just over 2 TeV, with bounds as high as 1.6 TeV on light generations of squarks [23].

CMS excludes gluinos decaying to flavorless jets plus ET with masses just above 1.6 TeV, while light-flavored squarks are bound at masses just over 1.3 TeV [84]. The five sigma discovery potential for the colored sparticles is being approached by already existing limits. For example, upper bounds on the discovery potential of gluinos in the jets plus ET channel are estimated to be roughly 2.4 TeV at CMS [25], and around the same at ATLAS [26]. In a few scenarios, where decay chains are engineered to vastly prefer decay through heavy flavored squarks, the 5 sigma discovery limit on gluino masses is 2.8 TeV [27]. This presents an uncomfortable shadow scenario for the weak scale physicist, where SUSY partners may have masses in the intermediate TeV range, yet be undiscoverable

86 with the Large Hadron Collider. In its high Luminosity run the LHC will take 3 inverse attobarns of data. However this represents a sensitivity gain growing only with the square root of luminosity, therefore some SUSY production modes will simply remain invisible to searches even with the full HL-LHC data set. It is then incumbent on the phenomenologist to switch focus to more rare but spectacular SUSY production modes which may become visible in the high luminosity data set of LHC. In particular, rare events with extremely boosted states offer an excellent signal to background ratio, and thus hope to offer discovery scenarios to the HL-LHC. In this paper we propose to study a rare SUSY production process which increases the discovery potential for the gluino. Standard searches focus on the process of gluino pair production pp g˜g˜. However, the kinematic threshold in the pair production process → severely cuts off the production mode for heavy gluinos. In addition there are many jets in the events, and large backgrounds for the process. Instead we propose to consider the production of a single gluino and weakino(either neutralino or chargino), pp gχ˜ ±, first → 0 proposed in reference [85]. The process yields events with jets plus missing energy; however, as weakinos are expected to be much lighter than gluinos, the kinematic wall for the gluino mass is significantly relaxed. The events contain a substantial amount of missing energy, and in addition the relative clean-ness of the events this allows us to reconstruct the jets from the gluino decay into kinematic discriminants involving the transverse mass. Exact production cross section for the process will depend on the admixture of the weakino. Depending on the weakino admixture and weakino mass splittings, there may be a hard lepton in the events which result from the decay of charginos or next to lightest neutralinos. In this work we will thus choose to work in a wino-like weakino scenario. Wino-like lightest supersymmetric particles(LSPs) present themselves over much of SUSY parameter space as can be seen in classes of models like the PMSSM [86], General Gauge Mediated Models [87, 88, 89, 90, 91], and extensions of anomaly mediation [92]. In these scenarios gluino associated production with wino-like charginos is appreciable and the charginos decay to very mass degenerate neutralino LSPs, hence minimizing the number of hard particles in the event. This work is meant to be a proof of principle of the viability of the gluino-weakino production as a discovery process, we thus conservatively choose to study a simple inclusive channel with a specialized jets and missing energy analysis. Extending analyses to include other cuts or other weakino scenarios may improve these results even more. By exploiting large missing energies and a kinematic feature in the transverse mass of jets resulting from gluino decay, we will demonstrate that we can provide a good discovery potential for gluino masses in the 2.4 to 3 TeV range. This chapter is organized as follows: Section 2 discusses production modes and cross sections for gluino-weakino production, Section 3 lays out the SUSY parameter space and event kinematics, Section 4 describes our cut based analysis, and Section 5 gives results for

87 the gluino discovery potential at the HL-LHC. Section 6 concludes.

Production Modes

There are various processes by which a colored SUSY particle may be produced in asso- ciation with a weakino. We present diagrams of these production mechanisms in Fig. 4.1. Gluino-weakino production follows via a t-channel process through exchange of a virtual squark, the tree level exchange is dominated by the light flavors of squark. The resulting weakino may be charged or neutral. Production of a squark in association with a neu- tralino or chargino may arise through a quark-gluon fusion process through virtual squark exchange; again the resulting weakino may be charged or neutral. A loop level process is also possible in which one gluino and one neutralino are produced through gluon fusion. Here all flavors of virtual quarks and squarks run in the loops. We may now explore the relative production cross section of these processes for some typical masses for the MSSM particles using simple benchmark points. In Fig. 4.2 we show the computed cross sections for the production of squarks or gluinos in association with the bino-like lightest neutralino. In order to demonstrate the relative production cross sections of tree level vs loop level processes, the cross sections have been computed using the Madgraph5@NLO [59], and the one loop SUSYQCD model [93]. Since virtual squarks mediate all processes, we have plotted all production cross sections as a function of squark mass, where we have assumed equal masses for all flavors of light flavor squark. Here we have chosen 4 benchmark points; heavy (mχ=1 TeV) and light (mχ = 100

Figure 4.1: Production modes for a single colored MSSM particle in association with a chargino or neutralino.

88 Figure 4.2: Relative production cross sections for colored SUSY particles produced in as- sociation with a bino-like neutralino or various benchmark masses. The production cross sections are plotted as a function of squark masses given in TeV on the x axis.

GeV) neutralino mass points, as well as a heavy (mg=7.5 TeV shown as x’s) and light

(mg=1.5 TeV shown as dots) gluino mass points. We see that in the squark-neutralino production mode, the production cross section falls off very sharply as the squark mass approaches the kinematic threshold of 14 TeV exactly as we expect. In examining the neutralino-gluino production we see that increasing the neutralino mass from effectively zero to 1 TeV produces an order of magnitude difference in cross section. We also note that for large values of squark mass, above the average center of mass energy of LHC collisions, the gluino squark production process is well approximated by 1 a dimension 6 effective operator proportional to 2 . Tree level gluino neutralino production msq is the dominant process over a large range of squark masses, and remains above a femtobarn for squark masses of a few TeV. The loop-level gluino neutralino production from gluon fusion is much smaller than the production from quark fusion except in the case of very light squarks, and falls rapidly with the squark mass. We therefore neglect loop level production in this work.

Event kinematics and SUSY parameter space

Our study of gluino-weakino production relies on four parameters, the weakino and gluino masses, the weakino mixing content, and the squark masses. In this analysis, we will fix

89 the weakino content by demanding that the lightest neutralino be purely wino-like. In the wino-like the case, the LSP neutralino will be accompanied by an almost mass degenerate pair of charginos, split from the LSP mass by an amount which must be more than a pion mass [94]. In our simplified model only the squarks, gluinos and wino-like weakinos will be light with all other SUSY masses including sleptons, the bino, and Higgsinos very large; the choice of heavy Higgsinos and bino is theoretically consistent with a wino-like LSP. The gluinos are produced in association with both light charginos and neutralinos. In our studies we have fixed the neutralino mass to be 100 GeV. This is phenomenologically viable, as in searches with very small mass splittings between the neutralino and chargino, light neutralino masses are mostly unconstrained by current disappearing track or mono-boson searches [95, 96, 97]. Disappearing track searches will have sensitivity to order 100 GeV wino scenario for a narrow range of mass splittings in the full HL-LHC run [98] and mono- boson searches may constrain some parameter space. Having fixed the mass of neutralinos and charginos we may now consider the kinematics of our signal events as a function of gluino and squark masses.

In our simplified SUSY spectrum the decay modes of the gluinos will beg ˜ qq˜ qqχ0 → → andg ˜ qq˜ qq0 χ±. Note here the decay of the gluino may proceed through either on or → → off shell intermediate squarks. The chargino decay will proceed through an off shell W to the neutralino and soft products. The signal events will thus be amenable to an inclusive jets plus missing energy search. One main characteristic of these events is a substantial amount of missing energy. Recall that the definition of missing transverse energy is that it is the negative vectorsum of the visible transverse momentum, ET = ΣpT . We may examine the characteristic distribu- − tions of missing energy using collider simulation techniques. We generated samples of signal events using Madgraph [59], events were decayed with Madspin[60], showered using Pythia [99], and run through the PGS detector simulator [100]. Events were generated applying a

300 GeV generator level ET cut.

Fig. 4.3a shows the ET distribution of 30000 signal events compared to the Zjj back- ground for a benchmark point with mg˜ 2.6 GeV and mq˜ 1.6 TeV. This benchmark point in- volves gluino decay through an on-shell squark. In the rest frame of the gluino we may com- 2 2 pute the squark’s momentum, pq˜, which is approximately mg˜ mq˜ /2mg˜. The squark − subsequently decays to a nearly massless quark and much lighter neutralino/chargino leav- ing the visible momentum in the event substantially unbalanced. The maximum ET occurs when the squark is ejected in the fully transverse direction, in which case we would expect the characteristic missing energy to be around 0.8 TeV in the gluino’s rest frame for our benchmark point. The characteristic missing energy of an event increases with increasing gluino mass as expected and also increases with increasing weakino mass. In Fig. 4.3b we have also plotted the distribution of the dijet invariant mass of our events.

90 (a) (b)

(c) (d)

Figure 4.3: Histograms giving the distribution of various kinematic discriminants in signal and background events of sample size 10000. (a) The ET distribution of events. (b) The di-jet invariant mass distribution of events. (c) The di-jet mT 0 distrbution. (d) The di-jet mT i distribution.

In our figure,the invariant mass is computed by taking the invariant mass of the two leading hard jets, which are overwhelmingly likely to come from the decay of the gluino. For our benchmark point with mq˜ >> mχ0 the maximum invariant mass should be approximately q 2 2 2 pq (pq + mq˜ ) + pq, about 2 TeV for out benchmark point. We see this estimate agrees with our plot. In order to characterize the kinematics of the event we may construct the transverse mass 2 2 2 of the leading di-jets. The standard transverse mass is given by m = (ΣET ) (ΣpT ) . T 0 − For our benchmark point, the minimum transverse mass occurs when the gluino decay ejects the initial squark in the longitudinal direction. The transverse mass of the dijet system will

91 depend on the azimuthal angle of the initial squark, so we expect the transverse mass to then be distributed uniformly until the maximum occurs when the gluino decay ejects the squark in the purely transverse direction. For our benchmark points we may compute this at bit under 2 TeV. Fig. 4.3c we show the distribution of mT 0 consistent with our expectations.

We see that in our mT 0 distribution, we have a significant number of events popu- lating the low transverse mass region. This is less efficient for distinguishing signal from background as there is significant overlap in the low transverse mass region. Therefore, in analyzing our events, we also use a generalization of the “inclusive” transverse mass which 2 2 2 we define as mT i = (miv +(ΣpT ) ). We see that with this definition, the inclusive transverse mass of our leading dijet system will be guaranteed to be larger than the invariant mass. We thus expect that, compared to the di-jet invariant mass distribution, the mT i distribution should have more events at high values. We expect a maximum of the inclusive transverse mass again when the gluino decay ejects the squark in the transverse direction. Our rough q 2 2 2 2 calculation for on-shell squarks yields a maximum value pq (pq + mq˜ ) + pq + pq/4. With the characteristic value of this distribution being a bit above that of the invariant mass dis- tribution, we also expect the mT i distribution to favor higher values (and have fewer events at low value) as compared to the mT 0 distribution. The leading di-jet mT i distribution is plotted in Fig. 4.3d. The distribution shape conforms to our expectations from simple kinematics.

Cuts-based analysis

Having discussed the signal kinematics we now describe our cuts-based jets plus missing energy analysis which has been tailored for this signal. We note that in the production of signal events we have scaled the tree level cross section predictions from Madgraph with a modest k factor of 1.3, which is consistent with next to leading order computations for this process [101, 102]. The main background for this process consists of Z + jets production in which the Z boson decays to neutrinos; qq¯ Zjj ET + jj. In testing → → the possible tt¯ background as a source of jets plus missing energy events, we found the number of events passing cuts to be negligible in our analysis compared to the main Z+jets process. Background events were also created with consistent 300 GeV generator level missing energy cuts. Background events were generated using Madgraph, showered with Pythia, and passed through the PGS detector simulator. The kinematic distribution of missing energy, invariant di-jet mass, mT 0 and mT i are given along with the signal in Fig

4.3. We see in the distribution plots that resulting ET distribution for the background is peaked at small values and swiftly falling with increasing missing energy. The transverse and invariant mass distributions are also peaked at low value and fall off very quickly at high values. In order to separate signal from background in our analysis we therefore consider

92 the following cuts:

Events must contain at least 2 jets in the central region of the detector η < 2.5 •

Jets must have pT > 20GeV •

Events must have ET 500 GeV • ≥ From this point we now test four possible cut flows. We choose to construct and cut around one of two transverse masses, either mT 0 or mT i. We choose to construct these transverse masses one of two possible ways, using either an exclusive di-jet or an inclusive all-jet method. First we describe the exclusive method:

A dijet transverse mass discriminant mT = mT 0 or mT i is constructed using only the • two leading jets in the event

the chosen mT must fall in a kinematic window which varies with hypothesized gluino • mass Mg ∆ < mT < Mg + ∆ where ∆ is .5Mg − Next we describe the inclusive all-jet method:

mg˜ 2.2 TeV

− + cut gχ˜ 0 gχ˜ gχ˜ Zjj

none 10000 10000 10000 10000

inclusive mT 0 1511 2417 2509 94

500 GeV ET 1048 1592 1645 11

mg˜ 1.0 TeV

− + cut gχ˜ 0 gχ˜ gχ˜ Zjj

none 10000 10000 10000 10000

inclusive mT 0 2269 4091 4259 1091

500 GeV ET 796 583 532 124

Table 4.1: Cut flow for signal and the main Z jj background for 2 benchmark points with 1.6 TeV quarks. The transverse mass mT 0 is reconstructed using the exclusive di-jet method. To demonstrate the change in efficiencies as the transverse mass window shifts with gluino mass, the top benchmark point gives cut flow for a 2.2 TeV gluino while the bottom benchmark point gives cut flow for a for 1 TeV gluino.

93 (a) (b)

(c) (d)

Figure 4.4: Scatter plots of missing transverse energy vs various invariant masses in signal and background events. The black dots are background, the red blue and green dots show + − events with gluinos produced in association with χ0, χ and χ respectively. (a) The distribution of mT 0 using the exclusive di-jet cut method. (b) The distribution of mT i using the exclusive di-jet cut method. (c) The distribution of mT 0 using the inclusive all-jet method. (d) The distribution of mT i using the inclusive all-jet method.

A transverse mass discriminant mT = mT 0 or mT i is constructed using all viable jets •

the chosen mT must fall in a kinematic window which varies with hypothesized gluino • mass Mg ∆ < mT < Mg + ∆ where ∆ is .5Mg − Fig. 4.4 shows the difference in signal and background distributions using our four cut- flow techniques. For this figure, we have chosen a benchmark point with gluino mass of 2.6 TeV and squark mass of 1.6 TeV. We show scatter plots for 10000 signal events and 10000 background events in the missing energy-transverse mass plane. Fig. 4.4a shows ET vs mT 0 using the exclusive dijet method; Fig. 4.4b shows ET vs mT i using the exclusive di-jet method. Fig. 4.4c shows ET vs mT 0 using the inclusive all-jet method; Fig. 4.4d shows ET vs mT i using the inclusive all-jet method. We see that the inclusive all-jet methods have the predictable effect of smearing out the events. We also note the events take a characteristic 94 distribution in the missing energy-transverse mass plane, which would be an interesting subject for efforts to further optimize this analysis. In Table 4.1 we present a sample cut-flow for signal and background events for two possible benchmark points. We have used the exclusive di-jet method to construct the mT 0 transverse mass for the cut flow in this table. To show how the search efficiencies depend on the gluino mass we show cut-flows for a benchmark point with squark mass 1.6 and gluino mass 2.2 TeV, and another benchmark point with squark mass 1.6 TeV and gluino mass 1 TeV. We see from the cut-flow that as we raise the gluino mass we also raise threshold to make it into the transverse mass window. The high transverse mass threshold ensures that the ratio of signal to background efficiency decreases drastically for appreciable gluino masses.

Results

We will now construct the discovery potential for gluinos for the 3 ab−1 run of the HL- LHC. Using conventions for low background statistics we may define the signal signifi- cance, S. With Sg as the number of signal events and B the number of background events S=Sg/√B. The conventional threshold for discovery is to take S=5 for discovery and S= 3 for sensitivity [103]. In Fig. 4.5 we have plotted significance vs. gluino masses for various squark masses in the gluino-weakino associated production channel. We have added horizontal lines to indicate the significance thresholds of 5 for discovery and 3 for sensitivity. In addition, we have added a vertical line at a gluino mass of 2.4 GeV for comparison to current stated discovery potential. In the upper plot we have shown results using our exclusive di-jet search, constructing the transverse mass mT 0. In the lower plot we show results using our exclusive di-jet search constructing the transverse mass mT i. For comparison, we give a plot using the inclusive method in Fig. 4.6. We see from Fig. 4.5 that the gluino discovery potentials in our scenario depend heavily on squark masses. ATLAS sets the current limits for squark masses decaying 100 percent

of the time viaq ˜ qχ0 to jets plus missing energy at 1550 GeV for 100 GeV neutralino → masses[23]. CMS places limits in this scenario of 1325 GeV [84]. ATLAS searches in the 1-lepton final state bound squark masses to be at least 1200 GeV [104], but this result ± involves a squark decaying through on-shell W bosons to hard leptonsq ˜ qχ qW χ0. → → In our scenario the squark has a significant branching fraction into charginos that are highly mass degenerate with the neutralino LSP, it is then a question as to what squark lower mass limits are in this scenario. The 0-lepton search limits present to us a conservative choice of lower squark mass bounds.

We can see that in our search constructing mT 0 from exclusive di-jets, we find a 5

95 Figure 4.5: Significance for gluino-weakino production vs gluino mass for various squark masses. The upper plot gives significances for the search which uses the di-jet mT 0 tansverse mass discriminant. The lower plot gives the significance for the search which uses the di-jet mT i tansverse mass discriminant.

96 sigma discovery potential for gluinos with masses of 3.1 TeV for 1.6 TeV squarks. For 1.6 TeV squark masses we find a 3 sigma sensitivity potential for gluinos of masses 3.4 TeV. One will notice a feature in this sensitivity plot that appears once the possibility of decay through an on-shell squark becomes kinematically possible for the gluino, improving the search efficiency. In the lower plot of 4.5, using the mT i discriminant constructed from exclusive di-jets, we can see that for squark masses of 2.2 TeV, we have a 5 sigma discovery potential for gluinos of 2.2 TeV with a 3 sigma sensitivity potential for gluinos of about 2.5 TeV. Our results are competitive with current projections and may raise the discovery potential for gluino masses above 2.4 in the case of lighter squark masses.

Conclusions

We have demonstrated the 5 /sigma discovery potential for the 3 ab−1 HL-LHC may extend to gluinos masses in the 2.4 to 3 TeV range by studying the gluino-weakino produc- tion channel. The discovery potential in this case is competitive with that of the standard gluino pair production channel. The resultant discovery potential comes despite smaller a production cross section than the gluino pair production process. However, the stand-out kinematics of the gluino recoiling off of a light weakino allows our analysis a large missing energy cut, and a very substantial di-jet transverse mass cut. The resulting search has a low background rate and ensures the gluinos are discoverable. This work offers an existence proof that the gluino discovery potential may be substantial

Figure 4.6: Significance for gluino-weakino production vs gluino mass for various squark masses. Significances are given for the search which uses the all-jet mT 0 transverse mass discriminant.

97 in the gluino-weakino channel by employing a very basic cut based analysis. We thus note that with optimization and improvements of the gluino-weakino analysis, it is possible that the discoverable gluino mass threshold may be raised even more. We will discuss some opportunities for expanded searches. As was mentioned before, a more sophisticated kinematic cut may be engineered to take into account the relation of missing energy to the transverse mass in the signal events. In addition, more sophisticated cuts may be made to take into account the shape of the distributions of the kinematic variables. These edge effects have been discussed in proposed searches for supersymmetric particles [105] and exotics such as Dark Matter [106]. Further, alternate regions of SUSY parameter space may be studied. One example is regions of SUSY parameter space where squark decay channels may be altered to include intermediate states like second to lightest neutralinos or on-shell vector bosons. This may add hard leptons to the events. In addition, lower mass limit on squarks in these scenarios are more loose, in which case the gluino production cross section may be increased. Another possibility is to consider models which split the masses of squark flavors. We have operated under the assumption that 4 flavors of squark are mass degenerate. This gives the toughest lower mass bounds to squarks which decay to quark plus neutralino. Splitting squark flavors may relax the lower mass bound on squarks and have interesting effects on production cross sections of the gluino weakino pair due to differing quark pdfs. Finally, in the wino or Higgsino like weakino scenarios, the mass splitting of the chargino and LSP might be adjusted to ensure that the chargino resultant in the gluino decay lives an intermediate amount of time and appears as a disappearing track. An additional disap- pearing track and two hard jets may produce a great discovery scenario for this process.

98 Chapter 5 Conclusion

Particle physics is at a critical juncture. On the one hand, we have discovered the final piece of the Standard Model of Particle Physics, the culmination of decades of theoretical and experimental work and an extremely successful theory for describing the fundamental particles and interactions that make up our universe. On the other hand, we also know that there must be more to the story, as evidenced by such phenomena as dark matter and neutrino oscillations. As we fine tune our knowledge of the SM, we continue to uncover more questions. In this work we reviewed two primary targets of particle physics, focusing on collider searches. Specifically, we focused on techniques to use “jets plus missing energy” signatures to probe both SM Higgs couplings and possible rare SUSY decays. In Chapter 2 we take advantage of the relatively large h gg branching fraction and → used leptonic decays in associated production of the Higgs with a vector boson to increase sensitivity of Higgs decay to light jets where j = g, u, d, s . We then put bounds on the { } Higgs branching fraction to these jets as compared to BRSM (h gg). The bounds we → obtained on were not as strong as bounds obtained through fitting from gg production → mode, but the process does lay the groundwork for a search for the direct decay of the Higgs to a light quark. We were also able to put bounds on the light quark Yukawa couplings that are comparable to bounds obtained by other approaches. Chapter 3 we extended the work of Chapter 2, applying basic machine learning tech- niques in an attempt to tease out a higher signal discrimination than that achieved in Chapter 2. Both an FCN and a CNN improved upon the cuts-based results in the 2-lepton and 0-lepton channels and roughly matched the cuts-based analysis in the 1-lepton channel. With the FCN we achieved a combined channel significance of 0.61 and with the CNN we achieved a combined channel significance of 0.62, as compared to the combined channel significance of 0,59 achieved in the cuts-based analysis, excluding loop proccesses. We also found the correlations of event features to the classification label of signal or background, and discussed what these correlations might indicate for future work using machine learn-

99 ing tools in this area. In particular, they highlighted the importance of jet features that were not considered at all in the cuts-based analysis. As this was a preliminary inquiry, we conclude that there is sufficient potential for improvement over the cuts-based analysis to justify continued efforts in this area. Finally in Chapter 4 we turned to physics Beyond the Standard Model, analyzing a rare supersymmetric production mode using the reconstructed transverse mass and exoected high missing energy to test the viability of extending the 5 σ discovery potential of the gluino at the LHC to a gluino mass of as much as 3 T eV . Such a search would extend the ability of the LHC to fins supersymmetric particles at the T eV scale, even as significant swaths of parameter space are being ruled out by existing searches assuming more dominant production modes. The global collider community is in the process of determining what the next particle collider will look like. Whether it is a lepton collider that is capable of serving as a precision “Higgs factory” or an even larger, more powerful hadron collider, we are still years if not decades away from seeing it in operation and getting the first results. Until that time, as searches at the LHC exclude more and more parameter space, it is becoming a challenge to tease out new physics or more precise measurements for certain processes. This thesis has offered a few new ideas for addressing some of these challenges.

100 Bibliography

[1] H. K. Dreiner, H. E. Haber and S. P. Martin, Phys. Rept. 494, 1 (2010) doi:10.1016/j.physrep.2010.05.002 [arXiv:0812.1594 [hep-ph]].

[2] H. E. Logan, arXiv:1406.1786 [hep-ph].

[3] M. Robinson, doi:10.1007/978-1-4419-8267-4

[4] S. P. Martin, Adv. Ser. Direct. High Energy Phys. 21, 1 (2010) [Adv. Ser. Direct. High Energy Phys. 18, 1 (1998)] doi:10.1142/9789812839657 0001, 10.1142/9789814307505 0001 [hep-ph/9709356].

[5] Y. Fukuda et al. [Super-Kamiokande Collaboration], Phys. Rev. Lett. 81, 1562 (1998) doi:10.1103/PhysRevLett.81.1562 [hep-ex/9807003].

[6] M. Tanabashi, et al. (Particle Data Group) Phys. Rev. D 98, 030001 (2018).

[7] ATLAS Collaboration, G. Aad et al., Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC, Phys. Lett. B716 (2012) 1–29, [arXiv:1207.7214]. [8] CMS Collaboration, S. Chatrchyan et al., Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC, Phys. Lett. B716 (2012) 30–61, [arXiv:1207.7235]. [9] M. Aaboud et al. [ATLAS Collaboration], JHEP 1808, 089 (2018) doi:10.1007/JHEP08(2018)089 [arXiv:1805.01845 [hep-ex]].

[10] B. Tuchming [CDF and D0 Collaborations], arXiv:1405.5058 [hep-ex].

[11] J. Thaler, “How to Read LHC Olympics Data Files” The MadGraph5 aMC@NLO Homepage, 19 Dec 2006 http://madgraph.phys.ucl.ac.be/Manual/lhco.html

[12] ATLAS Collaboration, G. Aad et al., Observation and measurement of Higgs boson decays to WW∗ with the ATLAS detector, Phys. Rev. D92 (2015), no. 1 012006, [arXiv:1412.2641]. [13] A. David et al. [LHC Higgs Cross Section Working Group], arXiv:1209.0040 [hep-ph].

[14] CMS Collaboration [CMS Collaboration], CMS-PAS-HIG-16-042. 101 [15] “Beyond any doubt: Higgs boson couples to the heaviest lepton.” ATLAS Updates, ATLAS Collaboration, 8 June 2018 https://atlas.cern/updates/physics-briefing/higgs-couples-heaviest-lepton

[16] M. Aaboud et al. [ATLAS Collaboration], Phys. Rev. D 99, 072001 (2019) doi:10.1103/PhysRevD.99.072001.

[17] “New ATLAS result establishes production of Higgs boson in association with top quarks.” ATLAS Updates, ATLAS Collaboration, 4 June 2018 http://atlas.cern/updates/physics-briefing/observation-tth-production

[18] M. Aaboud et al. [ATLAS Collaboration], Phys. Lett. B 784, 173 (2018) doi:10.1016/j.physletb.2018.07.035 [arXiv:1806.00425 [hep-ex]].

[19] “Higgs boson observed decaying to b quarks at last!” ATLAS Updates, ATLAS Collaboration, 9 July 2018 http://atlas.cern/updates/physics-briefing/higgs-observed-decaying-b-quarks

[20] P. Fayet, S. Ferrara, Phys. Rept. 32, (1977) doi:10.1016/0370-1573(77)90066-7.

[21] J. Terning, (International series of monographs on physics. 132) doi:10.1093/acprof:oso/9780198567639.001.0001

[22] J. Alwall, P. Schuster and N. Toro, Phys. Rev. D 79, 075020 (2009) doi:10.1103/PhysRevD.79.075020 [arXiv:0810.3921 [hep-ph]].

[23] M. Aaboud et al. [ATLAS Collaboration], Phys. Rev. D 97, no. 11, 112001 (2018) doi:10.1103/PhysRevD.97.112001 [arXiv:1712.02332 [hep-ex]].

[24] A. M. Sirunyan et al. [CMS Collaboration], Eur. Phys. J. C 77, no. 10, 710 (2017) doi:10.1140/epjc/s10052-017-5267-x [arXiv:1705.04650 [hep-ex]].

[25] CMS-PAS-SUS-14-012 ”Supersymmetry discovery potential in future LHC and HL-LHC running with the CMS detector” http://cds.cern.ch/record/1981344

[26] ATL-PHYS-PUB-2014-010 ” Search for Supersymmetry at the high luminosity LHC with the ATLAS experiment” https://cds.cern.ch/record/1735031

[27] K. A. Ulmer [CMS Collaboration], arXiv:1310.0781 [hep-ex].

[28] J. D. Wells, hep-ph/0306127.

[29] K. Hultqvist, OR. Jacobsson, and K.E. Johansson Nuc. Inst. Meth. Phys. Res. A 364, oo 193-200 (1995)

[30] G. E. Hinton, Osindero, S., and Teh, Y. Neural Computation, 18, pp 1527-1554 (2006)

[31] K. Datta and A. J. Larkoski, JHEP 1803, 086 (2018) doi:10.1007/JHEP03(2018)086 [arXiv:1710.01305 [hep-ph]].

[32] S. Chatrchyan et al. [CMS Collaboration], Phys. Rev. D 87, no. 7, 072001 (2013) doi:10.1103/PhysRevD.87.072001 [arXiv:1301.0916 [hep-ex]].

102 [33] T. Aaltonen et al. [CDF Collaboration], Phys. Rev. Lett. 104, 141801 (2010) doi:10.1103/PhysRevLett.104.141801 [arXiv:0911.3935 [hep-ex]].

[34] T. Aaltonen et al. [CDF Collaboration], Phys. Rev. D 80, 012002 (2009) doi:10.1103/PhysRevD.80.012002 [arXiv:0905.3155 [hep-ex]].

[35] J. Lin, M. Freytsis, I. Moult and B. Nachman, JHEP 1810, 101 (2018) doi:10.1007/JHEP10(2018)101 [arXiv:1807.10768 [hep-ph]].

[36] A. Dey, J. Lahiri and B. Mukhopadhyaya, arXiv:1905.02242 [hep-ph].

[37] P. Brtschi, C. Galloni, C. Lange and B. Kilminster, Nucl. Instrum. Meth. A 929, 29 (2019) doi:10.1016/j.nima.2019.03.029 [arXiv:1904.04924 [hep-ex]].

[38] A. Geron, O’Reilly Media. (2017) ISBN: 978-1491962299

[39] F. Chollet, Manning Publications Co. (2017) ISBN: 978-1617294433

[40] A. Mordvintsev, C. Olah, M Tyka. ”Google AI Blog.” June 17, 2015, updates July 7, 2015. https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html/.

[41] Google Cultural Institute artsandculture.google.com/.

[42] Google Deep Dream Generator Gallery deepdreamgenerator.com/.

[43] CMS Collaboration, S. Chatrchyan et al., Measurement of Higgs boson production and properties in the WW decay channel with leptonic final states, JHEP 01 (2014) 096, [arXiv:1312.1129].

[44] ATLAS, CMS Collaboration, G. Aad et al., Measurements of the Higgs boson production and decay rates and constraints on its couplings from a combined ATLAS and CMS analysis of the LHC pp collision data at √s = 7 and 8 TeV, JHEP 08 (2016) 045, [arXiv:1606.02266].

[45] CMS Collaboration, V. Khachatryan et al., Search for a Standard Model Higgs Boson Produced in Association with a Top-Quark Pair and Decaying to Bottom Quarks Using a Matrix Element Method, Eur. Phys. J. C75 (2015), no. 6 251, [arXiv:1502.02485].

[46] ATLAS Collaboration, G. Aad et al., Search for the Standard Model Higgs boson decaying into bb produced in association with top quarks decaying hadronically in pp collisions at √s = 8 TeV with the ATLAS detector, JHEP 05 (2016) 160, [arXiv:1604.03812].

[47] ATLAS Collaboration, A study of standard model higgs boson production in the decay mode h bb in association with a w or z boson for high luminosity lhc running, tech. rep., ATL-PHYS-PUB-2014-011, July, 2014.

[48] ATLAS Collaboration, Projections for measurements of Higgs boson signal strengths and coupling parameters with the ATLAS detector at a HL-LHC, Tech. Rep. ATL-PHYS-PUB-2014-016, CERN, Geneva, Oct, 2014. 103 [49] T. Han and B. McElrath, h to mu+ mu- via gluon fusion at the LHC, Phys. Lett. B528 (2002) 81–85, [hep-ph/0201023].

[50] D. Asner, T. Barklow, C. Calancha, K. Fujii, N. Graf, H. E. Haber, A. Ishikawa, S. Kanemura, S. Kawada, M. Kurata, A. Miyamoto, H. Neal, H. Ono, C. Potter, J. Strube, T. Suehara, T. Tanabe, J. Tian, K. Tsumura, S. Watanuki, G. Weiglein, K. Yagyu, and H. Yokoya, Ilc higgs white paper, arXiv:1310.0763.

[51] FCC-ee study Collaboration, M. Koratzinos, FCC-ee accelerator parameters, performance and limitations, Nucl. Part. Phys. Proc. 273-275 (2016) 2326–2328, [arXiv:1411.2819].

[52] CEPC-SPPC Study Group Collaboration, CEPC-SPPC Preliminary Conceptual Design Report. 1. Physics and Detector, http://inspirehep.net/record/1395734,.

[53] J. M. Butterworth, A. R. Davison, M. Rubin, and G. P. Salam, Jet substructure as a new Higgs search channel at the LHC, AIP Conf. Proc. 1078 (2009) 189–191, [arXiv:0809.2530].

[54] G. Perez, Y. Soreq, E. Stamou, and K. Tobioka, Prospects for measuring the Higgs coupling to light quarks, arXiv:1505.06689.

[55] G. T. Bodwin, F. Petriello, S. Stoynev, and M. Velasco, Higgs boson decays to quarkonia and the Hcc¯ coupling, Phys. Rev. D88 (2013), no. 5 053003, [arXiv:1306.5770].

[56] ATLAS Collaboration, G. Aad et al., Search for Higgs and Z Boson Decays to J/ and (nS) with the ATLAS Detector, Phys. Rev. Lett. 114 (2015), no. 12 121801, [arXiv:1501.03276].

[57] ATLAS Collaboration, Search for the Standard Model Higgs and Z Boson decays to J/ψ γ: HL-LHC projections, Tech. Rep. ATL-PHYS-PUB-2015-043, CERN, Geneva, Sep, 2015.

[58] LHC Higgs Cross Section Working Group Collaboration, D. de Florian et al., Handbook of LHC Higgs Cross Sections: 4. Deciphering the Nature of the Higgs Sector, arXiv:1610.07922.

[59] J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, O. Mattelaer, H. S. Shao, T. Stelzer, P. Torrielli, and M. Zaro, The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations, JHEP 07 (2014) 079, [arXiv:1405.0301].

[60] P. Artoisenet, R. Frederix, O. Mattelaer and R. Rietkerk, JHEP 1303, 015 (2013) doi:10.1007/JHEP03(2013)015 [arXiv:1212.3460 [hep-ph]].

[61] C. Englert, M. McCullough, and M. Spannowsky, Gluon-initiated associated production boosts higgs physics, arXiv:1310.4828.

[62] T. Sjostrand, S. Mrenna, and P. Z. Skands, PYTHIA 6.4 Physics and Manual, JHEP 05 (2006) 026, [hep-ph/0603175]. 104 [63] DELPHES 3 Collaboration, J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaˆıtre,A. Mertens, and M. Selvaggi, DELPHES 3, A modular framework for fast simulation of a generic collider experiment, JHEP 02 (2014) 057, [arXiv:1307.6346].

[64] S. D. Ellis, C. K. Vermilion, and J. R. Walsh, Techniques for improved heavy particle searches with jet substructure, Phys. Rev. D80 (2009) 051501, [arXiv:0903.5081].

[65] D. Krohn, J. Thaler, and L.-T. Wang, Jet Trimming, JHEP 02 (2010) 084, [arXiv:0912.1342].

[66] J. Thaler and K. Van Tilburg, Identifying Boosted Objects with N-subjettiness, JHEP 03 (2011) 015, [arXiv:1011.2268].

[67] J. M. Butterworth, I. Ochoa, and T. Scanlon, Boosted Higgs b¯b in vector-boson associated production at 14 TeV, Eur. Phys. J. C75 (2015), no.→ 8 366, [arXiv:1506.04973].

[68] A. Banfi and J. Cancino, Implications of qcd radiative corrections on high-pt higgs searches, arXiv:1207.0674.

[69] ATLAS Collaboration, Expected pileup values at the HL-LHC, Tech. Rep. ATL-UPGRADE-PUB-2013-014, CERN, Geneva, Sep, 2013.

[70] C. Shimmin and D. Whiteson, Boosting low-mass hadronic resonances, Phys. Rev. D94 (2016), no. 5 055001, [arXiv:1602.07727].

[71] F. Bishara, U. Haisch, P. F. Monni, and E. Re, Constraining light-quark yukawa couplings from higgs distributions, arXiv:1606.09253.

[72] CMS Collaboration, Search for the standard model higgs boson produced through vector boson fusion and decaying to b bbar, arXiv:1506.01010.

[73] ATLAS Collaboration, Search for the standard model higgs boson decaying into b¯b produced in association with top quarks decaying hadronically in pp collisions at √s=8 tev with the atlas detector, arXiv:1604.03812.

[74] CMS Collaboration, VBF H to bb using the 2015 data sample,.

[75] G. Perez, Y. Soreq, E. Stamou, and K. Tobioka, Constraining the charm yukawa and higgs-quark coupling universality, arXiv:1503.00290.

[76] Y. Zhou, Constraining the Higgs boson coupling to light quarks in the HZZ final states, Phys. Rev. D93 (2016), no. 1 013019, [arXiv:1505.06369].

[77] Y. Soreq, H. X. Zhu, and J. Zupan, Light quark yukawa couplings from higgs kinematics, arXiv:1606.09621.

[78] G. Bonner and H. E. Logan, Constraining the Higgs couplings to up and down quarks using production kinematics at the CERN Large Hadron Collider, arXiv:1608.04376.

105 [79] A. L. Kagan, G. Perez, F. Petriello, Y. Soreq, S. Stoynev, and J. Zupan, Exclusive Window onto Higgs Yukawa Couplings, Phys. Rev. Lett. 114 (2015), no. 10 101802, [arXiv:1406.1722].

[80] Project Jupyter jupyter.org/.

[81] Google Colab colab.research.google.com/.

[82] Keras keras.io/.

[83] Tensorflow www.tensorflow.org/.

[84] A. M. Sirunyan et al. [CMS Collaboration], JHEP 1805, 025 (2018) doi:10.1007/JHEP05(2018)025 [arXiv:1802.02110 [hep-ex]].

[85] H. Baer, D. D. Karatas and X. Tata, Phys. Rev. D 42, 2259 (1990). doi:10.1103/PhysRevD.42.2259

[86] C. F. Berger, J. S. Gainer, J. L. Hewett and T. G. Rizzo, JHEP 0902, 023 (2009) doi:10.1088/1126-6708/2009/02/023 [arXiv:0812.0980 [hep-ph]].

[87] P. Meade, N. Seiberg and D. Shih, Prog. Theor. Phys. Suppl. 177, 143 (2009) doi:10.1143/PTPS.177.143 [arXiv:0801.3278 [hep-ph]].

[88] L. M. Carpenter, M. Dine, G. Festuccia and J. D. Mason, Phys. Rev. D 79, 035002 (2009) doi:10.1103/PhysRevD.79.035002 [arXiv:0805.2944 [hep-ph]].

[89] L. M. Carpenter, arXiv:0812.2051 [hep-ph].

[90] A. Rajaraman, Y. Shirman, J. Smidt and F. Yu, Phys. Lett. B 678, 367 (2009) doi:10.1016/j.physletb.2009.06.047 [arXiv:0903.0668 [hep-ph]].

[91] L. M. Carpenter, arXiv:1712.10269 [hep-ph].

[92] L. M. Carpenter, P. J. Fox and D. E. Kaplan, hep-ph/0503093.

[93] C. Degrande, B. Fuks, V. Hirschi, J. Proudom and H. S. Shao, Phys. Lett. B 755, 82 (2016) doi:10.1016/j.physletb.2016.01.067 [arXiv:1510.00391 [hep-ph]].

[94] C. H. Chen, M. Drees and J. F. Gunion, hep-ph/9902309.

[95] M. Aaboud et al. [ATLAS Collaboration], JHEP 1806, 022 (2018) doi:10.1007/JHEP06(2018)022 [arXiv:1712.02118 [hep-ex]].

[96] A. Anandakrishnan, L. M. Carpenter and S. Raby, Phys. Rev. D 90, no. 5, 055004 (2014) doi:10.1103/PhysRevD.90.055004 [arXiv:1407.1833 [hep-ph]].

[97] H. Baer, A. Mustafayev and X. Tata, Phys. Rev. D 89, no. 5, 055007 (2014) doi:10.1103/PhysRevD.89.055007 [arXiv:1401.1162 [hep-ph]].

[98] ATL-PHYS-PUB-2018-031 ”ATLAS sensitivity to winos and higgsinos with a highly compressed mass spectrum at the HL-LHC” https://cds.cern.ch/record/2647294

106 [99] T. Sjstrand et al., Comput. Phys. Commun. 191, 159 (2015) doi:10.1016/j.cpc.2015.01.024 [arXiv:1410.3012 [hep-ph]].

[100] J. Conway, ”Pretty Good Simulation of High energy collisions,” http://www.physics.ucdavis.edu/ conway/research/software/pgs/pgs4-general.htm.

[101] E. L. Berger, M. Klasen and T. M. P. Tait, Phys. Lett. B 459, 165 (1999) doi:10.1016/S0370-2693(99)00617-6 [hep-ph/9902350].

[102] B. Fuks, M. Klasen and M. Rothering, JHEP 1607, 053 (2016) doi:10.1007/JHEP07(2016)053 [arXiv:1604.01023 [hep-ph]].

[103] H. Baer, V. Barger, A. Lessa and X. Tata, JHEP 0909, 063 (2009) doi:10.1088/1126-6708/2009/09/063 [arXiv:0907.1922 [hep-ph]].

[104] M. Aaboud et al. [ATLAS Collaboration], Phys. Rev. D 96, no. 11, 112010 (2017) doi:10.1103/PhysRevD.96.112010 [arXiv:1708.08232 [hep-ex]].

[105] B. C. Allanach, S. Grab and H. E. Haber, JHEP 1101, 138 (2011) Erratum: [JHEP 1107, 087 (2011)] Erratum: [JHEP 1109, 027 (2011)] doi:10.1007/JHEP07(2011)087, 10.1007/JHEP09(2011)027, 10.1007/JHEP01(2011)138 [arXiv:1010.4261 [hep-ph]].

[106] D. Abercrombie et al., arXiv:1507.00966 [hep-ex].

[107] G. Aad et al. [ATLAS and CMS Collaborations], Phys. Rev. Lett. 114, 191803 (2015) doi:10.1103/PhysRevLett.114.191803 [arXiv:1503.07589 [hep-ex]].

[108] H. Haber, “The Theory of Higgs Bosons: The Standard Model and Beyond” Idpasc Higgs School Lectures, 2011 http://www.idpasc.lip.pt/file.php/1/higgs2011/lectures/Haber.pdf

[109] J. Haller, A. Hoecker, et al. (The Gfitter Group) Eur. Phys. J. C 78, 675 (2018). https://doi.org/10.1140/epjc/s10052-018-6131-3

[110] S. Dittmaier , et al. (LHC Higgs Cross Section Working Group) (2011), 1101.0593.

[111] N. Craig, C. Englert and M. McCullough, Phys. Rev. Lett. 111, no. 12, 121803 (2013) doi:10.1103/PhysRevLett.111.121803 [arXiv:1305.5251 [hep-ph]].

[112] C. Englert and M. McCullough, JHEP 1307, 168 (2013) doi:10.1007/JHEP07(2013)168 [arXiv:1303.1526 [hep-ph]].

107 Appendix A Machine Learning Data

This appendix contains a “key” for the 41 12 images used in Chapter 3 as well as a × more detailed look at the feature correlation values.

Feature Key

The images were assembled using existing code that had been written for the traditional cuts-based analysis described in Chapter 2. Rather than significantly altering the code, or writing new code from scratch for the project described in Chapter 3, we used the same code to extract the features used to assemble the images. These features include original output from the source LHCO files, plus a number of higher level, constructed features that were relevant to the various cuts used in Chapter 2. To facilitate the cuts described in Chapter 2, some additional particle types contained redundant information or had open slots for pass/fail cues. These cues were usually set to 0 or 1. Thus this key to the 41 12 images used in Chapter 3 is somewhat messy, with an × organization designed for a different project and information that is redundant. Despite this, we have made an effort to describe the nature and source of each feature below. For the preliminary investigation into the viability of using a neural network and com- pare results to an earlier cut-based analysis of the same data, the messiness of the data was acceptable. Now that the preliminary investigation has returned promising results, significant data cleanup will be performed to assemble more compact and efficient images before final results are submitted for publication. The actual first line of each event in the augmented LHCO file contains header infor- mation such as the originating process, the process cross section, and the label. These constituted “administrative” features in a ragged, 5-column first line to the event which was excised prior to assembling each image. Thus the features of the input data in each image do not begin at 1; their numbering is offset by 5. This makes it more difficult to im- mediately identify which row and column a given feature number belongs to. Therefore we

108 have also mapped the feature numbers to the appropriate row and column in the augmented LHCO file. For the final data cleanup we will append the administrative information as a tail rather than a header, to make the image feature numbering more intuitive for the final results. Key to letter superscripts:

a These features are taken directly from the source LHCO file.

typ : LHCO particle type number [11]: 0 = photons, 1 = electrons, 2= muons, • 3 = hadronically-decaying taus, 4 = jets, 6 = missing transverse energy eta, phi, jmas, ntrk, btag • h/e had/em. • ≡ dum1, dum2: dummy variables that can be filled in as needed by users of the • LHCO files.

b These features are filled in as 0.0 to create a uniform 41 12 image. × c LHCO particle type 6: typa, 0.0, phia, pta, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0b, 0.0b

d Additional particle types: lepton pairs (7,8); isojet features(10); 1-lep eletron(11); muon(12) and met(13) features; met + jets variables(14)

e Reconstructed features from two or more basic LHCO file features.

f Some slots were left open to be filled with pass/fail cues for the cuts/based analysis. These cues were usually set to 0 or 1.

g Parent particle used for the features on this line.

h Additional particle type 1026: isojet info27 - typd, jet etaa, jet phia, jet pta, jet jmasa, jet ntrka, jet btaga, openf , jet par, openf , 1.0, openf ,

i Additional particle type 13: 1-lep channel info - typd, met etaa, met phia, met pta, f f reconstructed met px, reconstructed met py, sign of lepton in event, open , open , openf , lep par, openf , openf

j Additional particle type 14: met + jets variables - typd, met eta (0.0), met phia, TVQk, jmas sumk, ntrk sumk, btag sumk, met pta, openf ,openf ,openf ,openf

26It was discovered after all results had been obtained that the creation of additional particle type 9 was dependent on an earlier jet cut that had been deleted; because cuts were not performed for this analysis, particle type 9 was never triggered. We hope to include particle type 9 in the analysis prior to submitting this work for publication. 27The quantities in this line are associated with a very rare jet type that was triggered accidentally. We do not believe correlations to this line are relevant. We hope to correct this problem prior to submitting this work for publication. 109 The numeral superscripts and subscripts on the map of each feature number correspond to the highest 15 positively and negatively correlated features per channel. Features that are in the top 15 positively correlated list are in bold with list placement as a superscript (blue=0-lepton channel, brown = 1-lepton channel, red = 2-lepton channel). Subscripts deonte negatively correlated features with the same color and ranking scheme.

110 rows columns 1 2 3 4 5 6 7 8 9 10 11 12 1 LHCO particle type 0: photons 2 typa etaa phia pta jmasa ntrka btaga h/ea dum1a dum2a dum3b dum4b 3 4 LHCO particle type 1: electrons 5 typa etaa phia pta jmasa ntrka btaga h/ea dum1a dum2a dum3b dum4b 6 7 LHCO particle type 2: µs 8 typa etaa phia pta jmasa ntrka btaga h/ea dum1a dum2a dum3b dum4b 9 10 LHCO particle type 3: τs 11 typa etaa phia pta jmasa ntrka btaga h/ea dum1a dum2a dum3b dum4b 12 13 14 LHCO particle type 4: jets 15 typa etaa phia pta jmasa ntrka btaga h/ea dum1a dum2a dum3b dum4b 16 17 18 19 20 21 22 23 24 25 26 27 28 LHCO particle type 6: missing energyc

Table A.1: Key to the first 28 rows of the 41 × 12 particle collision event images.

111 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 rows -etncanl e -etncanl.Sbcit enengtvl orltdfaue ihtesame the with brown features channel, correlated (blue=0-lepton negatively superscript deonte a Subscripts as channel). placement scheme. 2-lepton list color = with red bold channel, in 1-lepton are = labels to correlations highest A.2: Table 330 318 306 294 282 270 258 246 234 222 210 198 186 174 162 150 138 126 114 102 90 78 66 54 42 30 18 6 1 283 271 259 247 235 223 211 199 187 175 163 151 139 127 115 103 331 319 307 295 91 79 67 55 43 31 19 7 2 etr a o at1 oso h 41 the of rows 13 last for map Feature 212 224 200 116 236 188 176 284 272 260 248 164 152 140 128 104 332 320 308 296 92 80 68 56 44 32 20 12 8 3 13 9 2 , 13 15 13 , , 8 13 10 , , 6 12 225 213 201 237 177 165 117 81 129 189 333 93 45 285 273 261 249 153 141 105 321 309 297 69 57 33 21 9 4 6 10 4 8 5 12 14 , 8 1 7 , , 11 15 , , 3 2 1 4 , , 15 8 5 3 , , 5 , 4 7 5 226 214 178 166 238 203 190 130 118 106 334 322 310 298 286 274 262 250 154 142 94 82 70 58 46 34 22 10 9 5 10 , 9 14 2 , , 7 4 7 , 10 6 6 , , 11 columns 9 112 × 2pril olso vn mg.Faue ihthe with Features image. event collision particle 12 191 203 167 179 239 227 215 83 131 119 107 335 323 311 299 287 275 263 251 155 143 95 71 59 47 35 23 11 6 10 12 1 15 5 , 11 14 10 1 , , 12 14 2 , 1 168 132 120 108 336 324 312 300 288 276 264 252 240 228 216 204 192 180 156 144 96 84 72 60 48 36 24 12 7 15 169 133 121 109 337 325 313 301 289 277 265 253 241 229 217 205 193 181 157 145 97 85 73 61 49 37 25 13 8 3 134 122 110 338 326 314 302 290 278 266 254 242 230 218 206 194 182 170 158 146 98 86 74 62 50 38 26 14 9 135 123 111 339 327 315 303 291 279 267 255 243 231 219 207 195 183 171 159 147 99 87 75 63 51 39 27 15 10 340 328 316 304 292 280 268 256 244 232 220 208 196 184 172 160 148 136 124 112 100 88 76 64 52 40 28 16 11 341 329 317 305 293 281 269 257 245 233 221 209 197 185 173 161 149 137 125 113 101 89 77 65 53 41 29 17 12 row columns 1 2 3 4 5 6 7 8 9 10 11 12 29 Added particle type 7: electron pairs d e e e e f f f g g f f 30 typ eta`` phi`` pt`` m`` open open open par1 par2 open open 31 Added particle type 8: muon pairs d e e e e f f f g g f f 32 typ eta`` phi`` pt`` m`` open open open par1 par2 open open 33 Added particle type 10: isojet variablesh 34 Added particle type 11:1-lepton channel electron variables d e f g f f 35 typ lep lep lep lep lep lep ptW open par open open a a a e e a 36 eta phi pt px py sign 37 Added particle type 12: 1-lepton channel muon variables d e f g f f 38 typ lep lep lep lep lep lep ptW open par open open a a a e e a 39 eta phi pt px py sign 40 Added particle type 13: 1-lepton channel met variablesi 41 Added particle type 14: met + jets variablesj

Table A.3: Key to the last 13 rows of the 41 × 12 particle collision event images.

rows columns 1 2 3 4 5 6 7 8 9 10 11 12 29 342 343 344 3454 346 347 348 349 350 351 352 353 30 354 355 356 357 358 359 360 361 362 363 364 365 31 366 367 368 3693 370 371 372 737 374 375 376 377 32 378 379 380 381 382 383 384 385 386 387 388 389 33 390 391 3925 3934 394 39511 396 397 3983 399 4002 401 34 402 403 404 4058 406 407 408 40912 410 411 412 413 35 414 415 416 417 418 419 420 421 422 423 424 425 36 426 427 428 429 430 431 432 433 434 435 436 437 37 438 439 440 4419,7 442 443 444 4456,9 446 447 448 449 38 450 451 452 45313 454 455 456 45715 458 459 460 461 39 462 463 464 465 466 467 468 469 470 471 472 473 40 474 475 47614 477 478 479 480 481 482 483 48413,11 485 11 7 6 3 41 486 487 4882 489 8 490 491 492 493 14 494 495 4961 497

Table A.4: Feature map for last 13 rows of the 41 × 12 particle collision event image. Features with the highest correlations to labels are in bold with list placement as a superscript (blue=0-lepton channel, brown = 1-lepton channel, red = 2-lepton channel). Subscripts deonte negatively correlated features with the same color scheme.

113 Correlation Tables

We used the same code to produce the higher level parameter values in each event as was used in the study in Chapter 2. These rows were designed to make it easier to manipulate the cuts that were described in Chapter 2, channel by channel. Thus, some values appear redundantly in multiple particle types because that quantity was relevant to more than one cut. Therefore some of the correlations are redundant. However, some of these arrays were not created unless the event met some minimum criteria such as number of leptons present. Therefore, features that appear to represent identical parameters may also have different correlations to the label. Other features in the various rows are probably irrelevant to the network and may actually hinder its learning efficiency. Because the preliminary study proved promising, future efforts on this project will include better grooming of the dataset for the neural network, to include a more efficient presentation of the higher level parameters. However, inclusion of every feature, regardless of our (human) expectations of their importance, did reveal some interesting correlations that could be relevant in designing future cuts-based approaches. We mentioned in Section 3.2 that setting the cutoff values to 0 or infinity as appropriate inadvertently resulted in the omission of particle type 9 (jet-pair features) from being cre- ated. Another mistake in resetting the cutoff values occurred in particle type 10, leading to some misleading information in the correlation matrix for the 1-lepton channel which will be addressed in Section 3.4.2.

2-lepton Correlations

The positively correlated features in the 2-lepton channel are discussed in Section 3.4. In general, the negative correlations were much weaker than the positively correlated features in the 2-lepton channel. The strongest negatively correlated feature was part of a baseline cut for an additional particle type intended to be used with the 0-lepton channel.

We do not expect a high ET for the 2-lepton channel so this inverse correlation makes sense. The τ variables also make sense as we do not expect the signal events to contain τ’s, therefore τ parameters further from zero become suspect. Othe negatively correlated features involve characteristics of 4th and subsequent jets. Again, this makes sense because we do not expect a large number of higher-pT jets in the signal channel. In Chapter 2, Sec. 2.4, we briefly mentioned a new observable, the TVQ, defined in Eq. 2.14:

T vQ Σi pT i ET . (A.1) ≡ | | − | | Here we see that TVQ is negatively correlated in the 2-lepton channel. Like feature 496, this makes sense in that we do not expect significant missing energy in this channel to 114 feature # correlation description

496 -0.073496 ET > 200GeV (P/F)

488 -0.061004 ET phi

117 -0.051072 leading τ pT

213 -0.043753 pT of 5th jet

225 -0.043217 pT of 6th jet 224 -0.040404 phi of 6th jet

201 -0.040165 pT of 4th jet 489 -0.040089 TVQ 214 -0.038151 jmas of 5th jet 116 -0.037861 leading τ phi 226 -0.033595 jmas of 6th jet 212 -0.030851 phi of 5th jet 200 -0.029495 phi of 4th jet 493 -0.029358 met pt

237 -0.028148 pT of 7th jet

Table A.5: 15 features with the highest negative correlation to the event label for the 2-lepton channel.

balance the scalar sum of the jet pT .

115 1-lepton Correlations

feature # correlation description 167 0.133106 ntrk for leading jet 400 0.047287 isojet minimum reconstructed mass 398 0.043797 isojet ID 393 0.041879 isojet pt 392 0.039077 isojet phi 445 0.035911 reconstructed W boson mass (muon)

45 0.033642 leading electron pT

405 0.033642 leading electron pT

441 0.033335 leading muon pT

81 0.033335 leading muon pT 395 0.033231 isojet ntrk 409 0.031208 reconstructed W boson mass (muon)

484 0.027969 ET > 30 GeV (P/F)

476 0.021377 ET phi 83 0.018029 charge of lepton (electron/positron)

Table A.6: 15 features with the highest correlation to the event label for the 1-lepton channel.

The 1-lepton channel in general had very low values for positive correlations. Aside from the number of tracks in the leading jet, which was the most highly correlated feature for all three channels, all the correlations for the 1-lepton channel are actually lower than the 15th item on the 2-lepton list. This makes it a bit more dangerous to draw conclusions from Table A.6. The 15th feature in the list makes this clear: it is the sign of the electrons in the event. Intuition tells us that this should not be correlated to the label of signal or background, as equal numbers of positive and negative W bosons were simulated. However, the correlation is extremely low (0.018) and is therefore likely be a statistical fluke with this particular dataset. The unreliable nature of such low correlations is further emphasized by the repeated appearance of the additional particle type 10, “isojet”, in the 2nd-5th place highest correla- tions, because the isojet row in the neural network image is actually the result of a mistake. This particle type was originally created to assess the possibility that the two jets resulting

116 feature # correlation description

189 -0.308795 pT of 3rd jet

201 -0.308396 pT of 4th jet

213 -0.263636 pT of 5th jet 202 -0.262844 jmas of 4th jet

177 -0.240593 pT of 2rd jet 214 -0.236062 jmas of 5th jet 190 -0.227026 jmas of 3rd jet 212 -0.225768 phi of 5th jet 200 -0.224410 phi of 4th jet 215 -0.204565 ntrk of 5th jet

225 -0.201718 pT of 6th jet 203 -0.201161 ntrk of 4th jet 224 -0.186669 phi of 6th jet 226 -0.183289 jmas of 6th jet 168 -0.168107 btag of 1st jet

Table A.7: 15 features with the highest negative correlation to the event label for the 1-lepton channel.

from the Higgs decay might be sufficiently collimated to be seen as the detector as a single jet by looking for a single jet with a jet pT > 30 GeV and a jmas value between 95 and 150 GeV. However, as previously mentioned, for the neural network the code was modified to eliminate as many traditional cuts on the data as possible. The mistake was made when both the upper and lower jmas limits were set to 0, resulting in the selection of extremelty rare specimens with a jet pT > 30 GeV but a jmas value of exactly 0. Qualified jets were extremely rare; upon further investigation, only 31 of these events were found in all 20,000 ∼ signal events and only 23 were found in all 40,000 background events, all coming from ∼ the tt¯ secondary background. Thus jets qualifying for this particle type were about three times as likely to occur in the signal channel, but with such a low occurrence rate in either channel interpretation of the significance of these odd jets is difficult. Unlike the 2-lepton channel, the top 15 negatively correlated features were much more strongly (negatively) correlated, implying that for this channel, it is just as much what the event is not as what the event is. In fact, the top two negatively correlated features have the highest absolute value of any feature correlation in the dataset, for any channel, and the strong negative correlations continue through at least the first 15 features. The top 14 negatively correlated features all correspond to jet variables for the 3rd

117 or subsequent jet, strongly suggesting that events with a large number of jets in general indicates background in this channel. The correlations of each of the features are close enough that the specific order in which they appear on the list may not be be as important as the general categories. A 3rd jet with higher pT is disfavored as signal, and in general

4th and subsequent jets with higher pT , jmas, phi, and number of tracks are disfavored. The 15th feature on the list suggests that b-tags in the leading jet disfavor the event as signal, which is to be expected since the signal events were explicitly generated without bot- tom quarks in the decay products while the tt¯ background process was explicitly generated to include bottom quarks. Although the one meaningful positive correlation and all of the negative correlations involved jet parameters, it should be remembered that both the signal and background processes were generated with the explicit presence of one lepton and one neutrino in the decay products. Thus even if the neural network did not make much use of leptonic features for this channel, there is an underlying assumption that both the signal and the background events were selected from all possible LHC events based on a leptonic signal. Also, the neural network may still have identified and utilized patterns between the leptonic features even if the individual leptonic features were not highly correlated to a label of signal or background. Unlike the analysis in Chapter 2, this analysis did not have a veto for more than one lepton in this channel, and if there was a second lepton in the event the network would have been non-judgmentally presented with it as a second leading lepton. Thus any additional leptons, if present, were not significantly correlated to the label, as no lepton features other than those corresponding to the leading lepton appeared in either the top 15 positively or the top 15 negatively correlated features.

0-lepton Correlations

The number of jet tracks in the three jets with the highest pT again appear, as well

as other jet characteristics of the top three jets like phi and jmas. ET makes a strong appearance, which again validates using a boosted Z produced in association with a Higgs as a leptonic handle. Although feature 493 and feature 333 both have the same value, the additional particle type 14 was not created unless the event passed a lepton rejection. Thus the ET is more highly correlated to signal in conjunction with the absence of any leptons in the event. Again cautioning that with 492 features it is dangerous to give too much weight to the correlation between a single feature and the label, we nonetheless note that four of the top 11 features in Table A.8 are related to the TVQ observable mentioned in Chapter 2. This could simply be due to the preliminary lepton rejection, although the difference this lepton rejection made between feature 493 and 333 was slight. Right now the TVQ variables are

118 feature # correlation description 167 0.288625 ntrk for leading jet 166 0.201417 jmas of leading jet

493 0.191779 ET (met pt)

333 0.191353 ET (met pt) 179 0.174201 ntrk for 2nd leading jet 491 0.155868 ntrk sum of all jets 490 0.135409 jmas sum of all jets 165 0.091821 pt of leading jet 178 0.079682 jmas of 2nd jet 191 0.078298 ntrk for 3rd leading jet 489 0.043452 TVQ (scalar sum of all jet pt) 203 0.026191 ntrk for 4th leading jet 176 0.015063 phi of 2nd jet 177 0.013437 pt of 2nd jet 188 0.013202 phi of 3rd jet

Table A.8: 15 features with the highest correlation to the event label for the 0-lepton channel. Features related to the TVQ observable first mentioned in Chapter 2 have been highlighted.

just a sum of other features that were also found to be more highly correlated. The sum is less correlated than the leading jet entry for the same observable, but more correlated than subsequent jet entries with the exception of the 2nd leading jet ntrk which has shown itself to be surprisingly important in all three channels. It will be interesting to see in future work how these correlations shift when the feature set has been better engineered for a neural network. For example, the code used for the preliminary study unfortunately did not include the sum of all jet tracks, the sum of all phi’s, or the sum of all jmas in a particle type that did not go through the preliminary lepton rejection first, so there is no way to tell if it is the summing of other jet features or the lepton rejection that gives them the relatively high correlation. Most of the negatively correlated features are not surprising and have a similar inter- pretation as those for the other channels. τ’s disfavor signal fairly strongly; the remaining features have a relatively low correlation, indicating that this channel may have a higher tolerance for events with many jets than the other channels. One interesting feature to make an appearance is the leading jet had/em. This feature gives the ratio of energy de- posited in the hadronic versus the electromagnetic calorimeters. This quantity was ignored in the cuts-based analysis in Chapter 2. Although it is third on the list its correlation is

119 feature # correlation description

117 -0.066957 leading τ pT 116 -0.054793 leading τ phi 169 -0.040121 leading jet had/em

213 -0.038970 pT of 5th jet

237 -0.036997 pT of 7th jet

225 -0.036747 pT of 6th jet 238 - 0.035580 jmas of 7th jet

201 -0.032725 pT of 4th jet 226 -0.031868 jmas of 6th jet 214 -0.030953 jmas of 5th jet 239 -0.030693 ntrk 7th jet 212 -0.027689 phi of 5th jet 236 -0.027545 phi of 7th jet 227 -0.025849 ntrk of 6th jet

129 -0.025132 2nd τ pT

Table A.9: 15 features with the highest negative correlation to the event label for the 0-lepton channel.

still fairly low so it is unclear if it is a good candidate for future study or just an interesting coincidence.

120