<<

Measurement of Jet Substructure in Boosted Top Decays at CMS

vorgelegt von Jan Skottke geboren am 3. Januar 1995

Masterarbeit im Studiengang Physik Universität Hamburg

November 2019 1. Gutachter: Prof. Dr. Johannes Haller 2. Gutachter: Dr. Roman Kogler Abstract

A measurement is presented of the differential t¯t production cross section as a function of the N-subjettiness ratio τ32 in fully merged hadronic de- 0 cays. These decays of the type t → bW → bqq¯ are reconstructed using anti-kT jets with a distance parameter of 0.8 and transverse momentum greater than 400 GeV. The measurement is performed with data collected with the CMS detector at a center-of-mass energy of 13 TeV in 2016 corresponding to an integrated lumi- nosity of 35.9 fb−1. The data is unfolded to the particle level in order to correct for detector effects. This makes it possible to compare the data to predictions from event generators using different tunes and varied model parameters. It is found that the measurement is sensitive to the simulation of final state radia- tion, allowing for the possibility to constrain model parameters associated with it. This is important for achieving higher precision in the identification of jets originating from fully merged hadronic top quark decays.

iii

Zusammenfassung

Die vorliegende Arbeit stellt eine Messung des differentiellen t¯t Produktion- swirkungsquerschnittes als Funktion des N-subjettiness Verhältnisses τ32 in vol- lkommen kollimierten Top Quark Zerfällen vor. Zerfälle der Art t → bW → bqq¯0 werden mit anti-kT Jets mit einem Radiusparameter von 0.8 und einem transver- salen Impuls von mehr als 400 GeV rekonstruiert. Die Messung benutzt Daten, welche vom CMS Detektor, bei einer Schwer- punktsenergie von 13 TeV aufgenommen wurden und einer integrierten Lumi- nosität von 35.9 fb−1 entsprechen. Die gemessenen Daten werden um Detek- toreffekte korrigiert und mit Monte Carlo simulierten Verteilungen verglichen. Durch diese Entfaltung ist es möglich, die Daten mit verschiedenen Vorhersagen von Ereignisgeneratoren mit verschiedenen Einstellungen und variierten Mod- ellparametern zu vergleichen. Es zeigt sich, dass die Messung sensitiv auf die Modellierung der Abstrahlung von Endzustandsteilchen ist, was es ermöglicht entsprechende Modellparameter einzuschränken. Dies ist vor allem für eine höhere Präzision bei der Identifikation von Jets aus vollkommen kollimierten Top Quark Zerfällen entscheidend.

v

Contents

1 Introduction 1

2 Theory 3 2.1 The Standard Model of ...... 3 2.1.1 Quantum Electro Dynamics ...... 4 2.1.2 Quantum Chromo Dynamics ...... 5 2.1.3 Charged-Current Weak Interaction ...... 6 2.1.4 Electroweak Unification ...... 7 2.1.5 Higgs Mechanism & Spontaneous Symmetry Breaking . . 9 2.1.6 Shortcomings of the Standard Model ...... 10 2.2 The Top Quark ...... 11 2.2.1 Production and Decay at the LHC ...... 11 2.3 Physics of Proton-Proton Collisions ...... 13 2.4 Jet Substructure ...... 14 2.5 Simulation of Proton-Proton Collisions ...... 17

3 Measurement of Jet Substructure 19 3.1 Jet Mass ...... 19 3.2 N-subjettiness ...... 19

4 Experiment 23 4.1 The Large Collider ...... 23 4.2 The Compact Solenoid ...... 25 4.2.1 Coordinate System ...... 26 4.2.2 Magnet ...... 27 4.2.3 Tracking System ...... 27 4.2.4 Calorimeters ...... 28 4.2.5 Muon System ...... 30 4.2.6 Trigger ...... 30

5 Reconstruction and Identification of Objects 33 5.1 Signature of Particles in the CMS Detector ...... 33 5.2 Particle Flow Algorithm ...... 34

vii Contents Contents

5.3 Muon Identification ...... 35 5.4 Identification ...... 36 5.5 Reconstruction of Jets ...... 36

5.5.1 The anti-kT Jet Clustering Algorithm ...... 36 5.5.2 Pileup Mitigation Techniques ...... 38 5.5.3 Jet Energy Corrections ...... 39 5.5.4 b tagging ...... 39 5.6 Missing Transverse Momentum ...... 40

6 Analysis 41 6.1 Data Sets and Event Simulation ...... 41 6.1.1 Data Sets ...... 42 6.1.2 Monte Carlo Samples ...... 42 6.2 Analysis Strategy ...... 45 6.3 Studies on Particle Level ...... 45 6.4 Studies on Reconstruction Level ...... 47 6.5 Unfolding ...... 55 6.5.1 Regularized Unfolding ...... 55 6.5.2 Determination of Bin Sizes ...... 57 6.5.3 Migration Matrix ...... 61 6.5.4 Validation Tests ...... 62 6.6 Uncertainties ...... 66 6.6.1 Statistical Uncertainties ...... 66 6.6.2 Experimental Uncertainties ...... 66 6.6.3 Model Uncertainties ...... 68 6.6.4 Total Uncertainty and Correlation ...... 69 6.7 Unfolding of Data ...... 69

7 Summary and Outlook 75

viii 1 Introduction

The standard model of particle physics (SM) is a theory, that describes three of the four know fundamental forces. Despite the successful description of ex- perimental data, there are still unanswered phenomena which the SM can not describe. For this reason particle collider like the Large Hadron Collider (LHC) are built to further test the standard model and search for new physics. The top quark is the heaviest particle in the standard model. Due to its high mass, it has a high Yukawa coupling to the Higgs boson and therefore plays an important role in the electroweak sector. Many new physics models predict new heavy particles which also have high Yukawa couplings to the top quark. Since those hypothetical new particles are often excluded at low masses, searches for new physics often aim at heavy particles. Because of the large mass of those new particles the decay products, e.g. top , are highly Lorentz boosted. As a consequence their decay products can be reconstructed with a single jet. To distinguish jets originating from top quarks and jets originating from other particles, information regarding their substructure is being used in the recon- struction of collision events at the LHC. Therefore, a good understanding of the jet substructure is important. This analysis presents a first measurement of the differential t¯t production cross section as a function of the N-subjettiness ratio τ32 in the boosted regime with the Compact Muon Solenoid (CMS) detector. It is a variable to characterize the substructure of hadronic jets and plays an important role in the identification of jets containing a top quark decay since it discriminates three-prong decays of the top quark from two-prong and one-prong topologies. This analysis uses data collected with the CMS detector in proton-proton collisions at the LHC with a center-of-mass energy of 13 TeV in 2016 corresponding to an integrated luminosity of 35.9 fb−1. For this analysis, data is unfolded to particle level using regularized unfolding within the TUnfold framework. The result can then be used to constrain different tunes of the simulation to achieve higher precisions in the identification of jets originating from fully merged top quark decays. The thesis is structured in the following way. The theoretical foundations of the standard model, a more detailed look at the top quark, and an introduction to jet substructure are provided in Chapter 2. An overview of already performed

1 studies on jet substructure is given in Chapter 3. The experimental setup con- sisting of the LHC and the CMS experiment are introduced in Chapter 4. This is followed by a description of the algorithms that are used to reconstruct and identify physical objects measured inside the detector in Chapter 5. In Chapter 6 the analysis is presented, discussing the used data and simulation samples, the phase space definition, further selection requirements, treatment of the uncer- tainties, and presenting the measurement of the differential t¯t production cross section as a function of τ32. The thesis is closed by a conclusion and outlook in Chapter 7.

2 2 Theory

The following chapter gives an introduction to the theoretical background of the presented analysis. It starts with an overview of the standard model of particles in 2.1, followed by a more detailed look into the top quark and its properties in 2.2. Physics of proton-proton collisions are described in 2.3 and jet substructure is discussed in 2.4. The chapter closes with a description of simulations in high-energy physics.

2.1 The Standard Model of Particle Physics

The Standard Model of particle physics is a quantum field theory that describes all known elementary particles and their interactions through three of the four known fundamental forces. Those forces are the strong, weak and electromag- netic interaction. The fundamental particles can be grouped into and 1 bosons with 2 and integer spin, respectively, as shown in Fig. 2.1. Fermions are further divided into quarks and leptons. Quarks are grouped into two different types, up-type and down-type. Up-type quarks carry a charge of 2 1 3 e and down-type quarks a charge of − 3 e, where e denotes the charge of the positron. Both types also carry a color charge with three different states. Lep- tons can either be charged or neutral. Charged leptons have an electric charge of −e while neutral leptons, called , carry no electric charge. In addition, fermions also have a weak isospin which is for up-type quarks and neutrinos 1 1 T3 = 3 and for down-type quarks and charged leptons T3 = − 3 . Fermions come in three generations. Each has an anti-fermion with inverted electrical- and color-charge. The gauge bosons are the mediators of the fundamental forces. The electromag- netic and strong force are mediated through the massless and , respectively. The massive W± and Z0 bosons mediate the weak interaction. Those bosons carry a spin of one, while the Higgs boson is scalar. The Higgs boson is a result of the spontaneous symmetry breaking in the electroweak sec- tor and provides an explanation of how elementary particles get their masses. A more detailed look at the fundamental forces, their theoretical description and shortcomings of the SM is given in the following sections. This section is based

3 2.1. The Standard Model of Particle Physics

Figure 2.1: Particle content of the Standard Model, divided into bosons and fermions with their three generations. Taken from [1]. on information from [2] and [3].

2.1.1 Quantum Electro Dynamics

Quantum Electro Dynamics (QED) was the first force which was described by a quantum field theory. Its charge is the electromagnetic charge usually given in units of e. To construct such a quantum field theory the Lagrange density for fermions, which are described by Dirac spinors, is used and a local gauge invariance is required. The Lagrange density of the Dirac equation is

¯ µ ¯ LQED = iψγ ∂µψ − mψψ (2.1) with ψ denoting the Dirac spinors, m the corresponding mass of the Dirac spinor, and γµ the four gamma matrices needed to write the Lagrangian in a four dimensional space-time representation. As mentioned before, this Lagrange density is required to be invariant under a local transformation. In this case the U(1) transformation ψ → ψ0 = eiqα(x)ψ (2.2) with q a coupling constant and α(x) denoting a phase dependent on the space- time coordinates. The mass term mψψ¯ is gauge invariant, while the derivative is not. In order to achieve gauge invariance, a vector field Aµ is introduced and

4 2. Theory a new derivative, the covariant derivative, is defined as

∂µ → Dµ = ∂µ + iqAµ. (2.3)

With this replacement the gauge invariant Lagrangian becomes

¯ µ ¯ ¯ µ LQED = iψγ ∂µψ − mψψ − qψγ ψAµ . (2.4) | {z } | {z } | {z } kin. term mass term interaction term

The vector field Aµ can be identified with the and a term coupling the photon Aµ to a fermion ψ with strength of q is introduced. Because there is no associated mass term, the photon is expected to be massless. Photons are electrically neutral and have no interactions with themselves.

2.1.2 Quantum Chromo Dynamics

The strong force is described by the Quantum Chromo Dynamics (QCD). The derivation of QCD follows the same procedure as in QED. The charge is the color charge and has three different states, that are referred to as red, blue and green. Analogous to the electric charge, color has its anti-colors which is carried by anti-particles. Because of the three color states the relevant transformation is a rotation of the SU(3) symmetry group,

a a ψ → ψ0 = eigsα (x)T ψ. (2.5)

a Again, α (x) denotes phases dependent on the space-time coordinates, gs is the coupling constant of the strong force and the rotations are described by the generators T a, where a ranges from 1 to 8. With this, the covariant derivative can be defined as a a ∂µ → Dµ = ∂µ + igsT Gµ. (2.6)

a Here, eight new vector fields Gµ are introduced which correspond to gluons, the mediators of the strong force. QCD describes the interaction between gluons and quarks and additionally between gluons and gluons, because gluons carry color charge themselves. This is represented in the Lagrangian of QCD:

1 L = ψ¯(iγµ∂ − m)ψ − g ψ¯(γµT aGa )ψ − Ga Gaµν . (2.7) QCD µ s µ 4 µν | {z } | {z } mass and kin. term quark- coupling | {z } of quark gluon kin. energy and gluon self interaction

Because of the gluons self interaction, the field strength between quarks rises with the distance between quarks. This results in the so-called color confine-

5 2.1. The Standard Model of Particle Physics ment, meaning that colored particles cannot exist as free particles. They exist in bound and colorless states, called . The possible bound states consist either of three quarks, which are called baryons, or of a quark anti-quark pair, called mesons. Additionally, bosons with tetra- and pentaquarks are possible [4, 5]. The process of forming hadrons from indi- vidual quarks and gluons is called . The only exception is the top quark which decays before it can hadronize due to its high mass.

2.1.3 Charged-Current Weak Interaction

The charged-current weak interaction is mediated through the massive W± bo- son. The rather large W boson mass of mW = 80.379 ± 0.012 GeV [6] leads to a short range interaction. The corresponding symmetry group is SU(2)L, where L denotes the coupling to left-handed particles and right-handed anti-particles. Because the W boson couples only to left-handed particles and has no coupling to right-handed particles, the parity is maximally violated. The charge of the weak interaction is the weak isospin. The handedness refers to the chirality of a particle which is closely connected to the helicity of a particle for relativistic velocities. The helicity is defined as the sign of the projection of the spin vec- tor onto the momentum vector. For massless particles a left-handed chirality corresponds to a negative helicity. In case of massive particles this is not true anymore. Here also positive helicities can be found. For the projection of the handedness component of a particle the projection operators

1 P = (1 − γ5) (2.8) L 2 1 P = (1 + γ5) (2.9) R 2 with γ5 = iγ0γ1γ2γ3 are introduced. These operators are applied to particles to get their left- and right-handed components, respectively. Due to this operator, the coupling of the W boson gets an additional term γµγ5 and the coupling reads

1 1 ψ¯(γµ (1 − γ5))ψ = (ψγ¯ µψ − ψγ¯ µγ5ψ). (2.10) 2 2

The first term transforms like a vector under parity transformation, and has a negative parity eigenvalue, and the second term transforms like an axial-vector, and has an positive parity eigenvalue. The difference between vector and axial- vector is called V-A-theory and violates parity conservation. Because all fermions, except the neutrinos, are massive, all of them can be found in left-handed isospin doublets and right-handed isospin singlets. The

6 2. Theory

1 1 third component of the weak isospin is T3 = + 2 in each upper and T3 = − 2 in each lower component. For the singlets T3 = 0. For leptons it looks as follows ! ! ! νe νµ ντ , , , eR, µR, τR (2.11) e µ τ L L L and for quarks ! ! ! u c t , , , uR, dR, cR, sR, tR, bR. (2.12) d0 s0 b0 L L L

Important to note is, that neutrinos are massless1 in the SM and therefore no right-handed neutrinos exist. The flavor changing is restricted to the same gen- eration for leptons. For quarks however, up-type quarks can convert into any down-type quark independent from their generation, because the weak eigen- states and mass eigenstates of quarks are not identical. This is called flavor mixing. A unitary transformation has to be applied to extract the weak eigen- states out of the mass eigenstates. This is described by the Cabibbo-Kobayashi-

Maskawa-matrix (CKM matrix) VCKM

 0     d Vud Vus Vub d  0     s  = Vcd Vcs Vcb  s . (2.13) 0 b Vtd Vts Vtb b

The absolute values can be obtained through a global fit [6]

0.97446 ± 0.00010 0.22452 ± 0.00044 0.00365 ± 0.00012  V =  +0.00010  . CKM 0.22438 ± 0.00044 0.97359−0.00011 0.04214 ± 0.00076  +0.00024 0.00896−0.00023 0.04133 ± 0.00074 0.999105 ± 0.000032 (2.14)

The probability for a certain transition is given by squaring the absolute values of VCKM. Due to complex phases in the matrix elements of the CKM matrix CP-violation can be explained.

2.1.4 Electroweak Unification

The electromagnetic interaction and the weak interaction can be unified into the electroweak interaction. This is described by the symmetry group SU(2)L ⊗

1Neutrino oscillations imply that neutrinos do have a mass. oscillations between two generations have been measured [7].

7 2.1. The Standard Model of Particle Physics

U(1)Y , where Y denotes the hypercharge which is the charge of the U(1)Y symmetry group. It is defined as

Y = 2(Q − T3), (2.15) where Q denotes the electric charge. The Lagrange density of the unification is the sum of both Lagrange densities

1 X L = − (W aµνW a + BµνB ) + ψiγ¯ µD ψ (2.16) EW 4 µν µν µ and is required to be gauge invariant under local gauge transformations. The covariant derivative reads

Y D = ∂ + igT aW a + ig0 B , (2.17) µ µ µ 2 µ with g0 being a coupling constant and a ranging from 1 to 3. This leads to four new vector fields that can be identified with gauge bosons. Those gauge bosons µ µ µ µ are W1 , W2 and W3 originating from the SU(2)L group, and B which origi- ± nates from the U(1)Y group. The W bosons are defined as linear combinations of W1 and W2, 1 W ± = √ (W µ ∓ iW µ). (2.18) 2 1 2 µ µ µ µ The photon field A and Z fields are a mixture of W3 and B . The electroweak mixing is described by the Weinberg angle θW , ! ! ! Aµ cos θ sin θ Bµ = W W . (2.19) µ µ Z − sin θW cos θW W3

The field Zµ can be identified with the Z0 boson, the third boson of the weak interaction, and the photon field Aµ with the photon γ. The Z0 boson couples differently via the weak interaction compared to the two W bosons. While the W bosons couples with a pure V-A vector current, as described in the previous section, the coupling of the Z boson contains two additional factors and reads

1 1 ψ¯(γµ (cf − cf γ5))ψ = (ψc¯ f γµψ − ψc¯ f γµγ5ψ). (2.20) 2 V A 2 V A

f f The factors cV and cA are not equal to one as for the W bosons, but have different values depending on the fermion. This results in a coupling to right- handed particles and different coupling strengths to the left- and right-handed component of a particle.

8 2. Theory

2.1.5 Higgs Mechanism & Spontaneous Symmetry Breaking

Due to gauge invariance, the masses of the W± and Z0 bosons are assumed to be zero. But experiments have shown that the masses of these bosons are large.

Their measured masses are mW = 80.379 ± 0.012 GeV and mZ = 91.1876 ± 0.0021 GeV [6]. To provide an explanation for the masses a new mechanism is introduced, called Higgs mechanism [8]. This introduces a symmetric scalar potential: V (Φ) = µ2Φ†Φ + λ(Φ†Φ)2 (2.21) with the Higgs field ! φ+ Φ = . (2.22) φ0 With the new potential, the Lagrange density reads as

µ † Lφ = (D φ) (Dµφ) − V (φ) (2.23)

Dµ is the covariant derivative of the electroweak unification. In contrast to the Lagrange densities before, the additional term V (φ) is not derived from a gauge theory, but postulated instead. This potential has to be gauge invariant under gauge transformation, i.e. rotations of φ+ and φ0 and under changes of the complex phase. The potential chosen reads

V (φ) = µ2 + |φ|2 + λ|φ|4 (2.24) with µ2 and λ two free parameters of the theory. If µ2 is positive (µ2 > 0) the ground state remains at |φ| = 0, but for negative values of µ2 a spontaneously broken symmetry is obtained. The minimum is then a continuum of degenerate ground states at |φ| = v with the vacuum expectation value

−µ2 v = ≈ 246 GeV. (2.25) 2λ

In the unitary gauge the ground state is ! 1 0 Φ = √ , (2.26) 2 v + H(x)

9 2.1. The Standard Model of Particle Physics with H(x) the excitation around the minimum v, which can be interpreted as the Higgs boson. The full Lagrange density becomes

1 µ 1 2 2 +,µ − Lφ = (∂ H)(∂µH) + g (v + H) W Wµ 2 4 . (2.27) 1 + (g2 + g02)(v + H)2ZµZ − V (φ) 2 µ

With this, interactions and mass terms for vector bosons and the Higgs boson arise. The mass terms for the vector bosons W± and Z0 are

1 1 p m = gv, m = v g02 + g2. (2.28) W 2 Z 2

The last term V (φ) is responsible for the mass and self-interaction of the Higgs boson and reads 1 V (φ) = −µ2H2 + λvH3 + λH4. (2.29) 4

p 2 The first term can be interpreted as the mass mH = −2µ of the Higgs boson. The terms with H3 and H4 can be interpreted as the self-interaction of the Higgs boson with three and four Higgs vertices, respectively. In 2012 the CMS [9] and the ATLAS [10] collaborations reported the observation of a Higgs-like boson. The experimentally measured mass of the Higgs boson is 125.10 ± 0.14 GeV [6]. By introducing the Yukawa coupling between the Higgs field and fermions, the fermions obtain their masses. The mass is proportional to the Yukawa coupling y

yv mf = √ . (2.30) i 2

Because the Yukawa coupling is a free parameter, the Higgs mechanism cannot predict the exact values of fermion masses. The Yukawa coupling was measured at the ATLAS and CMS experiment in the production of a Higgs boson in association with a t¯t pair and in Higgs boson decays like H→bb and H→ ττ [11– 16].

2.1.6 Shortcomings of the Standard Model

The standard model provides a description of the three interactions, which is verified to high precision with experimental data. However, there are still some undescribed phenomena in nature which point to physics beyond the standard model. For example gravity can not be described by a quantum field theory and there- fore can not be unified with the SM right now. In addition, it is not clear why gravity is so much weaker than other interaction scales.

10 2. Theory

Measurements [17,18] show that only approximately 5% of the total energy den- sity in the universe originates from SM particles. Roughly 26% originates from Dark Matter which has no particle candidate in the SM. Additionally, there is no explanation for the remaining ∼ 69%, called Dark Energy. Furthermore, there is an apparent asymmetry in the universe between matter and anti-matter. This asymmetry can be a result of CP violation in the weak interaction, but the effect is too small to describe the observed asymmetry. The hierarchy problem refers to the large difference between the different scales in particle physics. Those scales range from the electroweak scale (Λ ∼ 102 MeV) to the Planck scale (Λ ∼ 1019 GeV) at which the gravity gets important. This problem can be seen in the Higgs boson mass. It receives quantum corrections which scale with the considered energy scale. At the Planck scale these correc- tions are orders of magnitude larger than the measured mass mH. If the theory should be valid at high scales, the corrections have to cancel to a high degree. Many theories describing physics beyond the standard model introduce new heavy particles, some of which decay dominantly into top quarks.

2.2 The Top Quark

The top quark is an up-type quark from the third generation. In 1973, the third generation of quarks was predicted because of the observed CP violation in kaon decays. In 1995, the top quark was observed by the CDF [19] and D0 [20] collaborations at . The top quark is the heaviest SM particle with a mass of 173.34 ± 0.27 (stat) ± 0.71 (syst) GeV [21]. Because the top quark has such a high mass, it has a short lifetime of 0.5 × 10−24 s [6] and decays before it can hadronize. Therefore one can study the bare quark, providing direct access to parameters of the standard model. Because of its high mass, it plays an important role in many theories describing physics beyond the standard model. Furthermore, it has a Yukawa coupling to the Higgs bosons of order of unity. All these properties can provide information on fundamental interactions at the electroweak scale and beyond.

2.2.1 Production and Decay at the LHC

In hadron colliders, such as the LHC, top quarks are mainly produced in pairs via quark-anti-quark annihilation or gluon-gluon fusion. The Feynman diagrams for these processes are shown in Fig. 2.2. At the LHC, which operates at center- of-mass energies of 13 TeV, gluon-gluon fusion is the dominant process. Top quarks can also be singly produced via the weak interaction, but these processes

11 2.2. The Top Quark

q¯ t

g

q t¯ (a) g t

g g t g t

g t¯ g t¯ g t¯ (b)

Figure 2.2: Feynman diagrams of the production channels of top quarks. Dis- played in (a) is the production via quark-anti-quark annihilation and in (b) is the production via gluon-gluon-fusion in the s-, t-, and u-channels, respectively. Created with [22]. have much smaller cross sections. Therefore, single top processes will be treated as background in this analysis and focuses on pair produced top quarks. The top quark decays via the weak interaction almost exclusively into a and a W+ boson. The reason for this behavior can be seen in the CKM 2 matrix entry for |Vtb| = 99.91%, which is the probability for the top quark to decay into a bottom quark. The bottom quark is seen as a jet in the detector while the W+ decays further into a pair of quark and anti-quark or into a lepton and its corresponding anti-neutrino. This means for the t¯t final state, that there are three possible decay channels. Firstly, in the fully hadronic decay, both W bosons decay into a quark and anti-quark. The second possibility is the lepton+jets decay. Here, one W boson decays into a quark and anti-quark, while the second W bosons decays into a lepton and its corresponding anti-neutrino. Lastly, there is the dilepton decay, where both W bosons decay into a lepton and neutrino. The most likely de- cays are the fully hadronic decay with a branching fraction of 45.7% and the lepton+jets decay with a branching fraction of 43.8%. The dilepton decay has a branching fraction of 10.5% [6]. This analysis will focus on the lepton+jets final state as shown in Fig. 2.3, because the lepton can be used to differentiate between tt¯ and background while the hadronic decay can be used to study the top quark, which is the target of the presented measurement.

12 2. Theory

¯b b

q t¯ t l+

W − W +

q¯ ν

Figure 2.3: Decay chain of t¯t in the lepton+jets channel. The circle in the middle indicates the production of the top and anti-top quark. Charge con- jugate states are described by a similar diagram. Created with [22].

2.3 Physics of Proton-Proton Collisions √ In collision experiments the center-of-mass energy s determines the provided energy that is available to produce particles. At the LHC the center of mass energy is: √ s = 2Eproton = 13 TeV. (2.31)

Protons are no elementary particles, but consist of quarks and gluons and are fur- ther distinguished in valence-quarks, sea-quarks and gluons. While the valence- quarks consist of two up-quarks and one down-quark, sea-quarks are short-lived particles of any flavor which are produced by gluons in higher order processes. Because of this structure these partons only carry a fraction of the proton’s momentum. Since the scattering process takes place between the partons, the center-of-mass energy becomes

√ √ sˆ = x1x2s, (2.32)

with x1 and x2 being the momentum fractions of the involved partons. Since the initial partons and their momenta are unknown, simulation and prediction of these processes rely on the precise measurement of parton density functions (PDFS), which describe the probability of finding a particular parton inside a given momentum interval in a proton. They are measured in deep inelastic scat- tering experiments, i.e. ep → e + X. Such measurements were performed by the ZEUS and collaborations at the HERA Collider. Current PDF sets like NNPDF3.0 [23] use additional constraints from measurements at the LHC.

13 2.4. Jet Substructure

Underlying Event

The underlying event is the interaction of partons not involved in the hard scat- tering. The vertex of this interaction is the same as the one from the hard scattering and thus particles produced by the underlying event can not be sep- arated from particles produced in the hard interaction. The underlying event needs to be modeled and accounted for in the simulation of events.

Pile-up

To observe rare processes and collect a high amount of data, the LHC is required to have high collision rates. The disadvantage is that besides the collision of interest, multiple other collisions are occurring at the same time. These are mostly soft QCD processes, the effects of which need to be corrected for because otherwise energy measurements would always include particles not originating from the hard scattering process of interest, but also from other processes. For this, one has to distinguish between the vertex of the hard scattering from vertices from other interactions and account for additional energy deposits.

2.4 Jet Substructure

Jet substructure plays an important role in data analyses at the LHC, because it can be used to identify decays of boosted heavy particles, such as the W boson or the top quark, into a single jet. Substructure information can be used to increase the sensitivity in searches for heavy resonances, because their decays have such a high Lorentz boost, that their decay products, e.g. top quarks, are reconstructed in a single jet. This leads to an increased interest in the theoretical aspects of jet substructure and the development of new jet substructure variables. These variables should be sensitive to the processes of interest and calculable from first principles in QCD. To ensure the latter condition, a variable has to be infrared and collinear safe. Infrared safe means, that the jet should not change if soft radiation is included or excluded in the jet. Collinear safe means that the jet does not change whether two collinear particles or one particle with the summed momentum are considered. Those jet substructure variables are used in top taggers. Top taggers at the CMS experiment uses information of two jet substructure variables. One is the p P 2 jet mass mjet = ( i pi) , where i runs over all particles inside a given jet and pi is the four-vector of the i-th particle. The second variable, which is studied in the presented analysis, is the N-subjettiness. This variable is discussed in the following. A more detailed review on jet substructure can be found in

14 2. Theory

Figure 2.4: N-subjettiness distributions for jets originating from top quarks and jets originating from light quarks and gluons. τ1 is displayed in the upper left, τ2 in the upper right, and τ3 in the bottom. Taken from [25]. reference [24].

N-subjettiness

One important jet substructure variable is the N-subjettiness τN [25]. This variable is a measure of how likely it is that a jet has N subjets. Its values are in the interval [0, 1], where smaller values correspond to a higher compatibility with an N subjet hypothesis. It is defined as

1 X τN = pT,kmin(∆R1,k, ∆R2,k, ..., ∆RN,k), (2.33) d0 k

15 2.4. Jet Substructure

Figure 2.5: N-subjettiness ratios: τ2/τ1 distribution for jets originating from W bosons and jets originating from light quarks and gluons (left) and τ3/τ2 distribution for jets originating from top quarks and jets originating from light quarks and gluons (right). Taken from [25].

where k runs over all constituents inside a selected jet, pT,k is the transverse momentum of the respective particle, ∆Rj,k is the distance in the η-φ-plane between a subjet j and a particle k, and d0 denotes a normalization factor which is defined as X d0 = pT,kR0 (2.34) k with R0 as the distance parameter of the jet clustering algorithm. The subjets are found by an iterative procedure starting with subjets found by an exclusive jet clustering algorithm. The idea is that the energy for jets originating from hadronic top quark decays is distributed differently than for jets from hadronization of light quarks and gluons, in the following referred to as QCD jets. For a jet including a hadronic top quark decay it is expected that the energy is distributed along three axes originating from the three partons of the decay. Thus, the value of τ3 for this jet tends to lower values. For QCD jets the value is expected to be higher, because it is more uniformly distributed around one axis. Due to radiation, QCD jets can also have similar values of τ3 as shown in Fig. 2.4. Instead of using τN as discriminating variable, the ratio τN /τN−1 is found to be more helpful. It shows high separation power for three and two prong jets from QCD jets, as can be seen in Fig. 2.5. Three-prong jets, which originate from top quarks, have lower values of τ3/τ2, while QCD jets peak at higher values. The same behavior can be seen in the ratio τ2/τ1 for two prong jets, which i.e. originate from W bosons.

While the N-subjettiness τN is IRC safe, the ratios are not. However, the ratios

16 2. Theory are Sudakov [26] safe which makes them still calculable.

2.5 Simulation of Proton-Proton Collisions

For the interpretation of the data taken at the LHC simulations are needed. The Monte Carlo (MC) method is used to perform these simulations and is per- formed in several steps. In the first step the hard matrix element of an interaction is calculated and the cross sections of the hard matrix element are convoluted with the PDFs of the proton. The MC generators that are used in this step are POWHEG [27], MADGRAPH [28], and MADGRAPH5_aMC@NLO [29]. In the next simulation step the parton shower, hadronization and decay of un- stable hadrons are calculated. Those are handled by PYTHIA8 [30]. In this sim- ulation step various different parameters are considered and have to be tuned, for a better of the data by simulations. These parameters influence i.e. the initial-state radiation (ISR) and final-state radiation (FSR) or they scale the emission cross section by a damping function (hdamp) which controls the merg- ing between the matrix element and the parton shower and further regulates the radiation in the high pT-regime [31]. In this analysis those parameters are later used to calculate systematic uncertainties. In the last step the simulation of detector effects are processed by GEANT4 [32]. Here additional pileup is added and interactions of stable particles with the CMS detector are included.

17

3 Measurement of Jet Substructure

At the LHC top quarks are abundantly produced with large Lorentz boosts and can be reconstructed within a single jet. Therefore top tagging is an important method to identify jets, that originate from such a decay. For this method two substructure variables are very important, the top quark mass and the N-subjettiness. This chapter gives an overview about measurements of such variables that were already performed.

3.1 Jet Mass

The jet mass was measured in highly boosted t¯t events with a 8 TeV [33] dataset corresponding to an integrated luminosity of 19.7 fb−1 and with a 13 TeV [34] dataset corresponding to an integrated luminosity of 35.9 fb−1 at the CMS ex- periment. The analyses are performed in the lepton+jets final state, because selecting the lepton reduces background contributions and the hadronically de- caying top quark offers a good reconstruction due to all decay products being inside a single jet. The selection criteria for these analyses follow the same scheme and aim at a measurement phase space mostly consisting of fully merged top quark decays. An event is called fully merged, if all decay products of the hadronically decaying top quark can be found within a single jet. The result- ing mjet distribution is then unfolded to particle level, to account for detector effects. The results of the differential t¯t production cross section as a function of the jet mass are shown in Fig. 3.1.

3.2 N-subjettiness

In ATLAS the N-subjettiness, and other substructure variables, were measured in the boosted regime [35]. The lepton+jets channel is used for a good back- ground reduction, by selecting the lepton from the leptonically decaying top quark. The jet originating from the hadronically top quark is then measured.

19 3.2. N-subjettiness

Figure 3.1: Differential t¯t production cross section as a function of the jet mass. The results of the CMS analysis for a center-of-mass energy of 8 TeV [33] is shown on the left and the results of the CMS anal- ysis for a center-of-mass energy of 13 TeV [34] on the right.

The jets have a radius parameter of R = 1.0. After a selection aimed at boosted t¯t events, the data is unfolded to particle level, and a normalized differential t¯t production cross section as a function of τ32 is measured. The unfolding is performed with the Iterative Bayesian unfolding method. The result is shown in Fig. 3.2 (left). Even though the different predictions of simulations describe the data well within uncertainties, differences can be observed. A different measurement is performed by the CMS collaboration. The measure- ment was performed in the resolved case which means that the top quark only has a low boost and every decay product can be reconstructed in a separate jet. To achieve a contribution consisting mostly of pure t¯t processes, the selection requires at least four jets with a cone parameter of R = 0.4. At least two b tagged jets are required, and a W mass is reconstructed with the non b-tagged jets. A veto on an overlap between jets is set and N-subjettiness is calculated for each jet from the respective jet constituents. The unfolded distribution is presented for an inclusive set of jets. The unfolding is performed with an un- regularized unfolding procedure. The resulting distribution is shown in Fig. 3.2 (right). Because the measurement was performed in the resolved case, the mea- surement is mainly driven by the particle multiplicity and is shifted to lower values of τ32, showing a bad agreement between data and MC.

20 3. Measurement of Jet Substructure

Figure 3.2: Distribution of the normalized differential t¯t production cross section as a function of the N-subjettiness ratio τ32. The ATLAS measure- ment [35] in the boosted regime (left) and CMS measurement [36] in the resolved regime (right).

21

4 Experiment

The data analyzed in this thesis were recorded by the Compact Muon Solenoid (CMS) detector at the Large Hadron Collider (LHC). This chapter describes the LHC with its pre-accelerators and experiments in Section 4.1 and the CMS detector with its different subsystems in Section 4.2.

4.1 The Large Hadron Collider

The LHC is a hadron collider with a circumference of 26.7 km operating at CERN (Conseil européen pour la recherche nucléaire). It is designed to collide √ protons with a center-of-mass energy of s = 14 TeV to search for the Higgs boson and new physics and also to test the SM to very high precisions. Before the particles are injected into the LHC and reach their final energy, they need to be pre-accelerated to an energy of 450 GeV by a chain of pre-accelerators as depicted in Fig. 4.1. Within the LHC the particles are accelerated with superconducting cavities. In order to keep the particles on their orbit, superconducting dipole magnets are used. Quadrupole magnets focus the beams while higher order magnets account for higher order effects to stabilize the beams. Each proton beam consists of bunches. The bunches are collided at four collision points located around the LHC. At these collision points detectors are situated to record the collisions. The ATLAS [38] and CMS [39] detectors are multi-purpose detectors for various different physic analysis, like searching for the Higgs boson, studying its properties or searching for new physics beyond the standard model. Two more specialized experiments are LHCb [40] and ALICE [41] which mainly focus on B meson physics and heavy ion collisions, respectively. An important parameter of a particle collider is its luminosity. It is a measure for the number of proton-proton collisions per area and time and is defined as

n1n2 L = Nbfrev , (4.1) 4πσxσy where Nb denotes the number of bunches in the accelerator, n1 and n2 the number of protons per bunch and frev the collision frequency. The factor 4πσxσy describes the spread of the bunches perpendicular to their flight direction. In

23 4.1. The Large Hadron Collider

Figure 4.1: Sketch of CERN’s accelerator complex with the LHC and various different smaller accelerators. Taken from [37]. order to reach the design luminosity L = 1034 cm−2 s−1 the LHC is operated 11 with Nb = 2808 bunches per beam, n1 = n2 = 1.15 × 10 protons per bunch and a collision frequency of frev = 11.25 kHz [42]. This design luminosity was already achieved in 2016. For an estimate of the number N of a specific process, the instantaneous luminosity integrated over time Lint has to be multiplied by the cross section σ

N = σLint. (4.2)

The total integrated luminosity delivered by the LHC at the CMS collision point in the years from 2010 to 2018 can be seen in Fig. 4.2.

24 4. Experiment

CMS Integrated Luminosity Delivered, pp

Data included from 2010-03-30 11:22 to 2018-10-26 08:23 UTC 100 1 100 ) 2010, 7 TeV, 45.0 pb¡ 1 1 ¡ 2011, 7 TeV, 6.1 fb¡

b 1 f 2012, 8 TeV, 23.3 fb¡ (

1 2015, 13 TeV, 4.2 fb¡ y 80 80

t 1

i 2016, 13 TeV, 41.0 fb¡

s 1 2017, 13 TeV, 49.8 fb¡ o 1 n 2018, 13 TeV, 67.9 fb¡ i 60 60 m u L

d e

t 40 40 a r g e t n

I 20 20

l a t o

T 50 0 £ 0

1 Jul 1 Apr 1 May 1 Jun 1 Aug 1 Sep 1 Oct 1 Nov 1 Dec Date (UTC)

Figure 4.2: Total integrated luminosity delivered by the CMS experiment at the LHC. Taken from [43].

4.2 The Compact Muon Solenoid

The CMS detector is a multi-purpose detector located at one of the four collision points of the LHC. The detector has a total weight of 14000 t and has a cylin- drical shape with a diameter of 15 m and a length of 28.7 m. It is built as an onion-like structure of sub-detector systems around the beam axis. The detector can be divided into a central region called barrel region and forward/backward regions which are covered by endcaps on each side. The structure of the CMS detector is shown in Fig. 4.3. The tracker is the inner most layer and is build for a precise and efficient mea- surement of the trajectories of charged particles. The tracker is followed by the electromagnetic calorimeter which was build with a focus on fine granularity to detect decays into two photons. This is important for the search of the Higgs boson which can decay in two photons. The electromagnetic calorimeter uses crystals which return to their ground state fast enough to cope with the high collision rate. The hadronic calorimeter is the next layer and is important for the measurements of jets and missing transverse momentum. The solenoid cre- ates a magnetic field of 3.8 T, which bends the trajectory of charged particles and enables the measurement of the momentum of the charged particles. The muon system comes last and its central importance is the precise and robust measurement of . Embedded into the muon system is the return yoke for the magnetic field. In addition to the detector systems a trigger system is

25 4.2. The Compact Muon Solenoid

CMS DETECTOR STEEL RETURN YOKE Total weight : 14,000 tonnes 12,500 tonnes SILICON TRCKERS Overall diameter : 15.0 m Pixel (100x150 μm2) ~1.9 m2 ~124M channels Overall length : 28.7 m Microstrips (80–180 μm) ~200 m2 ~9.6M channels Magnetic feld : 3.8 T SUPERCONDUCTING SOLENOID Niobium titanium coil carrying ~18,000 A

MUON CHAMBERS Barrel: 250 Drif Tube, 480 Resistive Plate Chambers Endcaps: 540 Cathode Strip, 576 Resistive Plate Chambers

PRESHOWER Silicon strips ~16 m2 ~137,000 channels

FORWARD CALORIMETER Steel + Quartz fbres ~2,000 Channels

CRYSTAL ELECTROMAGNETIC CALORIMETER (ECAL) ~76,000 scintillating PbWO4 crystals

HADRON CALORIMETER (HCAL) Brass + Plastic scintillator ~7,000 channels

Figure 4.3: Full view of the CMS detector illustrating the different components of the detector. Taken from [44]. needed, because the collision rate exceeds the possible recording capacities. All sub-systems will be discussed in the following. Additional information can be found in [39].

4.2.1 Coordinate System

For the description of positions in the CMS experiment a right-handed coordi- nate system is used. Its origin lies in the center of the CMS detector. The x-axis points toward the center of the LHC, the y-axis points upwards and the z-axis is pointing in the direction of the beam. The azimuthal angle φ is defined as the angle measured in the x-y-plane from the x-axis. The radial coordinate in this plane is denoted by r. The polar angle θ is defined as the angle from a given point to the z-axis. Differences in θ are not invariant under Lorentz boost along the z-axis. The pseudorapidity η defined as

 θ  η = −ln tan , (4.3) 2 whose differences are Lorentz invariant can be used instead. The angular dis- tance ∆R between two objects i and j is measured in the φ-η-plane

∆R = p(∆φ)2 + (∆η)2 (4.4)

with ∆φ = φi −φj and ∆η = ηi −ηj. Since the momenta of the colliding partons are unknown, only the conservation of the total transverse momentum can be

26 4. Experiment

used. Therefore the transverse momentum pT is defined as q 2 2 pT = px + py. (4.5)

4.2.2 Magnet

A superconducting solenoid with a length of 12.5 m and a diameter of 3 m sur- rounds the tracking system and the calorimeter. It provides a magnetic field of 3.8 T parallel to the z-axis inside the tracking system. The magnetic field bends the tracks of charged particles and allows the measurement of their momenta and charge from the curvature of the track. The magnetic field outside the solenoid is returned by iron yokes within the embedded muon system. Because the magnetic field is returned, the magnetic field outside the solenoid has the opposite direction and muon tracks are bent in opposite direction relative to the central part of the detector. This allows for a more precise reconstruction of the muon momentum. The return yoke weighs about 11400 t and has therefore the largest contribution to the total weight of the experiment.

4.2.3 Tracking System

The tracking system is the first component of the detector and a schematic view is given in Fig. 4.4. It is installed closest to the interaction point and lies inside the magnetic field. It has a length of about 5.4 m, a radius of 1.1 m and covers a range of |η| < 2.5. Its purpose is to reconstruct the trajectories of electrically charged particles, measuring their momentum and charges by taking advantage of the Lorentz force which bends the trajectory of particles due to the magnetic field. The tracker is also used for identifying secondary vertices originating from B mesons, which is crucial for b tagging. Closest to the beam pipe is the pixel detector consisting of three layers in the barrel region, with a distance to the beam axis of 4.4 cm, 7.3 cm, and 10.2 cm. Two discs on each side are covering the forward regions at a z-distance of 34.5 cm and 46.5 cm. Each pixel has a size of 100 × 150 µm2 leading to a resolution of 10 µm in the r-φ-plane and 20−40 µm in z-direction. In 2017 the pixel detector was replaced and a fourth layer was added. Around the pixel detector the silicon strip detector is build. It is divided into four subsystems: The tracker inner barrel (TIB), the tracker inner disks (TID), the tracker outer barrel (TOB), and the tracker endcaps (TEC). The TIB has four layers consisting of strip sensors parallel to the z-axis. The TID are three disks with radially oriented strips in the endcaps. The resolution which is provided by these two subsystems in the r-φ-plane is 13−38 µm and they provide a coverage

27 4.2. The Compact Muon Solenoid

Figure 4.4: Sketch of the tracking system and its different components. Taken from [39]. of |z| < 118 cm. The tracker outer barrel (TOB) has six layers of strip sensors covering a region of |z| < 118 cm and a resolution of 18 − 47 µm. The TEC consists of nine disks of silicon strips which cover the region 124 < |z| < 282 cm and has a similar resolution as the TOB.

4.2.4 Calorimeters

The calorimeters are located around the tracking system. Those consist of the electromagnetic calorimeter (ECAL), which is designed for measuring and absorbing the energy of photons and , and the hadronic calorimeter (HCAL), which measures the energy of strongly interacting particles and ab- sorbs them. Both calorimeters are inside the solenoid, because particles have to transverse less material and thus lose less energy. Therefore this design increases the energy resolution.

Electromagnetic Calorimeter

The ECAL is a homogeneous, hermetic calorimeter and covers a region up to |η| < 3.0. It can be divided into a barrel region, which covers a region of |η| < 1.479 beginning at r = 129 cm and two endcaps, which cover a region of 1.479 < |η| < 3.0 starting at |z| = 314 cm. It consists of lead tungstate

(P bW O4) crystals as scintillator material. Incoming particles excite the crystals and in the process of returning to their ground state, they emit light which can be detected by photo detectors. The used lead tungstate crystals have a small radiation length X0 = 0.89 cm and a small Molière radius of 2.2 cm. The tungstate crystals are further radiation hard and emit around 80% of the scintillating photons within 25 ns, and are therefore fast enough to cope with

28 4. Experiment the LHC collision rate. These material characteristics allow for a very compact construction of the ECAL while still having a high granularity. The barrel region consists of lead tungstate crystals with a front area of 22 × 22 mm2 and a length of 230 mm corresponding to 25.8X0. The crystals in the two endcaps have a 2 front area of 28.6 × 28.6 mm and a length of 220 mm corresponding to 24.7X0. Furthermore, a preshower detector is stationed in front of each endcap to identify neutral . They consist of two layers of alternating lead as passive material and silicon strip sensors as active material. Both homogeneity and granularity result in an excellent energy resolution. The energy resolution of the ECAL [45] can be divided into three terms which are added quadratically:

σ 2.8% 12% E = ⊕ ⊕ 0.3%. (4.6) E [GeV] pE [GeV] E [GeV]

The first term describes the stochastic effects in the shower development within the ECAL, the second term is the so called noise term which covers the elec- tronic noise. The last term describes calibration errors and non-uniform light- collection.

Hadronic Calorimeter

The HCAL is built around the ECAL and aims to absorb and measure the en- ergy of strongly interacting particles. Because hadrons have a larger absorption length they mostly transverse the ECAL and they need to be stopped in the HCAL to precisely measure the energy. Therefore a non-homogeneous sampling calorimeter design is used. In a sampling calorimeter the active material is in- terspersed with absorber material. The absorber material is made out of brass plates, while the active material is a plastic scintillator with embedded wave- length shifters. Due to the higher density in the absorber material, the incoming hadrons begin to develop showers and absorb part of the energy which are then detected by the active material. The HCAL consists of four parts. The Hadron Barrel (HB) covers a region of |η| < 1.3 and is split into two half- barrel regions. Each region consists of 18 identical wedges around the beam axis. These wedges consist of absorber and active material, oriented parallel to the beam axis. The first and last absorber layer consists of stainless steel to guarantee structural strength. The active material is divided into multiple towers leading to a resolution of 0.087 × 0.087 in th η − φ−plane. The Hadron Outer (HO) part is located outside the magnet and covers |η| < 1.26. Its aim is to absorb and measure the remaining hadrons, which were not stopped in the HB and the magnet. The Hadron Endcaps (HE) cover the region 1.3 < |η| < 3.0. Their active mate-

29 4.2. The Compact Muon Solenoid rial is also divided into towers. For smaller values of |η| the tower segmentation is ∆φ = 0.087 rad and ∆η = 0.087, while for higher values the segmentation becomes ∆φ = 0.175 rad and 0.09 < |η| < 0.35. The Hadron Forward (HF) covers the most forward region of 3.0 < |η| < 5.0. The absorber material is made out of steel and the active material consists out of quartz fiber, to withstand the high-energetic particle flux. The segmentation of the towers is ∆φ = 0.175 rad and |η| = 0.175. On one hand the usage of sampling calorimeters decreases the energy resolution due to the absorber material. On the other hand it increases the granularity due to additional longitudinal segmentation. The energy resolution of the HCAL [46] is σ 115.3% E = ⊕ 5.5%. (4.7) E [GeV] pE [GeV] The first term describes the stochastic effects in the shower development within the HCAL and the second term describes calibration errors and non-uniform light-collection.

4.2.5 Muon System

The last subsystem is the muon system, stationed at the most outer part of the detector. Due to the much higher mass and typical momenta of muons they pass through matter without significant energy loss. Therefore they are considered as minimally ionizing particles. Thus they are expected to transverse through the calorimeters and produce a signal in the muon system, where their trajectory can be measured and the muons can be reconstructed. A good performing muon system is particularly of interest because muons often appear in important decay channels of the Higgs boson. In the muon system three types of gaseous detectors are used. In the barrel region which covers |η| < 1.2, aluminum drift-tube chambers are used. The two inner layers are surrounded by gaseous resistive plate chambers (RPC) which aid in the identification of muons. It is embedded in the iron yoke of the magnet, which grants a magnetic field of 2 T. In the endcap regions (ME) which cover 0.9 < |η| < 2.4 cathode strip chambers are installed. The ME is divided into four layers of CSC and interleaved with RPCs. This is due to the high expected muon rate in the endcaps.

4.2.6 Trigger

At the LHC the proton bunches are brought to collision every 25 ns. At design luminosity, bunch crossings occur with a frequency of 4 × 107 Hz. Because ca-

30 4. Experiment pacity and the storing rate are limited, it is impossible to record every event. Therefore, a trigger system is installed which determines which events are stored, in order to reduce the total event rate to 1 kHz. This is done in two steps. In the first step the level-1 trigger (L1) [47] which is based on hardware, takes into account the information of the muon system and the calorimeters and aims to accept events with high-energetic jets, muons, or significant amounts of en- ergy imbalance. After L1 the event rate is reduced to about 50 kHz. Afterwards, an event has to pass the software based High-Level-Trigger (HLT) [48] to be stored. For the decision making, the full granularity and the full resolution of the detector can be used. Based on these informations, potentially interesting events are kept.

31

5 Reconstruction and Identification of Objects

The CMS detector measures various different signals in its detector subsystems. They are reconstructed to particle candidates using the Particle Flow Algorithm, which is discussed in Section 5.2. In order to identify objects like jets, muons or electrons, different identification criteria are applied. These criteria are discussed in the respective objects section; starting with muons in Section 5.3, followed by electrons in 5.4, hadron cascades referred to as jets are discussed in 5.5 and the chapter closes with missing transverse momentum 5.6.

5.1 Signature of Particles in the CMS Detector

In order to identify all particles originating from pp and heavy ion collisions, the measurements of each subsystem are used to assign a distinct signature to a particle candidate. Charged particles generate tracks in the silicon tracker, while uncharged particles transverse it without leaving a track. The bending radius of the path of charged particles due to the magnetic field allows the measurement of their momentum and charge. In the ECAL electrons and photons deposit all their energy and are stopped. Charged and Neutral hadrons also deposit some energy inside the ECAL but mostly transverse it. They are then stopped in the HCAL where they deposit all their energy. Muons create hits in the tracker but mostly transverse the ECAL and HCAL. However they are the only particles which leave tracks inside the muon chambers. Because the sign of the magnetic field in the muon chamber is flipped, the tracks are bent in another direction which can also be used for the reconstruction of the muons. Neutrinos only interact weakly and therefore leave no traces in the detector. The different signatures are visualized in Fig. 5.1.

33 5.2. Particle Flow Algorithm

Figure 5.1: Profile of the structure of the CMS detector. Depicted are the dif- ferent tracks of the particles through the detector. Taken from [49].

5.2 Particle Flow Algorithm

The Particle Flow (PF) Algorithm [50] is used in the CMS experiment to com- bine information from all subsystems to reconstruct and identify stable particles present in the recorded event. The algorithm creates a list of muon, electron, photon, neutral, and charged hadron candidates. Due to the combination of information of all subsystem, the reconstruction of the energy and angular res- olution of the particles is greatly improved. In the first step hits from the tracker and muon system are assigned to a track in an iterative procedure. If a track could be found, hits in the respective systems are removed from the next iteration. In the beginning of this procedure the criteria for a reconstructed track are very tight but get looser for each iteration. Energy deposits in ECAL and HCAL cells with a local energy maximum above a certain threshold are clustered and are used as seeds for clustering all adjacent cells with another given threshold. In the next step every information from the subsystems are linked together. Tracks are extrapolated from their last tracker hit into the calorimeter, to as- sign tracks and clusters to a particle. Bremsstrahlung induced by electrons show additional energy deposits from photons in the ECAL. These clusters are linked to the corresponding electron tracks to account for the energy loss. If tracks from the tracker and muon system can be linked within a certain χ2−fit value,

34 5. Reconstruction and Identification of Objects they are assigned to a global muon candidate. Elements of the sub-detectors which are linked together are referred to as blocks. In the final step, these blocks are associated with certain particles. If tracks are matchable from the tracker system, the muon system and agree in their measured momentum, the global muons are classified as a PF muon. Informa- tion from the ECAL and the tracking system are used to identify PF electrons. The tracks and energy deposits from PF muons and electrons are removed from further consideration. All remaining tracks are considered PF charged hadrons. After all previous identification the remaining energy clusters are associated to photons and neutral hadrons. Photons have single ECAL clusters while hadrons have HCAL and ECAL clusters.

5.3 Muon Identification

Muon candidates, reconstructed as PF muons, need to fulfill additional identi- fication criteria, to be considered as muons in this analysis. For this purpose three working points [51] are recommended by the CMS collaboration to be used. These working points differ in their efficiencies and misidentification rates. In this analysis, the tight working point is used, providing the lowest efficiency but returns the purest collection of muons. The requirements for a muon candidate are the following criteria: • The candidate is reconstructed in the inner tracker and muon system.

• The candidate is reconstructed as a PF muon.

2 • The track fit performed by PF returns χ /Ndof < 10, where Ndof are the number of degrees of freedom.

• At least one muon chamber hit needs to be included in the tracker fit.

• At least two muon chamber hits have to be matched to the candidate.

• Its tracker track has a transverse impact parameter of dxy < 2 mm with respect to the primary vertex.

• The longitudinal distance of the tracker with respect to the primary vertex

is dz < 5 mm.

• At least one hit in the pixel tracker is present.

• At least hits in five different tracker layers are present.

If all these criteria are fulfilled and the muon candidate further has pT > 55 GeV and |η| < 2.4, it is stored and called muon in this analysis.

35 5.4. Electron Identification

5.4 Electron Identification

Similar to the muon identification, electron candidates which have been recon- structed as PF electrons also have to fulfill additional requirements to be con- sidered as electrons in this analysis. Again, the CMS collaboration recommends different working points [52]. This analysis uses a tight working point for elec- tron candidates. Because the cuts depend on the absolute pseudorapidity of the cluster, only the variables that are considered are listed. Those variables are

• the shape of the shower in η−direction,

• the distances in η and φ between the track extrapolated to the ECAL and shower itself,

• the ratio of energy deposited in the ECAL and the HCAL matched to the electron candidate,

1 1 • the value of | E − p | of the electron candidate, where E is calculated from calorimetry and p is taken from tracker information,

• the transverse impact parameter dxy w.r.t. the primary vertex,

• the longitudinal impact parameter dz w.r.t. the primary vertex, and

• the number of missing hits in the tracking system.

In addition to these requirements an electron candidate needs to have pT > 55 GeV, |η| < 2.4, and has to pass a selection on electrons originating from photon conversion (γ → e+e−) to be considered as an electron in this analysis.

5.5 Reconstruction of Jets

Because quarks and gluons can not exist freely due to confinement, they hadronize to a cascade of color neutral hadrons. In order to reconstruct the initial parton, all these hadrons need to be considered. This is done by jet clustering algorithms which cluster the reconstructed PF particles into jets using the FastJet software package [53]. Two important requirements are demanded of these algorithms, infrared and collinear safety. The jet algorithm presented in the following fulfills these requirements.

5.5.1 The anti-kT Jet Clustering Algorithm

Within the CMS collaboration, the anti-kT algorithm [54] is the standard jet algorithm. It provides mostly circular jet shapes. This algorithm takes a list of

36 5. Reconstruction and Identification of Objects

Figure 5.2: Shape of the jet clustered with the anti-kT algorithm. Taken from [54]. all reconstructed PF particles as input. The inputs are then called pseudojets.

Then the distance diB between each pseudojet and the beam axis is calculated via 2n diB = kT,i (5.1)

2n with kT,i referring to the transverse momentum of pseudojet i. The distance dij between pseudojet i and j are calculated via

∆R2 d = min (k2n , k2n ) ij (5.2) ij T,i T,j R2 with ∆Rij the distance between pseudojet i and pseudojet j, and R a constant parameter which defines the radius of the resulting jet. If dij < diB then pseu- dojet i and pseudojet j are combined and both variables are re-calculated. At some point this condition no longer holds and the combined pseudojet is called a jet and is removed from the list of pseudojets. This procedure is repeated until all pseudojets are removed from the list. The way of combining the pseudojets is given by the parameter n. The anti-kT algorithm uses n = −1 and clusters pseudojets with the highest transverse momentum first. The resulting jet shape is shown in Fig. 5.2. In this analysis, anti-kT jets with a radius parameter of

R0 = 0.4 and R0 = 0.8 are used. Furthermore the four-vector of overlapping lep- tons with jets are subtracted from the four-vector of the jet. They are referred to as AK4 and AK8 jets, respectively.

37 5.5. Reconstruction of Jets

5.5.2 Pileup Mitigation Techniques

The high collision rate of the LHC comes with some drawbacks. One of these is the high number of additional interactions, called pileup. Pileup interactions lead to additional particles in the event that influence the reconstruction of physical objects like jets. Here additional particles from pileup are clustered into the jet, leading to higher momentum, mass and a worse resolution of the jet. It further leads to additional jets, and a worse muon isolation. Two approaches to reduce the influence of pileup are described in the following.

Charged Hadron Subtraction

One approach of pileup mitigation is the Charged Hadron Subtraction (CHS) technique [55]. As the name suggests this technique removes charged hadrons associated to pileup vertices from the event. Charged hadrons are identified in the PF algorithm as tracks with hits in the ECAL and HCAL. The primary vertex is chosen based on the highest sum of the transverse momenta of tracks that are associated with it. All other vertices are then classified as pileup vertices and require Ndof > 4, where Ndof denotes the number of freedoms in the vertex fit. Charged PF candidates can then be associated with either the primary vertex, a pileup vertex or no vertex. All charged PF candidates associated with a pileup vertex are removed from the input list of jet clustering algorithms. This technique works only for |η| < 2.5 where tracker information is available and therefore does not work for neutral PF candidates directly. For this reason a jet area based correction is applied.

Pileup Per Particle Identification

Pileup Per Particle Identification (PUPPI) [56,57] is a more sophisticated tech- nique to reduce the influence of pileup, not only for charged PF candidates but also for neutral PF candidates. In this approach a local shape α is calculated for each particle in the event, which tries to distinguish between particles from the primary vertex and from a pileup vertex. In the CMS experiment this local shape α is based on the pT spectrum of the particles and on the distance between particles. In the next step information about charged PF candidates from the tracker is used to calculate a weight for each PF candidate by comparing its re- spective α value to the mean and RMS from the charged PF candidates pileup distribution. This weight ranges from zero to one, where a pileup candidate should get a weight of zero, and a hard scattering candidate a weight of one. These weights are then used to rescale the four-momentum of the respective PF particles. The rescaled PF particles can then be used to reconstruct jets.

38 5. Reconstruction and Identification of Objects

5.5.3 Jet Energy Corrections

The energy of jets found by the clustering algorithms have to be corrected for additional effects. Those effects originate from different sources i.e. non lineari- ties in the detector response or a disagreement between data and simulation due to the modeling. The four-momentum praw of the jet is corrected according to

corrected raw p = CJEC · p (5.3) with the correction factor

raw 0 00 CJEC = Coffset(pT ) · CMC(pT, η) · Crel(η) · Cabs(pT). (5.4)

The first factor Coffset depends on the raw jet pT and corrects for noise effects from electronics and additional energy from pileup. This correction factor is applied only when the CHS technique was used on a jet, to correct for neutral

PF candidates. The next factor CMC corrects the jet momentum such that it becomes equal on average to the generated jet momentum. This correction rec pT is done by calculating the response variable R = gen in different η and pT pT 1 regions and are then applied to the reconstructed jet as CMC = hRi , with hRi being the average response. It depends on the jet momentum after the offset correction. The third factor Crel is η dependent and accounts for small relative differences between data and simulation. The last factor Cabs is derived from data in Z+jets and γ+jets events in order to obtain the absolute energy scale. This factor depends on the jet momentum after all previous corrections. More details can be found in [58].

5.5.4 b tagging

In analyses which involve the existence of a top quark, the identification of bottom quarks is very useful, because the top quark almost exclusively decays into a b quark and a W boson. Due to the relatively long lifetime of the B meson of the order of 10−12 s, it travels several millimeters in the detector, before it decays. This leads to a displaced vertex, called secondary vertex, at the point where the B meson decayed. Due to the high-resolution of the CMS tracking system, primary and secondary vertices can be resolved. To identify the jets originating from b quarks, the CMS Collaboration developed the Combined Secondary Vertex (CSV) algorithm [59] and its updated version CSVv2 [60]. This algorithm returns a discriminator in the interval [0, 1] evaluating how b quark like this jet is, with 1 being the most likely case. The discrimination is based on several input variables like the fraction of charged hadrons within the

39 5.6. Missing Transverse Momentum jet, the invariant mass, and the impact parameter of the reconstructed secondary vertex. The impact parameter is defined as the distance between two studied objects at their closest approach. Different working points [61] are recommended by the CMS collaboration which differ by the cut value, applied on the CSV discriminator. In this analysis the medium working point is used, corresponding to a cut value of 0.8484.

5.6 Missing Transverse Momentum

miss The missing transverse momentum pT in an event is defined as the magnitude of the negative vectorial sum over the transverse momenta of all PF particles. The value is calculated as

miss miss X pT = |~pT | = |− ~pT,i| (5.5) i where the sum runs over all PF particles. Missing transverse momentum arises when particles leave the detector undetected or due to mismeasurements. In this analysis it is used as an estimate of the energy that is carried away by neutrinos in the leptonic W boson decay.

40 6 Analysis

In searches for new heavy particles, the top quark plays an important role, due to its high mass. The high masses of theoretical new particles lead to boosted top quark decays where all decay products of the top quark can be reconstructed within a single large jet. To identify those jets, top tagging algorithms are needed. Those algorithms use information of the substructure, such as the N- subjettiness ratio τ3/τ2, further denoted as τ32. Therefore it is important to have a good understanding of these variables, and a good description by simulations. Substructure measurements were performed in the boosted regime in ATLAS and in the resolved regime in CMS, as described in Chapter 3. The analysis presented here aims at a measurement of the differential t¯t production cross section as a function of τ32 in the boosted regime with the CMS experiment. The data and simulations used for this analysis are described in Section 6.1. The strategy of the analysis is explained in Section 6.2. Studies on particle level and the resulting selection criteria for boosted top quark decays are described in Section 6.3. In Section 6.4 further studies on the detector level to reduce background processes are discussed. To account for detector effects on data a regularized unfolding is performed with the TUnfold [62] software package. The unfolding technique, the filling of a migration matrix and various validation checks for the unfolding procedure are described in Section 6.5. The consid- ered uncertainties in the unfolding procedure are discussed in Section 6.6. The chapter closes with the result of the unfolding of data in Section 6.7.

6.1 Data Sets and Event Simulation

For the interpretation of the data, analyses in particle physics often compare data to simulated events. Simulations provide the theoretical prediction of the Standard Model. In this chapter the used dataset and the considered MC sim- ulation samples are discussed.

41 6.1. Data Sets and Event Simulation

6.1.1 Data Sets

For the presented analysis, data recorded with the CMS detector in the year 2016 at a center-of-mass energy of 13 TeV are used. The size of the dataset corresponds to an integrated luminosity of 37.76 fb−1. Because data have to pass various detector conditions, some runs are excluded and the integrated luminosity is reduced to 35.87 fb−1.

6.1.2 Monte Carlo Samples

For an understanding of the standard model background composition MC sim- ulations are used to describe the data. The used generators were already intro- duced in Section 2.5. All simulations that provide an estimate of the outcome of a collision, as seen in the detector, are referred to as reconstruction level or detector level. Simulations providing information of physical processes after the hadronization but without detector effects, are referred to as particle level or generator level. The most important simulations for this analysis are the t¯t samples. These samples are used as input for the unfolding procedure and for the estimation of systematic uncertainties. All processes similar to the t¯t lepton+jets final state are included as background. Those processes are single top production, W+jets, Z+jets, Diboson production, and multijet events, referred to as QCD events. The most dominant backgrounds are expected to be W+jets and single top production, because both produce exactly one lepton and additional jets in the final state. QCD events are reduced by selecting a high pT lepton, because mostly leptons with low transverse momenta are produced. Z+jets is reduced by vetoing on additional leptons. Diboson events have a very low cross section and are therefore mostly negligible, despite that WW and WZ can lead to similar final states as t¯t. A summary of all t¯t and background samples can be found in Table 6.1 and Table 6.2, respectively. Each table contains information on the process in the first column, and the sample in the second column. The third column states the used generator for a sample. The fourth column shows the production cross section used to simulate events and the last column shows the number of events.

42 6. Analysis

Table 6.1: Summary of the t¯t samples used in the presented analysis. The pro- cess name is shown in the first column. The second column shows the sample of the process. The used generator for theses samples is displayed in the third column. The calculated cross section and the number of events are presented in the fourth and fifth column, respectively. Process Sample MC Generator Cross Section [pb] Number of Events

0 < Mt¯t < 700 GeV POWHEG 831.76 76 738 314 700 < Mt¯t < 1000 GeV POWHEG 76.605 38 436 000 1000 < Mt¯t < ∞ GeV POWHEG 20.578 24 569 457 0 < Mt¯t < ∞ GeV MADGRAPH 831.76 10 199 051 mt = 171.5 GeV POWHEG 831.76 19 607 502 m = 173.5 GeV POWHEG 831.76 19 420 550 t¯t t FSR up POWHEG 831.76 59 133 437 FSR down POWHEG 831.76 59 150 176 ISR up POWHEG 831.76 58 977 100 ISR down POWHEG 831.76 58 420 151 hdamp up POWHEG 831.76 58 664 984 hdamp down POWHEG 831.76 58 305 373

43 6.1. Data Sets and Event Simulation

Table 6.2: Summary of the background samples used in the presented analysis. The process name is shown in the first column. The second column shows the process sample. The used generator for theses samples is displayed in the third column. The calculated cross section and the number of events are presented in the fourth and fifth column, respec- tively. The QCD samples are binned in pˆT, the transverse momentum transfer for the 2 → 2 process. The HT variable is the scalar sum over the transverse momenta of electrons, muons, jets and missing pT. Process Sample MC Generator Cross Section [pb] Number of Events

15 < pˆT < 20 GeV POWHEG 3819570 4 141 251 20 < pˆT < 30 GeV POWHEG 2960198.4 31 878 740 30 < pˆT < 50 GeV POWHEG 1652471.46 29 809 492 50 < pˆT < 80 GeV POWHEG 437504.1 19 662 175 80 < pˆT < 120 GeV POWHEG 106033.6648 23 560 662 QCD 120 < pˆT < 170 GeV POWHEG 25190.5151 19 809 962 (muon enriched) 170 < pˆT < 300 GeV POWHEG 8654.4932 17 350 231 300 < pˆT < 470 GeV POWHEG 797.3527 45 961 426 470 < pˆT < 600 GeV POWHEG 79.0255 19 489 276 600 < pˆT < 800 GeV POWHEG 25.0951 19 909 529 800 < pˆT < 1000 GeV POWHEG 4.7074 19 940 747 1000 < pˆT < ∞ GeV POWHEG 1.6213 3 940 447

20 < pˆT < 30 GeV POWHEG 5352960 9 241 500 30 < pˆT < 50 GeV POWHEG 9928000 11 508 842 50 < pˆT < 80 GeV POWHEG 2890800 45 789 059 QCD 80 < pˆ < 120 GeV POWHEG 422800 77 800 204 (electron enriched) T 120 < pˆT < 170 GeV POWHEG 77274 74 862 552 170 < pˆT < 300 GeV POWHEG 18810 11 540 163 300 < pˆT < ∞ GeV POWHEG 1350 6 831 522

15 < pˆT < 20 GeV POWHEG 254596 2 685 602 20 < pˆT < 30 GeV POWHEG 328999.93 10 987 947 QCD 30 < pˆT < 80 GeV POWHEG 405623.4 15 342 783 (b/c to electron) 80 < pˆT < 170 GeV POWHEG 38104.43 14 851 987 170 < pˆT < 250 GeV POWHEG 2635.81332 9 811 991 250 < pˆT < ∞ GeV POWHEG 711.925875 5 284 762

70 < HT < 100 GeV POWHEG 208.977 9 691 660 100 < HT < 200 GeV POWHEG 181.302 10 977 326 200 < HT < 400 GeV POWHEG 50.4177 9 589 193 400 < H < 600 GeV POWHEG 6.98394 9 725 661 Z+jets T 600 < HT < 800 GeV POWHEG 1.68141 8 253 178 800 < HT < 1200 GeV POWHEG 0.775392 2 673 066 1200 < HT < 2500 GeV POWHEG 0.186222 596 079 2500 < HT < ∞ GeV POWHEG 0.00438495 399 492 WW POWHEG 76.605 38 436 000 Diboson WZ POWHEG 76.605 38 436 000 ZZ POWHEG 76.605 38 436 000

70 < HT < 100 GeV MADGRAPH 1319 10 020 533 100 < HT < 200 GeV MADGRAPH 1345 39 449 178 200 < HT < 400 GeV MADGRAPH 359.7 19 069 732 400 < H < 600 GeV MADGRAPH 48.91 7 759 701 W+jets T 600 < HT < 800 GeV MADGRAPH 12.05 18 687 480 800 < HT < 1200 GeV MADGRAPH 5.501 7 830 536 1200 < HT < 2500 GeV MADGRAPH 1.329 6 872 441 2500 < HT < ∞ GeV MADGRAPH 0.0322 2 637 821 t-channel (anti-top) POWHEG 80.95 38 619 669 t-channel (top) POWHEG 136.02 64 996 300 Single Top s-channel aMC@NLO 3.36432 33 033 622 tW (anti-top) POWHEG 19.5741 8 657 573 tW (top) POWHEG 19.5741 8 657 554

44 6. Analysis

6.2 Analysis Strategy

The analysis presented here aims at a measurement of the differential t¯t pro- duction cross section as a function of the N-subjettiness ratio τ32 in boosted top quark decays at the particle level. The measurement is performed in the `+jets channel to suppress background events. Here, ` denotes either an electron or a muon. The jet originating from the hadronically decaying top quark is chosen to perform the measurement on. At first, a phase space has to be defined on particle level. This phase space should contain a high fraction of top quark decays where all decay products can be found inside a single jet. In the next step additional selection criteria on detector level have to be found which select a high amount of t¯t events and simultaneously reject background events. To avoid a bias on the hadronic jet, the selection criteria are only applied to the leptonic jet. Before an unfolding is performed, background events are subtracted from the τ32 distribution on detector level. After validation tests, the unfolding procedure is used on data. From the resulting distributions the differential t¯t production cross section can be determined and is compared to predictions of various simulations.

6.3 Studies on Particle Level

The first step of the presented analysis is finding a proper phase space definition for the measurement in the boosted regime. The selection used for this phase space should provide sufficient statistics for a measurement and a high fraction of jets originating from top quark decays with all decay products inside the jet. The selection criteria are chosen to be similar to already performed studies [33,

34,63,64] in the boosted regime. In the following, leading jets in pT are referred to as first jet j1, and second leading jets in pT are referred to as second jet j2. Only AK8 jets with pT > 170 GeV and |η| < 2.4 are considered. For the measurement phase space each event has to fulfill the following requirements:

• exactly one muon or electron from the leptonic top quark decay ` with pT > 55 GeV and |η| < 2.4,

• exactly two AK8 jets with pT,1 > 400 GeV

and pT,2 > 200 GeV,

• ∆R(`, j2) < 0.8, and

• Mj1 > Mj2+`.

45 6.3. Studies on Particle Level

350 tt fully merged tt Events 300 not merged tt

250

200

150

100

50

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ 32

Figure 6.1: Distribution of the N-subjettiness ratio τ32 of the first AK8 jet in t¯t simulation on generator level after the selection. The gray area shows the total number of t¯t events, the green line shows the fully merged fraction of t¯t events and the red line displays the not merged fraction of t¯t event.

Because this measurement is done in the `+jets channel, exactly one lepton from the leptonic top quark decay is required. Depending on the electron or muon channel, the respective lepton is chosen. The high pT thresholds of both AK8 jets are chosen to select boosted top quark decays. Exactly two AK8 jets are required so that only events with one jet for each top quark candidate is present.

The selection on ∆R(`, j2) ensures that the leptonically decaying top quark is boosted, because the lepton is reconstructed inside the jet. Events where not all decay products can be reconstructed in one jet are suppressed by the last requirement, the mass criterion. This should hold true for boosted topologies, because the jet from the hadronically decaying top quark includes the full top quark decay, while the jet of the leptonically decaying top quark only includes the b quark and lepton but misses the neutrino because it can not be detected in the CMS detector. Therefore the mass of the second AK8 jet, combined with the lepton, is smaller.

The distribution of τ32 of simulated t¯t events after this selection is shown in Fig. 6.1. The plot shows the t¯t distribution (gray) with their fully merged (green) and not merged (red) fractions. Events where every decay product from the hadronic decay can be reconstructed inside the selected jet are categorized as fully merged. All other cases are categorized as not merged. The fraction of fully merged events tends to lower values, while the not merged fraction tends

46 6. Analysis

1 12000 0.9 Events 0.8 10000 0.7

8000 0.6 fully merged fraction

0.5 6000 0.4

4000 0.3

0.2 2000 0.1

200 300 400 500 600 700 800 900 1000 200 300 400 500 600 700 800 900 1000 p threshold p threshold T, 1 T, 1 Figure 6.2: Event count (left) and fully merged fraction (right) as a function of the pT threshold of the first AK8 jet in t¯t simulation. to higher values. This behavior is expected, because the fully merged fraction contains three decay products, while the not merged fraction contains only two or even only one decay product. Therefore the fully merged jet is more likely to have three subjets. The final distribution of τ32 will be used as prediction from POWHEG for the unfolding procedure.

As already mentioned, a high value of the pT of the first jet has to be chosen to select boosted topologies. The value was chosen based on a study shown in Fig. 6.2 where the number of t¯t events and the fraction of fully merged t¯t events are shown as a function of the pT threshold of the first jet. While the fully merged fraction increases for higher pT values of the first AK8 jet, until it reaches a plateau of 0.65 at 600 GeV, the number of t¯t events decreases rapidly.

The value of pT = 400 GeV of the first jet was chosen as a compromise between a high fully merged fraction of 55% and retaining sufficient t¯t events.

6.4 Studies on Reconstruction Level

The same selection criteria as at the particle level are required on reconstruction level. Additional requirements are set in order to reduce the contribution from background processes. Both the electron and muon channel, follow a similar selection. An event has to pass the following requirements:

• Single muon trigger as a combination of "HLT_Mu50v*" and "HLT_TkMu50_v*", in the muon channel,

• single electron trigger as a combination of "HLT_Ele27_WPTight_Gsf_v*" and

47 6.4. Studies on Reconstruction Level

"HLT_Ele115_CaloIdVT_GsfTrkIdT_v*", in the electron channel,

• exactly one tight lepton with pT > 55 GeV and |η| < 2.4,

• veto on additional leptons,

• two-dimensional lepton isolation criterion: rel 1 ∆R(lepton, next AK4 jet) > 0.4 or pT (lepton, next AK4 jet)> 40 GeV ,

miss • pT > 50 GeV, and

• at least one medium b tag.

Each selection criterion is discussed in the following. The single muon triggers µ are combined with a logical "or" and require the muon to have pT > 50 GeV. µ It is recommended by the CMS collaboration [51] to cut on pT > 53 GeV for this trigger, to reach a plateau of the trigger efficiency. This analysis uses a cut µ on pT > 55 GeV. To increase the number of events in the electron channel, the single electron triggers are used differently depending on the transverse momen- e tum of the electron. If the electron has pT < 120 GeV, only the requirements of the first trigger have to be met and an additional isolation criterion has to e be fulfilled. For electrons with pT > 120 GeV the first and second trigger are combined with a logical "or". For the single electron and single muon trigger scale factors are applied to simulation in order to correct for differences of the trigger efficiency in data and MC [33,34,63,64]. Only one lepton is expected in the `+jets channel of the t¯t decay, therefore a veto on additional leptons is used. This cut reduces the contribution of Diboson and Z+jets events. The next selection criterion is a two-dimensional lepton isolation [33,34,63,64]. This cut is used differently for the electron and muon channel. For the elec- tron channel the two-dimensional isolation is only applied for electrons with e pT > 120 GeV that meet the requirements of both single electron triggers linked with a logical "or". Electrons with a lower transverse momentum already require an isolation criterion to pass the first single electron trigger and are therefore not required to pass the two-dimensional isolation. In the muon channel the two- dimensional lepton isolation is applied to muons that meet the requirements of the single muon triggers linked with a logical "or". These isolation cuts reduce the majority of QCD background. The two-dimensional cut is illustrated in Fig. 6.3 for simulated QCD events (right) and simulated t¯t events (left). Both plots are scaled to unity, to get a better comparison between both processes. The

1prel(a, b) = |~pa×~pb| T |~pb|

48 6. Analysis

tt QCD 200 200

180 10−1 180 10−1

160 160 10−2 10−2 140 140

120 10−3 120 10−3 (lepton, next AK4 jet) (lepton, next AK4 jet)

rel T 100 rel T 100 p p 10−4 10−4 80 80

60 10−5 60 10−5

40 40 10−6 10−6 20 20

0 10−7 0 10−7 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 ∆ R(lepton, next AK4 jet) ∆ R(lepton, next AK4 jet) rel ¯ Figure 6.3: The ∆R-pT -plane for tt events (left), and QCD events (right). The area which is cut out by the two-dimensional lepton isolation crite- rion is marked with a rectangle. Both figures are scaled to unity.

figures show that this isolation criterion removes the majority of QCD events, while keeping a high fraction of t¯t events. The applied cut is marked by a red rectangle. miss Both the pT and b tag requirements reduce the background contribution of W+jets and QCD. For b tagging a scale factor is applied, that corrects for differ- ences in tagging efficiencies between data and MC. Figure 6.4 shows the number of medium b tagged jets in the event before the selection criterion. Mainly W+jets and QCD events are rejected. After those selection criteria the resulting distributions are shown in Fig. 6.5. The left column shows the muon channel and the right column shows the elec- tron channel. The fraction of t¯t after those selections is 67%, which shows the good background suppression of the selection. Furthermore, a trend in the dis- tribution of pT of the first jet can be seen in the data to MC ratio which is also present in the other distributions. This trend comes from the softer top quark pT spectrum in data. This effect has been observed in several measurements of the t¯t cross section [33,34,65–68]. On top of this baseline selection, the following criteria have to be fulfilled:

` • pT > 55 GeV and |η| < 2.4,

• exactly two AK8 jets with pT,1 > 400 GeV

and pT,2 > 200 GeV,

• ∆R(`, j2) < 0.8, and

• Mj1 > Mj2+`. These are the same selection criteria that were defined at the particle level. The resulting final distributions are shown for the muon channel in Fig. 6.6 and for

49 6.4. Studies on Reconstruction Level

medium 35.9 fb•1 (13 TeV) NBjets

tt Events 107 DY+Jets Single top Diboson W+Jets 6 10 QCD

105

0 2 4 6 8 10 12 14 medium NBjets

Figure 6.4: Distribution of the number of b tagged jets using the medium work- ing point, before the selection on the number of b tagged jets.

the electron channel in Fig. 6.7. Because of the softer top quark pT spectrum, the t¯t samples have to be scaled by a constant factor of 0.77 to match the total number of events in data and simulation. This scale factor is only applied to obtain the figures showing the data to simulation comparison and is not used in the unfolding procedure. After the scale factor is applied, data and simulation show good agreement within uncertainty. The remaining background processes are mainly single top and W+jets pro- cesses. Because QCD has a fraction of < 2%, these events are negligible and are not shown in the figures. The fraction of t¯t events after the full selection is

85.12%. The plot of τ32 shows a good separation between background processes and t¯t processes. Background processes are mainly present at high values, while fully merged t¯t events have low values.

For the unfolding procedure the total τ32 plot is split into a low jet mass with mj1 < 152 GeV and high mass region with mj1 > 152 GeV to separate fully merged t¯t events from not merged t¯t events. These regions are shown in Fig. 6.8.

The lower mass region tends to higher values of τ32, due to unmerged events, while the high mass region shows more events at lower values of τ32.

50 6. Analysis

35.9 fb•1 (13 TeV) 35.9 fb•1 (13 TeV) 6 6 10 Data 10 Data

5 tt 5 tt Events 10 DY+Jets Events 10 DY+Jets Single top Single top 4 4 10 Diboson 10 Diboson W+Jets W+Jets 3 3 10 QCD 10 QCD

102 102

10 10

1 1

0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 1.5 1.5

1 1

Data / MC 0.5 Data / MC 0.5 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 p first AK8 jet [GeV] p first AK8 jet [GeV] T T 35.9 fb•1 (13 TeV) 35.9 fb•1 (13 TeV)

Data 6 Data 106 10 tt tt

Events Events 5 105 DY+Jets 10 DY+Jets Single top Single top 4 104 Diboson 10 Diboson W+Jets W+Jets 3 103 QCD 10 QCD

2 102 10

10 10

1 1

0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 1.5 1.5

1 1

Data / MC 0.5 Data / MC 0.5 0 100 200 300 400 500 600 700 800 9001000 0 100 200 300 400 500 600 700 800 9001000 pµ [GeV] pe [GeV] T T 35.9 fb•1 (13 TeV) 35.9 fb•1 (13 TeV)

Data 6 Data 106 10 tt tt

Events Events 5 105 DY+Jets 10 DY+Jets Single top Single top 104 Diboson 104 Diboson W+Jets W+Jets 3 3 10 QCD 10 QCD

102 102

10 10

1 1

0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 1.5 1.5

1 1

Data / MC 0.5 Data / MC 0.5 0 100 200 300 400 500 600 700 800 9001000 0 100 200 300 400 500 600 700 800 9001000 missing p [GeV] missing p [GeV] T T Figure 6.5: Distributions after the baseline selection criteria for the muon chan- nel (left), and the electron channel (right). The first row shows the miss pT of the first AK8 jet. The pT of the respective lepton and pT is shown in the second and third row, respectively. The gray area shows the statistical uncertainty on MC.

51 6.4. Studies on Reconstruction Level

35.9 fb•1 (13 TeV) 35.9 fb•1 (13 TeV) 350 160 Data Data

Events 300 Events tt 140 tt 250 DY+Jets 120 DY+Jets Single top Single top 200 Diboson 100 Diboson W+Jets 80 W+Jets 150 60 100 40 50 20

0 50 100 150 200 250 300 350 400 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.5 1.5

1 1

Data / MC 0.5 Data / MC 0.5 0 50 100 150 200 250 300 350 400 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 τ τ Mfirst AK8 jet [GeV ] 3/ 2 first TopJet 35.9 fb•1 (13 TeV) 35.9 fb•1 (13 TeV) 500

300 Data Data

Events tt Events 400 tt 250 DY+Jets DY+Jets Single top Single top 300 200 Diboson Diboson W+Jets W+Jets 150 200 100 100 50

0 200 400 600 800 1000 1200 0 100 200 300 400 500 600 700 800 900 1000 1.5 1.5

1 1

Data / MC 0.5 Data / MC 0.5 0 200 400 600 800 1000 1200 0 100 200 300 400 500 600 700 800 9001000 p first AK8 jet [GeV] pµ [GeV] T T 35.9 fb•1 (13 TeV) 35.9 fb•1 (13 TeV) 1600 500 Data 1400 Data

Events tt Events tt 400 DY+Jets 1200 DY+Jets Single top 1000 Single top 300 Diboson Diboson W+Jets 800 W+Jets 200 600 400 100 200

0 100 200 300 400 500 600 700 800 900 1000 0 2 4 6 8 10 12 14 1.5 1.5

1 1

Data / MC 0.5 Data / MC 0.5 0 100 200 300 400 500 600 700 800 9001000 0 2 4 6 8 10 12 14 missing p [GeV] medium T NBjets Figure 6.6: Final distributions of the muon channel. In the top row the mass of the first AK8 jet is shown on the left and the N-subjettiness ratio τ32 µ on the right. The pT distribution of the first AK8 jet and the pT of the muon are shown on the middle left and middle right, respectively. miss Displayed on the bottom left is pT and on the bottom right the number of medium b tagged jets. The gray area shows the statistical uncertainty on MC. The t¯t sample is scaled with a factor of 0.77.

52 6. Analysis

35.9 fb•1 (13 TeV) 35.9 fb•1 (13 TeV)

250 Data 120 Data Events tt Events tt DY+Jets 100 DY+Jets 200 Single top Single top Diboson 80 Diboson 150 W+Jets W+Jets 60 100 40

50 20

0 50 100 150 200 250 300 350 400 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.5 1.5

1 1

Data / MC 0.5 Data / MC 0.5 0 50 100 150 200 250 300 350 400 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 τ τ Mfirst AK8 jet [GeV ] 3/ 2 first TopJet 35.9 fb•1 (13 TeV) 35.9 fb•1 (13 TeV) 250 Data 300 Data

Events tt Events tt 200 DY+Jets 250 DY+Jets Single top Single top 150 Diboson 200 Diboson W+Jets W+Jets 150 100 100 50 50

0 200 400 600 800 1000 1200 0 100 200 300 400 500 600 700 800 900 1000 1.5 1.5

1 1

Data / MC 0.5 Data / MC 0.5 0 200 400 600 800 1000 1200 0 100 200 300 400 500 600 700 800 9001000 p first AK8 jet [GeV] pe [GeV] T T 35.9 fb•1 (13 TeV) 35.9 fb•1 (13 TeV) 1400 450 Data Data 400 1200 Events tt Events tt 350 DY+Jets 1000 DY+Jets 300 Single top Single top Diboson 800 Diboson 250 W+Jets W+Jets 200 600

150 400 100 200 50

0 100 200 300 400 500 600 700 800 900 1000 0 2 4 6 8 10 12 14 1.5 1.5

1 1

Data / MC 0.5 Data / MC 0.5 0 100 200 300 400 500 600 700 800 9001000 0 2 4 6 8 10 12 14 missing p [GeV] medium T NBjets Figure 6.7: Final distributions of the electron channel. In the top row the mass of the first AK8 jet is shown on the left and the N-subjettiness ratio τ32 on the right. The pT distribution of the first AK8 jet and e the pT of the electron are shown on the middle left and middle right, miss respectively. Displayed on the bottom left is pT and on the bottom right the number of medium b tagged jets. The gray area shows the statistical uncertainty on MC. The t¯t sample is scaled with a factor of 0.77.

53 6.4. Studies on Reconstruction Level

35.9 fb•1 (13 TeV) 35.9 fb•1 (13 TeV) 90 70 80 Data Data

Events Events 60 70 tt tt DY+Jets DY+Jets 50 60 Single top Single top 50 Diboson 40 Diboson W+Jets W+Jets 40 30 30 20 20 10 10

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.5 1.5

1 1

Data / MC 0.5 Data / MC 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ τ τ τ 3/ 2 first TopJet 3/ 2 first TopJet 35.9 fb•1 (13 TeV) 35.9 fb•1 (13 TeV)

100 100 Data Data

Events tt Events tt 80 DY+Jets 80 DY+Jets Single top Single top 60 Diboson 60 Diboson W+Jets W+Jets

40 40

20 20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.5 1.5

1 1

Data / MC 0.5 Data / MC 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ τ τ τ 3/ 2 first TopJet 3/ 2 first TopJet

Figure 6.8: Distribution of the N-subjettiness ratio τ32 for different jet mass

regions. In the top row are events with a jet mass of mj1 < 152 GeV

and in the bottom row are events with mj1 > 152 GeV. On the left is the muon channel and on the right the electron channel. The gray area shows the statistical uncertainty on MC. The t¯t sample is scaled with a factor of 0.77.

54 6. Analysis

6.5 Unfolding

At the LHC, most measurements compare measured distributions with simula- tions and categorize them into certain phase space regions, called bins. These measured distributions are the result of a folding of the true distribution with detector effects. Due to detector effects migrations between bins occur. Those migrations can be determined with a study of the difference in MC on particle level and detector level. The migrations and measured distribution can be used to obtain an estimate on the true distribution in which one is interested. This unfolding problem can be written as

m X y˜i = Aijx˜j, 1 ≤ i ≤ n (6.1) j=1 with m the number of bins of the true distribution, n the number of bins of the measured distribution, y˜i the average expected event count on detector level, Aij a matrix element describing the migration from the bin in x˜j to the bin in y˜i, and x˜j is the average of the true distribution. A schematic view of this problem is shown in Fig. 6.9. One is interested in a distribution x instead of the statistical mean x˜. Naively one could replace y˜i → yi and x˜j → xj and solve for xj by inverting the matrix A. But it turns out, that small statistical fluctuations of y would result in large fluctuations in x. To damp this effect a regularization is used. The TUnfold software package [62] provides a regularized unfolding approach and is described in the following.

6.5.1 Regularized Unfolding

The TUnfold method is searching for a stationary point in the Lagrangian

L(x, λ) = L1 + L2 + L3, (6.2) with

T L1 = (y − Ax) Vyy(y − Ax), (6.3) 2 T T L2 = τ (x − fbx0) (L L)(x − fbx0), (6.4) T L3 = λ(Y − e x), and (6.5) X Y = yi, (6.6) i X ej = Aij. (6.7) i

55 6.5. Unfolding

Figure 6.9: Schematic view of the unfolding procedure. The particle distribution x is folded with the detector response A. The result is an average detector level distribution y˜. The real measured distribution y can be different due to statistical fluctuations. Taken from [62].

The first term L1 is based on a least square minimization, where Vyy is the covariance matrix describing uncertainties of the input y.

The second term L2 introduces the regularisation which reduces fluctuations in 2 x. The parameter τ is the strength of the regularisation. The bias vector fbx0 suppresses deviations of x from fbx0 for fb = 1 and deviations from zero for fb = 0. The choice of the matrix L determines if the absolute value, the first or second derivative of x is regularized. The analysis presented here uses the option to regularize the absolute value of x.

The last term L3 is an optional area constraint. This constraint is regulated with the parameter λ. It is used to ensure that the resulting values in x corrected with the efficiency in e match the total events on detector level Y . In this analysis this constrain is not used. The regularisation parameter τ is determined with a minimization of the global correlation coefficient defined as: s 1 ρi = 1 − −1 , (6.8) (V xx)ii(V xx)ii with the covariance matrix Vxx of the output x. Many different ways of choosing the regularisation are implemented. The one used in this analysis is the min- imization of the average global correlations including systematic uncertainties, where the minimum is scanned over 75 iterations. The resulting regulariza- tion is shown in Fig. 6.10. The described setup finds the correct value of the regularization parameter τ to minimize the global correlation coefficient.

56 6. Analysis

) 1.2 τ (log

ρ 1.1 final τ value τ scan 1

0.9

0.8

0.7

0.6 −6 −5 −4 −3 −2 −1 0 log τ

Figure 6.10: Scan over the average global correlation coefficients in red. The optimal value of τ which minimizes the average global correlation is shown as black dot.

6.5.2 Determination of Bin Sizes

A suitable binning for the generator level and the detector level have to be found. For the generator level binning one limiting factor is the resolution of gen the distribution of τ32. To calculate the resolution of the distribution of τ32 , the gen distribution is first split into different regions of τ32 . In each region the ratio rec gen τ32 /τ32 is calculated and a Gaussian fit is performed. From the fit the one sigma band is obtained and the width w is calculated as w = 2σ. One example of this procedure is shown in Fig. 6.11. The mean value µ and the one sigma band are illustrated in the resulting fit. The obtained width w is defined as the resolution. With this definition the resolution is a measure for how precisely a value is reconstructed compared to the generated value. The obtained resolutions are shown in Fig. 6.12 (left). Here AK8 jets with CHS applied and AK8 jets with gen PUPPI applied are considered. For lower values of τ32 the resolution is worse for CHS jets than for PUPPI jets. For higher values they are similar as one would expect, because pileup effects have a larger impact on lower values of τ32. From the obtained resolutions an estimate of the bin width can be calculated. For the calculation the resolution of a specific region is multiplied by the central value of a given region. The resulting bin widths are shown in Fig. 6.12 (right). Because the resolution of PUPPI jets is better, the resulting bin widths are smaller than the bin widths of CHS jets. From these studies a bin width of 0.1 is expected to be appropriate for CHS jets over the whole range of τ32. A second factor for the final binning at the generator level are the purity and stability. The purity p = Nrec+gen is the ratio between the number of all events Nrec

57 6.5. Unfolding

120 Events 100 tt Fit 80 σ

60

40

20

0.6 0.8 1 µ 1.2 1.4 1.6 1.8 2 τrec τgen 32 / 32

rec gen Figure 6.11: Distribution of the number of events as a function of τ32 /τ32 . The fit of the distribution is displayed with the mean µ and the width 2σ.

0.35 0.16

0.3 0.14

bin width CHS resolution 0.12 PUPPI 0.25 CHS PUPPI 0.1 0.2 0.08 0.15 0.06

0.1 0.04

0.05 0.02

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τgen τgen 32 32 Figure 6.12: Resolution (left) and estimated bin width (right) as a function of gen τ32 for CHS jets (blue) and PUPPI jets (red).

58 6. Analysis

1 1

0.9 0.9 Purity

0.8 Stability 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ τ 32 32

Figure 6.13: Purity and stability as a function of τ32 estimated in t¯t simulation. that are generated and reconstructed in a given bin i and the number of events that are reconstructed in bin i. The stability s = Nrec+gen is the ratio between the Ngen number of events that are generated and reconstructed in a given bin i and the number of all events that are generated in bin i. After increasing the bin sizes especially at low τ32, both purity and stability are well above 40%. The purity and stability are shown in Fig. 6.13. The purity is high in the first bin and has a plateau for the other bins. The stability on the other hand is increasing for rec higher values of τ32. This can be explained due to the shift of τ32 from lower to gen higher values in the lower τ32 region. This shift is further studied in Fig. 6.14. rec gen gen Both figures show the mean value of τ32 /τ32 as a function of τ32 , split in a high (NPV > 20) pileup region and a low (NPV ≤ 20) pileup region, where NPV denotes the number of primary vertices. The left figure shows the mean value of CHS jets, and the right figure the mean value for PUPPI jets. While a shift rec gen between τ32 and τ32 is observed in CHS jets, the effect is largely reduced by the PUPPI algorithm. PUPPI jets are mostly unaffected from pileup effects, while

CHS jets reconstruct higher values of τ32, when pileup vertices are present. The resulting bin ranges are shown in Table 6.3. The lower bins have a much higher range then one would initially expect from Fig. 6.12. But to achieve high purity and stability, the bins have to be larger, due to the observed shift to higher τ32 values. The limiting factor for the reconstruction level binning is statistical precision. The reconstruction level binning is chosen such that every bin contains at least 120 events. For the regularized unfolding approach it is further required to use at least Nrec > 2Ngen bins, so that TUnfold can perform the least square minimization. Figure 6.15 shows the input distribution on detector level with two additional sidebands. The number of events in each bin is two times larger

59 6.5. Unfolding > >

gen 32 1.4 gen 32 1.4 τ τ

/ ≤ / NPV 20 N ≤ 20

rec 32 rec 32 PV τ 1.2 NPV > 20 τ 1.2 < < NPV > 20

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τgen τgen 32 32

rec gen gen Figure 6.14: Distribution of the mean values of τ32 /τ32 as a function of τ32 . The values are displayed for two pileup scenarios. The red line shows the mean value for events with NPV > 20, and the blue line shows the result for NPV ≤ 20. On the left are the effects for CHS jets, and on the right effects for PUPPI jets. The uncertainties on the mean values estimated from the fit are small and therefore not visible.

Table 6.3: Bin number and bin range for the generator level binning. gen Bin τ32 range underflow 0-0.2 1 0.2-0.45 2 0.45-0.60 3 0.60-0.73 4 0.73-0.85 5 0.85-1

60 6. Analysis

450

400 Events Data 350 tt Background 300

250

200

150

100

50

0 p m < m mjet < 155 GeV mjet > 155 GeV < 400 GeVj1 j2+lep T detector binning Figure 6.15: Input distribution of the combination of the muon channel and electron channel on reconstruction level. Green lines are indicating different regions. then stated before, because the figure shows the combination of electron and muon channel. Additionally the first region, indicated by the first green solid line, is split into a lower mass and higher mass region which is further explained in Section 6.5.3. Therefore the number of events is split between those regions.

6.5.3 Migration Matrix

For the unfolding procedure a migration matrix A with elements Aij has to be

filled. An event is filled into Aij if it contributes to bin i in the generator level distribution and to bin j in the detector level distribution. The projection of either dimension provide the distribution of the particle level or the detector level. For the migration matrix only events from the t¯t sample are considered, and they are normalized to the total number of events in the corresponding bin at particle level. The filled migration matrix is shown in Fig. 6.16. The x-axis gen shows the particle level bins for the distribution of τ32 and detector level bins rec for the distribution of τ32 are shown on the y-axis. The migration matrix entries

Aij read as a probability that a generated event in bin i is reconstructed in bin j. Events that pass only one selection are included in the respective underflow bin. The migration matrix is divided into different regions marked with red solid lines [33,34,63,64]. The measurement phase space is the first region enclosed by two red solid lines. It is further divided by the jet mass, to separate fully merged ¯ gen rec tt events in the higher mass region (mjet > 155 GeV, mjet > 152 GeV) from gen rec unmerged events in the lower mass region (mjet < 155 GeV, mjet < 152 GeV).

61 6.5. Unfolding

35.9 fb-1 (13 TeV)

j2+lep − 10 1 < m j1

m

< 400 GeV 400 <

T p detector binning

− 10 2 > 152 GeV jet m

− 10 3 < 152 GeV jet m

m < m m < 155 GeV m > 155 GeV p < 400 GeV j1 j2+lep jet jet T generator binning

Figure 6.16: The normalized migration matrix derived from t¯t events of the com- bination of electron and muon channel with all sidebands. Bins on generator and detector level are shown on the x-axis and y-axis, respectively.

This splitting is indicated by a dashed red line. After the unfolding both re- gions are combined to retain sufficient statistical precision. The measurement phase space shows, that most events are reconstructed on a diagonal, indicat- ing that events generated in a given bin, are mostly reconstructed in the same bin region. The diagonal is washed out due to the higher number of detec- tor level bins compared to the generator level, a slightly shifted mass window and the finite resolution of τ32. Additional sideband regions are implemented in the migration matrix to reduce the dependence on the simulation model. For these regions requirements on the measurement phase space are loosened.

For the first sideband the first AK8 jet is required to have a lower pT value of

300 GeV < pT < 400 GeV. The second sideband is the inverse mass criterion

(mj1 < mj2+`). The sidebands show a not negligible amount of events that migrate into the measurement phase space. The migration matrix has a total number of 65 bins on detector level and 24 bins on generator level.

6.5.4 Validation Tests

Before the data are unfolded, various tests are performed to verify the unfolding setup. Those tests ensure that the unfolding works independent of the used model and corrects only for detector and reconstruction effects. All validation tests are performed in the electron and muon channels, and both channels com- bined where the combination is performed prior to the unfolding. First, the MC

62 6. Analysis sample, that is used to fill the migration matrix, is unfolded with itself. The result is shown in Fig. 6.17. The unfolded MC distribution exactly matches its own particle level distribution, which proves a correct filling of the migration matrix. Another validation test is done with a split MC sample of t¯t. For this test 20% of the events are randomly selected and saved as pseudo data. The remaining 80% are used to fill the migration matrix. This splitting is done for three different sets of pseudo data. All three sets of pseudo data are then unfolded and com- pared to their prediction. The result is shown in Fig. 6.18. The line thickness of the prediction gives the statistical uncertainty. The unfolded pseudo data shows a good agreement with its truth for all three test samples. The last validation test checks the unfolding setup for model dependencies. For this test the renormalization and factorization scales µr and µf are varied by factors of 0.5 or 2. In Fig. 6.19 only the combination of the electron and muon channel is shown. Both scales are varied by a factor of 0.5 in the left plot and by a factor of 2 in the right plot. This validation shows no sign of a model dependence, because the unfolded distribution agrees within uncertainty with its respective prediction (blue) and is not pulled towards the prediction of the migration matrix. This also indicates that the regularization strength is not chosen too large.

63 6.5. Unfolding

35.9 fb-1 (13 TeV) 35.9 fb-1 (13 TeV) 400 400

[fb] Unfolded POWHEG [fb] Unfolded POWHEG 32 32 σ 350 σ 350 τ τ d d d d POWHEG prediction POWHEG prediction 300 300

250 250

200 200

150 150

100 100

50 50

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ τ 32 32 35.9 fb-1 (13 TeV) 800

[fb] Unfolded POWHEG 32 σ 700 τ d d POWHEG prediction 600

500

400

300

200

100

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ 32

Figure 6.17: Differential t¯t production cross section as a function of τ32 of the unfolded t¯t sample. On the top left is the result of the electron channel, on the top right of the muon channel, and in the bottom the result of the combination of both channels. The prediction of the nominal t¯t sample is displayed in red and the unfolded nominal t¯t sample in black. The inner error bars are the statistical uncer- tainties on the input. The total uncertainty is given by the outer error bars.

64 6. Analysis

35.9 fb-1 (13 TeV) 35.9 fb-1 (13 TeV) 500 500

[fb] Pseudo data prediction [fb] Pseudo data prediction 450 450 32 32 σ Unfolded pseudo data 1 σ Unfolded pseudo data 1 τ τ d d d 400 Unfolded pseudo data 2 d 400 Unfolded pseudo data 2 Unfolded pseudo data 3 Unfolded pseudo data 3 350 350

300 300

250 250

200 200

150 150

100 100

50 50

0 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ τ 32 32 35.9 fb-1 (13 TeV) 1000

[fb] Pseudo data prediction 900 32 σ Unfolded pseudo data 1 τ d d 800 Unfolded pseudo data 2 Unfolded pseudo data 3 700

600

500

400

300

200

100

0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ 32

Figure 6.18: Differential t¯t production cross section as a function of τ32 of the unfolded pseudo data. In the top left is the electron channel, in the top right the muon channel, and in the bottom the combination of both channels. The thickness of the prediction (red) gives the statistical uncertainty of it. Each pseudo data set is represented by a black marker.

65 6.6. Uncertainties

35.9 fb-1 (13 TeV) 35.9 fb-1 (13 TeV) 1000 800

[fb] Unfolded µ down, µ down [fb] Unfolded µ up, µ up 900 r f r f 32 32 σ σ 700 τ τ d d

d POWHEG prediction d POWHEG prediction 800 µ down, µ down µ up, µ up r f 600 r f 700

600 500

500 400

400 300 300 200 200

100 100

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ τ 32 32

Figure 6.19: Differential t¯t production cross section as a function of τ32 of the unfolded scale variations of the combination of the electron and muon channel. The result is compared to down variations of µr and µf in the left figure and up variations in the right figure as a test of the used model. In red is the prediction of the nominal t¯t sample, and in blue is the prediction of the model variation. In black is the resulting unfolding.

6.6 Uncertainties

In this section the considered uncertainties in the unfolding procedure are dis- cussed. The uncertainties are categorized into different sources. The first source is the statistical uncertainty arising from limited statistical precision in the mea- surement procedure. The second source are experimental uncertainties including uncertainties on jet corrections and scale factors accounting for efficiency dif- ferences between simulation and data. The last source contain the uncertainty arising from the choice of modeling parameters in simulation.

6.6.1 Statistical Uncertainties

The statistical uncertainty on the unfolding output arises from the uncertainty on the input data. This uncertainty is used to fill a covariance matrix.

6.6.2 Experimental Uncertainties

For the total systematic uncertainty at detector level different sources are con- sidered. The first set arises from the limited number of events of the simulation, which is used to fill the migration matrix and the statistical uncertainty on background processes which are subtracted before the unfolding procedure. The second set accounts for corrections of the simulation applied at detector

66 6. Analysis

100 stat ⊕ exp sys stat 90 MC stat background sys jet energy scale jet energy resolution 80 b tagging pileup muon ID muon trigger 70 electron ID electron trigger 60 electron reconstruction

50 relative uncertainty [%]

40

30

20

10

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ 32 Figure 6.20: The relative experimental and statistical uncertainties on the dif- ferential t¯t production cross section measurement as a function of τ32 of the combined channels. level. These corrections cause changes in the response matrix. Those changes are estimated with response matrices of the different variations, which are ±1σ variations. The shift between nominal and varied response matrix is propagated to the unfolded output within TUnfold. The resulting shift is used as systematic uncertainty. From the up (+1σ) and down (−1σ) variations, only the variation which leads to the largest total shift is used as uncertainty. The total shift is defined as the sum of all absolute values of the shifts across all measurement bins. For each uncertainty a covariance matrix is filled and added to the total covari- ance matrix. The considered uncertainties arise from jet energy correction, jet energy resolution smearing, pileup corrections, b tagging scale factors, uncer- tainties on the electron and muon ID scale factors, uncertainties on the electron and muon trigger scale factors, and on the electron reconstruction scale factors. An additional uncertainty arises from the uncertain production cross section of background processes, which are subtracted from data. The cross sections of these processes are varied within respective uncertainties and are again handled within TUnfold. The uncertainty of the production cross section of W+jets is 19% [69], of single top 23% [70], and of all other background processes 100%. The relative experimental and statistical uncertainties on the unfolding output of the combined channels are shown in Fig. 6.20. The experimental uncertainties are exceeded by the statistical uncertainties of both the input and the migration matrix. Important to note is, that uncertainties on jet energy scale and jet en-

67 6.6. Uncertainties

100

90 stat ⊕ model sys stat 80 MC stat scales µ , µ 70 R F choice of mt 60 ISR FSR

50 hdamp relative uncertainty [%]

40

30

20

10

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ 32 Figure 6.21: The relative model and statistical uncertainties on the differential t¯t production cross section measurement as a function of τ32 of the combined channels. ergy resolution may be underestimated, because those corrections are applied to the whole jet, which does not change the substructure of it. A different approach could be that those corrections are applied directly on each particle used to form the jet. This method is expected to have an impact on the substructure of a jet and would therefore change the influence of variations on the τ32 distribution.

6.6.3 Model Uncertainties

Model uncertainties account for effects of the chosen simulation model which was used to determine the migration matrix. For the uncertainty estimation a certain model is unfolded with a response matrix filled from the nominal t¯t simulation. The difference between model output and its truth is calculated. If more than one variation due to a model parameter is available, the average of the absolute values is constructed, and then used to calculate a covariance ma- trix. This covariance matrix is then added to the total systematic uncertainty. The considered uncertainties are coming from the renormalization and factor- ization scales µr and µf , respectively, different top quark masses, and parton shower variations of the ISR, FSR, and hdamp parameters. The relative model and statistical uncertainties on the unfolding output of the combined channels are shown in Fig. 6.21. No dominant uncertainty is recog- nizable, but the contributions from FSR and choice of the top mass have the largest impacts.

68 6. Analysis

1 1 1 1 32 32 τ τ

0.9 0.9 0.8 0.8

0.8 0.8 0.6 0.6 0.7 0.7 0.4 0.4 0.6 0.6

0.2 0.5 0.5 0.2

0.4 0 0.4 0

0.3 −0.2 0.3 −0.2 0.2 0.2 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 τ 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 τ 1 32 32 Figure 6.22: Correlation between each bin in the measurement phase space of the combined channels. The correlation of the statistical uncertainty is shown on the left, the total correlation on the right.

6.6.4 Total Uncertainty and Correlation

To derive the total uncertainty all covariance matrices are summed up. The model uncertainty together with the statistical uncertainty contribute the most to the total uncertainty. The dominating contribution to the model uncertain- ties arise from FSR and the choice of the top quark mass. The experimental uncertainties are exceeded by the statistical uncertainty and are also lower than the model uncertainties. For the correlation in each bin, a correlation matrix for the statistical uncertainty and the total uncertainty is calculated from the respective covariance matrix. The correlation of the statistical and total un- certainty between each bin of the unfolding output of the combined channel is shown in Fig. 6.22. In both plots the off-diagonals have a negative correlation which means that if one value is shifted in one direction, the neighboring bins tend to shift in the other direction. All other bins show a positive correlation.

6.7 Unfolding of Data

After the validation of the unfolding setup and estimation of uncertainties, the

τ32 distribution is measured in data. The unfolding is performed in the electron and muon channel separately. The compatibility between the electron and muon channel is shown in Fig. 6.23. Both channels are compatible and can therefore be combined. The combination is done prior to the unfolding procedure.

The differential t¯t production cross section as a function of τ32 is measured in the combination of the electron and muon channel. Figure 6.24 shows the dif-

69 6.7. Unfolding of Data

400

[fb] muon channel 32 σ 350 τ d d electron channel 300

250

200

150

100

50

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ 32 Figure 6.23: Compatibility between the electron and the muon channel. The inner error bar describes the statistical uncertainty and the outer error bar describes the total uncertainty. ferential cross section at particle level in data and is compared to the prediction of POWHEG. The uncertainty is small for lower values of τ32 where the fraction of fully merged t¯t events is higher and increases for higher τ32 values where the not merged fraction is higher. The unfolded data distribution is lower than the predictions from POWHEG. The extracted cross section after combining the electron and muon channel in the fiducial phase space is

σdata = 231 ± 9 (stat) ± 10 (exp) ± 19 (model) fb = 231 ± 24 (tot) fb, in data and

σMC = 333 ± 62 (theo) fb, in the POWHEG prediction. The theoretical uncertainty on the cross section in simulation is calculated from the model variations described in Section 6.6.3. The measured cross section in data is not compatible with the cross section from

POWHEG predictions. One reason could be the softer top quark pT spectrum in data than in simulations. Furthermore, the cross section is normalized to the total cross section of the distribution to compare the shapes of the distribution to different predictions of simulations. In Fig. 6.25 the normalized cross section as function of τ32 is com- pared between unfolded data and predictions of POWHEG and its FSR varia- tions. The FSR variations are shown because it has been found that changes in

70 6. Analysis

35.9 fb-1 (13 TeV) 800

[fb] Unfolded Data 32 σ 700 POWHEG τ d d POWHEG mjet > 155 GeV

600 POWHEG mjet < 155 GeV

500

400

300

200

100

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ 32

Figure 6.24: The differential t¯t production cross section as a function of τ32 of the unfolded data. The prediction of POWHEG is shown as a red line. The red area shows the contribution from the higher mass region, and the lower mass region is displayed in blue. The inner error bar describes the statistical uncertainty and the outer error bar describes the total uncertainty.

35.9 fb-1 (13 TeV) 3 32 σ τ

d Unfolded Data d

POWHEG 1 σ FSR up 2.5 FSR down

2

1.5

1

0.5

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ 32 Figure 6.25: The normalized differential t¯t production cross section as a function of τ32 of the unfolded data. The prediction of POWHEG is shown in red, and the FSR up variation in blue and the FSR down variation in green. The inner error bar describes the statistical uncertainty and the outer error bar describes the total uncertainty.

71 6.7. Unfolding of Data

FSR cause the largest change in the distribution. With this comparison data can be used to constrain the uncertainty due to FSR variations. Especially the first bin indicates an overestimation of the FSR uncertainty. Figure 6.26 shows the unfolded data compared to the predictions of the MAD- GRAPH and POWHEG generators. The prediction of POWHEG is shown for different tunes: CUETP8M2T4 and CP5. The latter tune was newly introduced in 2017. The figure shows the differential t¯t production cross section (left) and the normalized differential t¯t production cross section (right). The cross section in the fiducial phase space for the MADGRAPH and POWHEG CP5 predictions are

σMADGRAPH = 373 ± 52 (theo) fb (6.9) and

σCP5 = 320 ± 81 (theo) fb. (6.10)

Both cross sections of the predictions are not compatible with the measured cross section in data. The shape comparison of the MADGRAPH prediction shows a similar descrip- tion of data than the predictions of POWHEG CUETP8M2T4. The POWHEG CP5 prediction on the other hand shows a worse description, especially for lower values of τ32 where the fraction of fully merged t¯t events is higher. With this comparison the data can be used to constrain different tunes. These measure- ments show an especially high sensitivity for low values of τ32 where the fraction of fully merged t¯t events is higher for different tunes and parton shower varia- tions.

72 6. Analysis

35.9 fb-1 (13 TeV) 35.9 fb-1 (13 TeV) 1000 3 32 σ τ

Unfolded Data d Unfolded Data [fb] d

900 POWHEG CUETP8M2T4 POWHEG CUETP8M2T4 32 σ 1 σ τ

d MADGRAPH MADGRAPH

d 2.5 800 POWHEG CP5 POWHEG CP5

700 2 600

500 1.5

400 1 300

200 0.5 100

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ τ 32 32 Figure 6.26: The differential t¯t production cross section (left) and the normalized differential t¯t production cross section (right) as a function of τ32 of the unfolded data. The unfolded data is compared to three different generators, MADGRAPH (blue), POWHEG CUETP8M2T4 (red), and POWHEG CP5 (green). The inner error bar describes the statistical uncertainty and the outer error bar describes the total uncertainty.

73

7 Summary and Outlook

In the presented analysis the first measurement with the CMS experiment of the differential cross section as a function of τ32 in the boosted regime has been performed at 13 TeV. The detector level distribution is unfolded to particle level and provides a crucial input for the tune of t¯t simulations. In order to perform the measurement, a phase space definition is found where a large fraction of merged t¯t events is selected. Therefore, a study on the pT of the first jet was performed, to find a good compromise between high statistics and high fraction of jets containing all decay products. This results in a τ32 distribution where the matched fraction peaks at lower values, while the unmatched fraction peaks at higher values. The criteria found for the measurement phase space are also applied on recon- struction level. The resulting phase space on reconstruction level shows a high fraction of t¯t events. In the distribution of τ32 a clear separation between back- ground and t¯t events is visible. To determine the bin ranges for the unfolding, studies were performed on the resolution of τ32. The resulting bin widths are used as a starting point, to op- timize those widths for a high purity and high stability. The final bin widths provide high purity and stability well above 40% over the whole τ32 range. Bins on reconstruction level are chosen in a way, that there are at least two times more detector level bins than generator level bins. With the final binning a migration matrix with sidebands is filled. Various tests and cross checks are performed in order to optimize and validate the unfolding setup. The data are then unfolded to particle level and compared to different models and the nominal t¯t predictions. The measured cross section in data is smaller than predicted by simulation, due to a softer pT spectrum of the top quark in data. In order to constrain the uncertainty of the FSR modeling, the unfolded distribution from data is compared to the prediction of the FSR variations. The first bin shows an overestimation of the FSR variation, and further steps can be taken to constrain the uncertainty with data. Lastly the cross section and the shape of the unfolded data are compared to different predictions of generators and tunes. While the shape of the MADGRAPH generator and the POWHEG CUETP8M2T4 tune show good agreement with the data, the POWHEG CP5

75 tune does not describe the data well. The presented studies were performed on AK8 jets with CHS applied, but stud- ies on the resolution show, that AK8 jets with the PUPPI algorithm show better resolutions for lower values of τ32. This could be interesting for future studies, because this can lead to a smaller bin size in the unfolding and subsequently to a better sensitivity and a reduction of model dependences in the measurement. The presented analysis used ungroomed jet variables to define a selection. A grooming algorithm like Soft Drop [71] would reduce the influence on additional radiation inside the jet and a better separation between fully merged and not merged jets via the mass criterion could be achieved. A further improvement could be the use of the datasets from 2016, 2017, and 2018, which would reduce the statistical uncertainty. Furthermore, more side- bands could be added, and the sideband and measurement region could be finer binned. This would further result in a reduction of model uncertainties. The presented analysis provides a setup to study substructure variables of jets and could be used to measure different variables to provide more information to theorists and MC generators.

76 Bibliography

[1] Wikimedia Commons, “Standard Model of Elementary Particles”. https://en.wikipedia.org/wiki/File: Standard_Model_of_Elementary_Particles.svg. [Accessed 20.09.2019].

[2] D. J. Griffiths, “Introduction to Elementary Particles; 2nd rev. version”. Physics textbook. Wiley, New York, NY, 2008.

[3] M. E. Peskin and D. V. Schroeder, “An Introduction To Quantum Field Theory”. Addison-Wesley, Reading, USA, 1995.

[4] LHCb Collaboration Collaboration, “Observation of J/ψφ Structures Consistent with Exotic States from Amplitude Analysis of B+ → J/ψφK+ Decays”, Phys. Rev. Lett. 118 (Jan, 2017) 022003, doi:10.1103/PhysRevLett.118.022003.

[5] LHCb Collaboration Collaboration, “Observation of J/ψp Resonances 0 − Consistent with Pentaquark States in Λb → J/ψK p Decays”, Phys. Rev. Lett. 115 (Aug, 2015) 072001, doi:10.1103/PhysRevLett.115.072001.

[6] Particle Data Group, M. Tanabashi, et al., “Review of particle physics”, Phys. Rev. D 98 (2018) 030001, doi:10.1103/PhysRevD.98.030001.

[7] Super-Kamiokande Collaboration, “Evidence for oscillation of atmospheric neutrinos”, Phys. Rev. Lett. 81 (1998) 1562–1567, doi:10.1103/PhysRevLett.81.1562, arXiv:hep-ex/9807003.

[8] P. W. Higgs, “Broken Symmetries and the Masses of Gauge Bosons”, Phys. Rev. Lett. 13 (Oct, 1964) 508–509, doi:10.1103/PhysRevLett.13.508.

[9] CMS Collaboration, “Observation of a New Boson at a Mass of 125 GeV with the CMS Experiment at the LHC”, Phys. Lett. B716 (2012) 30–61, doi:10.1016/j.physletb.2012.08.021, arXiv:1207.7235.

[10] ATLAS Collaboration, “Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC”,

77 Bibliography

Phys. Lett. B716 (2012) 1–29, doi:10.1016/j.physletb.2012.08.020, arXiv:1207.7214.

[11] CMS Collaboration, “Precise determination of the mass of the Higgs boson and tests of compatibility of its couplings with the standard model predictions using proton collisions at 7 and 8 TeV”, Eur. Phys. J. C75 (2015), no. 5, 212, doi:10.1140/epjc/s10052-015-3351-7, arXiv:1412.8662.

[12] ATLAS Collaboration, “Measurements of the Higgs boson production and √ decay rates and coupling strengths using pp collision data at s = 7 and 8 TeV in the ATLAS experiment”, Eur. Phys. J. C76 (2016), no. 1, 6, doi:10.1140/epjc/s10052-015-3769-y, arXiv:1507.04548.

[13] ATLAS, CMS Collaboration, “Measurements of the Higgs boson production and decay rates and constraints on its couplings from a combined ATLAS and CMS analysis of the LHC pp collision data at √ s = 7 and 8 TeV”, JHEP 08 (2016) 045, doi:10.1007/JHEP08(2016)045, arXiv:1606.02266.

[14] CMS Collaboration, “Combined measurements of Higgs boson couplings in √ protonproton collisions at s = 13 TeV”, Eur. Phys. J. C79 (2019), no. 5, 421, doi:10.1140/epjc/s10052-019-6909-y, arXiv:1809.10733.

[15] ATLAS Collaboration, “Observation of Higgs boson production in association with a top quark pair at the LHC with the ATLAS detector”, Phys. Lett. B784 (2018) 173–191, doi:10.1016/j.physletb.2018.07.035, arXiv:1806.00425.

[16] ATLAS Collaboration, “Cross-section measurements of the Higgs boson √ decaying into a pair of τ-leptons in proton-proton collisions at s = 13 TeV with the ATLAS detector”, Phys. Rev. D99 (2019) 072001, doi:10.1103/PhysRevD.99.072001, arXiv:1811.08856.

[17] Planck Collaboration, “Planck 2015 results. XIII. Cosmological parameters”, Astron. Astrophys. 594 (2016) A13, doi:10.1051/0004-6361/201525830, arXiv:1502.01589.

[18] Planck Collaboration, “Planck 2018 results. VI. Cosmological parameters”, arXiv:1807.06209.

78 Bibliography

[19] CDF Collaboration, “Observation of top quark production in pp¯ collisions”, Phys. Rev. Lett. 74 (1995) 2626–2631, doi:10.1103/PhysRevLett.74.2626, arXiv:hep-ex/9503002.

[20] D0 Collaboration, “Search for high mass top quark production in pp¯ √ collisions at s = 1.8 TeV”, Phys. Rev. Lett. 74 (1995) 2422–2426, doi:10.1103/PhysRevLett.74.2422, arXiv:hep-ex/9411001.

[21] ATLAS, CDF, CMS, D0 Collaboration, “First combination of Tevatron and LHC measurements of the top-quark mass”, arXiv:1403.4427.

[22] J. Ellis, “TikZ-Feynman: Feynman diagrams with TikZ”, Comput. Phys. Commun. 210 (2017) 103–123, doi:10.1016/j.cpc.2016.08.019, arXiv:1601.05437.

[23] NNPDF Collaboration, “Parton distributions for the LHC Run II”, JHEP 04 (2015) 040, doi:10.1007/JHEP04(2015)040, arXiv:1410.8849.

[24] A. J. Larkoski, I. Moult, and B. Nachman, “Jet Substructure at the Large Hadron Collider: A Review of Recent Advances in Theory and Machine Learning”, arXiv:1709.04464.

[25] J. Thaler and K. Van Tilburg, “Identifying Boosted Objects with N-subjettiness”, JHEP 03 (2011) 015, doi:10.1007/JHEP03(2011)015, arXiv:1011.2268.

[26] A. J. Larkoski, S. Marzani, and J. Thaler, “Sudakov safety in perturbative QCD”, Physical Review D 91 (Jun, 2015) doi:10.1103/physrevd.91.111501.

[27] S. Alioli, P. Nason, C. Oleari et al., “A general framework for implementing NLO calculations in shower Monte Carlo programs: the POWHEG BOX”, JHEP 06 (2010) 043, doi:10.1007/JHEP06(2010)043, arXiv:1002.2581.

[28] J. Alwall, M. Herquet, F. Maltoni et al., “MadGraph 5 : Going Beyond”, JHEP 06 (2011) 128, doi:10.1007/JHEP06(2011)128, arXiv:1106.0522.

[29] J. Alwall, R. Frederix, S. Frixione et al., “The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations”, Journal of High Energy Physics 2014 (Jul, 2014) doi:10.1007/jhep07(2014)079.

79 Bibliography

[30] T. Sjostrand, S. Mrenna, and P. Z. Skands, “A Brief Introduction to PYTHIA 8.1”, Comput. Phys. Commun. 178 (2008) 852–867, doi:10.1016/j.cpc.2008.01.036, arXiv:0710.3820.

[31] CMS Collaboration, “Extraction and validation of a new set of CMS PYTHIA8 tunes from underlying-event measurements”, arXiv:1903.12179.

[32] GEANT4 Collaboration, “GEANT4: A Simulation toolkit”, Nucl. Instrum. Meth. A506 (2003) 250–303, doi:10.1016/S0168-9002(03)01368-8.

[33] CMS Collaboration, “Measurement of the jet mass in highly boosted tt √ events from pp collisions at s = 8 TeV”, Eur. Phys. J. C77 (2017), no. 7, 467, doi:10.1140/epjc/s10052-017-5030-3, arXiv:1703.06330.

[34] CMS Collaboration, “Measurement of the jet mass distribution and top quark mass in hadronic decays of boosted top quarks in pp collisions at √ s = 13 TeV”, 2019.

[35] ATLAS Collaboration, “Measurement of jet-substructure observables in top quark, W boson and light jet production in proton-proton collisions at √ s = 13 TeV with the ATLAS detector”, JHEP 08 (2019) 033, doi:10.1007/JHEP08(2019)033, arXiv:1903.02942.

[36] CMS Collaboration, “Measurement of jet substructure observables in tt √ events from proton-proton collisions at s = 13TeV”, Phys. Rev. D98 (2018), no. 9, 092014, doi:10.1103/PhysRevD.98.092014, arXiv:1808.07340.

[37] F. Marcastel, “CERN’s Accelerator Complex. La chaîne des accélérateurs du CERN”,. General Photo.

[38] ATLAS Collaboration, “The ATLAS Experiment at the CERN Large Hadron Collider”, JINST 3 (2008) S08003, doi:10.1088/1748-0221/3/08/S08003.

[39] CMS Collaboration, “The CMS Experiment at the CERN LHC”, JINST 3 (2008) S08004, doi:10.1088/1748-0221/3/08/S08004.

[40] LHCb Collaboration, “The LHCb Detector at the LHC”, JINST 3 (2008) S08005, doi:10.1088/1748-0221/3/08/S08005.

80 Bibliography

[41] ALICE Collaboration, “The ALICE experiment at the CERN LHC”, JINST 3 (2008) S08002, doi:10.1088/1748-0221/3/08/S08002.

[42] L. Evans and P. Bryant, “LHC Machine”, JINST 3 (2008) S08001, doi:10.1088/1748-0221/3/08/S08001.

[43] CMS Collaboration, “Delivered luminosity versus time for 2010-2012 and 2015-2018 (pp data only)”. https: //twiki.cern.ch/twiki/bin/view/CMSPublic/LumiPublicResults. accessed: 21.10.2019.

[44] CMS Collaboration Collaboration, “Cutaway diagrams of CMS detector”,.

[45] P. Adzic et al., “Energy resolution of the barrel of the CMS electromagnetic calorimeter”, JINST 2 (2007) P04004, doi:10.1088/1748-0221/2/04/P04004.

[46] CMS HCAL Collaboration, “Design, performance, and calibration of CMS hadron-barrel calorimeter wedges”, Eur. Phys. J. C55 (2008), no. 1, 159–171, doi:10.1140/epjc/s10052-008-0573-y.

[47] CMS Collaboration, “CMS. The TriDAS project. Technical design report, vol. 1: The trigger systems”,.

[48] CMS Collaboration, “CMS: The TriDAS project. Technical design report, Vol. 2: Data acquisition and high-level trigger”,.

[49] D. Barney, “CMS Detector Slice”, (Jan, 2016). CMS Collection.

[50] A. Sirunyan, A. Tumasyan, W. Adam et al., “Particle-flow reconstruction and global event description with the CMS detector”, Journal of Instrumentation 12 (oct, 2017) P10003–P10003, doi:10.1088/1748-0221/12/10/p10003.

[51] CMS Collaboration, “Muon ID”. https: //twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideMuonIdRun2. [Accessed 28.10.2019].

[52] CMS Collaboration, “Electron ID”. https://twiki.cern.ch/twiki/bin/ view/CMS/CutBasedElectronIdentificationRun2. [Accessed 28.10.2019].

[53] M. Cacciari, G. P. Salam, and G. Soyez, “FastJet user manual”, The European Physical Journal C 72 (Mar, 2012) doi:10.1140/epjc/s10052-012-1896-2.

81 Bibliography

[54] M. Cacciari, G. P. Salam, and G. Soyez, “The anti-kt jet clustering algorithm”, JHEP 04 (2008) 063, doi:10.1088/1126-6708/2008/04/063, arXiv:0802.1189.

[55] CMS Collaboration, “Pileup Removal Algorithms”, Technical Report CMS-PAS-JME-14-001, CERN, Geneva, (2014).

[56] D. Bertolini, P. Harris, M. Low et al., “Pileup per particle identification”, Journal of High Energy Physics 2014 (Oct, 2014) doi:10.1007/jhep10(2014)059.

[57] CMS Collaboration Collaboration, “Pileup mitigation at CMS in 13 TeV data”, Technical Report CMS-PAS-JME-18-001, CERN, Geneva, (2019).

[58] CMS Collaboration, “Determination of jet energy calibration and transverse momentum resolution in CMS”, Journal of Instrumentation 6 (Nov, 2011) P11002âĂŞP11002, doi:10.1088/1748-0221/6/11/p11002.

[59] CMS Collaboration, “Identification of b-Quark Jets with the CMS Experiment”, JINST 8 (2013) P04013, doi:10.1088/1748-0221/8/04/P04013, arXiv:1211.4462.

[60] CMS Collaboration Collaboration, “Identification of b quark jets at the CMS Experiment in the LHC Run 2”, Technical Report CMS-PAS-BTV-15-001, CERN, Geneva, (2016).

[61] T. C. collaboration, “Btag Recommendation”. https: //twiki.cern.ch/twiki/bin/viewauth/CMS/BtagRecommendation. [Accessed 30.10.2019].

[62] S. Schmitt, “TUnfold, an algorithm for correcting migration effects in high energy physics”, Journal of Instrumentation 7 (oct, 2012) T10003–T10003, doi:10.1088/1748-0221/7/10/t10003.

[63] D. Schwarz. PhD thesis, University of Hamburg, in preparation.

[64] T. Dreyer, “First measurement of the jet mass in events with highly boosted top quarks and studies with top tagging at CMS”. PhD thesis, University of Hamburg, 2019.

[65] CMS Collaboration, “Measurement of differential cross sections for the production of top quark pairs and of additional jets in lepton+jets events √ from pp collisions at s = 13 TeV”, Phys. Rev. D97 (2018), no. 11, 112003, doi:10.1103/PhysRevD.97.112003, arXiv:1803.08856.

82 Bibliography

[66] CMS Collaboration, “Measurement of the integrated and differential tt¯ √ production cross sections for high-pt top quarks in pp collisions at s = 8 TeV”, Phys. Rev. D94 (2016), no. 7, 072002, doi:10.1103/PhysRevD.94.072002, arXiv:1605.00116.

[67] ATLAS Collaboration, “Measurements of tt¯ differential cross-sections of highly boosted top quarks decaying to all-hadronic final states in pp √ collisions at s = 13 TeV using the ATLAS detector”, Phys. Rev. D98 (2018), no. 1, 012003, doi:10.1103/PhysRevD.98.012003, arXiv:1801.02052.

[68] ATLAS Collaboration, “Measurement of the differential cross-section of highly boosted top quarks as a function of their transverse momentum in √ s = 8 TeV proton-proton collisions using the ATLAS detector”, Phys. Rev. D93 (2016), no. 3, 032009, doi:10.1103/PhysRevD.93.032009, arXiv:1510.03818.

[69] CMS Collaboration, “Measurement of the production cross section of a W √ boson in association with two b jets in pp collisions at s = 8 TeV”, Eur. Phys. J. C77 (2017), no. 2, 92, doi:10.1140/epjc/s10052-016-4573-z, arXiv:1608.07561.

[70] CMS Collaboration, “Observation of the associated production of a single √ top quark and a W boson in pp collisions at s =8 TeV”, Phys. Rev. Lett. 112 (2014), no. 23, 231802, doi:10.1103/PhysRevLett.112.231802, arXiv:1401.2942.

[71] A. J. Larkoski, S. Marzani, G. Soyez et al., “Soft Drop”, JHEP 05 (2014) 146, doi:10.1007/JHEP05(2014)146, arXiv:1402.2657.

83

Eidesstattliche Erklärung

Ich versichere, dass ich die beigefügte schriftliche Masterarbeit selbstständig angefertigt und keine anderen als die angegebenen Hilfsmittel benutzt habe. Alle Stellen, die dem Wortlaut oder dem Sinn nach anderen Werken entnom- men sind, habe ich in jedem einzelnen Fall unter genauer Angabe der Quelle deutlich als Entlehnung kenntlich gemacht. Dies gilt auch für alle Informatio- nen, die dem Internet oder anderer elektronischer Datensammlungen entnom- men wurden. Ich erkläre ferner, dass die von mir angefertigte Masterarbeit in gleicher oder ähnlicher Fassung noch nicht Bestandteil einer Studien- oder Prüfungsleistung im Rahmen meines Studiums war. Die von mir eingereichte schriftliche Fassung entspricht jener auf dem elektronischen Speichermedium. Ich bin damit einverstanden, dass die Masterarbeit veröffentlicht wird.

Ort, Datum Jan Skottke

85

Danksagung

Zunächst möchte ich mich bei Prof. Dr. Johannes Haller bedanken, dass er mir die Möglichkeit gegeben hat, meine Masterarbeit in seiner Forschungsgruppe zu einem so interessantem Thema zu verfassen. Zusätzlich möchte ich mich für die zahlreichen Denkanstöße während der wöchentlichen Gruppentreffen und die Möglichkeit, an diversen Konferenzen teilnehmen zu dürfen, bedanken. Ganz besonders möchte ich mich bei Dr. Roman Kogler bedanken der nicht nur das Zweitgutachten dieser Arbeit übernommen hat, sondern auch für zahlreiche Kommentare und Denkanstöße, welche die Analyse weiter vorangebracht haben, gesorgt hat. Des Weiteren danke ich Anna Benecke und insbesondere Dennis Schwarz für die tolle und lustige Zeit im Büro, die hervorragende Betreuung meiner Arbeit, die Beantwortung vieler Fragen und dem Korrekturlesen dieser Arbeit. Danke! Außerdem danke ich allen anderen Mitgliedern der Arbeitsgruppe für die sehr angenehme Atmosphäre und Beantwortung vieler Fragen. Insbesondere möchte ich mich bei Henrik Jabusch und Nino Ehlers bedanken, die zeitgleich mit ihrer Masterarbeit angefangen haben. Zusammen haben wir echt einige Probleme gelöst und uns tapfer durch dieses Jahr geschlagen! Zum Schluss möchte ich meiner Familie, ganz besonders meinen Eltern, und meinen Freunden danken, die mich immer unterstützt und an mich geglaubt haben.

87