CERN-THESIS-2012-336 University of Trieste

Faculty of Mathematical, Physical and Natural Sciences Master of Science in Physics

Measurement of the associated production of a Z boson and hadronic jets with the CMS detector at LHC

Candidate: Supervisor: Tomo Umer Dr. Giuseppe Della Ricca

Assistant supervisor: Dr. Fabio Cossutti

Academic Year 2011/2012 - Summer Session . .

To my best friend and my girlfriend, who are luckily the same person.

Contents

Index i

Introduzione 1

Introduction 3

1 The Physics of LHC and the CMS detector 5 1.1 Large Collider ...... 5 1.1.1 A run-down of the accelerator ...... 7 1.1.2 Current LHC operational conditions ...... 8 1.1.3 Coordinate system and kinematic variables ...... 10 1.2 The ...... 13 1.2.1 Physics goals ...... 13 1.2.2 Overview of the CMS detector ...... 15 1.2.3 Tracker ...... 17 1.2.4 Electromagnetic calorimeter ...... 20 1.2.5 Hadronic calorimeter ...... 24 1.2.6 Superconducting solenoidal magnet ...... 25 1.2.7 Muon detectors ...... 26 1.2.8 Trigger and Data acquisition system ...... 28

2 Theoretical basis for the Z + jets production 31 2.1 Basis of the Standard Model ...... 31 2.1.1 Electroweak interactions ...... 34 2.1.2 Strong interactions ...... 35 2.1.3 Description of a proton-proton collision ...... 36 2.2 Z + jets associated production ...... 37 2.2.1 Drell-Yan process ...... 37 2.2.2 Multijet production ...... 40 2.2.3 Study of the associated production of Z boson + jets at the LHC ...... 41 2.3 Jets ...... 42

3 Monte Carlo Event Generators 45 3.1 Introduction to Event Generators ...... 45 3.2 Matrix Elements based generators ...... 47 ii Contents

3.3 Parton Shower based generators ...... 49 3.3.1 Initial and Final State Radiation ...... 51 3.4 Combining Matrix Element and Parton Shower generators . . 52 3.4.1 Merging ...... 52 3.4.2 Vetoed Parton Shower ...... 53 3.5 Pythia 6.4 ...... 54 3.6 MadGraph 5 ...... 57 3.7 Madgraph + Pythia ...... 58 3.8 Sherpa ...... 60

4 Data - Monte Carlo Comparison and Analysis 63 4.1 Jet Production Rates in Association with W and Z Bosons . 63 4.2 Rivet ...... 64 4.2.1 Rivet analyses ...... 65 4.2.2 Z + jets analysis in Rivet ...... 65 4.3 Estimation of Monte Carlo generators uncertainties ...... 67 4.3.1 Central predictions ...... 67 4.3.2 Different Pythia tunes ...... 68 4.3.3 Renormalization and factorization scales ...... 68 4.3.4 Parton density function choice ...... 75

Conclusions 85

Bibliography 89 Introduzione

Lo studio della produzione associata di un bosone Z0 con getti adronici considerata in questa tesi `eimportante allo scopo di verificare la cromo- dinamica quantistica perturbativa. Inoltre, una misura precisa della sezione d’urto del processo Z + n getti adronici `eessenziale siccome il suddetto processo costituisce un fondo significativo ad altri processi interessanti del Modello Standard (MS) o processi che non sono inclusi modello standard. Questa analisi `estata svolta presso l’esperimento Compact Muon Solenoid (CMS) al (LHC), utilizzando i dati ottenuti nel 2011. Lo scopo principale di questa tesi `elo studio delle incertezze sistematiche relative alle predizioni dei generatori Monte Carlo (MC) sulle osservabili in- trodotte nell’analisi Z+ getti adronici. E` noto che con i calcoli attualmente disponibili al secondo ordine nella teoria della cromodinamica quantistica perturbativa le incertezze sistematiche delle predizioni dei generatori Monte Carlo variano tra il 10 e il 30%. Variazioni cos´ıgrandi sono dovute soprat- tutto a incertezze relative alle distribuzioni dei partoni e alla natura stessa dei calcoli perturbativi che necessitano l’introduzione di due fattori non fisici, la scala della rinormalizzazione e la scala della fattorizzazione. Di fatto il lavoro svolto consiste nel considerare valori diversi dei suddetti parametri (la scelta della distribuzione dei partoni e le due scale) e studi- are successivamente le variazioni ottenute nelle distribuzioni delle osservabili della medesima analisi. Nel Capitolo 1 viene discussa la struttura di LHC, e viene descritto in modo approfondito il rivelatore CMS. Nel Capitolo 2 vengono riassunte le basi teoriche dell’analisi Z+ getti adronici. Il Capitolo 3 discute degli aspetti teorici e sperimentali relativi ai generatori MC. Nell’ultimo Capitolo 4 viene presentato il lavoro vero e proprio svolto per questa tesi, iniziando con una breve descrizione della strategia dell’ analisi Z+ getti adronici ancora in fase di sviluppo. Nel seguito del capitolo viene introdotto il programma che per- mette l’implementazione delle analisi sui campioni generati con i programmi Monte Carlo. Infine vengono spiegate le scelte dei diversi generatori MC e relativi parametri, assieme con i grafici risultanti.

Introduction

The associated production of a Z0 boson with hadronic jets analysis con- sidered in this thesis is a stringent test of perturbative quantum chromo- dynamics. In addition, a precise measurement of the Z + n hadronic jets cross-section is crucial since this process is a background for Standard Model (SM) and beyond SM physics. This analysis was performed using the Com- pact Muon Solenoid (CMS) experiment at Large Hadron Collider (LHC), and the 2011 dataset. The goal of this thesis is the study of uncertainties of Monte Carlo (MC) generators predictions by making use of the above mentioned Z+ hadronic jets analysis. In particular, it is known that the MC predictions system- atic uncertainties range from 10 to up to 30%, using the currently available next-to-leading order calculus. This large range is mainly due to the un- certainties on the parton distribution functions (PDF) and on the nature of perturbative calculations, which make them dependent on the choice of renormalization and factorization scales. Specifically, one has to consider different values for the aforementioned pa- rameters (the PDF choice and the two scales) and study the results obtained in the observables provided by the Z+ hadronic jets analysis. In Chapter 1 the structure of the LHC is described, together with a detailed look at the CMS detector. In Chapter 2 the theoretical background needed for the Z+ hadronic jets analysis is explained. In Chapter 3 the theoretical and practical aspects of the MC generators are explored. Chapter 4 presents the core of the work, beginning with a brief description of the strategy of the still developing Z+ hadronic jets analysis. In the following, the tool that allows the implementation of the analysis on the MC generated sam- ples is presented. Lastly, the choices made for the different MC generators are explained, along with the resulting plots.

Chapter 1

The Physics of LHC and the CMS detector

1.1 Large Hadron Collider

The high energy physics colliders can be subdivided according to various categories, with one of them being based on the types of particles that are being collided. One then speaks of hadron colliders or lepton colliders. Due to the heavier mass of with respect to the leptons, the energy lost with each particle curving (Bremsstrahlung effect) is negligible. This means that given a fixed radius of a circular collider, if it accelerates hadrons (again, as opposed to leptons), the center of mass energy achieved can be higher. On the other hand, the composite structure of the hadrons is responsible for multiple interactions in a single proton-proton collision. Furthermore, the exact values of the partons momentum are not known a priori, but are subject to a probability distribution, which makes the measurement more difficult.

The Large Hadron Collider (LHC) is a storage ring used to accelerate and collide protons and heavy ions, built in the Center for Nuclear Research (CERN) laboratory, close to Geneva. The LHC utilizes a 26.7 km long cir- cular tunnel dug up between 1984 and 1989 for the CERN LEP collider [1]. It was designed to accelerate hadrons up to 7 TeV per beam with a in- stantaneous luminosity of 1034 cm−2s−1. The beam energy and the design luminosity have been chosen to study physics at the TeV scale. Specifically, compared to the previous hadron collider experiments there was a seven-fold increase in energy and hundred-fold increase in integrated luminosity. The main goal of the LHC is to understand the mechanism of electroweak symmetry breaking for which the Higgs mechanism is presumed to be re- sponsible, successfully rounding up the Standard Model (SM). A candidate for the Higgs boson has been found by two LHC experiments, both ATLAS and CMS collaboration. Other alternatives and extensions to the SM exist, for example the super symmetries, new quarks or forces. The energy range of 6 The Physics of LHC and the CMS detector

Figure 1.1: Aerial view of the LHC complex with the six experiments.

LHC should give us the possibility to probe and perhaps validate or discard these new physics scenarios.

To explore the physics at this scale, six experiments have been built along the LHC (Fig. 1.1):

A Toroidal Lhc ApparatuS (ATLAS), • Compact Muon Solenoid (CMS), • Large Hadron Collider beauty (LHCb), • A Large Ion Collider Experiment (ALICE), • LHC-forward (LHCf), • TOTal Elastic and diffractive cross-section Measurement (TOTEM). • In particular, along the beam pipe there are four interaction points at which the ATLAS, CMS, ALICE and LHCb experiments are set. The first two experiments are the so called “general purpose” ones, meaning that their physics goal is various, with emphasis on studying the Higgs boson. The latter two experiments are more specific, with ALICE focusing the study on the Heavy ions collisions and the quark-gluon plasma, while the LHCb focuses in great detail to the study of the beauty quark decays. The LHCf and TOTEM experiments are instead placed along the beam pipe after the interaction points and are focused on studying particles produced at very small angles with respect to the beam pipe. The aim of the LHCf 1.1.1 A run-down of the accelerator 7

Figure 1.2: The accelerator complex. experiment is to study the neutral-particle production in the very forward region, while the TOTEM’s physics program is dedicated to the precise mea- surement of the proton-proton interaction total cross-section, as well as to the in-depth study of the proton structure which is still poorly understood.

1.1.1 A run-down of the accelerator In this section the technical details of the whole accelerator complex are presented. The chain of accelerators leading up to the LHC is depicted in Fig. 1.2 and can be summarised as:

Linac 2 [2] is a linear accelerator approximately 36 m long that pro- • vides a beam of protons (a bunch with a time interval between 20 and 150 µs) of 50 MeV energy with a current of about 150 - 180 mA and repetition rate of 1 Hz. Note that Linac 3 is used to produce Heavy Ions instead;

Proton Synchrotron Booster (only “Booster” in Fig. 1.2) is of a circu- • lar shape approximately 157 m long which boosts the protons up to 1.4 GeV energy;

Proton Synchrotron (PS) is a bigger circumference of 628 m, increasing • the energy of the protons up to 26 GeV;

Super Proton Synchrotron (SPS) has a bigger circumference of length • 6.9 km and is used to accelerate the protons from 26 GeV to 450 GeV, 8 The Physics of LHC and the CMS detector

and successively injecting them into the LHC.

A detailed description of the various sub-parts of the accelerators is beyond the scope of this thesis, whose focus is primarily the LHC itself. As already mentioned, protons are accelerated from 450 GeV to its final value (4 TeV as of 2012 and ultimately 7 TeV as per design). The ring is made of 8 arcs, each approximately 3 km long, and 8 straight sections about 700 m long. With particle-antiparticle colliders (Tevatron), the two beams set to collide can share the same phase space in a single ring. This is not the case for LHC where we have identical particles in both beams, meaning that they have the same electric charge and thus cannot simultaneously circle a single ring in two opposite directions. For this purpose LHC has two rings in which beams counter-rotate due to twin bore magnets with oppositely oriented magnetic field. The magnets are composed of two sets of superconducting coils and beam channels with the same mechanical structure and cryostat. There are two different types of magnets along the ring. First the 1232 dipole bending magnets 15 m long, used to keep the beams in the right orbit along the circumference. The dipoles develop a field of 8.4 T perpendicular to the beam direction. This high value of the magnetic field is achieved by circu- lating the current in the superconducting regime, which is in turn achieved by cooling every single magnet to 1.9 K using super-fluid 3He as a coolant. The second type of magnets are the 392 quadrupole magnets, 5-7 m long, to focus the beams inside the beam-pipe. Next step is of course colliding the hadrons, which happens in previously mentioned four interaction points, where the beams are collimated and fo- cused by special magnets in order to increase the probability of interaction between the protons (Fig. 1.3).

1.1.2 Current LHC operational conditions

The protons in the beams are clustered into bunches of approximately 1011 protons per bunch. Currently each beam consists of 1374 bunches spaced by 50 ns - thus the rate of bunch crossings is 20 MHz. As men- tioned before, the energy per beam during the 2012 run is at a value of 4 TeV. A very important parameter for a collider is the instantaneous luminosity, defined as:

2 Np nbfrevγ L = ∗ F 4πnβ where Np is the number of particles per bunch, nb the number of bunches per beam, frev the revolution frequency, γ the relativistic gamma factor, n the normalized transverse beam emittance - describing the spread of parti- cles in the phase space - and β∗ the beta function at collision point. F is the geometric luminosity reduction factor due to the beam crossing angle at the 1.1.2 Current LHC operational conditions 9

Figure 1.3: LHC schematized. interaction point.

The value of the instantaneous luminosity can fluctuate during the machine operations. The LHC design luminosity is 1034 cm−2 s−1, and it has in- creased from a maximum of 1032 cm−2 s−1 in the 2010 run to the 2012 maximum value of about 6.5 1033 cm−2 s−1 and still rising (Fig. 1.4). The × instantaneous luminosity is important because it has a direct impact on the number of interactions per bunch crossing. Specifically, the probability of having n interactions per bunch crossing follows the Poisson distribution

(L σ)n (n) = · exp−L·σ P n! where L is the luminosity per bunch crossing and σ is the total proton- proton cross section. The integrated luminosity, that is the integral of instantaneous luminosity with respect to time, is used to quantify the amount of interactions pro- duced during a data taking period. The total number of events of a certain process R process is then given by the N = σprocess Ldt. It is evident that the tot · cross-section at a fixed integrated luminosity is proportional to the number of events of a certain type. In Fig. 1.6 various processes and their cross- sections are shown. By far the dominant process is the minimum-bias one (uninteresting processes, mostly inelastic scattering), with σminbias 70 mb, ∼ while the cross-section for an interesting process, for example the Z boson production, amounts to a few nb [3]. The consequence is that for each inter- 10 The Physics of LHC and the CMS detector

Figure 1.5: The values of integrated lumi- Figure 1.4: The values of instanta- nosity as collected by the CMS experiment, neous luminosity through 2012 as seen through the last three years of the LHC work- by the LHC. ing. esting vertex in a bunch crossing, there will be several minimum-bias ones, called pile-up. During the 2011 data taking period an order of 20 pile-up events was measured (Fig. 1.7). In summary, the luminosity is an important parameter in particle physics that influences both the number of interesting events, and the amount of pile-up gathered. During the 2010 the LHC delivered a total of of 47 pb−1, during 2011 5.7 fb−1 were collected while in 2012 it is expected to deliver around 15 fb−1 (Fig. 1.5). In Table 1.1 a few key parameters are shown.

Parameter Design value Current value beam injection energy [TeV] 0.45 0.45 beam energy [TeV] 7 4 number of bunches per beam 2808 1374 beam envelope at IP 1 and 5 [m] 0.55 0.6 number of particles per bunch [1011] 1.15 1.5 norm. transverse emittance [µm rad] 3.75 2.4 colliding beam size [µm] 16 18 stored beam energy [MJ] 362 110

Table 1.1: This table shows some key LHC parameters.

1.1.3 Coordinate system and kinematic variables All experiments positioned along LHC use the same coordinate system and a few conventions for the kinematic variables generally used in the field of high energy physics. In the following, the natural units will be used, meaning that c = ~ = 1. Placing the origin at one of the interaction points, the z-axis lies tangent to the accelerator ring, while the x-axis and y-axis lie in the perpendicular 1.1.3 Coordinate system and kinematic variables 11

Figure 1.6: Energy dependence of cross-sections of some interesting processes.

Figure 1.7: The CMS event display picture for a collected high pile-up event, where the different reconstructed primary vertices are clearly visible. 12 The Physics of LHC and the CMS detector

Figure 1.8: The relation between pseudorapidity and θ. plane. The positive side of the x-axis points to the center of the ring, while the positive side of the y-axis points upwards . The coordinate system is thus right-handed. It is accustomed to use the polar system instead, with the following variables: p distance from the beam axis r = x2 + y2 • x azimuth angle φ = arctan( ) • y y polar angle θ = arctan( ) • z However, since the polar angle is not an invariant for Lorenz boosts, another two variables are defined, see Fig.(1.8): 1 E + p rapidity y = ln( L ) • 2 E pL − pseudorapidity η = ln(tan(θ/2)) • − In the approximation of massless particles, the two are numerically identical. The coordinates used to locate a particle in space around the interaction point are then (r, φ, η), where the latter two identify the direction of an outgoing particle. It is not possible to determine the exact parton momentum, because of the confinement (See section 2). What is generally done is to consider the transverse momentum component of the partons to be negligible with respect to the longitudinal one. It then turns out useful to define a few kinematic variables as follows:

particle transverse momentum p⊥ = p sin θ [GeV] • q 2 2 particle transverse energy E⊥ = p + m [GeV] • ⊥ Pn i missing transverse energy Emiss = p [GeV] • − i=1 ⊥ where i runs over all measured particles. Since the initial transverse en- ergy of the particles is negligible, if it were possible to measure all outgoing 1.2 The Compact Muon Solenoid 13 particles, then also the final transverse momentum should be exactly zero because of the momentum conservation principle. However, that is not the case, for there are escaping weakly interacting particles (neutrinos), and the geometrical acceptance of the detector is not perfect. Furthermore the re- construction efficiency does not cover the whole solid angle 4π. This can result in large values for the missing transverse energy.

1.2 The Compact Muon Solenoid

With the design luminosity and a total cross-section 100 mb, evaluated for ∼ √s = 14 TeV, the expected rate of proton-proton collision events measured by the experiments is around 109 events/s. This to several technical difficulties that must be undertaken in order to be able to perform some measurements. Specifically, the storage of data cannot be faster than 100 events/s, which means the number of proton-proton collision events has to be reduced online by a factor of 107 via selection process called trigger. Furthermore, the short time interval between bunch crossing being 25 ns as design value, has important consequences for the design of the read-out and trigger systems. Overall, these operating conditions require careful planning when designing the detectors [4]. At design luminosity and center-of-mass energy a single event can produce a number of particles in the order of tens or hundreds. Taking into account that for each bunch crossing there will be about 20 minimum bias events (this number increases with higher instantaneous luminosity) superimposed to the interesting one, results in the production of thousands of particles (excluding neutrinos) every 25 ns that have to be detected. This effect increases when taking into account that the response time of a detector and its electronics is greater than 25 ns (easily in the order of hundreds of ns). To compensate for this effect a high-granularity detector is needed (to distinguish various particles as precisely as possible) with good time resolution. Consequently this means a large number of detector channels and relative electronics that transports the signal. The resulting millions of detector channels require very good synchronisation and calibration. Furthermore the large flux of particles coming from the interaction point leads to high radiation levels, especially in the forward region, requiring radiation-hard detectors and front-end elec- tronics. In the following sections the structure of the CMS detector is described in detail.

1.2.1 Physics goals

The CMS experiment has been built in order to study the physics of the Standard Model in great detail, and to investigate theoretical predictions that go beyond the SM. Before going to the detector structure, here follow a list of multiple goals. 14 The Physics of LHC and the CMS detector

Figure 1.9: Branching ratios of the main different decays of the Higgs boson.

Tests of the standard model

The LHC will continue upon improving some SM measurements (electroweak and QCD physics in particular) with higher precision. Just to name a few, the study of top-quark physics, W and Z boson physics, the physics of the hadronic jets.

Higgs Boson search

The main focus of the CMS detector is on the study of the Higgs boson, the scalar particle responsible for giving mass to all particles through the mecha- nism of spontaneous breaking of the SU(2)L U(1)Y electroweak symmetry. ⊗ The production cross section of the Higgs boson is very small compared to the minimum bias cross section at the LHC. As other short lived particles, it can be searched for through its decay channels. Its branching ratios vs. its hypothesized mass are shown in Fig. 1.9. Both the CMS collaboration and ATLAS have done a tremendous work in trying to determine its existence. During the last update both experiments have seen an excess of 5σ with respect to a Higgs-less model, at about 125 GeV mass in the two main channels: H γγ and H VV 4l, where V → → → indicates both W and Z, and l stands for lepton (Fig. 1.10).

In order to determine if this is the long awaited for Higgs boson, further studies must be made. And to do so, it is essential to understand the back- ground to this processes. One of them being the topic of this thesis, the Z+jets production. 1.2.2 Overview of the CMS detector 15

Figure 1.10: A candidate event for the Higgs to ZZ decay with invariant mass of 126.9 MeV.

1.2.2 Overview of the CMS detector

The CMS detector is of a cylindrical shape, measuring 21.6 m in length, with a diameter of 14.6 m and a weight of 12500 t (Fig. 1.11). The structure of the detector is in principle very similar to other particle physics detectors - a cylinder constructed along the beam pipe at the interaction point, with vari- ous sub-parts needed for the detection of specific particles. It can be divided into two separate parts: a central region, called the barrel for its shape, and two lateral regions called endcaps, forming the basis of the cylinder. This structure enables for very high detector hermeticity. It should be noted that the sub-detectors installed on the endcaps have lesser granularity than the ones in the barrel, but are more resistant to radiation. This coincides with

Figure 1.11: An expanded 3D view of the CMS detector. 16 The Physics of LHC and the CMS detector

Figure 1.12: A cross cut of the CMS detector with subdetectors.

the physics goals, since most of the interesting events happen at higher pT values, covered with the barrel acceptance. As the name implies, the structure of the CMS is very compact and uses a large magnetic field for bending and detecting muons and other charged particles, produced by a large bore superconducting solenoid magnet. En- closed in the magnet structure are an all-silicon pixel and strip detector, a -tungstate scintillating crystals electromagnetic calorimeter (ECAL) and a brass-scintillating sampling hadron calorimeter (HCAL). Outside of HCAL lie the four stations of the muon detectors, covering most of the 4π solid angle, serving also as the flux return yoke. Forward sampling calorime- ters extend the pseudo rapidity coverage to high values ( η 5). | |. From inside out the structure of the detector can be summarised as follows (Fig. 1.12):

Tracker position is at a radius r < 1.2 m, and η < 2.5 and it con- • | | sists of a pixel vertex detector surrounded by the Silicon Strip Tracker (SST) to reconstruct charged particle tracks and locate primary and secondary vertices. It provides a high charged-particle momentum res- olution and reconstruction efficiency, efficient triggering and offline tagging of the b-jets, and is the closest to the interaction region;

ECAL position is at a radius 1.2 < r < 1.8 m, and η < 3. The • | | Electromagnetic CALorimeter is is made of lead tungstate PbWO4 scintillating crystals and a forward preshower detector. It measures electrons and photon energy. It has a high electromagnetic energy 1.2.3 Tracker 17

resolution, good diphoton and dielectron mass resolution ( 1% at ' 100 GeV), wide geometric coverage, π0 rejection and efficient photon and lepton isolation at high luminosities;

HCAL is positioned at a radius 1.8 < r < 2.9 m, and η < 5. The • | | Hadron CALorimeter’s function is to measure the jet position and transverse energy. Similar to the ECAL, it is extended in the forward region 3 < η < 5 with a very forward hadron calorimeter (HF). It | | provides a high missing transverse energy and dijet-mass resolution:

Magnet is positioned at a radius 2.9 < r < 3.8 m, and η < 1.5 is • | | large enough to accommodate the calorimeters and the inner tracker, providing a longitudinal magnetic field of 4 T. Its structure is a super- conducting solenoid;

Muon system is positioned at a radius 4 < r < 7.4 m, and η < 2.4, • | | merged inside the magnet return yoke, used to detect and reconstruct muon tracks and is composed of Drift Tubes (DT) in the barrels, Catode Strip Chambers (CSC) in the endcaps and complemented over- all by Resistive Plate Chambers (RPC) in the η < 2.1. They yield | | a high muon identification and momentum resolution over a wide range of momenta and angles, good dimuon mass resolution ( 1% ' at 100 GeV) and the ability to determine unambiguously the electric charge of the muons with momentum p < 1 TeV. The RPC are mainly used as a trigger. A more detailed description of the detector sub-parts follows below, focusing on their role in the detection of the various particles. It should be noted that the structure of the detector was strongly influenced by the choice of the magnet which was selected in order to have a large bending power, to enable precise measurement of high-momentum muons and other charged particles. For the sake of clarity the following description will be in the same order as above.

1.2.3 Tracker The silicon tracker is the innermost detector of CMS and serves for a large number of purposes [5] [6]. It is situated in the region η < 2.5, r < 1.2 m | | and z < 2.7 m and is made of silicon semiconductor detectors, effectively | | being the largest ever designed Si detector with a surface of 198 m2. As mentioned before, the function of the tracker is reconstructing vertices and charged particles tracks. It is known that the charged track density around the interaction point is, in the absence of a magnetic field proportional to the inverse of the squared distance. Under the 4 T magnetic field produced by the solenoid, the density decreases more gradually than 1/r near the origin, while farther away in the radial direction the density decreases faster than predicted by 1/r. To work in the LHC regime of high pile-up the tracker must have two properties: 18 The Physics of LHC and the CMS detector

Figure 1.13: A visualization of the CMS tracking detector. low cell occupancy and large hit redundancy. Furthermore the material of the tracker should not degrade the particles energy too much, because that would hamper the functioning of the ECAL, in particular, the detection of the Higgs boson in the channel H γγ. This last request limits the total → quantity and type of active material in the detector, and the output cables layout. It makes sense then to structure the Si detector into an inner silicon pixel detector, surrounded by several layers of silicon microstrip detectors of dif- ferent size and pitch between the strips (Fig. 1.13). To obtain the low occu- pancy the chosen detectors have high granularity (especially the ones closer to the interaction point, because they receive all the direct radiation hits) and fast primary charge collection (thin detectors and overdepleating the silicon bulks).

Silicon pixel detector

The Silicon pixel detector is the closest to the interaction point with r < 10 cm and similarly to the CMS detector is of a cylindrical shape, with three barrel layers and two disks for each endcap. The layers are composed of mod- ular detector units, each one consisting of a thin silicon sensor segmented with n+-pixels on a n-type substrate. Since the main goal of the pixel system is to reconstruct the positions of the charged particles tracks, both rφ and z coordinates have to be measured as precisely as possible. Choosing square pixel shapes optimizes the resolution in both coordinates simultaneously. In this region the particle flux reaches frequencies up to 107 cm−2 s−1 requiring great granularity and spatial res- olution in order to distinguish main vertices from pile-up ones. The pixel size dimension was chosen to be 100 µm 150 µm, and the de- × tector barrel thickness varies between 250 and 200 µm. 1.2.3 Tracker 19

Figure 1.14: The CMS microstrip silicon tracker.

Microstrip silicon tracker

The silicon tracker is situated at a radius 20 < r < 60 cm. The modules in the barrel follow a cylindrical structure, while in the endcaps they are arranged as disks and supported by carbon fibre wheels (Fig. 1.14). The silicon tracker can be further divided into four regions:

Tracker inner barrel (TIB) is the innermost part of the detector with • z < 65 cm, and is composed of four layers of micro-strips. Their width | | varies between 80 and 120 µm, while the ratio strip pitch to strip width remains equal to 0.25 for all strip types [7]. The obtained resolution is between 23 and 34 µm in (rφ) and around 230 µm along the z-axis;

Tracker outer barrel (TOB), as the name implies, is the external part • of the barrel, covering the same length along the z-axis as the TIB. It is made of six layers of micro-strips, which are not the same as in the TIB. The resolution then varies between 35 and 52 µm in (rφ) while being around 530 µm along the z-axis;

the last two components are similar, being the tracker endcaps (TEC) • and tracker inner discs (TID). Their function is to complement the detector barrel at both ends, lying in the plane perpendicular to the beam. Each endcap is composed of 10 discs extend the pseudorapidity coverage in the region 1.2 η 2.5. Beyond η 2.5 the radiation ≤ | |≤ | |≈ level and track density become too high for the silicon micro-strip detectors.

There are about 5.4 106 channels in the CMS silicon tracker, equally dis- × tributed between the barrel and endcaps, with a total active surface close to 75 m2 covering on the whole η < 2.4. In order to operate with the large | | luminosity available at the LHC, the associated electronics must have a very fast response time. 20 The Physics of LHC and the CMS detector

Tracking As already stated, the main goal of the tracker is to reconstruct charged particle tracks and vertices, the process in detail will be discussed here. The main algorithm implemented by the CMS to reconstruct the tracks is the Kalman Filter [8], which is a recursive formulation of the least-squares method of fitting a set of measurements to a track model [9]. The combinatorial Kalman Filter (CKF) is a recursive algorithm, starting from an initial trajectory estimate, using pattern recognition and track fit- ting. At best one could expect 16 hits from the Tracker on the whole:

3 hits in the pixel tracker, with 3 coordinates each and spatial resolu- • tion of 10 µm;

14 hits in the micro-strip detector, with 2 coordinates each and spatial • resolution of 30 µm.

In CMS, the track reconstruction is done using the information provided by the tracker to evaluate all possible combinations of hits and afterwards determining the set of hits with the highest probability of constituting a track. The initial estimate of the trajectory, called seed, consists of a track segment made using a set of compatible signals in the pixel layers close to the beam pipe. The CKF algorithm then proceeds by alternating between propagation and update steps. The propagation step consists in projecting the current trajectory to the next outward tracker layer, while propagating the covari- ance matrix and taking into account energy loss and multiple scattering. In the update step, the propagated trajectory at the next layer is combined with the observation. This procedure generates multiple possible tracks (trees) starting from each seed. The number of possible trajectories is limited by the number of miss- ing hits and the value of their χ2. Since the track is reconstructed step by step, only the last layer has the preceding information of the whole track candidate, while all others have only partial information. Because of this (mis)information, the remaining track candidates are passed through a Kalman smoothing algorithm to obtain the best possible estimates at each pixel detector.

1.2.4 Electromagnetic calorimeter The function of the electromagnetic calorimeter is to precisely measure the energy of the electrons and photons rendering it an important part of the overall CMS detector for a various number of physics goals. In particular, the design of the CMS ECAL has been established in view of the possible observation of a light Higgs boson mH < 140 GeV to a pair of γ. In this light mass range, the intrinsic Higgs width ΓH is less than 100 MeV, the invariant γγ mass resolution is dominated by the experimental resolution, 1.2.4 Electromagnetic calorimeter 21

Figure 1.15: A schematic view of the CMS electromagnetic calorimeter. which should be at most of the order of 1%, to enhance a possible signal in this channel. Similar to the other detectors, the ECAL structure is divided into a bar- rel region η < 1.48 (EB) and a forward region to cover up to η < 3.0 | | | | (EE). ECAL is a homogeneous and hermetic electromagnetic calorimeter with finely segmented crystals of lead tungstate (PbWO4). This material is a radiation resistant, chemically inert scintillator. It has all the necessary characteristics to operate in the LHC environment of high luminosity and energy. Another requirement posed by the LHC is due to the 25 ns time interval between each bunch crossing. Lead tungstate has a shorter scintil- lation decay time, τ = 10 ns, which allows to collect 85% of the light during the time interval between collisions. On the same note, the Moli`ereradius is sufficiently small, 21.9 mm and radiation length X0 = 8.9 mm, enabling the calorimeter to have a compact structure . The barrel is composed of 61200 crystals, with granularity 360-fold in φ and (2 85)-fold in η. The × ECAL has two endcaps, each divided into two halves called Dees. Each Dee contains 3662 crystals (Fig. 1.15).

ECAL structure

As mentioned before, the whole of ECAL is composed of lead tungstate crystals (Fig. 1.16). They retain a trapezoidal lateral shape with squared front faces and are slightly different in the two ECAL regions. In the barrel, their spatial length amounts to 230 mm, while radiation length X = 25.8 X0 (Fig. 1.17). The surface of the front face is a 22 22 mm2 and granularity × ∆η ∆φ = 0.0175 0.0175, designed to be high enough to efficiently dis- × × tinguish between photons and neutral pions. In the endcaps the crystals are shorter, measuring 220 mm, with conse- quently smaller radiation length, X = 24.7 X0. They also have a larger front section, 28.8 26.8 mm2 and a larger granularity for higher values of × 22 The Physics of LHC and the CMS detector

Figure 1.17: A simulation of the shower Figure 1.16: A lead tungstate crystal. developing inside the crystal.

η . The highest granularity of ∆η ∆φ = 0.05 0.05 is achieved in the | | × × forward most crystals [12]. Another important aspect is the crystals light yield which heavily depends on the temperature. For this reason a cooling system is used to dissipate the heat produced by the electronics and keep the detector at a temperature of about 18 ◦C. The transparency of the crystals is also negatively dependant on the amount of radiation absorbed. And since the measured energy is pro- portional to the total scintillation light produced by the showering particle, transparency stability is critical for this type of measurements. Therefore, the transparency is constantly being measured via a laser calibration sys- tem during the data taking. In nominal conditions the PbWO4 crystals have good transparency (around 70%) in the main emission band of scintillation light (420 430 nm). −

Figure 1.18: The CMS ECAL barrel Figure 1.19: The CMS ECAL barrel from the inside. from the outside.

The ECAL barrel (Fig. 1.18) retains a modular structure. The crystals are contained in a thin-walled (0.1 mm) alveolar structure (submodule). Each submodule contains 5 2 crystals, each 0.35 mm apart from the next. The × alveolar wall is made of an aluminium layer, facing the crystals, and two layers of glass fibre-epoxy resin. To prevent aluminium oxidation, a special coating is applied to it. 1.2.4 Electromagnetic calorimeter 23

Figure 1.20: Half of the ECAL endcap, or dee.

The submodules are then assembled together, with 0.5 mm distance between them, to form the larger structure of a module. The submodules are held in partial cantilever by an aluminium grid, which supports their weight from the rear. The modules are of different types, depending on their position in η, each containing from 400 to 500 crystals. And lastly, four modules, separated by 4 mm thick aluminium conical webs, are then assembled in a supermodule, containing 1700 crystals each. The entire CMS ECAL barrel is composed of 36 supermodules. The crystals are tilted at an angle of 3 degrees in respect to the nominal interaction point, both in φ and η, to avoid the particles trajectory to coincide with the cracks in the detector.

The two ECAL endcaps are constructed from four half-disks, or dees (Fig. 1.20), each consisting of 3662 crystals. Each dee contains 138 standard 5 5 super- × crystal units and 18 specially shaped supercrystals located at the inner and outer radii. The endcaps are located at a 1.3 m distance from the nominal interaction point along the beam line and the crystals composing them are tilted by an angle between 2 and 8 degrees (for the same reason as in the barrel). Furthermore, a preshower detector is placed on the front face of the endcap discs. The ECAL preshower detector is a sampling calorimeter consisting of multiple sections of two layers of silicon strip detector and two discs of lead absorber. Its function is to identify the neutral pions in the forward region, improve the electron discrimination against minimum ionizing particles and improve the determination of the spatial position of the electrons and pho- tons. The detector acceptance is 1.653 < η < 2.6, and its length is about | | two radiation lengths. 24 The Physics of LHC and the CMS detector

Figure 1.21: The CMS HCAL just before being inserted.

Energy resolution The electromagnetic energy resolution of ECAL can be parametrised as a function of the incident electron (photon) energy E [GeV] in terms of: σ a b E = + + c E √E E where:

a represents the stochastic term, due to the event to event fluctuations • in lateral shower containment, photo-statistics and photodetector gain;

b represents the noise term, depending on the level of electronic noise • and event pile-up;

c represents a constant term which depends on the non-uniformity • of the longitudinal light collection, leaking of energy from the rear face of the crystal and the accuracy of the detector inner-calibration constants.

The constant term dominates the resolution at high energies and is expected to be 0.3% in the barrel and 1.0% in the endcaps. Tests made with beam electrons with energies between 20 and 250 GeV have shown that the elec- tromagnetic energy resolution and noise performance of the ECAL suits the design goal of the detector [11], [12].

1.2.5 Hadronic calorimeter The hadronic calorimeter HCAL, depicted in Fig. 1.21 has a similar func- tion to ECAL, measuring the energy and direction of jets, their transverse miss energy ET and the missing transverse energy ET . Since it has to contain a hadronic shower, it will have to be larger than ECAL and similarly have 1.2.6 Superconducting solenoidal magnet 25 high hermeticity. One limitation derives from its position inside the magnet, disabling the possibility of using a ferromagnetic material. Similarities with ECAL continue when looking at the body of the HCAL: it is composed of a central cylindrical region, called barrel (HB) and two endcaps (HE) being the bases of the cylinder. This part of the calorimeter covers a range up to η 3. This part of HCAL is positioned as the last layer of detectors before | |≤ the solenoidal magnet. There are two additional calorimeters, one being the Outer Hadronic Calorime- ter (HO), and the other a Forward Calorimeter (HF). The HO is placed just outside the magnet (described in the next section), to ensure enough sam- pling depth in the barrel region, while the HF is placed at a distance of 11.2 m from the interaction point, enabling the extension in pseudorapidity up to η 5.2. HCAL is a sampling calorimeter, which means that it is | |≤ made of alternate layers of absorber (for the particle to develop the hadron shower) and scintillator (to detect and measure the light from the shower and consequently energy).

The HB is made of 5 cm thick layers of brass plate absorbers alternated with 3.7 mm thick active layers of plastic scintillators. The number of layers varies throughout the detector, but is always sufficient to ensure between 5 and 10 radiation lengths. The granularity ∆η ∆φ = 0.087 0.087 is fine × × enough to allow for an efficient di-jet separation. Furthermore, it is possible to combine the information coming from the ECAL trigger towers (5 5 × crystals) due to same granularity availible in HB, to obtain the so called calotower (calorimeter tower). The HE uses the same brass absorber as HB, with thickness of each layer equal to 7.9 cm. Between the absorber layers, 70 000 plastic scintillator plates are used. The granularity is different in two different spatial regions. Namely, for η < 1.6 it yields 0.087 0.087 and | | × for η > 1.6 the granularity is 0.17 0.17. | | × The HO takes advantage of the solenoid coil using it as an additional ab- sorber equal to 1.4/sinθ interaction lengths, and two scintillator layers with the same granularity as HB. Finally, the HF had to be made out of different materials, due to the quantity of radiation in the forward region (on average, 760 GeV per proton-proton collision is deposited in the HF, compared to the 100 GeV for the rest of the detector regions). The two HF use quartz fibres as an active material, embed- ded in a 5 mm steel absorber. The output signal of the fibres is Cherenkov radiation, then collected by photomultipliers.

1.2.6 Superconducting solenoidal magnet

The magnet has been chosen to have enough bending power to measure the curvature of charged particles. The CMS collaboration opted for a single magnet, with greater magnetic field, resulting in a more compact overall design of the detector. It measures 12.5 m in length, with inner diameter of 26 The Physics of LHC and the CMS detector

Figure 1.23: A depiction of the Figure 1.22: The solenoidal magnet. solenoidal magnet.

5.9 m generating a uniform magnetic field of 4 T, and is shown in the Fig. 1.22 and 1.23. This makes it the largest and most powerful superconducting solenoidal magnet ever constructed. A good dimensional ratio (length/radius) of the solenoid, accompanied by a high magnetic field provide the necessary bending power for precision charged particle tracking and efficient muon detection and measurement up to η < 2.5. Using a single magnet has an added bonus of simplifying the | | design of the muon detectors. As previously explained, the solenoid contains three other parts of the detector, the Tracker, ECAL and HCAL. The mag- netic flux is returned via a 1.8 m thick saturated iron yoke (with a return field of 1.8 T) instrumented with four layers of muon chambers. The magnet conductor is divided into three parts: a central flat supercon- ducting cable, a high purity aluminium stabiliser and an external aluminium- alloy to reinforce the sheath. The superconducting cable is a Rutherford type with 40 NiTb strands surrounded by pure Al as a thermal stabilizer, and is kept at a temperature of 1.9 K by a liquid helium cryogenic system. The current circulating the magnet is around 20 kA, with a stored energy of 2.7 GJ.

1.2.7 Muon detectors Muons with transverse momentum higher than 200 GeV/c are impossi- ble to detect with the tracker alone, because their trajectory is essentially rectilinear and the tracker is therefore unable to extract the needed cur- vature/radius information with the desired precision. The solution to this problem is to construct an additional specific detector to track muons, mak- ing use of the 1.8 T field escaping from the solenoid magnet (Fig. 1.24). To make full use of the magnetic return flux, the muon system is placed outside the magnet, embedded into the return yoke. The main goal of this system is of course identifying muons and with the information from the tracker measure their transverse momentum. High transverse momentum muons are a key clear signature in multiple interesting physics processes, making the muon detectors a crucial part in the CMS trigger system. The muon system is composed essentially of three different independent sub- 1.2.7 Muon detectors 27

Figure 1.24: The CMS muon system. systems, determined as usual by the physics. Namely, the track occupancy in the forward region is > 100 Hz/cm2 which is much higher than < 10 Hz/cm2 expected in the central region. Also the residual magnetic field is higher in the forward region. Because of this two reasons drift tubes detectors (DT) are installed in the central region, while in the forward region cathode strip chambers (CSC) are installed. These two systems cover the region of η < | | 2.4 and retain a multi-layer structure in order to efficiently reject single hits produced by short-range particles. To complement the two subsystems in the region η < 2.1 redundancy is | | provided by the resistive plate chambers (RPC), which have a limited spa- tial resolution but, on the other hand, faster response and excellent time resolution, less than 3 ns. They can also be finely segmented because they don’t need a costly read-out system.

Drift tubes are composed of parallel aluminium plates that are in- • sulated from the perpendicular “I” shaped aluminium cathodes by a polycarbonate plastic profile. The anodes are formed of 50 µm di- ameter stainless steel and placed between the cathodes. The internal volume is filled by a mixture of 80% Ar and 20% CO2 at atmospheric pressure. This gas was chosen because it is non-flammable and can be safely operated underground in large quantities, as required in CMS. The resolution given by the drift tubes is 100 µm both in rφ and z. ∼ Cathode strip chambers are composed of two perpendicular cathode • planes, segmented into strips. Parallel to the strips and between the planes is located an array of anode wires. Gaps are filled with a mixture 28 The Physics of LHC and the CMS detector

Figure 1.25: The CMS data acquisition flow with the two triggers.

of 30% Ar, 50% CO2 and 20% CF4. The φ coordinate can be measured with a precision of 50 µm, due to the interpolation of the signal of neighbouring strips.

Resistive plate chambers are made of planes of a phenolic resin (bake- • lite) with a bulk resistivity of 1010 1011 Ωcm, separated from alu- − minium strips by an insulating film. The gaps in between are filled with a non-flammable gas, obtained as a mixture of 94.5% freon (C2H2F4) and 4.5% isobutane (C4H10), which operates in avalanche mode to sustain high rates.

1.2.8 Trigger and Data acquisition system

For the CMS experiment, a 2-level trigger system has been chosen: the first level trigger (L1) and the high level trigger (HLT). The amount of data taken has to be reduced to a frequency that is manageable in terms of time needed to write the information on a hard drive [13] [14]. The data flow is presented in Fig. 1.25.

Level-1 trigger

This trigger runs on dedicated processors and accesses rough level granular- ity information from calorimetry and muon system. The task of the L1 is to quickly discriminate uninteresting events in the order of 3.2 µs for each bunch crossing. It has to reduce the data flow from 40 MHz to 100 kHz. The Level-1 trigger selects events on the basis of calorimetry and muon system information, identifying electrons, muons, photons, jets and missing transverse energy (Fig. 1.26). This selection is processed by fast hardware 1.2.8 Trigger and Data acquisition system 29

Figure 1.26: The internal structure of the level-1 trigger. systems. The two systems, calorimetry and muon system, are analysed in- dependently and then combine the information producing the output sent to the Data Acquisition System. During the time it takes the L1 trigger to accept or reject an event, the data is temporally put into a pipeline waiting to be saved or deleted. Because of the short time requirement, the decisions made by L1 are simple and only based on a few logic operations. For exam- ple it can search for some objects like electrons, photons, muons, or require a minimum value of missing transverse energy. Only partial information is used when making this selection. As an example, the ECAL information is not used at full granularity but only at trigger tower’s level.

High Level Trigger

The high level trigger’s function is to reduce the data flow from the 100 kHz remaining from the L1 selection to a manageable 100 Hz. The HLT runs completely on commercial processors and performs reconstruction by using the information from all sub-detectors. The data from the subdetectors is collected by a builder unit that assigns them to a switching network that dispatches events to the processor farm. The CMS switching network has a bandwidth of 1 Tbit/s. This simple design has both upsides and downsides. On one hand, the limi- tation is in the maximum bandwidth available and number of processors. On the other hand, this allows for easy upgrading or replacing the processors with newer and faster ones once they are available. Furthermore, the algo- rithm implementation is purely software, making it easy to modify without having to address the hardware. The HLT is run event-by-event on a single processor and the time available is in the order of 300 ms. This means that it is not possible to run a complex analysis program on each event. To over- 30 The Physics of LHC and the CMS detector come this obstacle, the LHT was virtually divided into three levels. Most events (uninteresting) fail a few simple requirements, while events that seem interesting are processed using more information:

Level 2: uses only muon and calorimetry information; • Level 2.5: additionally uses the pixel information; • Level 3: makes use of the full information from all the tracking detec- • tors.

The rate of events is then reduced by a factor 103 from the L1 trigger. These 100 events/s are going to be written to mass storage, saturating the CMS bandwidth capabilities. Chapter 2

Theoretical basis for the Z + jets production

In this chapter the basic process involving the production of Z and ac- companying hadronic jets will be discussed. The logical starting point is the underlying theory, the Standard Model (SM) of electroweak and strong inter- actions with its fundamental building blocks and SU(3)C SU(2)L U(1)Y ⊗ ⊗ gauge symmetry. After that, the description of the associated production of the Z bosons with hadronic jets in proton-proton collisions will be outlined. The study of this process is interesting for several reasons. Firstly, the pro- duction of the Z boson is, at tree level, governed by the electroweak coupling constant and provides a test of the SM physics and the parton model for “hard” hadronic interactions. Furthermore, the larger contributions to the Z production cross-section arising at the Next-to-leading order (NLO) in QCD comes from virtual or real gluon radiation. This results in a complex signature with hadronic jets in addition to the decay products of the Z bo- son in the final state. Moreover the final states of the Z with jets serve as background to some predicted decay states of the Higgs boson. For this reason, a section of this chapter is dedicated to briefly describing the Higgs mechanism. The pro- duction of Z with jets may also take part in the background for searches of physics beyond the SM, but this is outside the scope of the thesis.

2.1 Basis of the Standard Model

The fundamental assumption of the Standard Model is that matter is con- stituted of spin- 1 fermions, subdivided into leptons and quarks, and their 2 corresponding anti-particles [15]. The leptons and quarks are arranged into three different generations as in Fig. 2.1. Denoting the quarks in general by q and anti-quarks byq ¯, the matter is composed of leptons (elementary particles) and hadrons, which further separate into mesons (structured as qq¯) and baryons (structured as qqq). 32 Theoretical basis for the Z + jets production

Figure 2.1: The elementary particles of the SM. Note that the Higgs boson is missing in this representation.

The elementary particles interact with each other via the three fundamental forces, which are mediated by spin-1 gauge bosons (Fig. 2.1):

Electromagnetic force mediated by the photon γ; •

Weak force mediated by the bosons Z0 and W ±; •

Strong force mediated by the gluons g. •

Here lies proof that the SM cannot constitute a complete theory explaining everything, for it does not include a (spin-2) graviton, responsible for gravi- tational interaction. On the other hand, however, the strength of the gravi- tational coupling constant is completely negligible in respect to other forces, rendering a gravitation-less description in agreement within the statistical uncertainties of the experimental data. As a quantitative comparison, the relative strength of the four fundamental forces calculated for two protons with a momentum transfer between the charges equal to 1 GeV is shown in the Tab. 2.1. In Fig. 2.2 the interactions between the SM particles are shown. 2.1 Basis of the Standard Model 33

Interaction strong electromagnetic weak gravity Relative strength 1 1.4 10−2 2.2 10−6 1.2 10−38 × × × Table 2.1: The relative strength of the four fundamental interactions is shown. The respective values were calculated for two protons, with a mo- mentum transfer between the charges equal to 1 GeV.

Figure 2.2: The interactions of the SM particles. The lines show the possible elementary vertices of interaction. 34 Theoretical basis for the Z + jets production

As mentioned in the introduction to this chapter, the elementary particles in- teract with each other via a Lagrangian invariant under SU(3)C SU(2)L ⊗ ⊗ U(1)Y . The invariance of the Lagrangian under various sub-parts of the sym- metry - SU(2)L U(1)Y and SU(3)C - results in a manifestation of different ⊗ forces, which will be explained in the following sections. However, since the fermion and boson masses are not included, the Higgs mechanism needs to be invoked and the resulting Standard Model Lagrangian is

1 µν = FµνF + Ψ(i¯ D/ m)Ψ L − 4 − 1 a µν X G G + q¯j(iD/jk Mjk)qk − 4 µν a − j,k † µ + DµΦ D Φ V (Φ) , −

µν where the first term (FµνF ) represents the photons interactions, the sec- ond term (Ψ(¯ iD/ m)Ψ) is the term for a Dirac fermions and their interactions − a µν with W and Z bosons. The third term (GµνGa ) represents the interaction P of the gluons, the fourth term ( j,k q¯j(iD/jk Mjk)qk) represents the interac- − † µ tion between quarks and gluons and the last two terms (DµΦ D Φ V (Φ)) − are the Higgs interactions.

2.1.1 Electroweak interactions

The SM of the Electroweak interactions (unification of the Electromagnetic and Weak interactions) is a quantum field theory invariant under local SU(2)L U(1)Y gauge transformations. The generator of SU(2)L is the ⊗ weak isospin, T while the weak hypercharge Y is introduced as a genera- tor of the U(1)Y defined as Q = T3 + Y/2, where Q is the electric charge and T3 the third component of the isospin. The symmetry arising from this invariance generates interactions mediated by four gauge bosons: the pho- ton γ, two massive charged bosons, W − and W + and a massive neutral boson Z0 [15]. The masses of the bosons cannot be directly inserted in the Lagrangian as a consequence of this symmetry. However, the spontaneous breaking of this symmetry generates the masses of the bosons through the so called Higgs mechanism. The legacy of this mechanism is the existence of a neutral, spinless Higgs particle whose mass is not fixed by theory, but may have been found at the LHC. Similarly, the masses of the fermions cannot be directly inserted in the La- grangian. The Dirac spinors representing thee fermions are left-handed (L) and right-handed (R) and are defined as ψL,R = 1/2(1 γ5)ψ (eigenstates ∓ of the chirality γ5). In the zero mass limit they would be eigenstates of he- licity. The three quark families are defined as doublets and singlets of the symmetry as QL and QR. The first generation, for example, is: 2.1.2 Strong interactions 35

 u  U = (u) U = R R L d D d L R = ( )R where the pair in a generation constitutes a SU(2)L-doublet and the two R are SU(2)L-singlets. Similarly for the leptons, where only one R-singlet exist:

 l  L = L = (l) L ν R R l L with l = (e, µ, τ). Here the right-handed neutrino (νl)R singlet does not exist, because neutrinos are considered massless in the SM, and from experiments measured to have negative helicity, following they are purely left-handed. Again, the same Higgs mechanism responsible for giving masses to the gauge bosons also yields masses to the fermions, while leaving neutrinos massless.

2.1.2 Strong interactions The Lagrangian of the strong interactions among quarks is invariant under the SU(3)C gauge transformation, where C stands for color. This generates eight gauge fields mediators of the strong force, the gluons, which are mass- less spin-1 vector bosons. The conserved charge associated with the strong interactions is called colour charge and is covered by the Quantum Chro- modynamic theory (QCD). The colour charge is a property of the quarks, while the leptons remain colorless, meaning that they do not interact via the strong force. The gluons contain colour charge and can interact both with each other or with quarks (Fig. 2.2). The hadrons, bound state of quark and/or gluons, are colour singlets. The value of the QCD strong coupling constant decreases logarithmically with increased momentum transfer. This effect is called asymptotic free- dom and justifies the use of the perturbation theory when describing the so called “hard processes” [17] among quarks and gluons (great momentum exchanges, or equivalently short distances). Another peculiarity of the strong force is the value of the QCD coupling con- stant, which is much larger than unity at greater distances (or equivalently small momentum exchanges - in the so-called soft processes), that holds quarks strongly bound together in a phenomenon known as the confine- ment [16]. This is the reason why no quarks or gluons have been observed free in nature. The quarks that break the confinement cannot survive as free particles because of the postulated color confinement and therefore in their propagation they undergo two processes called the fragmentation and hadronization (Fig. 2.3). During these processes the hadrons are formed from quarks and gluons, and organize themselves in high energy jets, measurable as energy flows at the particle detectors. In a jet the hadrons are collimated covering a narrow area in the η and φ coordinates of the detector. With the informations coming from the jet structure, it is possible to determine the “elementary”, underlying, quark or gluon predecessor: the sum of energies 36 Theoretical basis for the Z + jets production

Figure 2.3: A proton-proton collision sketch. of particles in the cone gives the energy of the original parton, and the axis of the cone gives the parton direction. The hadronization process is closely connected to the strong interaction. Thus measuring the scale at which the hadronization starts, at the order of ΛQCD = 200 MeV, gives a lower limit of the applicability of the QCD per- turbation theory. Because of the non-perturbative structure, the dynamics of the process are not yet fully understood and can only be approximately described as a perturbative expansion in terms of the strong coupling con- stant αs. For this reason the hadronization is modelled and parametrized in a number of phenomenological approaches trying to reproduce the data as closely as possible. One of them is the Lund string model, implemented by Pythia, explained in Sec. 3.5.

2.1.3 Description of a proton-proton collision Due to the composite structure of protons at high energy in terms of quarks and gluons, the proton-proton collisions are not simple to describe. In fact, for each event the structure might be as in Fig. 2.3. What we can physically detect are only the initial protons and the outcoming hadrons. It is assumed 2.2 Z + jets associated production 37 that the base of the collision is composed by the “main” elementary quark or gluon interaction (the interaction with the highest pT exchange), but other partons from the same two protons may interact as well, forming what is called Multiple Parton Interactions (MPI). Furthermore, additional quarks or gluons can be radiated, as part of the Initial State Radiation (ISR), or Fi- nal State Radiation (FSR). It is important to note that the only assumption made here is that the main interaction has the highest pT exchange amongst all interactions. However, this does not exclude that other interactions due to the MPI might still be “hard” interactions. The remaining protonic parts, called Beam Remnants (BR), proceed almost collinear to the direction of the beams. All the processes which are not part of the main interaction are mutually described as Underlying Event (UE). The jets coming from the underlying event generally contain low values of pT and thus can be efficiently separated from the main process by applying some kinematic cuts (requiring a minimum value of pT during the jets re- construction phase for example).

Additionally there is another type of process shrouding the main interac- tion due to the multiple proton-proton collisions for each bunch crossing. As explained in the previous chapters, with a number of protons per bunch of approximately 1011, the probability to have more than one interaction for each bunch crossing is almost certain. During the 2011 data taking period, in average there were about 20 events/bunch crossing. This kind of contam- ination to the main process are called pile-up (PU) interactions. Charged particles coming from different collisions can be individuated by coming from the different vertexes, reconstructed in the tracker (in the limit of vertex re- construction resolution). However, identifying neutral particles coming from the PU is far from trivial, using only the information available in the form of collected energy in ECAL (photons) or HCAL (neutral hadrons).

2.2 Z + jets associated production

In this section the approach used to calculate the cross-section for the pro- duction of a vector boson Z in association with hadronic jets is described.

2.2.1 Drell-Yan process First consider proton-proton collisions, where a quark and an anti-quark annihilate resulting in a production of a lepton pair (l+l−) having a large 2 2 invariant mass M = (pl+ + pl− ) , called the Drell-Yan process (Fig. 2.4):

pp l+l− + X, → where in analogy to deep inelastic scattering, X denotes a generic hadronic final state consistent with energy and momentum conservation, that will not 38 Theoretical basis for the Z + jets production

Figure 2.4: The Drell-Yan process pp → qq¯ → l+l− + X. be explicitly measured (i.e. inclusive dilepton production). Denoting the two protons momentum with P1 and P2 respectively, the center of mass energy 2 2 will then be s = ECM = (P1 + P2) . To obtain the inclusive cross-section σpp→l+l−+X , consider all possible sub- process cross-sectionsσ ˆqq¯→l+l− given by the quark-antiquark combinations available in the protons. It was postulated by Drell and Yan that once these subprocess cross-sections are calculated, they have to be summed together by weighting them with the parton density functions (PDFs) denoted by the shaded circles in Fig. 2.4. If the two quarks carry momentum p1 = x1 P1 and p2 = x2 P2 as in Fig. 2.4, then symbolically the cross-section can be written as a special case of the more general factorization theorem [18]

Z DY X σ = dx1dx2fq(x1)fq¯(x2)ˆσqq¯→l+l− , (2.1) q where the values of the PDFs fq(x1) and fq¯(x2) are measured in a variety of experiments, including the LHC ones. Theσ ˆqq¯→l+l− represents the hard part of the interaction, while the softer parts are covered by the PDFs. Note here that this equation does not cover the MPI discussed beforehand. 2 Defining the Mandelstam variable ass ˆ = (p1 + p2) = x1x2s and τ =s/s ˆ , than the eq. 2.1 is formally valid in the limit where quarks are asymptotically “free” at zeroth order in the QCD coupling constant αs. In other words, the equation holds whens ˆ Mproton. This is verified in the present case, where  Mproton 1 GeV, whiles ˆ M 91 GeV (for the Z boson). ' ≥ ' The lowest-order total cross section for quark-antiquark annihilation into a lepton pair via a off mass-shell photon γ∗ in Fig.2.1 is given by: 2.2.1 Drell-Yan process 39

2 4πα 1 2 + − σˆq(p1)¯q(p2)→l l = Qq , (2.2) 3ˆs Nc where the Nc is the number of colours, Qq quark fractional charge and α is the electromagnetic coupling constant. In a general proton-proton collision the incoming quarks and anti-quarks can have a wide range of collision energies √sˆ, so it is appropriate to consider the differential lepton pair mass distribution. Given a lepton invariant mass M, the quark anti-quark process producing it has a differential cross-section:

2 dˆσ 4πα 2 2 2 = 2 Qqδ(ˆs M ) (2.3) dM 3M Nc − Substituting this last equation 2.3 into eq. 2.1 gives the parton model dif- ferential cross section for the Drell-Yan process at leading order (LO): dσDY Z 1 X dˆσ = dx dx f (x )f (x )(q q¯) (qq¯ l+l−) M 2 1 2 q 1 q¯ 2 M 2 d 0 q ↔ × d → " # 4πα2 Z 1 X = dx dx δ(x x s M 2) Q2f (x )f (x ) + (q q¯) M 2N 1 2 1 2 q q 1 q¯ 2 3 c 0 − × q ↔ (2.4)

In order to improve the obtained LO result in Eq. 2.4, perturbative QCD corrections at higher orders in αs must be taken into account, with the calculus becoming longer and more difficult with each added order, and thus not shown here. Already with (αs) the Feynman Diagrams arising O can be separated into two groups: virtual corrections, which as the name implies do not appear visible in the final state, and real corrections, which appear in the form of additional jets in the final state. As an example, the possible (αs) corrections are depicted in Fig. 2.5. O The effect of these corrections is such that in the cross-section expression the PDFs acquire a logarithmic mass dependence, so that equation 2.1 becomes Z DY X 2 2 σ = K dx1dx2fq(x1,M )fq¯(x2,M )ˆσqq¯→l+l− . (2.5) q The mass-dependant PDFs are available from the deep-inelastic scattering measurements, and K represents a constant factor. At the moment, calcu- lations up to the next-to-next-to leading order (NNLO) are available and even some beyond.

Up to now, the calculations shown were done for an intermediate state of γ∗ (Fig. 2.4). To apply them to a production of a Z boson instead (σ(qq¯ Z l+ l−)) it is enough to replace the subprocess cross-section → → of eq. 2.2 with the following: 40 Theoretical basis for the Z + jets production

Figure 2.5: From left column to right, the different possible contributions to the base process, considering O(αs). The two diagrams corresponding to a) depict virtual gluon corrections, the two corresponding to b) depict real gluon corrections and the remaining two under c) depict quark-gluon scattering process together with the correspondingqg ¯ contribution.

+ − σˆ + − =σ ˆqq¯→Z BR(Z l l ) , qq¯→Z→l l · → + − where σqq¯→Z is the production cross-section for Z, and BR(Z l l ) the → branching ratio of its decay to leptons. Given that the decay width of the Z boson is small (ΓZ = 2.5 GeV) compared to its mass (MZ 91 GeV), it is ' sufficient to consider the production of effectively stable particles [16]. The production cross-section can therefore be approximated as if the Z boson mass was on shell:

π 2 2 2 2 σˆqq¯→Z = √2GF M (V + A )δ(ˆs M ) , 3 Z q q − Z 2 2 where Vq and Aq are associated with the vector and axial coupling constants of the neutral current interaction, whose values are predicted by the SM.

2.2.2 Multijet production The total Z production cross-section can be decomposed as a sum of terms of multijets cross sections with increasing order in αs:

σZ = σZ+0jets + σZ+1jet + σZ+2jets + ... where each cross-section can be additionally expanded in orders of αs:

2 σZ+0jets = a0 + a1αs + a2αs + ... 2 σZ+1jet = b1αs + b2αs + ... 2 σZ+2jets = c2αs + ...... 2.2.3 Study of the associated production of Z boson + jets at the LHC 41

The coefficients ai, bj, ck ... in these expansions are in general functions of the jet-definition parameters, for example the cone size used to cluster the hadrons into jets, and transverse momentum, rapidity and separation cuts imposed on jets or clusters. The sum of parameters corresponding to the same order of αs in QCD perturbation theory, namely

a0 = δ0

a1 + b1 = δ1

a2 + b2 + c2 = δ2 ... is, however, independent from the jets parameters and corresponds simply to the perturbative expansion in powers of αs of the total cross-section. The largest contribution to the Z boson + jets cross-section is given by the sum of the very first coefficients of every “exclusive” multijet cross-section ai, bj, ck .... It is possible to calculate these coefficients from the Feynman diagrams of partonic processes xy Z + j1 . . . jn, where x, y and ji are → quarks and gluons. The explicit calculations of these coefficients up to a multiplicity of njets 4 were carried out by Berends and Giele [19]. They ≤ also studied the ratio between the cross-section with n jets and that of n 1 − jets in the final state,

σZ+njets fn(Z) = σZ+(n−1)jets and found out that this ratio should be constant. Therefore fn(Z) can be parametrised as

fn(Z) = α + βnjets . This scaling has been tested in various experiments, for example at the Tevatron [20] and LHC [21].

2.2.3 Study of the associated production of Z boson + jets at the LHC With the current LHC center of mass energy of 7 TeV, the production cross- section of the Z boson is about 1 nb. During 2011 about 5.7 fb−1 of data were collected with the said energy giving an expected number of 106 candidates of Z0. The study of the production of the Z boson + jets is important, as already noted, for various reasons, one of them being the quantitative study of the QCD, because the rate of multijet production is dependant on the strong coupling constant. The kinematic distributions of the jets instead give a possibility to study the underlying scattering matrix elements. Measurements of the production of a Z + n jets suffer from theoretical and experimental uncertainties associated with the definition and counting of jets. On the experimental part, the most dominant uncertainties are the 42 Theoretical basis for the Z + jets production energy response of the detector to a jet, additional energy contributions from the underlying event, background from mis-identified non-electroweak events and jets acceptance. On the theoretical side, the most dominant uncertainties come from the choice of PDFs considered (different parametrisations available in litera- ture), initial and final state radiation, and the non-perturbative evolution of partons into on-mass shell particles that could be detected as jets. These effects have been studied in this thesis work.

2.3 Jets

The jets are collimated flows of hadrons that result from fragmentation of high energy quarks or gluons. They tend to be visually obvious structures when looking at an experimental event display. However, they are impor- tant and complex objects which require a careful treatment in order to be reconstructed unambiguously in a variety of different analyses. A set of rules that projects the particles into jets is commonly known as jet algorithm. The algorithm usually involves one or more parameters, that govern its detailed behaviour. The combination of a jet algorithm and its parameters is called jet definition. It is possible to classify the jet algorithms in two great classes:

Sequential recombination algorithms. • The first step in this group of algorithms is to identify the pair of par- ticles closest in some distance measure. Successively recombine them by summing their four-momenta. Afterwards repeat the previous two steps until a stopping criteria is reached. The various sequential algo- rithms in existence differ mainly in their particular choices of distance measure and stopping criteria;

Cone algorithms. • This group of algorithms, on the other hand, put together particles within specific conical angular regions. For example, what is called a stable cone is made up of particles whose momentum sum coincides with the cone axis. Differences between various cone algorithms es- sentially deal with the strategy taken to search for the stable cones and the procedure used with cases where the same particle is found in multiple stable cones.

Regardles of the algorithm used, the hadronic jets of interest have high p⊥ values, and thus usually a kinematic cut is applied, requiring for example a minimum value p⊥,min in order to be considered. One of the important aspects when dealing with jet algorithms are the in- frared and collinear divergences, i.e. divergences that occur when a parton emits a very “soft” parton (infrared divergence), or when a parton splits into 2.3 Jets 43 two collinear partons (collinear divergence). As per a convention made, the jet clustering algorithm has to remain stable (meaning that the resulting jet does not change drastically) even when these processes occur.

Here only the anti-k⊥ algorithm [22] will be discussed, belonging to the category of the sequential recombination algorithms and currently being implemented by both the CMS and ATLAS experiments at the LHC.

Anti-k⊥ algorithm

The anti-k⊥ algorithm implements a symmetric distance measure, dij, be- tween all pairs of particles i and j,

2 ∆Rij d = min(p−2, p−2) , ij ⊥i ⊥j R2 2 2 where p⊥i is the transverse momentum of particle i, ∆R = (ηi ηj) + ij − (φ2 φ2)2 and R is the jet radius which determines the jet angular reach. i − j The exact value of R is fixed by the user when the algorithm is being im- plemented. Additionally, the anti-k⊥ algorithm involves a distance measure between every particle i and the proton beam

−2 diB = p⊥i .

However, given N particles, there are N(N 1)/2 distances dij to calculate, − and find the smallest amongst them. This consequently means (N 3) opera- O tions involved. Due to the high number of particles in each collision at LHC, the calculation can become lengthy. One of the possible implementations is given, for example, by FastJet [22]. In particular, FastJet makes use of the observation that the smallest pair distance remains the same if one uses the following alternative (non-symmetric) dij distance measure:

2 2 ∆Rij ∆Rij d = p−2 d = p−2 ij ⊥i R2 ji ⊥j R2

For a given i, the smallest of the dij is simply found by choosing the j that 2 minimises the ∆Rij , i.e. by identifying is geometrical nearest neighbour on the y φ cylinder. Without going into too many details, this factorization − enables faster computation time, and instead of (N 3) interacctions, only O N ln N are necessary. The anti-k⊥ algorithm is the default algorithm in various analyses, and in particular, its FastJet implementation is used in the Rivet framework which will be discussed later.

Chapter 3

Monte Carlo Event Generators

3.1 Introduction to Event Generators

When drawing analogies to the real life, there are two kinds of Monte Carlo (MC) programs: generators, i.e. Pythia, MadGraph, Sherpa and others [23], pro- • duce events in which two incoming particles collide and outgoing par- ticles are generated. In this sense the generators play the role of the accelerators, like LHC; simulators, i.e., Geant 4 [25], simulate the interaction of the afore- • mentioned outgoing particles with the matter and therefore play the role of a detector, such as CMS. The work of this thesis focuses on the first type of programs. However it is important to note that in practice the available data comes from the de- tector information and so to compare real events to simulated ones it is necessary to apply some unfolding technique. The unfolding consists in tak- ing the real data, applying various correction efficiencies (that are measured beforehand), and then simulate the effect of the detector and subtract it from the data. Only at this point can the real data can be compared with the simulations. Alternatively, the event simulations can be “folded” with the detector simulations and then compared to the data obtained from the measurements. From this point onwards it will be assumed, unless specified otherwise, that real data means the data acquired from the CMS detector and successively unfolded, while by generated data is meant the product of a MC generator.

The role of the MC generators is to give a detailed description of the final state, so that, ideally, any experimental observable or combination of ob- servables can be predicted and compared with real data. This is important 46 Monte Carlo Event Generators in order to gain a very good understanding of the signal and background processes and to separate them. However, it should always be kept in mind that generators are not perfect, because not all the LHC physics is cur- rently completely understood or based on first principles; some parts of the simulations are in fact based on phenomenological models, possibly with no estimates deriving from theory for the parameters values. Nevertheless quantum mechanics is based on probabilities (e.g. cross-section), which can be simulated with pseudo-random numbers and Monte Carlo methods and this is the reason why MC generators are useful. In order to simulate an event, its structure must be subdivided into various parts as already seen in Fig. 2.3:

initially two protons are accelerated towards each other - to be viewed • as a collection of partons;

a collision of two partons, one from each proton, gives the hard process • of interest, for example qq¯ Z. As explained in the previous chap- → ters, most events will be constituted of uninteresting, soft collisions, or simple elastic scattering;

short-lived “resonances” produced in the hard process (point in case, • the Z boson) and their subsequent decay have to be viewed as part of this process itself, because of the spin-correlations being transferred;

a collision implies accelerated colour and electromagnetic charges, and • therefore bremsstrahlung can occur. Emissions associated to the two incoming partons (before the interaction) are called ISR;

emissions associated with the outgoing partons are called FSR; • each proton is made up of multiple partons, and successively further • parton pairs may collide within one single proton-proton collision, called multiple interactions. However, these further interactions are corrections to higher orders and are thus suppressed;

each of these further collisions may contain its own ISR and FSR; • colliding partons take only part of the initial protons energy. The re- • mains are found in beam debris;

hadronization and fragmentation during which the final partons com- • bine to form hadrons, which can in turn decay.

The MC method allows to consider these steps sequentially, and within each step to define a set of rules that can be used iteratively to construct a more and more complex state, which may be ending in hundreds of particles moving out in different directions. Since each particle has about 10 degrees of freedom (flavour, mass, momentum, production vertex, lifetime, ...), this corresponds to thousands of choices involved for a single event. The goal of 3.2 Matrix Elements based generators 47

MC generators is to have a sufficiently realistic description of these choices, so that both the average behaviour and the fluctuations about this average are well described. Schematically the cross section for a general final state is provided by:

σfinal state = σhard process Ptot,hard process→final state This cross-section has to be summed over all possible combinations leading from the hard process to the final state in question (showering, hadroniza- tion, etc ...), and integrated over the relevant phase-space regions. There are different kinds of MC generators, varying from the general-purpose ones [24] to the more specialized ones. In terms of content, the general- purpose generators are able to produce the final state of an event in terms of outgoing hadrons and other particles measurable by the detector, while the specialized generators are dedicated to better describing a part of the generation process, for example calculating the matrix elements to a higher precision, or higher orders. This means that the specialized generators can- not generate an event as measurable by a detector, and need further pro- cessing. To the first category belong for example Pythia and Sherpa, while Mad- graph belongs in the second category. It has to be noted here that no specialized generators exist to handle hadronization, and therefore general- purpose generators can be used as “plugins” to the specialized ones. For example MadGraph + Pythia. All of the aforementioned generators will be explained in depth in the fol- lowing sections, Pythia in Sec. 3.5, Madgraph in Sec. 3.6 and Sherpa in Sec. 3.8.

3.2 Matrix Elements based generators

Matrix Element (ME) are calculated starting from the Feynman rules. Af- terwards they are combined with the phase space in order to calculate the cross-sections. Consider a simple QCD scattering between two quarks, for example u1d2 u3d4, process similar to Rutherford scattering, but with → a gluon exchange instead of a photon. Consider the Mandelstam variables 2 2 2 defined ass ˆ = (p1 + p2) , tˆ = (p1 p3) ,u ˆ = (p1 p4) . The differential − − cross-section is then:

dˆσ π 4 sˆ2 +u ˆ2 = α2 , dtˆ sˆ2 9 s tˆ2 2 4 which diverges roughly as dp /p for transverse momentum p⊥ 0. To ⊥ ⊥ → avoid this divergence, an approximation must be made, selecting a lower cutoff transverse momentum p⊥min, named renormalization scale. Similar cross-sections differing mainly by colour factors are obtained for quark-gluon scattering, or gluon-gluon scattering. A few further QCD graphs are less sin- gular, for example gg qq¯. As done for the calculation of the Z production → 48 Monte Carlo Event Generators

Figure 3.1: Example of a possible combination of programs in the generation of the events. cross section, the differential cross section has to be weighted by the PDFs and summed over parton species, obtaining: ZZZ X 2 2 dˆσij σ = dx1dx2dtfˆ i(x1,Q )fj(x2,Q ) , tˆ i,j d where Q is the momentum transfer in the interaction. The value of Q2 is also known as the factorization scale. The PDFs are strongly peaked at small fractional momentum, further enhancing the peaking of the cross-section at small p⊥ values. The general purpose generators strive to contain a great number of pro- cesses, but usually only at the lowest order in perturbation theory, while the experimental interest may lie in higher orders - obtainable in ME generators. The price to pay is the need of a lower cutoff energy below which the per- turbation theory cannot be applied and thus ME approach does not work. On the contrary, the Parton Shower approach, is well-suited for smaller pT values, while failing to describe high pT exchanges, rendering the two ap- proaches complementary. The working scheme of ME + Parton Shower (and other additional programs) can be seen in Fig. 3.1. The process discussed above is only at the lowest order in perturbation theory. Going to (α3) virtual and real corrections need to be taken into O s account, as seen in Fig. 2.5. The cross-section for 2 3 processes is almost → always divergent when one of the parton energies vanish (infrared diver- gences) or two massless partons become collinear (collinear divergences). However, the virtual loop corrections to 2 2 graphs contain similar diver- → gences. In particular, the infrared divergences of the two processes cancel each other, with only finite terms remaining. The collinear divergences on 3.3 Parton Shower based generators 49

Figure 3.2: The “factorization” of a complex 2 → n process. the other hand generate the Q2 dependance of the PDFs. The problematic part is the calculation of the virtual corrections. If one is satisfied with Born-level diagrams only (without any loops), it is possi- ble to calculate the diagrams to quite high orders in αs, up to something like eight partons in the final state. These partons have to be well sepa- rated in the phase-space regions where the divergences become problematic. In order to cover also the regions of the phase-space where partons become soft/collinear, the ME approach is complemented by the Parton Shower one, explained in the following section.

3.3 Parton Shower based generators

If the rate of emission of a single gluon is large, then also the rate of emission of two or more gluons will be large, and thus the need to have high orders and many loops in matrix-element calculations. Parton Showers (PS) introduce two new concepts [23] simplifying the calculations:

iterative structure that allows simple expressions like q qg, g gg • → → and g qq¯ to be combined together to form complex multiparton → final states;

Sudakov factor that offers a physical way to handle the calculation • between real and virtual corrections.

Neither of the above are exact, but they allow for a good approximation of the emissions in the soft and collinear regions of the phase space.

The starting point of the PS approach is to “factorize” the 2 n process → (where n can be a large number of partons), to a simple core process, like 50 Monte Carlo Event Generators

Figure 3.3: A cascade of successive branchings.

2 2, convoluted with showers to add the remaining partons as in Fig. 3.2. → The incoming and outgoing partons of the core process must be on mass shell at large timescales, meaning p2 = E2 p2 = m2 0 with obvious − ≈ notation. Due to the uncertainty principle, however, the closer one comes to the hard interaction (shorter timescales), the more off-shell partons may be. This means that the incoming quarks may radiate a succession of harder and harder gluons, while the outgoing ones radiate softer and softer gluons. The gluons belonging to ISR are spacelike, p2 < 0, and are therefore called spacelike showers. The FSR on the other hand is constituted of timelike showers, with p2 > 0. The cross-section of the whole 2 n process is approximated to be Q2, as- 2 → suming that the other Qi virtualities can be neglected. For example, looking at Fig. 3.2 the inequality Q2 Q2 must hold true for each i = 1, 2, 3, 4. i  This means that first the hard process can be picked without any references to the showers, and only afterwards are the showers added with unit prob- ability. Of course, these showers do modify the event structure, so logically the cross-section is affected. For example, the total transverse energy E⊥total of an event is increased by the ISR. It is important to note that when dealing with PS, the hard process has to be picked so that Q2 Q2 is valid for all other virtualities. In the Fig. 3.2 i  for example if Q2 > Q2, then instead of ud ud as the core hard event, 1 → the ug ug has to be picked. Consequently, the content of the ISR and → FSR changes. Without this criteria one might doublecount a given graph, or even count it multiple times. Moreover, the approximation of neglecting virtualities when considering the core hard process becomes worse the more the incoming and outgoing partons are off-shell, giving an upper limit to the applicability of a PS approach. 3.3.1 Initial and Final State Radiation 51

3.3.1 Initial and Final State Radiation The Parton Shower approach enables, as explained above, to “factorize” a complex process into several subprocesses. Moreover this can be applied to obtain further branchings from already emitted gluons, as in Fig. 3.3. In this way, the big probability of one q qg branching is splitted into several → probabilities for successive branchings. The splitting of gluons and quarks follow the so called DGLAP equations [26]:

2 αs dQ d a→bc = Pa→bc(z) dz (3.1) P 2π Q2 4 1 + z2 where Pq→qg = , (3.2) 3 1 z − (1 z(1 z))2 Pg→gg = 3 − − , (3.3) z(1 z) − nf 2 2 Pg→qq¯ = (z + (1 z) ) , (3.4) 2 −

2 with nf being the number of quark flavours and z = Q /(2pbpc). Never- theless, these probabilities still blow up in the soft and collinear regions. Needles to say, the perturbation theory will cease to be meaningful at so 2 small Q scales that αs(Q) becomes of the order of unity and exceeds it, where confinement effects and hadronization take over. Typically a lower cutoff is placed at Q = 1 GeV so that under this scale no further branchings occur. By applying this cutoff one avoids singularities, but probabilities can still grow well over unity. This problem is covered by the second important aspect of the Parton Shower approach: Sudakov (form) factor. The derivation of the Sudakov form factor is deferred to literature [27]. What is important for this thesis is that this factor compares in the exponential multiplying the probability, and its function is such that the total probability of a parton to branch never exceeds unity. Note here that the Sudakov factor is only an approximation of the virtual loop corrections to the Feynman diagrams. This takes care of the divergences coming from considering only real corrections, but does not supply the exact values of finite parts emerging from the cancellation between virtual and real corrections. In other words, this approach does not allow to simply consider a Born-approximation together with a Sudakov fac- tor and make it equivalent to a complete NNLO calculus (for example). Now the approach of the PS is clear: starting from a simple qq¯ system, the 2 quarks are individually evolved downwards from some initial Qmax until they branch. At this stage the “mother” parton disappears, and two new partons are formed in its wake, which in turn are evolved downwards in Q2 and may branch. This process continues until the lower cutoff scale is reached. In reality this applies only to FSR, while the ISR is more complex, because of the composite structure of the incoming protons. In particular, instead of starting with the incoming parton a and then branching it forward until the 52 Monte Carlo Event Generators state b is obtained, it is easier to first consider b and then try reconstructing the shower backwards to obtain the initial a.

3.4 Combining Matrix Element and Parton Shower generators

As explained previously, the ME in Sec. 3.2 and PS in Sec. 3.3 approaches complement each other. Depending on the ME generator used, there cur- rently exist two different possibilities:

a ME generator that calculates the Born-level diagrams up to about • eight partons in the final state (e.g. MadGraph);

a ME generator that calculates both the real and virtual corrections, • currently at NLO and NNLO (e.g. BlackHat).

Clearly the generators implementing a NLO and NNLO calculus yield results at a higher precision. However, what is lost here is the ability to generate events with different multiplicities in the final state and sensibly combine them. In other words, what is possible with LO ME calculus is to consider Z + 1 jet, Z + 2 jets, Z + 3 jets ..., afterwards adding the PS and effectively combine them. With NLO and NNLO calculus it is still possible to add par- tons in the final state, but no prescription currently exist to combine them. Focusing from now onwards on the generators calculating only the Born- level diagrams, the ME approach cannot be used to explore the internal structure of a jet and it is difficult to match to the Hadronization which is supposed to take over at very soft/collinear regions. On the other hand, the PS approach is clearly an approximation, but it is easily applicable to a variety of physics, having to know only the main hard process and adding complexity to this state with the evolution of showers, as described. Contrary to ME, this approach fails to describe well the pro- cess when the p⊥ exchange in the process is great, but permits to explore the internal structure of a jet and provides a match to the hadronization models. It is highly favourable then to combine the two approaches and at the same time be careful of doublecounting or gaps in the phase space coverage. Sev- eral alternative approaches have been developed, each having its strong and weak points.

3.4.1 Merging The goal of the merging is to cover the whole phase space with a smooth transition from ME to PS. The simplest case to consider would be a process where the ME is known at LO, as well as the real-emission part of the NLO, i.e. an additional gluon. The parton shower should then reproduce 3.4.2 Vetoed Parton Shower 53

1 dσLO+g W ME = σLO d(phase space) starting from a LO topology, where W ME indicates the phase space popu- lated by ME. Similarly, if W PS denotes the phase space populated by PS, then a correction factor W ME/W PS needs to be applied. After applying the aforementioned correction factor only the real part of the ME and PS correc- tions are equal. However, the PS additionally contains the virtual corrections due to the Sudakov form factor. This introduces an ordering variable Q2, so that the whole phase space is covered starting from the “hard” emissions and moving to the ”softer” ones. The distribution of the phase space obtained in this way is of the form

2 ! Z Qmax PS 2 ME 2 ME 02 02 Wcomplete(Q ) = W (Q )exp W (Q )dQ − Q2

PS As such, the Wcomplete agrees with the ME in the hard region of the phase space, where the factor is close to unity, and with the PS in the soft /collinear regions, where W ME W PS. ≈ This method is especially convenient for resonance decays, as for example Z0 l+l−, where it is known that the full NLO answer, with virtual cor- → NLO LO 2 rections included, is σ = Kσ (1 + αs(Q )/π). Simply rescaling by this factor then allows to obtain the full NLO answer, starting from a LO one.

3.4.2 Vetoed Parton Shower The following approach is in some sense the extension of the merging ap- proach described above. The objective is still to combine the real-emission behaviour of ME with the Sudakov factors of the PS. The difference is in that the merging approach only works for combining LO and NLO expressions, while the vetoed parton shower can be extended towards several different orders. To understand this algorithm, it is useful to consider a lowest-order process, such as qq¯ Z0. For each higher order, as long as only Born-level graphs → are considered, jets are added in the final state. For example, the first order might be qq¯ Z0g, the second order qq¯ Z0gg, and so on. Denoting these → → cross-sections as σ0, σ1, σ2,... , as previously explained, each σi with i 1 ≥ contains soft and collinear divergences, and these are limited by applying a set of ME phase-space cuts, for example on the invariant masses of parton pairs. However, applying the ME approach without the virtual corrections, does not ensure a “balance” between the cross-sections, where the addition of a σi+1 should be compensated by a depletion of a σi. That is, if the event with i jets at some resolution scale is considered, and the minimal jet energy is lowered (energy required to form a jet) revealing an additional jet, then the event must be reclassified from being a i-jet event to a i + 1-jet event. This 54 Monte Carlo Event Generators balance can be achieved with the help of Sudakov showers. Note here that this addition does not yield a finite value that one obtains from the complete real and virtual calculus. A few alternative algorithms exist, but in the following only the MLM ap- proach used in Sec. 3.7 will be discussed:

1) a hard process is picked within the ME-cuts allowed phase space region, in proportion to the values of the relative cross-sections σ0, σ1, σ2 ... For this purpose an αs0 is used, larger than the αs considered in the steps below;

2) a shower history is reconstructed that describes how the event could have evolved from the hard-process to the actual final state;

3) the best choice of the argument of αs in showers is known to be the squared transverse momentum of the respective branching. All the Q 2 branchings together produce a factor Wα = branchings(αs(p⊥i)/αs0), with the αs0 chosen in the step 1), which determines the probability that the event should be considered;

It is at this point that the different matching algorithms part ways;

4) a complete parton shower is allowed to develop from the selected par- ton configuration;

5) the partons obtained after the showering are clustered into a set of jets (e.g. by using the k⊥ algorithm as in Sec. 3.7), with the same jet separation criteria as used with the original parton configuration (pre-shower);

6) each jet is matched to its nearest original parton if possible;

7) the event in question is accepted only if the number of jets is equal to the number of original partons, and if each original parton is sensibly matched to a jet. The idea at the base of the MLM approach is that the probability of not generating any additional final jet activity during the shower production is provided by the Sudakov factors used in step 4).

3.5 Pythia 6.4

Pythia 6.4 [28] is a general purpose generator written in Fortran program- ming language. Take note here that Pythia 8 is a complete rewrite of the code in C++ and both versions are still implemented in various analyses. In this thesis only Pythia 6.4 will be considered. It contains a variety of physics aspects, including beyond SM searches. It is known for the reasons explained in the previous sections that Pythia alone does not reproduce well processes at high p⊥ (because it calculates the exact ME only for the 2 2 process, and not for the 2 n), so it is → → 3.5 Pythia 6.4 55 generally implemented in conjunction with other ME generators. However, since it contains a wide array of processes even beyond the SM, it still gets a fair amount of use as a standalone generator. As with all the other physics programs, one of the key aspects when implementing it is to retain a critical view. Pythia was developed by the Lund group and implements the so-called string-fragmentation model - in short, the Lund model [28] - which is one of the existing models of describing the confinement, and the related pro- cesses of fragmentation and hadronization. The combination of the string fragmentation and parton shower has been found to be very successful and important in describing the hadronic Z0 events.

When using Pythia, the Monte Carlo program is built as a slave system, meaning that the user has to supply the main program. From this, vari- ous subroutines are called to execute specific tasks and then the control is returned to the main program. In order to have the control over what is being generated, the user should understand the importance of various pa- rameters and subroutines. As an example relative to this analysis, the study of Z e+e−, all the other decays should be switched off, thus effectively → producing only the wanted events. Additionally, the parameters in question can be for example the PDF chosen to describe the protons structure, the decay of unstable particles, and so on. For a general user however, it is not necessary to start from scratch. A set of working parameters are included in the so called “tunes”, which are obtained by best fitting the generated events to the real data. Afterwards, with the user discretion, some of these parameters may be changed in order to study their effects on the produced data. It should be noted here that Pythia adopts its own event format, but al- lows for a conversion to the HepMC event format [29], which is a mutual convention achieved in order to more easily compare and interface different generators. The difference between the two types of events is that HepMC allows for only quantities that can be defined unambiguously (being inde- pendent from the program used). The details of an event generator have already been explained in the previ- ous sections, and thus in the following only the Pythia specific details will be discussed.

Hard Process and Resonances

The Pythia 6.4 library contains about 300 different hard processes. They can be classified in in many different ways, one of them being according to the number of final-state objects: they can be ‘2 1’ processes, or ‘2 2’, → → or ‘2 3’ and so on. Pythia is optimized for the ‘2 1’ and ‘2 2’ → → → ones, with no generic treatment for processes with three or more particles in the final state. However, there exist different approaches for some specific 56 Monte Carlo Event Generators processes. Furthermore, the missing activity in the final state is supplied by the parton shower (with known limitations). Another possible classification is according to the physics scenario. One can speak of hard QCD, soft QCD, heavy-flavour production, SUSY, etc... The bulk of the above processes is of the ‘2 2’ type. Furthermore, it is → important to note that the classification based on the number of particles in the final state can be misleading when considering resonances and their decay, since an s-channel resonance is considered as a single particle, even if is assumed to always decay into two final-state particles. This means that + − 0 0 the process qq¯ W W q1q¯ q2q¯ is classified as ‘2 2’ even though → → 1 2 → the decay treatment of the W pair includes the full ‘2 4’ matrix elements. → A characteristic of Pythia is that the partial widths and decay rates can be calculated dynamically (thus not being constant), as a function of the actual mass of a particle. This is especially important for resonances with large decay widths and for threshold regions.

String Fragmentation

The default string fragmentation implemented in Pythia is the Lund string fragmentation model [28], but other independent fragmentation models ex- ist (to allow possible comparisons). All the different models have similar structure, for they are probabilistic and iterative. This means that the fragmentation as a whole is described in terms of one or a few simple underlying branchings of the type jet hadron → + remainder-jet or perhaps string hadron + remainder-string and so on. → At each branching, probabilistic rules are given for the production of new flavours and for the sharing of energy and momentum between the prod- ucts. Without going into too much detail, to explain the string fragmentation model, one can consider the simplest possible color-singlet system, qq¯. Lat- tice QCD studies support a linear confinement picture, i.e. that the energy stored by the quark pair increases linearly with the separation between the quarks (if the short-term Coulomb term is neglected). As the two quarks move apart from each other, the physical picture is that of a color flux tube stretching in between. The transverse dimensions of the tube are of typical hadronic sizes, about 1 fm. From hadron spectroscopy the string constant, i.e. the amount of energy per unit length, is deduced to be κ 1 GeV/fm. ≈ As the q andq ¯ move apart, the potential energy stored in the string increases linearly, and the string may then break, producing a new q0q¯0 pair, so that the new situation is made up of qq¯0 and q0q¯. If the invariant mass of either of these pairs is large enough, subsequent breaks may occur. The assumption of the Lund model is that the string break-up process continues until only on-mass-shell hadrons remain, each hadron corresponding to a small piece of string with a quark in one end and anti-quark in the other. In order to produce the q0q¯0 pair, the Lund model makes use of the quantum 3.6 MadGraph 5 57

mechanical tunnelling. This leads to a Gaussian spectrum in the p⊥ distri- bution of the q0q¯0 pairs, and also suppresses the production of heavy quarks, u : d : s : c 1 : 1 : 0.3 : 10−11. This means that the heavy quarks will ≈ essentially be produced only in the perturbative parton-shower branchings q qq¯. → When a quark and antiquark form a meson, an algorithm is invoked which chooses between the allowed possibilities - either a pseudoscalar or vector meson. Quantitatively, a ratio of 1 : 3 is expected, but here the Lund model is not particularly predictive. A tunnelling mechanism can also be invoked to explain the production of baryons, but this is still a poorly understood area. Generally speaking, the different string breaks are causally disconnected, meaning that the breaks can be described in any convenient order, following an interactive procedure which will not be explained here. As a final comment on string fragmentation, in the Lund model one of the assumptions is that the gluon exchange does not mess up the initial colour assignment. There exist color reassignment models, where this is be- ing studied. Experimentally, the evidence is, so far, that these effects are sufficiently small. Another example of nontrivial effects not covered by the Lund fragmentation model is the Bose-Einstein correlations between the identical final-state particles.

3.6 MadGraph 5

MadGraph 5 is a complete rewrite of the automatized matrix element generator MadGraph [30]. The original code was written in Fortran, now partly rewritten in Python, rendering it more flexible. As mentioned in pre- vious sections, this is a specialized event generator which does not generate the whole event. In fact, it calculates the ME more precisely than a general purpose event generator would, but requires the subsequent parton shower- ing done by Pythia (for example). The generation of events with MadGraph is done in two subsequent steps. First, the ME are calculated. In order to do this, two files (called cards) have to be set up, where one card defines the process (e.g. qq¯ e+e−) and the → other contains the run parameters. This helps to understand that different processes can be generated (e.g. by adding jets in the final state), while still retaining the same run parameters. In the second step the events can be generated. Contrary to a PS generator, MadGraph allows to calculate the Born-level ME of a 2 n process, where → n can be greater than 2. Two important things are of relevance here. First off, the time needed for the calculation of the ME increases exponentially with each added jet in the final state, posing an upper limit for the number of jets in terms of computational power. Secondly, the following addition of the parton shower has to be taken into account, which can increase the 58 Monte Carlo Event Generators number of jets in the final state. Nevertheless, this gives an edge over Pythia, where the generated events are of the inclusive type, meaning that it can only generate Z + n jets, where the number of jets is dependant only on the cross-section and can- not be controlled otherwise. On the contrary, from the MadGraph point of view, the Z + 0 jets is a different event from the Z + 1,2,3,4 jets. This is useful because due to the exponentially decreasing cross-section with the increasing number of jets, the statistics available with high jet multiplicity can be quite low. If however the Z + 3 jets or Z + 4 jets can be generated independently from the rest, it is easy to increase the statistics by simply increasing the number of generated events (with 3 and 4 jets). The generated events are written in the HepMC event format, following the mutual convention, easily accessible by Pythia or other generators contain- ing parton shower.

3.7 Madgraph + Pythia

As previously explained, MadGraph does not reproduce a complete final state (it is missing the hadronization part), and Pythia itself cannot re- produce well the data for high p⊥. A combination of the two is then highly desirable, utilizing the calculation of the matrix elements for high p⊥ (well separated jets) and the parton shower for small p⊥ (jet structure). Mad- Graph implements either MLM (explained in Sec. 3.4.2) or CKKW match- ing. In the following only the first will be considered, which was used for the present analysis.

The final state partons are clustered using the k⊥ jet algorithm (see 2.3, −1 with p⊥ instead of p⊥ ) to find the parton shower history of the event. The smallest k⊥ value is restricted to be above the xqcut cutoff scale. In order to mimic the behaviour of a parton shower, the k⊥ value for each vertex cor- responding to a QCD emission is used as a renormalization scale for αs in that vertex. The factorization and renormalization scales for the main hard 2 1 or 2 2 process are fixed by the transverse mass of the produced → →2 2 2 particle(s), m⊥ = p⊥ + m . Successively, the event is passed to Pythia for the parton showering. Af- ter the showering, but before the hadronization and decays, the final state partons are clustered into jets with a cutoff scale Qcut > xqcut. The jets are then compared to the original partons from the ME. A jet is considered to be matched to the closest parton if the jet measure k⊥(parton, jet) is smaller than the cutoff Qcut. For the event not to be vetoed, each jet must be matched to a parton. The only exception is in the highest multiplicity sample where extra jets are allowed below the k⊥ scale of the softest ME element in the event (in the phase space not covered by the ME). At this point it follows that xqcut defines the minimal distance in the phase space 3.7 Madgraph + Pythia 59

10-1 10-1 DJR(1→0) 0 partons DJR(2→1) 1 partons 2 partons 3 partons 4 partons

-2 Normalized scale -2

Normalized scale 10 10

10-3 10-3

10-4 10-4 -1 -1 10 0.5 1 1.5 2 2.5 310 0.5 1 1.5 2 2.5 3 Log (Merging scale) Log (Merging scale) DJR(3→2) 10 DJR(4→3) 10 Normalized scale -2 -2

Normalized scale 10 10

-3 10 10-3

10-4 10-4 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3 Log (Merging scale) Log (Merging scale) 10 10

Figure 3.4: The differential jet rate distribution in the four different regions. This is an example of a problematic Qcut, as seen in the non-smooth transition in the region 0 → 1. allowed between extra partons. However, since the matching procedure ef- fectively takes place in Pythia, the value of Qcut must also be chosen with care.

The initial value of Qcut is calculated starting from the xqcut value in Mad- Graph, but it might not be the best choice. In order to check if the chosen Qcut value enabled a correct matching, one can look at the Differential Jet Rate (DJR) plot, which shows the transition from the region below the matching scale to the region above. The distributions of DJR have to be independent from the cutoff chosen and the transition at the cutoff has to be as smooth as possible. A code is available for download and once run it plots the DJR for the 0 1, 2 1, 3 2 and 4 3. In short, the com- → → → → bined histograms from the different jet multiplicities have to give a smooth histogram. In Fig. 3.4 a bad choice of Qcut is shown, while in Fig. 3.5 a good one. Before running Pythia on all MadGraph events it is good practice to first run it only a handful of them, check if everything works and change the value of the Qcut if needed. There is a price to pay because of the merging. Not all MadGraph events can be successfully merged with Pythia, covering uniformly the phase space without overpopulating some parts and leaving others out. The efficiency of 60 Monte Carlo Event Generators

10-1 10-1 DJR(1→0) 0 partons DJR(2→1) 1 partons 2 partons 3 partons 4 partons

-2 Normalized scale -2 Normalized scale 10 10

10-3 10-3

10-4 10-4 -1 10 0.5 1 1.5 2 2.5 310-1 0.5 1 1.5 2 2.5 3 Log (Merging scale) Log (Merging scale) DJR(3→2) 10 DJR(4→3) 10 Normalized scale -2 -2 Normalized scale 10 10

-3 10 10-3

10-4 10-4 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3 Log (Merging scale) Log (Merging scale) 10 10

Figure 3.5: The differential jet rate distribution in the four different regions. This is an example of a good choice for the Qcut. the merging can be at around 80% for the Z + 0 jets, but can go as low as a 6% for the Z + 4 jets. The decrease in the efficiency can be attributed to the higher number of particles in the MadGraph event, covering the phase space.

3.8 Sherpa

Simulation of High Energy Reactions of PArticles, Sherpa [31], is a Monte Carlo event generator that provides complete hadronic final states in simu- lations of high-energy particle collisions. It is a standalone general purpose generator. It differs from Pythia and Herwig in that it is a recently de- veloped event generator, and thus directly written from scratch in C++. Along the lines of object-oriented programming and MC generators it re- tains a modular structure. This means that Sherpa comes with various modules that can be switched on or off, depending on the user preferences. Additionally, these modules can be replaced with other programs (e.g., a more specialized ME generator like BlackHat [32]). Sherpa differs from MadGraph and Pythia in that it calculates Born- level diagrams to a current maximum of 8 partons (using a different algo- rithm than Madgraph, more suitable to parallel processing) and at the same time implements its own scheme for Parton Showers and matching be- 3.8 Sherpa 61 tween its ME calculations and PS. This means that Sherpa cross-sections have LO accuracy, which in some special cases might be extended. In the following a few key aspects of Sherpa will be discussed, focussing on the differences in comparison with the aforementioned programs and for more detailes referring to [33] and [24] .

Parton Shower in Sherpa Sherpa implements a different showering algorithm than Pythia, based on the Catani-Seymour dipole factorization formalism [35]. The showers are then made of splitting dipoles, where a dipole is made up of the actual parton that is supposed to split and a well-defined spectator parton that is colour-connected to the emitter. In this formalism, denoting with (I) a parton in the initial state and (F) a parton in the final state, four differ- ent configurations for the dipoles are possible: FF, FI, IF and II. All dipole configurations are treated equally, and thus contrary to Pythia there is no formal difference between initial- and final-state radiation. Successive emis- sions are ordered in terms of the invariant transverse momentum of either the final-state splitting products or with respect to the emitting beam par- ticle. One of the points in favour of this algorithm is that it facilitates the merging between ME and PS.

Matrix Elements and Parton Shower Merging in Sherpa For the merging, Sherpa implements a CKKW scheme which goes as fol- lows: cross-sections σk for processes with k extra partons are calculated with the constraint that the ME calculated final states pass the jet criteria. The criteria are defined by the jet measure

2 2 Qij = 2pipj min( ) , Cij,k + Cji,k where the minimum is taken over the color-connected partons k, and where for final state partons i, j the following expression holds if j is a gluon

2 pipk mi Cij,k = , (pi + pk)pj − 2pipj otherwise Cij,k = 1. The minimal distance is set by the merging scale Qcut. Following, the processes with fixed parton multiplicity are chosen with the P probability σk/ k σk. The hard process is then picked from the list of pos- sible partonic processes giving rise to the required parton multiplicity, and in accordance with their contributions to the cross-section. In order to determine the reweighting, the parton configuration of the ME has to be analysed. The partons are clustered backwards according to the 62 Monte Carlo Event Generators shower measure and and inverted shower kinematics untill the original 2 2 → hard process has been found. The reweighting proceeds according to the re- constructed shower history, and the event is accepted or rejected based on a kinematics-dependent weight. Next, the parton-shower evolution is started with suitably defined scales for intermediate and final-state particles. Intermediate partons undergo trun- cated shower evolution which permits the parton-shower emissions between the scales of one matrix branching and the next. Lastly, parton-shower radiation is subject to the condition that no extra jet is produced. If any emission turns out to be harder than the separation cut Qcut, the event is vetoed, effectively implementing a Sudakov rejection.

Hadronization in Sherpa The hadronization in Sherpa differs from the string fragmentation available in Pythia, for it implements a cluster fragmentation [36, 37, 38] which at its base assumes a local parton-hadron duality, i.e. the idea that quantum numbers on hadron level follow very closely the flow of quantum numbers on the parton level. During this process the particles close to each other in phase space are united as pseudo-particles under the name of “clusters”. Only the very lightest clusters are converted directly into hadrons, whereas heavier clusters may either convert into heavy hadrons or decay into lighter clusters. Describing the process in more detail:

first all gluons are forced to decay in quark or diquark pairs, qq¯ or dd¯ • and all remaining partons are brought on mass shell. The recoils are handled mainly through colour-connected particles;

the decays of heavy clusters are achieved by emitting a gluon from a • qq¯ pair, which in turn splits as in the previous point;

max two parameters are invoked in this procedure, the p⊥ and p0. The • max p⊥ serves as the upper limit of the transverse momentum of all non-perturbative decays (e.g. g qq¯) and p0 serves to determine the 2 → effective p⊥. Lighter-flavour pairs are preferably produced due to the available phase space; this is additionally taken care of by using weight parameters;

lastly, the decays of clusters into hadrons are determined by various • weights including flavour wave functions, phase-space factors, flavour and hadron- multiplet weights, and other dynamical measures. Chapter 4

Data - Monte Carlo Comparison and Analysis

In this chapter the analysis of the jet production rates in association with W and Z bosons [21] will be presented, continuing with its extension to the Z+ jets analysis along with an introduction to Rivet. Afterwards, the choices made for the different generators will be presented and the resulting plots will be shown.

4.1 Jet Production Rates in Association with W and Z Bosons

The jet production rates in association with W and Z bosons [21] is the analysis with a logical extension to the Z+ jets analysis currently ongoing. It was published with the 2010 dataset with an integrated luminosity of 36 pb−1. Here with V is denoted either a Z or W boson. In this analysis the measurements of inclusive jet rates fn(V ) was measured and a test of Berends-Giele scaling, both explained in Chapter 2. Furthermore the ratio of W to Z cross sections and the W charge asymmetry as a function of the number of associated jets were measured. The measurements provide a stringent test of perturbative-QCD calculations and are sensitive to the possible presence of new physics. In the following, the event selection will be briefly described, focusing on the Z production detected through decays into electrons and positrons, relevant for the present analysis. The event selection begins with the identification of a charged lepton, either an electron or a muon, with p⊥ > 20 GeV. This lepton is called the ”leading lepton”. If it is an electron, it must have an ECAL cluster in the region η < 2.5 with the exclusion of the region 1.4442 < η < 1.566. This exclu- | | | | sion rejects electrons close to the barrel / endcap transition and regions in shadow of cables and services. For each electron candidate, a supercluster in ECAL is defined in order to correct for the potential underestimation of the energy due to bremsstrahlung. 64 Data - Monte Carlo Comparison and Analysis

Thus the electron cluster is combined with the group of single clusters at- tributed to bremsstrahlung photons generated in the material of the tracker. To reduce further the contamination from misidentified electrons and hadronic decays, the electrons isolated from hadronic activity are selected. This se- lection is based on maximum values allowed for the isolation variables. The p 2 2 sum of E⊥ is made within the isolation cone ∆R = (∆η) + (∆φ) < 0.3. The values of the isolation variables are chosen in order to obtain an electron efficiency of 80%. The ”second leading electron” is then searched for with p⊥ > 10 GeV and quality requirements such that the efficiency corresponds to 95%. If such a second leading electron is found, and its invariant mass with the first leading electron Mll lies between 60 GeV and 120 GeV, the event is assigned to the Z + jets sample. If such a second leading electron is not found, the event is assigned to the W + jets sample, thereby ensuring that there is no overlap between the two samples.

The Z+ jets analysis discussed in this thesis extends the jet production rates in association with W and Z bosons by considering a larger sample of data, relative to the 2011 LHC run, with 5.7 fb−1 of integrated luminosity. Additionally a few adjustments have been made on the event selection and different observables looked at, which will be explained in the following Rivet section. The data point shown in the following plots refer to this updated analysis [39].

4.2 Rivet

Rivet (Robust Independent Validation of Experiment and Theory) [40] al- lows for the validation and tuning of Monte Carlo event generators. It is a C++ class library providing the infrastructure and calculational tools for reconstructing the analyses being done on the real data and easily applying them independently to a wide variety of generators. Furthermore, Rivet was thought of as a tool to facilitate the data preser- vation, which is a big problem for the high energy physics experiments in general. With time passing by, the information relative to a particular heavy energy physics experiment can be partially lost, or even completely unavail- able (e.g. details relative to an experiment that was long shut down). To achieve the said time durability, Rivet aims to implement an analysis rel- ative to each physics paper published, and store the relative experimental data in a file format adopted by Rivet. This last step allows for an au- tomated generation of the bin-edges of a histogram, starting from the one available with the real data. Furthermore, it allows for an easy comparison between data and MC, with supplied scripts. Currently, the Rivet package comes with about 100 analyses implemented and relative data, raging from LEP, HERA, RHIC, KEK-B, Tevatron Runs I and II, LHC and more. This means that all the given analyses can still 4.2.1 Rivet analyses 65 be run independently from the details of various experiments, on whichever MC currently available and compared with the real data. In order to do so Rivet operates purely on the HepMC event record [29], which is a standardized event format for MC generators. This effectively makes Rivet generator-independent. Of particular interest are the stable particles (labelled by “state 1” in the HepMC), because only these can be measured by a detector such as CMS. A further concurring feature of Rivet is its simplicity and, for the sake of it, the analyses code should be written in clean and easy language, ideally self-explanatory to be a reference to the experimental analysis algorithm.

4.2.1 Rivet analyses

The Rivet analysis function is the core of the Rivet program, which deter- mines whether an event will be accepted or rejected using a negative logic. The latter means that whenever a variable doesn’t pass a certain cut, the event is vetoed, proceeding with the analysis of the next one. To apply the kinematic cuts in Rivet there exist some pre-defined objects called the Pro- jections. A projection is an object which calculates a property, or properties from an event, such as a single number (e.g. the Q2), a single complex object (e.g. a sphericity tensor) or a list of such objects (e.g. a set of particles in the final state). The projections include kinematic cuts that are frequently used in various analyses, which means that some specific cuts have to be implemented by the user. It is possible to include multiple projections and additional kinematic cuts in a single analysis. It is important to note here that multiple analyses can be loaded at the time of the Rivet running. This means that it is not necessary to re-run the code for each analysis, but simply define which analyses have to be run on a certain set of data. For the present work, only the Z+ jets analysis will be considered. Lastly, Rivet enables the normalization of the histograms at the end of event processing. However, if for whatever reason the normalization cannot be implemented, a script exists which enables to normalize histograms rel- ative to a MC simulation with the histograms available from real data.

4.2.2 Z + jets analysis in Rivet

As explained in Sec. 4.2, the Rivet analysis should mimic as closely as pos- sible the analysis done on the real data, and such is the present case. First the projection final state is applied, to obtain only the particles mea- + − surable by the CMS detector. Next, the e and e with the highest p⊥ have to be found in the event because, as explained in previous chapters, the main 0 interaction (producing Z ) should be the one with the highest p⊥ exchange. 66 Data - Monte Carlo Comparison and Analysis

In order to select the two electrons, however, it is necessary to additionally consider the photons that are too close to the electrons in the phase space for the ECAL to distinguish. To reproduce this effect introduced by the ECAL limited sensibility, first the following distance is defined q 2 2 ∆R = (ηl ηγ) + (φl φγ) − − where l can mean either e+ or e−. In order for the ECAL to distinguish between a photon and a lepton, the ∆R has to be large enough: for the barrel (situated in the region of η < 1.4442) the minimal distance is ∆R > 0.05, | | while for the endcap (situated in the region of 1.5660 < η < 2.4) | | the minimal distance is ∆R > 0.07 (these two estimates were obtained by considering the spatial dimensions of the ECAL crystals, as in Sec. 1.2.4). If the distance between an electron (positron) and photon is smaller than the given values, the energy of the two particles in question are summed, while still retaining the original lepton direction - the direction is unaffected for it is measured by the tracker, which does not detect neutral particles. This process is repeated for all the electrons, positrons and photons found. In the end, only the electron and positron convoluted with photons, having the highest combined p⊥ are considered. Additionally, this p⊥ has to be greater than 20 GeV. If, for whatever reason, two such particles are not found, the event is vetoed. Next, the invariant mass of this lepton pair is calculated, requiring it to be 0 in the allowed Z mass window, which was chosen to be 71 GeV < me+e− < 111 GeV. Once again, if the mass exceeds either limit, the event is vetoed. Lastly, since the analysis is Z+ hadronic jets, the two electrons (and adjacent photons) must not be counted as jets and are thus eliminated from the particle list before doing any further analysis on the jets. Rivet makes use of the FastJet anti-k⊥ algorithm to form the jets from the outgoing particles, and the usual choice for the R parameter (see Sec. 2.3) in CMS is 0.5, which has been fixed here as well. Successively, the different histograms are filled, which will be shown in the following sections:

Jet Multeplicity is the distribution of the number of jets found; • Leading Jet p⊥ is the distribution of the transverse momentum of the • jet having the highest p⊥;

Additionally, if the number of jets is equal to 2, 3 or 4, the his- • tograms containing the transverse momentum distribution of the sub- leading (second jet), sub-subleading (third jet) or sub-sub-subleading jet (fourth jet) are respectively filled.

The histograms made in Rivet are not directly normalized, because the complete information necessary to do so was not available. For example, in the real analysis, the histograms are normalized with the detector efficiency 4.3 Estimation of Monte Carlo generators uncertainties 67 values, which are missing here. As already specified, this problem can be avoided by normalizing the MC histograms to the area given by the real data histograms.

4.3 Estimation of Monte Carlo generators uncer- tainties

As explained in Chapter 3, at least a part of each MC generator is based on phenomenological models which depend on certain parameters. The set of parameters is fixed by comparing an analysis performed on simulated events with the experimental data analysis, and extrapolated from the best fit. This has two major implications:

a configuration of parameters has to be tested on multiple analyses; • the value of certain parameters might represent a bigger effect in some • analyses;

One of the key aspects of MC generators tuning is therefore considering ob- servables sensible to variations of a set of parameters. For this thesis work a single analysis was considered (Z+ jets), and the observables defined therein, with the goal of studying the effect of a few MC generators with different configurations. In particular, the goal was to study the main systematic errors introduced by the variation of a set of parameters. In order to do so, comparisons were made between the central predictions (using the recommended settings) and real data, successively applying variations on the parameters as will be explained in the following sections. The data sample used for comparison was obtained from the 2011 LHC run, with about 5.7 fb−1 of data. The MC generated samples were produced to an approximately equal size (in the most cases about 5 fb−1). The number of needed events were calculated using the equation N = σtotLint, where σtot was given by the MC generator with a sample run and Lint is the integrated luminosity.

4.3.1 Central predictions The first comparison is between the central predictions of the different event generators:

Pythia 6.4 with Z2 tune; • MadGraph inclusive set + Pythia 6.4 with Z2 tune; • MadGraph exclusive set + Pythia 6.4 with Z2 tune; • 68 Data - Monte Carlo Comparison and Analysis

Sherpa. • where by “tune” a particular choice of parameters is intended (the details can be found in [41]). Note here that Pythia 6.4 with Z2 tune was consid- ered only for completeness, with known limitations (not reproducing well the data at high p⊥) and will not be used in the following sections. Furthermore, the MadGraph inclusive set was generated by requiring Z + n jets, where n was determined by the respective cross-section. Again, this production will not be considered in the following, but was shown here for comparison. Lastly, the MadGraph exclusive set was produced by considering four dif- ferent processes: Z + 0 jets, Z + 1 jets, Z + 2 jets, Z + 3 jets, and only the last one as Z + n jets with n 4. Roughly the same number of events were ≥ produced for the inclusive and exclusive sets. However, in the exclusive case the Z+3 jets and Z+n jets with n 4 were produced with higher integrated ≥ luminosity (equal to 10 fb−1 ) in order to increase the statistics. Lastly, the Sherpa sample was generated with integrated luminosity equal to 1.2 fb−1. The sample was generated with enhancement factors which amplify the pro- duction rate of 2, 3, and 4 jets, effectively increasing the statistics. The emerging plots are shown in Fig. 4.1, 4.2, 4.3, 4.4 and 4.5, where in each figure on the top histogram the distribution is shown and on the bottom the ratio between data and MC. As explained, the data generated by Pythia which explicitly calculates only 2 2 matrix elements and covers the rest with parton showers, does not → reproduce well the hard part of the event - data with high p⊥. The two different MadGraph productions are similar with the main differ- ence in the gained statistics for the third and fourth jet. Lastly, the Sherpa generated events are in good agreement with data even by having four times lesser statistics than MadGraph + Pythia sample.

4.3.2 Different Pythia tunes

The next comparison was made by considering the aforementioned Mad- graph exclusive set in conjunction with different Pythia tunes (for the PS and hadronization parts). Three different Pythia tunes were considered for comparison: Z2, D6T and Perugia2011 (P11 in short). The plots are shown in Fig. 4.6, 4.7, 4.8, 4.9 and 4.10 where in each figure on the top histogram the distribution is shown and on the bottom the ratio between data and MC. It can be seen that the different Pythia tunes do not have an important effect on the observables studied. This was expected, since it is MadGraph that calculates the bulk of the cross-section.

4.3.3 Renormalization and factorization scales

The next set of generated events were made in order to see the effect of the renormalization and factorization scales in determining the MC generators 4.3.3 Renormalization and factorization scales 69

Leading jet p 1 ⊥ ⊥ b p

b /d b CMS data σ

d MC (pythia6)

· 1 b 10− ) MC (fullstat) b

jet MC (madpy)

1 b

b ≥ MC (sherpa)

b

+ 2 10− Z b ( b σ b

1/ b

3 b b 10− b

b

1.4 1.2 1

MC/data 0.8 0.6 50 100 150 200 250 300 jet p [GeV] ⊥

Figure 4.1: The leading jet p⊥ distribution. The sample produced by Pythia 6 is shown in red, MadGraph inclusive set in blue, MadGraph exclusive set in green and Sherpa sample in magenta. The yellow band indicates the total errors (statistical and systematic) associated to the real data.

Second jet p ⊥ ⊥ b p 10 5 b /d CMS data σ d b MC (pythia6) ·

) 10 4 MC (fullstat)

jet b MC (madpy) 2

≥ b MC (sherpa)

b + 3 Z 10 ( b σ

b 1/ b

2 b 10 b

1.4 1.2 1

MC/data 0.8 0.6 50 100 150 200 250 300 jet p [GeV] ⊥

Figure 4.2: The subleading jet p⊥ distribution. The sample produced by Pythia 6 is shown in red, MadGraph inclusive set in blue, MadGraph exclusive set in green and Sherpa sample in magenta. The yellow band indicates the total errors (statistical and systematic) associated to the real data. 70 Data - Monte Carlo Comparison and Analysis

Third jet pT ⊥

p b

b /d 4 CMS data

σ 10

d MC (pythia6) · b

) MC (fullstat)

jet MC (madpy) 3 3 b

≥ 10 MC (sherpa) + b Z ( σ b 1/ 2 10 b

b

b

1.4 1.2 1

MC/data 0.8 0.6 40 60 80 100 120 140 160 180 jet p [GeV] ⊥

Figure 4.3: The sub-subleading jet p⊥ distribution. The sample produced by Pythia 6 is shown in red, MadGraph inclusive set in blue, MadGraph exclusive set in green and Sherpa sample in magenta. The yellow band indicates the total errors (statistical and systematic) associated to the real data.

Fourth jet pT ⊥ p

b b /d CMS data σ 3 d 10 MC (pythia6) · b

) MC (fullstat)

jet MC (madpy) 4 b

≥ MC (sherpa)

+ b Z ( 10 2 σ

b 1/

b

b

10 1 1.4 1.2 1

MC/data 0.8 0.6 30 40 50 60 70 80 90 100 jet p [GeV] ⊥

Figure 4.4: The sub-sub-subleading jet p⊥ distribution. The sample produced by Pythia 6 is shown in red, MadGraph inclusive set in blue, MadGraph exclusive set in green and Sherpa sample in magenta. The yellow band indicates the total errors (statistical and systematic) associated to the real data. 4.3.3 Renormalization and factorization scales 71

Jet multiplicity

N 1 b

/d b

σ CMS data b d

· 1 MC (pythia6) 10− ) MC (fullstat) b jet

1 MC (madpy)

≥ 2 10− MC (sherpa) b + Z (

σ 3 b 10− 1/

b 4 10−

1.4 1.2 1

MC/data 0.8 0.6 1 2 3 4 5 6 N jets

Figure 4.5: The jet multiplicity distribution. The sample produced by Pythia 6 is shown in red, MadGraph inclusive set in blue, MadGraph exclusive set in green and Sherpa sample in magenta. The yellow band indicates the total errors (statistical and systematic) associated to the real data.

Leading jet p ⊥ ⊥ b p

b /d b CMS data σ

d MC (Z2) · 10 1 b ) − MC (D6T) b jet MC (P11) 1 b

≥ b

+ 2 b

Z 10− ( b

σ b b 1/

b

3 b 10 b − b

b

1.4 1.2 1

MC/data 0.8 0.6 50 100 150 200 250 300 jet p [GeV] ⊥

Figure 4.6: The leading jet p⊥ distribution. The sample produced by the use of Pythia 6 with Z2 tune is shown in red, D6T tune in blue and Perugia2011 tune in green. The yellow band indicates the total errors (statistical and systematic) associated to the real data. 72 Data - Monte Carlo Comparison and Analysis

Second jet p ⊥ ⊥

p 10 5 b b /d CMS data σ

d MC (Z2)

· b

) MC (D6T) 10 4 jet b MC (P11) 2 ≥ b + b Z 3 ( 10 σ b

1/ b

b

2 b 10 b

1.4 1.2 1

MC/data 0.8 0.6 50 100 150 200 250 300 jet p [GeV] ⊥

Figure 4.7: The subleading jet p⊥ distribution. The sample produced by the use of Pythia 6 with Z2 tune is shown in red, D6T tune in blue and Perugia2011 tune in green. The yellow band indicates the total errors (statistical and systematic) associated to the real data.

Third jet pT ⊥

p b

b /d 10 4 CMS data σ

d MC (Z2)

· b

) MC (D6T)

jet MC (P11) 3 3 b

≥ 10

+ b Z ( σ b

1/ 2 10 b

b

b

1.4 1.2 1

MC/data 0.8 0.6 40 60 80 100 120 140 160 180 jet p [GeV] ⊥

Figure 4.8: The sub-subleading jet p⊥ distribution. The sample produced by the use of Pythia 6 with Z2 tune is shown in red, D6T tune in blue and Perugia2011 tune in green. The yellow band indicates the total errors (statistical and systematic) associated to the real data. 4.3.3 Renormalization and factorization scales 73

Fourth jet pT ⊥ p b b /d CMS data σ 10 3 d MC (Z2)

· b

) MC (D6T) jet b MC (P11) 4 ≥

b +

Z 2

( 10

σ b 1/ b

b

10 1 1.4 1.2 1

MC/data 0.8 0.6 30 40 50 60 70 80 90 100 jet p [GeV] ⊥

Figure 4.9: The sub-sub-subleading jet p⊥ distribution. The sample produced by the use of Pythia 6 with Z2 tune is shown in red, D6T tune in blue and Perugia2011 tune in green. The yellow band indicates the total errors (statistical and systematic) associated to the real data.

Jet multiplicity

N 1 b

/d b

σ CMS data d b

· MC (Z2) 1 ) 10− MC (D6T) jet 1 b MC (P11) ≥

+ 2

Z 10−

( b σ 1/

3 b 10−

b

1.4 1.2 1

MC/data 0.8 0.6 1 2 3 4 5 6 N jets

Figure 4.10: The jet multiplicity distribution. The sample produced by the use of Pythia 6 with Z2 tune is shown in red, D6T tune in blue and Perugia2011 tune in green. The yellow band indicates the total errors (statistical and systematic) associated to the real data. 74 Data - Monte Carlo Comparison and Analysis uncertainties. Briefly recalling, the factorization scale compares in the de- pendency of the PDFs at NLO, while the renormalization scale serves as a cutoff scale for the αs under which the perturbation theory is not applicable. For Pythia only the Z2 tune was used. However, note here that the two scales need to be fixed both in Madgraph and Pythia separately. Sherpa on the other hand is used as a stand-alone generator and thus the scale variations are simpler to perform. In order to determine the uncertainties relative to these scales, a “standardized” procedure has been followed:

Events are generated in accordance to the central predictions; • The parameters values are multiplied by a factor 2, generating a new • set of events;

The parameters values are divided by a factor 2, generating one last • set of events.

In the present case, the factorization and renormalization scales in Mad- 2 Graph are factorized to the central m⊥ scale, multiplied by a factor alpsfact and scalefact respectively. In the two following configurations, the factors are multiplied or divided by 2. The same two scale factors should be modified in Pythia accordingly. How- ever, in Pythia the two parameters are factorized in a different way than in MadGraph with the constants named PARP(64) and PARP(72). Lastly, the parameters to be changed in sherpa are conveniently called factorization scale factor (Fs) and renormalization scale factor (Rs) respec- tively. Without going into too much detail, the values used for the six differ- ent sets of generated events (3 for Sherpa and 3 for MadGraph + Pythia) are presented in Tab. 4.1.

alpsfact scalefact PARP(64) PARP(72) Fs Rs default 1 1 1 0.25 1 1 scale up 2 2 4 0.125 4 4 scale down 0.5 0.5 0.25 0.5 0.25 0.25

Table 4.1: Values of the scale parameters variations. The names of the vari- ations (up / down) reflect the changing of the constants in MadGraph, while Pythia adopts a different logic.

It is important to notice here that for MadGraph + Pythia the value of the Qcut has to be modified for each choice of the aforementioned parameters, so that the DJR graphs are smooth as discussed in Sec. 3.7. Finally, the resulting histograms relative to the three different generator setups (with the varying scale factors) are presented in Fig. 4.11, 4.12, 4.13, 4.14 and 4.15, where in each figure on the top histogram the distribution is shown and on the bottom the ratio between data and MC. For all three 4.3.4 Parton density function choice 75

Leading jet p ⊥ ⊥ b p

b /d b CMS data σ

d MC (madpy-orig) · 10 1 b ) − MC (madpy-down) b jet MC (madpy-up) 1 b

≥ b

+ 2 b

Z 10−

( b

σ b b 1/

b

3 b 10 b − b

b

1.4 1.2 1

MC/data 0.8 0.6 50 100 150 200 250 300 jet p [GeV] ⊥

Figure 4.11: The leading jet p⊥ distribution. The sample produced by the central pre- diction scales is shown in red, while the resluts obtained by varying the two scales in Madgraph by a factor 0.5 (down) or 2 (up) are shown in blue and green, respectively. The yellow band indicates the total errors (statistical and systematic) associated to the real data. productions, the histograms were obtained with 5 fb−1 of MadGraph + Pythia generated data for the 0, 1, 2 jet events, while 10 fb−1 for the 3 and 4 jets. The histograms relative to the Sherpa event generation on the other hand contain 1.2 fb−1 of data and are presented in Fig. 4.16, 4.17, 4.18, 4.19 and 4.20. The scale variations represent the uncertainty due to the particular scales choice. It can be observed that for both sets of generated data, MadGraph + Pythia and Sherpa, the data is well contained in the band defined by the scale variations. In other words, the MC in question reproduce well the real data within the systematic uncertainties, with Sherpa performing a little better in the distribution of the fourth leading jet.

4.3.4 Parton density function choice

This last test was done in order to study the effect of the different PDFs con- veniently stored in the LHAPDF package [34]. For this test only Sherpa will be considered, with the original PDF, called “CT10”, and two addi- tional ones, labelled as “MSTW2008lo68cl” and “NNPDF21 100”. The par- ton density functions in Sherpa mainly influence the structure of the ISR. The results for the generated events are presented in Fig. 4.21, 4.22, 4.23, 4.24 and 4.25. Similar to the factorization and renormalization scales, the different para- 76 Data - Monte Carlo Comparison and Analysis

Second jet p ⊥ ⊥ p 10 5 b b /d CMS data σ

d MC (madpy-orig)

· b

) MC (madpy-down) 4 jet 10 MC (madpy-up) 2 b ≥

b +

Z b

( 3

σ 10 b 1/ b

b 10 2 b b

1.4 1.2 1

MC/data 0.8 0.6 50 100 150 200 250 300 jet p [GeV] ⊥

Figure 4.12: The subleading jet p⊥ distribution. The sample produced by the central prediction scales is shown in red, while the results obtained by varying the two scales in Madgraph by a factor 0.5 (down) or 2 (up) are shown in blue and green, respectively. The yellow band indicates the total errors (statistical and systematic) associated to the real data.

Third jet pT ⊥

p b

b /d 10 4 CMS data σ

d MC (madpy-orig)

· b

) MC (madpy-down)

jet MC (madpy-up) 3 3 b

≥ 10

+ b Z ( σ b

1/ 2 10 b

b

b

1.4 1.2 1

MC/data 0.8 0.6 40 60 80 100 120 140 160 180 jet p [GeV] ⊥

Figure 4.13: The sub-subleading jet p⊥ distribution. The sample produced by the central prediction scales is shown in red, while the results obtained by varying the two scales in Madgraph by a factor 0.5 (down) or 2 (up) are shown in blue and green, respectively. The yellow band indicates the total errors (statistical and systematic) associated to the real data. 4.3.4 Parton density function choice 77

Fourth jet pT ⊥ p b b /d CMS data σ 10 3 d MC (madpy-orig)

· b

) MC (madpy-down) jet b MC (madpy-up) 4 ≥

b +

Z 2

( 10

σ b

1/ b

b

10 1 1.4 1.2 1

MC/data 0.8 0.6 30 40 50 60 70 80 90 100 jet p [GeV] ⊥

Figure 4.14: The sub-sub-subleading jet p⊥ distribution. The sample produced by the central prediction scales is shown in red, while the results obtained by varying the two scales in Madgraph by a factor 0.5 (down) or 2 (up) are shown in blue and green, respec- tively. The yellow band indicates the total errors (statistical and systematic) associated to the real data.

Jet multiplicity

N 1 b

/d b

σ CMS data

d b

· MC (madpy-orig) 1 ) 10− MC (madpy-down) jet

1 b MC (madpy-up) ≥ 2 + 10−

Z b ( σ 1/ 3 b 10−

b 4 10− 1.4 1.2 1

MC/data 0.8 0.6 1 2 3 4 5 6 N jets

Figure 4.15: The jet multiplicity distribution. The sample produced by the central prediction scales is shown in red, while the results obtained by varying the two scales in Madgraph by a factor 0.5 (down) or 2 (up) are shown in blue and green, respectively. The yellow band indicates the total errors (statistical and systematic) associated to the real data. 78 Data - Monte Carlo Comparison and Analysis

Leading jet p ⊥ ⊥ b p

b /d b CMS data σ

d MC (sherpa-orig) · 10 1 b ) − MC (sherpa-down) b jet MC (sherpa-up) 1 b

≥ b

+ 2 b Z 10− ( b σ b b 1/

b

3 b 10 b − b

b

1.4 1.2 1

MC/data 0.8 0.6 50 100 150 200 250 300 jet p [GeV] ⊥

Figure 4.16: The leading jet p⊥ distribution. The sample produced by the central predic- tion scales is shown in red, while the results obtained by varying the two scales in Sherpa by a factor 0.25 (down) or 4 (up) are shown in blue and green, respectively. The yellow band indicates the total errors (statistical and systematic) associated to the real data.

Second jet p ⊥ ⊥ p 10 5 b b /d CMS data σ

d MC (sherpa-orig)

· b

) MC (sherpa-down) 10 4 jet MC (sherpa-up)

2 b ≥ b + b Z ( 10 3 σ b 1/ b

b 10 2 b b

1.4 1.2 1

MC/data 0.8 0.6 50 100 150 200 250 300 jet p [GeV] ⊥

Figure 4.17: The subleading jet p⊥ distribution. The sample produced by the central prediction scales is shown in red, while the results obtained by varying the two scales in Sherpa by a factor 0.25 (down) or 4 (up) are shown in blue and green, respectively. The yellow band indicates the total errors (statistical and systematic) associated to the real data. 4.3.4 Parton density function choice 79

Third jet pT ⊥

p b

b /d 10 4 CMS data σ

d MC (sherpa-orig) · b

) MC (sherpa-down)

jet MC (sherpa-up) 3 3 b

≥ 10 + b Z ( σ b 1/ 2 10 b

b

b

1.4 1.2 1

MC/data 0.8 0.6 40 60 80 100 120 140 160 180 jet p [GeV] ⊥

Figure 4.18: The sub-subleading jet p⊥ distribution. The sample produced by the central prediction scales is shown in red, while the results obtained by varying the two scales in Sherpa by a factor 0.25 (down) or 4 (up) are shown in blue and green, respectively. The yellow band indicates the total errors (statistical and systematic) associated to the real data.

Fourth jet pT ⊥ p b b /d CMS data σ 10 3 d MC (sherpa-orig) b ·

) MC (sherpa-down)

jet b MC (sherpa-up) 4

≥ b

+ 10 2 Z

( b σ b 1/ b

10 1

1.4 1.2 1

MC/data 0.8 0.6 30 40 50 60 70 80 90 100 jet p [GeV] ⊥

Figure 4.19: The sub-sub-subleading jet p⊥ distribution. The sample produced by the central prediction scales is shown in red, while the results obtained by varying the two scales in Sherpa by a factor 0.25 (down) or 4 (up) are shown in blue and green, respec- tively. The yellow band indicates the total errors (statistical and systematic) associated to the real data. 80 Data - Monte Carlo Comparison and Analysis

Jet multiplicity

N 1 b

/d b

σ CMS data d b

· MC (sherpa-orig) 1 ) 10− MC (sherpa-down) jet

1 b MC (sherpa-up) ≥

+ 2 10− Z b ( σ 1/ 3 b 10−

b 4 10− 1.4 1.2 1

MC/data 0.8 0.6 1 2 3 4 5 6 N jets

Figure 4.20: The jet multiplicity distribution. The sample produced by the central prediction scales is shown in red, while the results obtained by varying the two scales in Sherpa by a factor 0.25 (down) or 4 (up) are shown in blue and green, respectively. The yellow band indicates the total errors (statistical and systematic) associated to the real data. metrizations of the PDFs represent the uncertainty due to the particular PDF choice. The three different PDFs considered then comprise an uncer- tainty band, which is in excellent agreement with data. 4.3.4 Parton density function choice 81

Leading jet p ⊥ ⊥ b p

b /d b CMS data σ

d MC (pdfct) · 1 b

) 10− MC (pdfmstw) b jet MC (pdfnn) 1 b

≥ b

+ 2 b Z 10− ( b σ b b 1/

b

3 b 10 b − b

b

1.4 1.2 1

MC/data 0.8 0.6 50 100 150 200 250 300 jet p [GeV] ⊥

Figure 4.21: The leading jet p⊥ distribution. The sample produced by the use of the PDF CT10 is shown in red, MSTW2008 in blue and NNPDF21 in green. The yellow band indicates the total errors (statistical and systematic) associated to the real data.

Second jet p ⊥ ⊥ p 10 5 b b /d CMS data σ

d MC (pdfct)

· b

) MC (pdfmstw) 4 jet 10 MC (pdfnn) 2 b ≥

b +

Z b ( 3

σ 10 b 1/

b

b 10 2 b b

1.4 1.2 1

MC/data 0.8 0.6 50 100 150 200 250 300 jet p [GeV] ⊥

Figure 4.22: The subleading jet p⊥ distribution. The sample produced by the use of the PDF CT10 is shown in red, MSTW2008 in blue and NNPDF21 in green. The yellow band indicates the total errors (statistical and systematic) associated to the real data. 82 Data - Monte Carlo Comparison and Analysis

Third jet pT ⊥

p b

b /d 10 4 CMS data σ

d MC (pdfct)

· b

) MC (pdfmstw)

jet MC (pdfnn) 3 3 b

≥ 10 + b Z ( σ b 1/ 2 10 b

b

b

1.4 1.2 1

MC/data 0.8 0.6 40 60 80 100 120 140 160 180 jet p [GeV] ⊥

Figure 4.23: The sub-subleading jet p⊥ distribution. The sample produced by the use of the PDF CT10 is shown in red, MSTW2008 in blue and NNPDF21 in green. The yellow band indicates the total errors (statistical and systematic) associated to the real data.

Fourth jet pT ⊥ p

b b /d CMS data

σ 3 d 10 MC (pdfct) · b

) MC (pdfmstw)

jet MC (pdfnn) 4 b ≥

+ b

Z 2 ( 10 σ b 1/

b

b

10 1 1.4 1.2 1

MC/data 0.8 0.6 30 40 50 60 70 80 90 100 jet p [GeV] ⊥

Figure 4.24: The sub-sub-subleading jet p⊥ distribution. The sample produced by the use of the PDF CT10 is shown in red, MSTW2008 in blue and NNPDF21 in green. The yellow band indicates the total errors (statistical and systematic) associated to the real data. 4.3.4 Parton density function choice 83

Jet multiplicity

N 1 b

/d b

σ CMS data d b

· MC (pdfct) 1 ) 10− MC (pdfmstw) jet 1 b MC (pdfnn) ≥

+ 2

Z 10−

( b σ 1/ 3 b 10−

b 4 10− 1.4 1.2 1

MC/data 0.8 0.6 1 2 3 4 5 6 N jets

Figure 4.25: The jet multiplicity distribution. The sample produced by the use of the PDF CT10 is shown in red, MSTW2008 in blue and NNPDF21 in green. The yellow band indicates the total errors (statistical and systematic) associated to the real data.

Conclusions

This thesis is focused on the study of Monte Carlo generators. For the present work, the associated production of a Z0 boson with jets was considered with the relative observables. In particular, the distribution of the jet multiplicity and the distributions of the transverse momentum of the four jets with the highest p⊥. Three different Monte Carlo generators were considered: Pythia, Mad- graph and Sherpa. The study was performed by varying the key param- eters on which the said Monte Carlo generators are based and successively considering the effect produced on the given observables. Notably, the pa- rameters defining the renormalization and factorization scales were studied, and the choice of the parton density function describing the internal struc- ture of the incoming protons. From the presented results it is evident that Pythia 6 is not able to repro- duce the hard part of the event. In fact, for increasing p⊥ the agreement between the Pythia predictions and real data becomes worse. In the following tests it was found that the observables obtained by Mad- Graph + Pythia generated events are not significantly dependant on a particular Pythia tune. This was expected since the hard part of the event is evaluated by MadGraph which is maintained constant throughout dif- ferent Pythia settings. This means that the different Pythia tunes modify the observables relative to soft interactions which contribute to the distribu- tions shape less than the hard one. Nevertheless it can be observed that both the “Z2” tune (currently default for CMS) and “P11” are in good agreement with the data, while “D6T” which was used at Tevatron, performs worse. The MadGraph + Pythia combination and Sherpa as a standalone pro- gram are both in good agreement with data. Considering additionally the uncertainties provided by the factorization and renormalization scales vari- ations, the generated events are in excellent agreement with data for the jet multiplicity distribution and the first three jet p⊥ distributions, while the problems arise with the fourth jet p⊥. Even more so for Sherpa which falls short only in the two highest p⊥ bins in the fourth leading jet p⊥ distribu- tion. Moreover, the uncertainty band provided by the different PDF choices in Sherpa additionally confirms the agreement between the real and gener- ated data. 86 Conclusions

During this studies it was found out that a PS-based generator cannot re- produce well the data. Instead, a ME + PS approach (at LO) reaches a much better agreement. Within the generated statistics it is not possible to determine whether the best choice of a MC generator is MadGraph + Pythia or Sherpa. These simulations can be used to successively estimate the background to processes of the SM or beyond. For example the Z + hadronic jets analysis constitutes a background for H b¯b, or studies on missing energy, impor- → tant for several searches of new physics. Acknowledgements

I’d like to thank the kind and sarcastic (greatly appreciated) prof. Nello Paver, who held the two fundamental courses for particle physics giving us “experimentalists” an excellent understanding needed for further studies. Additionally he as my counter-menthor substantially helped by correcting the thesis over and again.

Next I’d like to thank my parents for all the support, both financial and moral, without which I would not be able to achieve what I did.

Then my brother, both as a good friend and someone reliable in my life. We might not agree on everything, but still understand each other.

An enormous thank you goes of course to my Ana. Her support was un- wavering throughout everything I did. And she is constantly helping me in every way she can, shaping me as the person I am today.

I’d further like to thank all my friends from the department of physics. In particular Giuseppe, Christian and Nicola, for we survived countless lab hours together. Also all the friends from the CMS Trieste group, Chiara, Andrea, Vieri, Damiana, Matteo and Massimo, who both helped me with my work, and are fun and interesting people to hang around with.

Next, a big thanks to all my good Slovenian friends, Jan, Martin, Gregor, Grega, Remi, Nejc (ho ho), Irina, PrimoˇzJ., PrimoˇzB., it was fun and legen ... wait for it ... dary, legendary and will continue to be.

And lastly, thanks to all the CERN summer students of 2011, you’ve ALL been an awesome lot to hang out with. In particular all the board gamers (and movie watchers), Kaj, Dainel, Silvia, Andrew, Rene, Markus, Sophie and that frakking Cylon Harry. To the airlock with him I say.

Bibliography

[1] L. Evans and P. Bryant, LHC Machine, JINST, 3:S08001, 2008.

[2] CERN Hadron Linacs, http://linac2.home.cern.ch/linac2/default.htm.

[3] A. D. Martin, R. G. Roberts, W. J. Stirling, and R. S. Thorne, Parton distributions and the LHC: W and Z production, Eur.Phys.J., C14:133145, 2000.

[4] CMS Collaboration, The CMS experiment at the CERN LHC, JINST, 3:S08004, 2008.

[5] CMS Collaboration, CMS Tracker Technical Design Report, (CERN- LHCC-98-006), 1998.

[6] CMS Collaboration, The CMS Tracker: addendum to the Technical De- sign Report, (CERN-LHCC-2000-016), 2000.

[7] P. Azzurri, The CMS Silicon Strip Tracker, Journal of Physics: Confer- ence Series, JINST, 41(1):127, 2006.

[8] E. Widl, R. Fr¨uhwirth, and W. Adam, A Kalman Filter for Track-based Alignment, (CMS-NOTE-2006-022), 2006.

[9] S. Cucciarelli, Track and Vertex Reconstruction with the CMS Detector at LHC, (CMS-CR-2005-021), 2005.

[10] CMS Collaboration, Performance and operation of the CMS electro- magnetic calorimeter, Journal of Instrumentation, 5(03):T03010, 2010.

[11] P. Adzic et al, Energy Resolution of the Barrel of the CMS Electromag- netic Calorimeter, (CMS-NOTE-2006-148), 2006.

[12] CMS Collaboration, Time Reconstruction and Performance of the CMS Electromagnetic Calorimeter, J. Instrum., 5(arXiv:0911.4044. CMS- CFT-09-006):T03011. 27 p, 2009.

[13] S. Dasu et al, CMS, The TriDAS project, Technical design report, vol. 1: The trigger systems, (CERN-LHCC-2000-038), 2000. 90 Bibliography

[14] S. Cittolin, A. R´acz,and P. Sphicas, CMS trigger and data-acquisition project: Technical Design Report, (CERN-LHCC-2002-026), 2002.

[15] F. Halzen and A. D. Martin, Quarks and leptons: an introductory course in modern particle physics, John Wiley and Sons, 1984.

[16] R. K. Ellis, W. J. Stirling, and B. R. Webber, QCD and collider physics, Cambridge University Press, 2003.

[17] J.M. Campbell, J.W. Huston and W.J. Stirling, Hard Interac- tions of Quarks and Gluons: a Primer for LHC Physics, [arXiv:hep- ph/0611148v1].

[18] G. Dissertori, I.G. Knowles and Michael Schmelling, Quantum Chromo- dynamics: High Energy Experiments and Theory (International Series of Monographs on Physics), Oxford University Press, 2009.

[19] F.A. Berends, H. Kuijf, B. Tausk, and W.T. Giele, On the production of a W and jets at hadron colliders, Nuclear Physics B, 357(1):32 64, 1991.

[20] L. Cerrito, Measurements of vector boson plus jets at the Tevatron (FERMILAB-CONF-10-311-E), 2010.

[21] The CMS collaboration, Jet Production Rates in Association with W and Z Bosons in pp Collisions at sqrt(s) = 7 TeV, [arXiv:1110.3226v1].

[22] M. Cacciari, G.P. Salam, and G. Soyez, FastJet user manual 3.0.1, [arXiv:1111.6097v1]

[23] T. Sj¨ostrand, Monte Carlo Generators, [arXiv:hep-ph/0611247v1].

[24] A. Buckley, J. Butterworth, S. Gieseke, D. Grellscheid, S. H¨oche, H. Hoeth, F. Krauss, L. L¨onnblad, E. Nurse, P. Richardson, S. Schumann, M.H. Seymour, T. Sj¨ostrand,P. Skands and B. Webber, General-purpose event generators for LHC physics, [arXiv:1101.2599v1].

[25] Introduction to Geant 4 http://geant4.web.cern.ch/geant4/UserDocumentation/ Welcome/IntroductionToGeant4/html/index.html

[26] V.N. Gribov and L.N. Lipatov, Sov. J. Nucl. Phys. 15 (1972) 438, ibid. 75; Yu. L. Dokshitzer, Sov. J. Phys. JETP 46 (1977) 641; G. Altarelli and G. Parisi, Nucl. Phys. B126 (1977) 298.

[27] V.V. Sudakov, Zh.E.T.F. 30 (1956) 87 Sov. Phys. J.E.T.P. 30 (1956) 65.

[28] T. Sj¨ostrand, S. Mrenna and P. Skands, Pythia 6.4 Physics and Man- ual, [arXiv:hep-ph/0603175].

[29] http://lcgapp.cern.ch/project/simu/HepMC/ Bibliography 91

[30] T. Stelzer, W.F. Long, Automatic Generation of Tree Level Helicity Amplitudes, [arXiv:hep-ph/9401258v1].

[31] T. Gleisberg and S. Hoche and F. Krauss and M. Schonherr and S. Schumann and F. Siegert and J. Winter, Event generation with Sherpa 1.1, JHEP 02 (2009) 007 [arXiv:0811.4622].

[32] http://blackhat.hepforge.org/

[33] Sherpa 1.4.1 online manual http://sherpa.hepforge.org/doc/SHERPA-MC-1.4.1.html

[34] M. R. Whalley, D. Bourilkov and R. C. Group, The Les Houches Accord PDFs (LHAPDF) and LHAGLUE, [arXiv:hep-ph/0508110].

[35] S. Catani, S. Dittmaier, M.H. Seymour, Z. Trocsanyi, The dipole for- malism for next-to-leading order QCD calculations with massive partons, Nucl. Phys. B627 (2002) 189-265. [arXiv:hep-ph/0201036].

[36] T.D. Gottschalk, An improved description of hadronization in the QCD cluster model for e+e− annihilation, Nucl. Phys. B239 (1984) 349.

[37] T.D. Gottschalk and D. A. Morris, A new model for hadronization and e+e− annihilation, Nucl. Phys. B288 (1987) 729.

[38] B.R. Webber, A QCD model for jet fragmentation including soft gluon interference, Nucl. Phys. B238 (1984) 492.

[39] V. Candelise et al. Study of the associated production of a Z boson and jets in p-p collisions at √s = 7 TeV, CMS-AN-11-451.

[40] A. Buckley, J. Butterworth, L. L¨onnblad, H. Hoeth, J. Monk, H. Schulz, J. Eike von Seggern, F. Siegert and L. Sonnenschein, Rivet user manual v. 1.6.0, [arXiv:1003.0694v6].

[41] http://pythia6.hepforge.org/