Arxiv:2010.09206V5 [Hep-Ex] 22 Mar 2021 Large Backgrounds, Is Comparatively Under-Explored
Total Page:16
File Type:pdf, Size:1020Kb
Permutationless Many-Jet Event Reconstruction with Symmetry Preserving Attention Networks Michael James Fenton ∗†,1 Alexander Shmakov ∗,2 Ta-Wei Ho,3 Shih-Chieh Hsu,4 Daniel Whiteson,1 and Pierre Baldi2 1Department of Physics and Astronomy, University of California Irvine 2Department of Computer Science, University of California Irvine 3Department of Physics, National Tsing Hua University, Taiwan 4Department of Physics and Astronomy, University of Washington Top quarks, produced in large numbers at the Large Hadron Collider, have a complex detector signature and require special reconstruction techniques. The most common decay mode, the \all-jet" channel, results in a 6-jet final-state which is particularly difficult to reconstruct in pp collisions due to the large number of permutations possible. We present a novel approach to this class of problem, based on neural networks using a generalized attention mechanism, that we call Symmetry Preserving Attention Networks (Spa-Net). We train one such network to identify the decay products of each top quark unambiguously and without combinatorial explosion as an example of the power of this technique. This approach significantly outperforms existing state-of-the-art methods, correctly assigning all jets in 80.7% of 6-jet, 66.8% of 7-jet, and 52.3% of ≥ 8-jet events respectively. Our code is available at https://github.com/Alexanders101/SPA Net Code Release INTRODUCTION these events is correctly determining which jets originate from each of the parent top quarks. At the Large Hadron Collider (LHC), protons are col- lided at the highest energy ever produced in the labora- The top quark, as the most massive fundamental par- tory. Most of these collisions produce a high multiplicity ticle in the Standard Model, is the only quark to decay of jets; collimated sprays of particles that originate from before hadronisation. This presents a unique opportu- the strongly-coupled quarks and gluons inside the pro- nity to study an isolated quark - if its decay products can ton. Final-states containing only jets occur through a be correctly identified. Top quarks decay via t W b, ! number of physical processes, and the copious produc- and events are categorized by the decay modes of the tion of these \all-jet" topologies presents opportunities W bosons: dileptonic (9%), single-lepton (45%) or all- for precision measurements [1,2], and searches for rare jets (46%) [8]. To date, the most precise measurements Standard Model [3] or new physics [4,5] processes, but of top quark properties are typically performed in the also raises particular challenges. Specifically, it is typi- single-lepton or dilepton channels[9]. The all-jet chan- cally difficult to connect an observed jet with its quark nel, held back by the ambiguous event reconstruction and origin, and the factorial dependence on the number of arXiv:2010.09206v5 [hep-ex] 22 Mar 2021 large backgrounds, is comparatively under-explored. jets leads to the so-called \combinatorial explosion". For example, top quark pair production has a 6 jet final- ≥ In this letter, we propose a novel architecture for as- state in the so-called \resolved" regime in which each signment of particle origin to jets, Symmetry Preserv- of the three decay products produced from each top are ing Attention NETworks (Spa-Net). Applying attention reconstructed as a single jet. In some events this can networks that naturally reflect the permutation symme- be mitigated using \boosted" reconstruction [6], though try of the task, Spa-Net significantly outperforms exist- this is limited to a small subset of events [7]. Thus, the ing state-of-the-art techniques while avoiding combinato- biggest obstacle to extracting physical information from rial explosion. In the following, we define the nature of the jet assignment task, describe invariance and attention mechanisms in neural networks, describe the dataset and * These authors contributed equally. training, and demonstrate the performance of our tech- y [email protected] nique relative to the state-of-the-art. 2 b masses2: q 2 2 2 t (mbqq mb0q0q0 ) (mqq mW ) (mq0q0 mW ) χ2 = − + − + − W q σ2 σ2 σ2 ∆mbqq W W q0 (1) W 0 t0 In our dataset (described below), we find mW = q0 81:3 GeV, σW = 12:3 GeV and σ∆mbjj = 26:3 GeV. b0 χ2 is evaluated for every permutation, and the minimum FIG. 1: Production diagram of a top quark pair, tt0, value is chosen. This method typically uses b-tagging and their decays into qqb and q0q0b0, respectively. to consider only permutations with b-jets in the place of b-quarks. This reduces the permutations but prevents the correct solution being found in the presence of mis- tagged jets. We apply this requirement in our studies for JET ASSIGNMENT consistency with recent experimental results - this imple- mentation has been the preferred reconstruction for AT- The jet assignment task is the identification of the orig- LAS [1, 10], while CMS have used a similar method [11]. inal particle which leads to a reconstructed jet. In a colli- sion which produces N jets, there are N! possible assign- Another approach is the use of Boosted Decision Trees ments. Fortunately, symmetries can reduce this number. or Neural Networks as permutation classifiers, though these have mostly been employed in the leptonic channels Top quarks decay via the chain t W b qqb, sup- where combinatoric explosion is reduced [12, 13], or to ! ! pressing charge labels. A pair of top quarks, tt0, therefore reconstruct tops individually [14]. CMS has also used a 2 produce six quarks, qqbq0q0b0. This process is shown in hybrid χ and BDT method in the all-jet channel [15]. Figure1. The task is to correctly identify six observed A more advanced minimisation technique is imple- jets with six labels: b, b0, 2 q, and 2 q0. The symmetries × × mented in KLFitter [16]. Transfer functions are used between the two tops and between the decay products of to represent the detector response in a likelihood func- the W -bosons reduce this to 6!=(2 2 2) = 90 permuta- × × tion per permutation, which is minimised to find the as- tions. Further complicating things, 50% of tt¯ events at 2 ∼ signments. This operates similarly to the χ technique, the LHC are expected to contain at least one additional though has greater CPU requirements due to the more jet which is not the result of top quark decay, leading complex model. KLFitter has to date been almost exclu- to 7!=(2 2 2) = 630 or 8!=(2 2 2 2) = 2520 sively used in the single lepton channel, and with the χ2 × ×1 × × × permutations in 7- or 8-jet events, respectively. Higher already CPU limited in many cases, it is not discussed jet multiplicity events are also likely; nonetheless, the here. current state-of-the-art techniques depend on enumerat- ing and evaluating each permutation to identify the best candidate. The many incorrect assignments obscure the INVARIANCE AND ATTENTION IN DEEP true assignment, diluting the scientific power of the data, NEURAL NETWORKS and represents a significant computational penalty when all permutations are evaluated for every event. Some- Equivariance and invariance properties can play an im- times, each permutation must be evaluated per system- portant role in the design of both feed-forward and re- atic uncertainty per event, which is often intractable in cursive neural networks across different forms of learn- the datasets typical in high-energy-physics (HEP). ing [17{22]. For instance, classical convolution neu- ral networks can produce object recognition outputs The most common technique is a χ2-minimisation that are invariant with respect to translations in their method, which scores a permutation based on the consis- two-dimensional inputs, and this invariance property tency of the reconstructed W -boson masses with known has been generalized to apply to other manifolds and values and similarity of the two reconstructed top quark 2 2 Explicitly using mtop in the χ equation improves reconstruction efficiency by a few percent at the expense of dramatically sculpt- ing the top mass distribution. We chose to make comparisons to 1 The additional factor of 2 is the invariance of the extraneous jets. the definition used in recent experimental results. 3 groups [23, 24]. In the problem considered here, the network output should be invariant under permutations of the input jet order. Such permutation invariance has been explored in set-based [25, 26] and graph net- works [27]. The output should further identify two dis- tinct interchangeable triplets, qqb and q0q0b0, each includ- ing an interchangeable pair, qq or q0q0. This permuta- tion invariance on the output is a unique property of our dataset which our architecture must account for. Attention mechanisms allow the network to selectively propagate information (gating) - the activities of a set of neurons are multiplied, component-wise, by the activities of another set of neurons. These gating mechanisms allow FIG. 2: High-level structure of Spa-Net. neural networks to dynamically modulate neuron activ- ity as a function of the other neurons or inputs. Recur- sive and attention-based networks, which allow the net- b-jets. work to infer relationships between different elements in The supervised learning technique employed here re- a sequence, have achieved state-of-the-art performance in quires a training sample in which the correct assignments natural language processing in machine translation [28], are identified. We define the correct jet assignments by language understanding [29], and text generation [30]. matching them to the simulated truth quarks within an p Attention architectures are permutation invariant be- angular distance of ∆R = ∆η2 + ∆φ2 < 0:4. Requir- cause rearranging the order of the elements in the input ing all six quarks to be unambiguously matched leaves sequence induces the same rearrangement in the atten- 2,967,955 total events for training, and 119,283 for per- s formance evaluation.