Particle Identification

Particle Identiﬁcation

Kanglin He, Gang Qin, Bin Huang, Jifeng Hu, Xiaobin Ji, Yajun Mao

email: [email protected] November 9, 2006

Contents

1 Introduction 2

2 The PID system of BESIII 2 2.1 The dE/dx measurements ...... 2 2.2 The TOF counter ...... 3 2.3 The CsI(Tl) Calorimeter ...... 3 2.4 The muon system ...... 5

3 The Correlated Analysis in TOF PID 5 3.1 General algorithm ...... 5 3.2 Errors and correlations of TOF measurements ...... 6 3.3 Combining the time-of-ﬂight from two layers’ measurements ...... 7

4 Control sample 7 4.1 Hadron Sample ...... 7 4.2 Electron Sample ...... 7 4.3 Muon Sample ...... 8

5 The Likelihood Method 9 5.1 Probability Density Functions ...... 9 5.2 Likelihood ...... 10 5.3 Consistency ...... 10 5.4 Weighted Likelihood ...... 12 5.5 Using likelihood, consistencies, and probabilities ...... 12 5.6 An example of TOF and dE/dx PID ...... 13 5.7 Cell analysis ...... 14 5.8 The role of neural nets ...... 15

6 The Toolkit for Multiple Variables Analysis 15

1 7 The Artiﬁcial Neural Network Method 15 7.1 The TMVA Algorithm Factories ...... 16

1 Introduction

Particle identiﬁcation (PID)will play an essential role in most of BESIII physics program.

Good µ/π separation is required in precise fD/fDs measurements. Excellent electron ID will help to improve the precision of CKM elements Vcs and Vcd. The identification of hadron (π/K/p) particles is the most common tool in BESIII physics analysis, sometimes it’s the most crucial tool to the analysis. For example, in searching for D0 − D0 mixing and CP violation. Each part of BESIII detector executes its own functions and provides a vast amount information which determines the final efficacy of particle identification. Particle identification should discriminate correctly and absolutely between particle species. In practice, physicists are imperfect, detectors are imperfect, backgrounds are present, and particles decay and interact as they traverse detector. In general, the detector responses are dependent of the incident angle and the hit position of charged track. Such non-uniformity in different geometry and physics region of detector have been carefully calibrated[?]. The particle identification assignment made for any particular track is often not cor- rect. However, by properly using all information available, one can discriminate between hypotheses most powerfully and can test for consistency between the data and the selected hypotheses. In recent years, a couple of PID algorithms has been developed: the likelihood method; the Fisher discriminator; the H-Matrix estimator; the Artificial Neural Network; and the Boosted Decision Tree, etc.

2 The PID system of BESIII

The BESIII detector consists of a Berylium beam pipe, a helium-based small-celled drift chamber,Time-OF-Flight (TOF) counters for particle identification, a CsI(Tl) crystal calorimeter, a super-conducting solenoidal magnet with a field of 1 Tesla, and a muon identifier using the magnet yoke interleaved with Resistive Plate Counters(RPC).

2.1 The dE/dx measurements The Main Drift Chamber (MDC) measures the drift times and the energy losses (dE/dx) of charged particle while it pass through the working gas. It consists of 43 layers of sensitive wires and works with a 60%/40% He/C3H8 gas mixture. The momentum of particle will be obtained by fitting a helical curves to a set of position coordinates which are provided by the drift time measurements. The energy loss in drift chamber can provide additional information on particle identification. Figure. 1 shows the normalized pulse heights varies with the momentum and particle species. The normalized pulse height is proportional to the energy lossp of incident particles in the drift chamber, which is a function of βγ = p/m, where γ = 1/ 1 − β2, p and m are the momentum and mass of charged particle. Charged particles of different mass will have different velocity at the same momentum, so together

2 1.2

1 p

0.8 K

0.6

0.4 π

0.2

0.20.4 0.60.8 11.2 1.4

Figure 1: Normalized pulse heights (dE/dx) vs. momentum of charged particles. with the momentum measurement, the dE/dx can give the mass information of the particle.

There are a lot of factors which affect the dE/dx measurements: the number of hits; the average pass lengthes in each cell; the space charge and saturation effects; the non-uniformity of electric fields, etc. Most of them are related to the incident angle and momentum of charged particles.

2.2 The TOF counter

Out-side the MDC is the TOF system, which is crucial for particle identification.It consists of a two layer barrel array of 88 50mm × 60mm × 2320mm BC480 scintillators in each layer and endcap arrays of 48 fan shaped BC404 scintillators. Hamamatsu R5942 fine mesh phototubes will be used-two on each barrel scintillator and one on each endcap scintillator. Expected time resolution for kaon and pion and for two layers is 100-110ps, giving a 2σ K/π separation up to 0.9Gev/c for normal tracks. In one e+e− collision, all produced particle fly from the interaction point(IP) toward to the outer detector.p The physics goal of TOF system is to measure the flight time t = L/βc, β = p/ p2 + m2 for charged particle identification, where c is the velocity of light, m is the mass of charged particle, β is the flight velocity of charged particle. L and p are the flight path and the momentum of charged particle gived by the MDC measurements. Usually, there are two equivalent ways to use the TOF information: comparing the measured time tmea against the predicted time texp, look for the most close to zero of ∆t = tmea − texp; calculating the measured mass of charged particle through

2 L 2 2 1 − β β = , m = p × 2 . (2.1) c × tmea β

A typical mass square distribution calculated by Eq.(2.1) is drawn in Figure 2.

The PID ability relies on the time resolution (σt) of the TOF system. The σt depends on the pulse height, hit position and the beam status. The performances of the scintillator, PMT and electronics are diﬀerent, usually the value of σt varies in diﬀerent TOF counter.

3 1000 π

800

600

400 K 200 p

0 -0.4 -0.2 00.2 0.40.6 0.811.21.4 m 2

Figure 2: Mass square distribution from TOF measurements.

2.3 The CsI(Tl) Calorimeter The CsI(Tl) crystal electromagnetic calorimeter (EMC) contains 6240 crystals, is used to measure the energy of photons precisely. The expected energy and spatial resolutions at 1 Gev are 2.5% and 0.6cm, respectively. The electromagnetic shower characteristic are diﬀerent for electrons, muons and hadrons, the energy deposit and the shape of shower in calorimetry can be used to identify particles. The energy loss by exciting and/or ionizing per unit length is given by the dE/dx, and is essentially the same for all energetic particle. In the CsI(Tl) crystal, the energy loss is approximately 5.63MeV/cm for minimum ionization particle (MIP). The energy deposit by ionization is about 0.165GeV for charged particles passing at normal incidence through the EMC. Since electrons and positrons produce electromagnetic showers as they pass through a calorimeter, their energy loss will be dominated by pair-production and Bremsstrahlung, even though there will be some energy loss by ionizing/exciting atomic electrons. They therefore lose all of their energy in the calorimeter, and the ratios of deposit energy to the track momentum (E/p) will be approximately unity. Sometimes the energy deposit of hadrons will have an E/p ratios higher than that of expected by dE/dx due to the nuclear interaction with materials. The energy loss of muons will be governed by dE/dx only, so the E/p ratio for muons will be smaller and narrower. Figure 3(a) shows the energy deposit vs momentum of electron, pion and muon in EMC. Generally, we expect

(E/p)µ < (E/p)π < (E/p)e. (2.2)

The ”shape” of shower can be described by the three parameters: Eseed, the energy deposited in the central crystal; E3×3, the energy deposited in the central 3×3 crystal array; and E5×5, the energy deposited in the central 5 × 5 crystal array. Muons pass the crystals without generating any shower, just a simple line, so Eseed/E3×3 and E3×3/E5×5 would be almost ∼ 1. But these two items will be diﬀerent in an electromagnetic shower caused by electrons and some of interacted pions. As shown in Figure 3(b), it is expected (E /E ) < (E /E ) < (E /E ) , seed 3×3 e seed 3×3 π seed 3×3 µ (2.3) (E3×3/E5×5)e < (E3×3)/E5×5)π < (E3×3/E5×5)µ. The secondmoment S is deﬁned as P E · d2 S = Pi i i . (2.4) i Ei

4 1.8 300 1.6

1.4 250

1.2 200 1 150 0.8 e

0.6 100

erngy depostion in EMC 0.4 µ / π / K /p 50 0.2

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 momentum e9/e25

Figure 3: (a)Energy deposit in EMC vs. the momentum of electrons, pions and muons; (b) ratio of E3×3/E5×5 of electrons, pions and muons.

where Ei is the energy deposition in the ith crystal and di is distance between that crystal and the reconstructed center. The original idea of S was developed by Crystal Ball experiment to distinguish the cluster generated by π0’s and γ’s. For a single electron or muon, most of its energy will deposit in the central crystal,so the S will be small. But for a interacted hadron, S will be relatively bigger, thus will do some help to separate pions and electrons.

2.4 The muon system

The magnet return iron has 9 layers of Resistive Plate Chambers(RPC) in the barrel and 8 layers in the endcap to form a muon counter. An average eﬃciency of 95% is obtained for the chamber. The spatial resolution obtained was 16.6 mm. The energy of electron is exhausted in the calorimeter, cannot reach to the muon counter. Most of hadrons passed the material of calorimeter, magnet, and would be absorbed in the return irons. Muons have quite strong punching ability. Usually muons will produce one hits in each layer, hadrons may produce many hits in a certain layer if the interaction occurred. The distances of muon hits to the extrapolated positions of inner track will be helpful to reduce the hadron contamination to a lower level, since the hits generated by the secondary muon from the decay of pion/kaon cannot math the inner track very well. Figure ?? show the distribution of travel depth, average hits per layer and the distance between the muon hit to the extrapolated position.

3 The Correlated Analysis in TOF PID

While one charged particle passing through the barrel array of TOF counter, there are possible two or four measurements , corresponding to the hits in one or two layers’ counters. At BESIII the problem of averaging the TOF measurements becomes complicated by the fact that diﬀerent measurements have correlated errors from the common event start time. A better choice would be the weighted average of the diﬀerent measurements.

5 3.1 General algorithm

Suppose we have n measurements ti of a particular time-of-flight. Since the measurements are correlated we need more information that just the individual errors. Accordingly, let’s define the covariance matrix Vt, whose terms are given by (Vt)ij =< δtiδtj >, where δti = ti − t, t is the average of ti. The best linear estimator for the time-of-fight which accounts for all measurements, including errors and correlations can be constructed generally as X X t = witi, wi = 1 . (3.1) i i P where the weights wi must be found. Writing δt = i wiδti and using the definition of the standard deviation, we get X 2 σt = wiwj(Vt)ij . (3.2) ij 2 P To minimize σt subject to the condition i wi = 1, we use the Lagrange multiplier technique. Let’s write X X 2 σt = wiwj(Vt)ij + λ( wi − 1) (3.3) ij i and set the derivative of Eq. (3.3) with respect to the wi and Lagrange multiplier λ to zero. This give the solution P −1 k(Vt )ik wi = P −1 . (3.4) jk(Vt )jk

3.2 Errors and correlations of TOF measurements

The time resolution of TOF (σt) counter can be factorized as a function of pulse height Q and hit position z [?]. σt varies with Q is complicated, which need the detail study on the real data. In this paper, only the z dependence σt are taken into account since it’s in a similar manner for electrons, muon and hadrons. Fig. ?? shows a typical σt(z) of one-end readout varies as a function of z from the Bhabha event. The tmea is determined by both the measurements of end-time and start-time. The accuracy of end-time is limited by the detector; The precision of start-time is controlled by the uncertainties of t0.Thus for a given TOF, the tmea in both readout end can be decomposed as t = t + (t ) 1 c D 1 , (3.5) t2 = tc + (tD)2 where t1,2 are the tmea’s in two readout end, tc is the correlated tmea,(tD)1,2 are the uncorrelated tmea’s. Let’s deﬁne t + t t = 1 2 + 2 t − t , (3.6) t = 1 2 − 2 the ﬂuctuation of tc p(σc) can be directly extracted by comparing the time resolution of t+ 2 2 and t− (σ+,−), σc = σ+ − σ−. σ+(z), σ−(z) and σc(z) are drawn in Fig. ??. As shown in Fig. ??, σc(z) is approximately a constant.

6 For one-layer barrel TOF measurement, the covariance matrix can be expressed as µ ¶ 2 2 σ1 σc Vt = 2 2 . (3.7) σc σ2 where σ1,2 are the time resolutions of two readout end, which are the function of z. Applying Eq. (3.7) in Eq. (3.1)−Eq. (3.4), we get

2 2 2 2 σ2 − σc σ1 − σc w1 = 2 2 2 , w2 = 2 2 2 , (3.8) σ1 + σ2 − 2σc σ1 + σ2 − 2σc and 2 2 4 2 σ1 · σ2 − σc σt = 2 2 2 . (3.9) σ1 + σ2 − 2σc the average t can be easily obtained. The time resolution σt(z) is drawn in Fig. ??.

3.3 Combining the time-of-ﬂight from two layers’ measurements 4 Control sample

4.1 Hadron Sample 4.2 Electron Sample At electron-positron colliders, there is an excellent source of electrons, the QED Bhabha scattering process e+e− → e+e−. The electron is nearly massless, so radiative corrections are very important for this process. The spectrum of radiated photons is in general soft, but there is a long tail which extends up to the beam energy. Roughly half of the photons are emitted from the initial state and the remainder from the final state. Photons from the initial state are seldom detected; they are emitted nearly parallel to the beam direction and so usually remain within the beam pipe. Thus, if the photon from the scattering process e+e− → γe+e− is detected, it is most likely the result of Bremsstrahlung from one of the final state electrons. This has several fortunate consequences. The final state electron which does not radiate will have momentum approximately equal to the beam energy. The electron which does radiate will be of lower momentum; the momentum distribution peaks at high momentum but extends to very low mentum. The electron sample was drawn from events produced by radiative Bhabha scattering. For such events, stringent cuts were placed on the photon and on the charged track of higher momentum. The charged track of lower momentum was added to the electron sample. The cuts imposed were:

1) that there must be two and only two tracks in the drift chamber with well measured momenta and showers in the barrel calorimeter.

2) that there must be at least one neutral shower in the barrel calorimeter with a measured energy of at least 200MeV.

7 3) that the higher momentum track should have momentum and deposit energy, consistent with the beam energy.

4) that the lower momentum track must have momentum at least 200 MeV below the beam energy.

5) that the direction of the neutral track should match the direction of the missing momentum.

Neutral tracks were required to have a measured energy of at least 200 MeV for several reasons. The shower detector is almost completely efficient for photons of energy greater than 50 MeV. There are some neutral showers which are not associated with incident photons. These are the result of ’split-off’ from the showers of charged tracks or of the electronic noise. Most of these spurious neutral tracks are of low measured energy. Requiring the energy to be greater than 200 MeV removed the majority of these. The energy requirement has another effect. The electron sample is needed to study the pattern of energy deposition by electrons in the shower detector. The photon and the radiating electron will be emitted in approximately the same direction; the larger the photon energy, the largely will be the angle between their direction. If the photon is required to deposit at least 200 MeV in EMC, the photon and electron showers will seldom overlap. Figure ?? shows the energy spectrum of detected photons and the momentum spectrum of electron sample.

4.3 Muon Sample To study the muon identification (especially for the low momentum muon), a large and board momentum range cosmic ray sample will be selected from the data. Muons are the most component of the cosmic ray. Hadrons and other electromagnetic components are filtered by the iron yoke, only a few percents of hadrons background are remained in the sample. The ratio of hadrons could be easily estimated by comparing with the e+e− → µ+µ− (dimu) sample. While a single cosmic ray (near the interaction region) pass through the tracking detector, it will be reconstructed as 2 tracks in most case. The cosmic rays selection are processed as: two charged MDC track are required. A total EMC energy (less than 1.5GeV) CUT are applied to remove Bhabha events. Both two charged track should have good TOF information and the difference of the two time-of-flight are required to be greater than 5ns. Comparison with the collision events, the cosmic events has different T0. To get the better tracking resolution, the sample are reconstructed again with the T0 correction. The T0 for the cosmic events has a shift (T + T ) T = 1 2 (4.1) 0 2 to the collision events, where T1 and T2 are the measure time-of-flight. The drift time are corrected as ( 0 (φ > 180◦) Tcorr = Tmeas − T0 + R4×(L−1)+W ◦ (4.2) (T2 − T1) × (φ ≤ 180 ) RTOF

8 7500 7500 4000 5000 5000 2000 2500 2500 0 0 0 012345 0 100 200 300 -1 -0.5 0 0.5 1 φ (dgree) pµ/(GeV) µ cosθµ

Figure 4: momentum p, φ and cos θ distributions for the selected cosmic ray sample from BESII data. where the R is radius of the wires(L is the layer number, W is the wire number) and TOF counter. The tracking parameters are improved after the correction, the cosmic ray with φ > 180◦ is an ideal “muon” track. Figure 4 shows the momentum, φ and cos θ distributions for selected cosmic ray from BESII data. The purity of the sample is quite high( greater than 98%). It can be used for the detector calibration, include: the momentum and incident angle dependent position resolution; the µ−ID eﬃciencies study; with the dimu sample together, check the uniformity of detector response; the E/p ratio and the shower shape measured in the EMC; the calibration of the dE/dx curve; the alignment of tracking system, etc.

5 The Likelihood Method

Using relative likelihoods (likelihood ratios) allows the most powerful discrimination between hypotheses, and using signiﬁcance provides a measure of consistency between data and selected hypotheses.

5.1 Probability Density Functions The response of a detector to each particle species is given by a probability density function (PDF). The PDF, written as P(x; p, H) describes the probability that a particle of species H = e±, µ±, π±,K±, p, p¯ leaves a signature x described by a vector of measurements(dE/dx, TOF, e/p, ...). P(x; p, H)dx is the probability for the detector to respond to a track of momentum p and type H with a measurementR in the range (x, x + dx). As with any PDF, the integral over all possible values is unit, P(x; p, H)dx = 1. Note that the momentum is treated as part of the hypothesis for the PDF and therefore is placed to the right of semicolon. Drift chamber momentum measurements are usually of suﬃcient precision that they can be treated as a given quantity. In borderline cases when the precision is almost suﬃcient, it is sometimes treated by assuming that momentum is perfectly measured and smearing the PDF. The vector x may describe a single measurement in one detector, several measurements in one detector, or several measurements in several detectors. The measurements may be correlated for a single hypothesis. An example of correlated measurements within a single

9 device is E/p and the shower shape of electrons in EMC. An example of correlated measurements in separate detectors is the energy deposited in the EMC and the instrumented flux return by charged pions. In many case of interest the correlations will be reasonably small and the overall PDF can be determined as a product of the PDFs for indivdual detectors. For example, the specific ionization deposited by a charged track as it traverses the drift chamber has almost no influence on the time-of-flight measurements in TOF. The difficult part of PID analysis is determining the PDFs, their correlations (if any) and underdtanding the uncertainties for these distributions.

5.2 Likelihood Given the relevant PDFs, the likelihood that a track with measurement vector x is a particle of species H is denoted by L(H; p, x). The functional forms of PDFs and the corresponding likelihood function are the same:

L(H; p, x) ≡ P(x; p, H) (5.1)

The difference between L(H; p, x) and P(x; p, H) is subtle: probability is a function of the measurable quantities (x) for a fixed hypothesis (p, H); likelihood is a function of particle type (H) for a fixed momentum p and the measured value (x). Therefore, an observed track for which x has been measured has a likelihood for each particle type. Competing particle type hypotheses should be compared using the ratio of their likelihoods. Other variables having a one-to-one mapping onto the likelihood ratio are equivalent. Two commonly used mappings of the likelihood ratios are difference of log-likelihoods and a normalized likelihood ratio, sometimes called likelihood fraction. For example, to distinguish between the K+ and + π hypotheses for a track with measurements xobs, these three quantities would be written as: + + L(K ; pobs, xobs)/L(π ; pobs, xobs) (5.2)

¡ + ¢ ¡ + ¢ log L(K ; pobs, xobs) − log L(π ; pobs, xobs) (5.3)

+ L(K ; pobs, xobs) + + (5.4) L(K ; pobs, xobs) + L(π ; pobs, xobs) It can be shown rigorously that the likelihood ratio (Eq. (5.2) and its equivalents Eq. (5.3) and Eq. (5.4)) discriminate between hypotheses most powerfully. For any particular cut on the likelihood ratio there exists no other set of cuts or selection procedure which gives a higher signal eﬃciency for the same background rejection. There has been an implicit assumption made so far that there is perfect knowledge of the PDF describing the detector. In the real world, there are often tails on distributions due to track confusion, nonlinearities in detector response, and many other experimental source which are imperfectly described in PDFs. While deviations from the expected distribution can be determined from control samples of real data and thereby taken into account correctly, the tails of these distributions are often associated with fake or badly reconstructed tracks. This is one reason why experimentalists should include an additional consistency test.

10 5.3 Consistency A statistical test for consistency does not try to distinguish between competing hypotheses: it address how well the measured quantities accord with those expected for a particle type H. The question is usually posed, ”What fraction of genuine tracks of species H looks less H-like than does this track?” This is the prescription for a significance level. For a device measuring a single quantity and a Gaussian response function, a track is said to be consistent with hypothesis at the 31.7%(4.55%) significance level if the measurement falls within 1(2)σ of the peak value. If the PDF is a univariate Gaussian, " µ ¶ # 1 1 x − µ(p, H) 2 P(x; p, H) = √ exp − (5.5) 2πσ(p, H) 2 σ(p, H) the significance level(SL) for hypothesis H of a measured track with x = xobs is defined by Z µH +xobs SL(xobs; H) ≡ 1 − P(x; H)dx (5.6) µH −xobs Notice that the integration interval is defined to have symmetric limits around the central value. This is an example of a two-sided test. Mathematically, one may also define a one-sided test where the integration interval ranges from xobs to +∞ or from −∞ to xobs. However , for a physicist establishing consistency, it is only sensible to talk about the symmetric, two-sided significance levels defined in Eq. (5.6) when presented with a Gaussian PDF. This definition is equally sensible for other symmetric PDFs with a single maximum. Nature is not always kind enough to provide Gassian or monotonic PDFs. For example, asymmetric PDFs are encountered when making specific ionization (dE/dx) measurements. Multiple peaks in a PDF might be encountered when considering the energy deposited by a 1 GeV π− in EMC. Although the π− will typically leave a minimum ionizing signature, some fraction of the time there will be a charge exchange reaction (π− + p → π0 + n) which deposit most of the pi− energy electromagnetically. A particularly useful generalization of the significance level of an observation xobs given the hypothesis H is defined to be Z

SL(xobs; H) = 1 − P(x; H)dx (5.7) P(x;H)>P(xobs;H) Although we define the consistency in terms of an integral over the PDF of x, note that the range(es) is(ar) specified in terms of the PDF, not in terms of x. This allows a physically meaningful definition. While other definitions of significance level are possible mathematically, we strongly recommend the definition in Eq. (5.7). Note that because the PDF is nomalized to 1, the significance level can be defined equivalently as Z

SL(xobs; H) = P(x; H)dx (5.8) P(x;H)

11 Using significance levels to remove tracks which are inconsistent with all hypotheses takes a toll on the efficiency (presumably small), and may also discriminate between hypotheses. In general, if a cut is made requiring SL > α, the false negative rate, is α. This is identical to the statement that the efficiency of this cut is equal to 1-α. The false positive rate, β(H) can depend on the definition of the SL, i.e., on the design of the test, and is identicalP t the misidentification probability. The background fraction in a sample is the sum of βiAP i, where PAi is the fraction of particle i in the sample. Consistencies control only the efficiency. Minimizing background, however, depends on the type of sample. A fixed cut on the consistency will produce very different background rates in different analysis. Any procedure for combining either confidence levels or significance levels consistency is arbitrary with an infinite number of equally valid alternatives. For example, the method of combining the confidence levels is mathematically equivalent to the following recipe:

2 2 1) use the inverse of CL = P (χ |1) to covert each of n probabilities CLi into a χi

2 Pn 2 2) add them up, i.e. χ = i=1 χi 3) use CL = P (χ2|n) to convert χ2 into a new ”combined” CL

5.4 Weighted Likelihood In the case (such as particle identiﬁcation) where the a priori probabilities of competing hypotheses are known numbers, PA(H), likelihood can be used to calculate the expected purities of given selections. Consider the case of K/π separation, the fraction of kaons in a sample with measurement vector x is given by L(K; x) ·P (K) F(K; x) = A (5.9) L(π; x) ·PA(π) + L(K; x) ·PA(K) This can be considered as a weighted likelihood ratio where the weighting factors are a priori probabilities. The F(K; x) are also called posteriori probabilities, relative probabilities, or conditional probabilities, and their calculation according to Eq. (5.9) is an application of Bayes’ theorem. The purity, i.e., the fraction of kaons in a sample selected with, say, F(K; x) > 0.9, is determined by calculating the number of kaons observed in the relevant range of values of F and normalizing to the total number of tracks observed there, e.g., R 1 dN 0.9 dF(H;x) F(H; x)dF(H; x) fraction(F > 0.9) = R (5.10) H 1 dN 0.9 dF(H;x) dF(H; x) where the integration variable is the value of F(H; x).

5.5 Using likelihood, consistencies, and probabilities If PDFs (and a priori probabilities) were perfectly understood, using likelihood ratios (and the probabilities calculated in Eq. (5.9)) to discriminate between hypotheses would suﬃce. However, the tails of distributions are likely to be unreliable. Some tracks will have signatures

12 in the detectors that are very unlikely for any hypothesis. Others will have inconsistent signatures in diﬀerent detectors, not in accord with any single hypothesis. We do not want to call something a K rather than π when the observed value of some parameter is extremely improbable for either hypothesis, even if the likelihood ratio strongly favors the K hypothesis. Extremely improbable events indicate detector malfunctions ad glitches more reliably than they indicate particle species; they should be excluded. For many purposes, this can be done conveniently by cutting on the consistency of the selected hypothesis. If the PDFs are resonablely well understood, this has the additional advantage that it provides the eﬃciency of the cut. Only in the case of a single Gaussian distributed variable do consistencies contain all the information to calculate the corresponding likelihood functions. There is a two-to-one mapping from the variable to the consistency and a one-to-one mapping from the PDF to the consistency. One can compute probabilities directly from likelihoods only because they are proportional to PDFs. To compare relative likelihoods, one must either retain the likelihoods or have access to the PDFs used to compute consistencies. If there is more than one variable involved, or the distribution is non-Gaussian, even this possibility evaporates; any consistency corresponds to a surface in the parameters space, and one cannot recover the values of the parameters or the likelihood, even in principle.

5.6 An example of TOF and dE/dx PID At BESIII, TOF and dE/dx are quite essential for hadron separation. For a TOF detector in which the time-of-ﬂight t are measured with Gaussian resolution σt which we assume t be a constant(∼ 80 ps); Similarly, the energy loss in drift chamber (dE/dx) are also Gaussian distribution with a resolution σE ∼ 6.5%. If all incident particles are known to be either pions, kaons and protons at some ﬁxed momentum, then the distribution of t and dE/dx will consist of the superposition of three Gaussian distributions, centered at the central values (tπ, tK ,tp ) and ((dE/dx)π, (dE/dx)K , (dE/dx)p) for pions, kaons and protons. The PDF for pion hypothesis is the normalized probability function " µ ¶ # 1 1 t − t 2 P(t; π) = √ exp − π 2πσt 2 σt " µ ¶ # (5.11) 1 1 dE/dx − (dE/dx) 2 P(dE/dx; π) = √ exp − π 2πσE 2 σE

The PDFs for kaon and proton are in the similar form. Using the observed time of ﬂight t and dE/dx information, the likelihoods for pion, kaon and proton can be constructed by

L(π) = L(π; t, dE/dx) = P(t; π) ·P(dE/dx; π) L(K) = L(K; t, dE/dx) = P(t; K) ·P(dE/dx; K) (5.12) L(p)L(p; t, dE/dx) = P(t; p) ·P(dE/dx; p)

Let’s Consider the K/π separation in a sample which consist 80% pions and 20% kaons. Using the observed time of ﬂight t and energy loss in drift chamber, it is possible to calculate

13 2 2 4 4 1.8 10 1.8 10 103 3 dE/dx 10 dE/dx K π π 1.6 2 1.6 10 K 102 K 1.4 10 1.4 10 1 K 1 1.2 π 1.2 π Events / ( 0.01 ) -1 -1 10 Events / ( 0.01 ) 10 1 -2 1 10 10-2 0.8 10-3 0.8 10-3 10-4 -4 0.6 0.6 10 10-5 -5 2.533.544.55 00.2 0.40.6 0.81 2.533.544.5510 00.2 0.40.6 0.81 time of flight/(ns) Fraction of likelihood time of flight/(ns) Fraction of likelihood 8000 7000 8000 6000 8000 6000 5000 6000 π 6000 π π π 4000 4000 Events / ( 0.015 ) Events / ( 0.025 )

4000 Events / ( 0.015 ) K 3000 Events / ( 0.025) 4000 2000 K 2000 2000 2000 K 1000 K

0 0 0 0 2.533.544.55 0.60.8 11.2 1.41.6 1.82 2.533.544.55 0.60.8 11.2 1.41.6 1.82 time of flight/(ns) dE/dx time of flight/(ns) dE/dx Momentum: p = 0.6 GeV Momentum: p = 0.8 GeV

2 2 4 104 10 1.8 K 1.8 3 π 3

dE/dx 10 10 dE/dx 1.6 1.6 2 102 10 π 1.4 K 10 1.4 10 K π 1 1 1.2 π

1.2 K Events / ( 0.01 ) -1 -1 Events / ( 0.01 ) 10 10 1 1 -2 10-2 10 -3 0.8 10-3 0.8 10 -4 10-4 0.6 10 0.6 -5 -5 10 2.533.544.5510 00.2 0.40.6 0.81 2.533.544.55 00.2 0.40.6 0.81 time of flight/(ns) Fraction of likelihood time of flight/(ns) Fraction of likelihood

8000 8000 8000 8000 6000 π 6000 6000 6000 π π

π Events / ( 0.025) 4000 4000 4000 Events / ( 0.015 ) Events / ( 0.025)

4000 Events / ( 0.015 ) 2000 K 2000 2000 K K 2000 K 0 0 0 0 2.533.544.55 0.60.8 11.2 1.41.6 1.82 2.533.544.55 0.60.8 11.2 1.41.6 1.82 time of flight/(ns) dE/dx time of flight/(ns) dE/dx Momentum: p = 1.0 GeV Momentum: p = 1.2 GeV

Figure 5: The relative likelihood constructed by combining the TOf and dE/dx information with the track momentum at 0.6, 0.8 , 1.0 and 1.2GeV. The time of ﬂight distribution is calculated by a 1.0 ﬂight distance. the relative probabilities of pions and kaons at measured t and dE/dx P (π)L(π) F(π) = A PA(π)L(π) + PA(K)L(K) (5.13) P (K)L(K) F(K) = A PA(π)L(π) + PA(K)L(K) By the construction, F(π) + F(K) = 1. The calculation of relative probabilities are illus- trated in Figure 5. As shown in Figure 5, the K/π separation at 0.6 GeV is better than that at 1 GeV.

5.7 Cell analysis In the example presented above we assumed there were no correlations between the particle identiﬁcation provided by the TOF and that provided by dE/dx. This is a ﬁne approach if TOF is a purely passive detector and there are no other sources of correlation. An approach that takes into account all correlations explicitly is cell analysis. Basically, you make a multi- dimensional histogram of all relevant variables and compute the fraction of tracks that land

14 in each cell for each hypothesis. You can then use these fractions as your likelihood. The result is optimal with all correlations completely accounted for, if the cells are small enough. The trouble with this approach is that as the number of variables becomes larger, the number of cells quickly gets out of hand. It becomes impossible ﬁnd enough ”training events” to map out the cell distributions with adequate statistics. Still, it is a viable approach for a small number of variables and is well suited to a problem such as combining E/p and event shape in calorimeter. This would involve 3 variables in principle: E/p, shape, and dip angle, but one might get by with relatively large cells. A judicious choice of cells that uses our knowledge of the underlying physics can greatly reduce the number of cells needed. e.g., the dip angle might be eliminated as a variable if a dip-corrected shape can be invented. If groups of highly correlated variables can be treated together, we might be able to construct a set of relatively uncorrelated likelihoods. It may be necessary to combine information from several detectors to construct some of these variables

5.8 The role of neural nets If the variables are not highly correlated, multiplying together the likelihood associated with each variable should suﬃce. If correlations are simple enough, a change of variables or a cell analysis may suﬃce. If the variables are highly correlated, neural nets and other such opaque boxes might construct near-optimal discrimination variables. The PDFs for the resulting variables can be used as the basis for a likelihood analysis. Using the same formalism for neural network outputs as for conventional likelihood analyses allows modular design of analysis software with no loss of information and optimal discrimination between hypotheses.

6 The Toolkit for Multiple Variables Analysis

7 The Artiﬁcial Neural Network Method

An artificial neural network [?] is a computational structure inspired by the study of biological neural processing.There are many different types of neural networks,from relatively simple to quite complex,just as there are many theories on how biological neural processing works.In BESIII particle identification , a type of layered feed-forward neural network will be applied. A layered feed-forward neural network has layers,or subgroups of processing elements.The first layer is the input layer and the last output layer.The layers that are placed between the first and the last layers are the hidden layers.A layer of processing elements makes independent computations on data that it receives and passes the results to another layer.The next layer may in turn makes its independent computations and pass on the results to yet another layer.Finally,a subgroup of one or more processing elements determines the output from the network.Each processing element makes its computation based upon a weighted sum of its inputs.The processing elements are seen as units that are similar to the neurons in a human brain,and hence,they are referred to as cells,neuromime,or artificial neurons.Even though our subject matter deals with artificial neurons,we will simply refer to them as

15 ptrk

type

goodHits

normPH

Figure 6: The structure of a type of layered feed-forward neural network applied in BESIII particle identiﬁcation. Four parameters ptrk,pt,goodHits,normPH from MDC act as the input neurons and construct the input layer,the right-most layer is the output layer consisting of an output neuron type, the two layers between them are hidden layers including 8 and 3 neurons

0.9 Error 0.8 Training sample

0.7 Test sample 0.6

0.5

0.4

0.3

0 200 400 600 800 1000 Epoch

Figure 7: Errors vs numbers for training and test samples. neurons.Synapses between neurons are referred to as connections,which are represented by edges of a directed graph in which the nodes are the artificial neurons. As discussed in Section 2, the following variables are choosed as the input neuron. They are: the momentum and transverse momentum by track fitting; the normalized pulse height and the number of good hits in dE/dx measurements; the mass square calculated from the measured time-of-flight, together with the hit position and pulse height in inner, outer barrel and endcap TOF array; the energy deposit in EMC, the shower shape parameters Eseed, E3×3, E5×5 and the secondmoment S; the travel depth, average of number of hits and the distance between the hit and the extrapolated position in first layer muon counter. The data sample are generated and analyzed in BESIII Offline Software System (BOSS). e/µ/π/K/p/p particles are generated in momentum range of 0.1-2.1 GeV/c. When all these input variables are ready, they will be put into the network and the training starts. The aim of the training is to minimize the total error upon the sum of a weighted examples. The epoch is the most important parameter during the training. Figure 7 shows its effect on training result, when this number is about 400, the result is good, too few or too many times will lead to useless result or over-training. The following figures shows the variation with momentum of some input neurons for different particles ,which gives us a better understanding of the input neurons’ ability in particle identification.

16 1 0.1 1 1

0.9 0.09 0.9 0.9

efficiency 0.8 0.08 efficiency 0.8 efficiency 0.8

0.7 0.07 0.7 0.7

0.6 0.06 0.6 0.6 ♦ electron ♦ muon ♦ kaon 0.5 0.05 0.5 0.5 ♦ pion ♦ pion ♦ pion 0.4 0.04 0.4 0.4

0.3 0.03 0.3 0.3

0.2 0.02 0.2 0.2

0.1 0.01 0.1 0.1

0 0 0 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 P (Gev/c) P (Gev/c) P (Gev/c)

Figure 8: NN-PID performance.

7.1 The TMVA Algorithm Factories References