Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
SLAC-TN-03-028 Neural Networks Patrick Smith Office of Science, Student UndergraduateLaboratory Internship (SULI) Stanford University Stanford Linear Accelerator Center Menlo Park, California August 14,2003 Preparedin partial fulfillment of the requirementsof the Office of Science,Department of Energy’s ScienceUndergraduate Laboratory Internship under the direction of Tony Johnson. Participant: Signature ResearchAdvisor: Signature Work supported in part by the Department of Energy contract DE-AC03-76SF00515. INTRODUCTION Physicistsuse large detectorsto measureparticles createdin high-energy collisions at particle accelerators.These detectors typically producesignals indicating either where ionization occurs along the path of the particle, or where energyis depositedby the.particle.The data producedby thesesignals is fed into pattern recognition programsto try to identify what particles were produced,and to measurethe energy and direction of theseparticles. Ideally, there are many techniquesused in this pattern recognition software. One technique,neural networks, is particularly suitable for identifying what type of particle causedby a set of energy deposits. Neural networks can derive meaningfrom complicatedor imprecisedata, extract patterns,and detect trendsthat are too complex to be noticed by either humansor other computer related processes. To assistin the advancementof this technology,Physicists use a tool kit to experimentwith severalneural network techniques.The goal of this researchis interface a neural network tool kit into JavaAnalysis Studio (JAS3), an applicationthat allows data to be analyzedfrom any experiment. As the final result, a physicist will have the ability to train, test, and implement a neural network with the desiredoutput while using JAS3 to analyzethe results or output. Before an implementationof a neural network can take place, a firm understandingof what a neural network is and how it works is beneficial. A neural network is an artificial representation of the humanbrain that tries to simulatethe learning process[5]. It is also important to think of the word artificial in that definition as computerprograms that use calculationsduring the learning process. In short, a neural network learnsby representativeexamples. Perhapsthe easiestway to describethe way neural networks learn is to explain how the humanbrain functions. The humanbrain containsbillions of neural cells that are responsiblefor processing information [2]. Each one of thesecells acts as a simple processor.When individual cells interact with one another, the complex abilities of the brain are made possible. In neural networks, the input or data are processedby a propagationfunction that adds up the values of all the incoming data. The ending value is then comparedwith a threshold or specific value. The resulting value must exceedthe activation function value in order to becomeoutput. The activation function is a mathematicalfunction that a neuron usesto produce an output referring to its input value. [8] Figure 1 depicts this process.Neural networks usually have three componentsan input, a hidden, and an output. These Iayerscreate the end result of the neural network. A real world example is a child associatingthe word dog with a picture. The child saysdog and simultaneouslylooks a picture of a dog. The input is the spoken word “dog”, the hidden is the brain processing,and the output will be the category of the word dog basedon the picture. This illustration describeshow a neural network functions. MATERIALS AND METHODS Before the implementation of a neural network tool kit into JAS3, JAS3 and all the componentshad to be installed. An editor that was capableof the processeshad to be obtained and downloadedsuccessfully on the workstation. The neural network that we originally used was Java Object Oriented Neural Engine (JOONB). This editor would eventually have to be replacedas you will find out why in the conclusion of this paper. There were severalfactors that were involved in choosing the editor. It had to be user- friendly, free software, flexible and capableof handling the amount of data for our purpose. The tool-kit will be used by physicists who are analyzing their specific data, so the application must be easy to use and understand. An equally important factor is that the editor is written in Java 2 programminglanguage because the editor will be interfacedwith JAS3, a Javabased application. Many companiessell neural network software. Thesesoftware packages are very expensive. For the purposeof our researchthere was no needfor one of those software packages.A more generaleditor would be beneficial. An editor that was flexible was also important. An editor with functionality that includes graphsand many other easyto use featureswould improve usability. There are many editors via the World Wide Web that met our needsand were free to download along with open sourcecode. Finally, the editor had to be able to handle the specified amount of data. Searchingfor a neural network editor involved the exploration of the intemet. Many networks were available,however; five met most of the qualifications. The Stuttgart Neural Network simulator met the qualifications by a researcherat SLAC on a previous project. NeuroSolutionsmet all the requirementsexcept that it was not free; in fact it was quite expensive. Cortex also met all the requirementsexcept that it was written in C++. JOONE satisfiedall requirements,so originally JOONE was chosen. Now that the editor was chosen,understanding the componentsof it was important. JOONE a GUI editor used a graphical user interface that allows you to create,modify, and train a neural network. JOONE for the most part is very self explanatoryand there is a tutorial that is somewhathelpful. In addition the creator of JOONE, Paolo Marrone, respondedto questions concerninghis editor that were unclear to us. Figure 2 showsthe JOONE editor. For this study we have useda neural network to identify calorimeter clustersin a possible linear collinear collider detector. In such a detectorhundreds of particles are createdin each collision, eachin which depositsits energyinto a set of calorimeter cells. Software groups the energy depositsinto clusters,representing the energydeposited by a single particle. By measuringvarious propertiesof the cluster and feeding thesemeasurements into a neural network we can attempt to estimatewhat type of particle createdeach cluster. To train the network we use simulatedevents, in which we alreadyknow which type of particle createdeach energy deposit. The data that we usedin the neural net camefrom Gary Bower a SLAC physicist who createda neural network using a different editor. His training and validation data was theoretically valuable in a couple of instances.When training a neural network, the data should be trained on more than one net to contrastand compareresults. Bower’s data gave us that elementfor comparison. Also, we could use it to train in different modesjust to seethe varying outputs. We constructeda neural network that included 3 layers. The input layer consistedof 15, the hidden 2.5,and the output 6. That meansthat the data set, the first 15 numbersin a row would be the input and the last 6 would be the desiredoutput accordingto the data combination of the input. The hidden layer varies. According to Kevin Swinger, the hidden layer value should be no more than twice the input. It is practical to experimentwith the hidden layer value. When the layers where finished, anothercomponent was usedto read the data into the network referred to as the file input. The componentsimply read the first 15 columns into the network. Another componentcalled the teacherread the last 6 columns or desiredresults into the network. The 21 rows of data that we usedto train our network was the form of binary numbers. The first 15 inputs were named: NE [0], NE [I], NE [2], NE [O]/NE [l], NE [l]/NE [2], firs& IastL, length, firstDdiff, aveLE5, angsep*lOOOO,aveLHits2, nhits, ClusEtot, and CE[2] was the quantitiesthat describedthe clusters. The last 6 desiredoutputs were nngamma,nnchhad, nneuhad,nngammafrag, nnhadfrag, and nnnone, Thesequantities identified what type of cluster it was. Example, if the first 15 binary numbersdescribe the last 6 rows to be “1 0 0 0 0 0" it would be considereda nngammacluster. This is because“ 1 0 0 0 0 0” = nngammain binary, as “0 0 1 0 0 0” = to a nneuhadcluster in binary. Seetable 1 for completetranslations. We set the initial learning rate q (eta) and momentumterms LZ(alpha). Eta dictatesthe proportion of the calculatederror that contributesto the weight change. Alpha relatesto the size of the previous updatefor eachweight. [8] Although we left the valuesto the editor default, the valueswere important to monitor and prevent over fitting the data. We then set the number of cycles (epochs)for the network to train the data to 10,000. The network doesthe rest. The purposeof training the net is to minimize the error which the network makesat each output unit over the entire data set. [S] So while training we looked for our error to move closer and closer to zero. It will never actually reachzero. If it did the network simply memorizedthe training data. So if the training error reacheda predeterminedtarget value, flattened out, or start to rise, training should be stopped. The last stepwas to read the output into a file for comparison. The output was read into AIDA. AIDA histograminterfaces allowed for a simple and easyto interpret implementationof the output. RESULTS