Monitoring Program Behaviour on SUPRENUM Abstract 1 Introduction
Total Page:16
File Type:pdf, Size:1020Kb
Monitoring Program Behaviour on SUPRENUM Markus Siegle Richard Hofmann Institut f ur Informatik VI I UniversitatErlangenN urnberg Martensstrae Erlangen Germany email siegleimmdinformatikunierlangende Programmers need to have detailed knowledge of the func Abstract tional b ehaviour of their programs and for the consideration It is often very dicult for programmers of parallel computers of p erformance asp ects they need timing information as well to understand how their parallel programs b ehave at execu Usually metho ds such as proling and accounting do not pro tion time b ecause there is not enough insight into the inter vide sucient information they only give summary statistical actions b etween concurrent activities in the parallel machine results Therefore users often resort to rudimentary metho ds Programmers do not only wish to obtain statistical informa such as writing logles during program execution in order to tion that can b e supplied by proling for example They need obtain debug information and p erformance information ab out to have detailed knowledge ab out the functional b ehaviour of their programs But only a relatively small fraction of the their programs Considering p erformance asp ects they need needed information can b e obtained that way A ma jor prob timing information as well Monitoring is a technique well lem with multiprocessors is the absence of a global clo ck with suited to obtain information ab out b oth functional b ehaviour high resolution Global timing information is essential for de and timing Global time information is essential for determin termining the chronological order of events on dierent no des ing the chronological order of events on dierent no des of a of a multiprocessor or of a distributed system and for de multiprocessor or of a distributed system and for determining termining the duration of time intervals b etween events from the duration of time intervals b etween events from dierent dierent no des no des A ma jor problem on multiprocessors is the absence Facing this problem our approach is to apply eventdriven of a global clo ck with high resolution This problem can b e monitoring techniques in order to nd out how a overcome if a monitor system capable of supplying globally parallel program b ehaves In particular we decided to use valid time stamps is used hybrid monitoring which combines advantages of b oth hard In this pap er the b ehaviour and p erformance of a parallel ware monitoring and software monitoring Using software program on the SUPRENUM multiprocessor is studied The monitoring it is relatively easy to relate the event traces ob metho d used for gaining insight into the runtime b ehaviour tained from the measurements to the measured program But of a parallel program is hybrid monitoring a technique that since monitoring is done within the ob ject system ie within combines advantages of b oth software monitoring and hard the system under study and therefore constitutes an extra ware monitoring A novel interface makes it p ossible to mea workload software monitoring changes the b ehaviour of the sure program activities on SUPRENUM The SUPRENUM ob ject system Also it is usually imp ossible to obtain global system and the ZM hardware monitor are briey describ ed timing information b ecause most parallel systems do not pro The example program under study is a parallel ray tracer We vide a global clo ck with high resolution With hardware mon show that hybrid monitoring is an excellent metho d to pro itoring there is no intrusion and the timing problem can b e vide programmers with valuable information for debugging solved by providing an external clo ck But there is no easy and tuning of parallel programs way to relate the recorded signals to the source co de of the Keywords debugging eventdriven monitoring multipro measured program cessor parallel program p erformance evaluation ray tracing In hybrid monitoring as in software monitoring the pro SUPRENUM tuning gram under study is instrumented by inserting additional in structions at p oints of interest The execution of such a mea surement instruction marks an event It causes the output of Introduction measurement data containing a token identifying the event It is often very dicult for programmers of parallel computers and p ossibly some additional parameters to an external in to understand how their parallel programs b ehave at execu terface A hardware monitor is connected to the interface tion time b ecause there is not enough insight into the inter It records the event stream coming from the interface and actions b etween concurrent activities in the parallel machine stores the sequence of events together with the resp ective time stamps as an event trace Since most of the work is done by the external hardware monitor hybrid monitoring provides the capabiliti es of software monitoring at a much lower level of intrusion The hardware monitor we use is a scalable distributed monitor system called ZM It is capable release The functions of the communication unit are of simultaneously recording event streams coming from an ar realized mainly by gate arrays and hybrid mo dules bitrary number of no des Since the ZM has a global clo ck with high resolution events coming from dierent no des can Up to pro cessing no des form a cluster No des of the b e chronologicall y ordered by their time stamps same cluster communicate via the cluster bus In order to In this pap er we use hybrid monitoring techniques to study provide some degree of faulttolerance the cluster bus consists the b ehaviour of a parallel program on the German sup ercom of two indep endent parallel buses each having a transfer rate puter SUPRENUM The SUPRENUM pro ject was launched of MBytes Thus the total bandwidth available for intra in as a governmentfunded pro ject and implementation cluster communication is MBytes of a prototype started in The pro ject resulted in a Figure left shows the comp onents of one SUPRENUM commercial pro duct in SUPRENUM is a distributed cluster In addition to the pro cessing no des each cluster con memory multiprocessor in which the pro cessing no des are in tains or sp ecial purp ose no des there are up to communi terconnected by a hierarchical bus system The SUPRENUM cation no des which handle the communication b etween clus architecture and its software environment are discussed in ters If a pro cessing no de in one cluster wants to communicate more detail in Section with another pro cessing no de in a dierent cluster commu In section the essential features of the hardware moni nication is done via a communication no de There is one disk tor ZM are presented Section presents the application of controller no de which can connect up to disks to the cluster our metho d to the implementation of a parallel ray tracing Finally there is one cluster diagnosis no de which monitors the program on SUPRENUM We show how measurements sup clusterbus and maintains statistical records Only communi p orted the implementation of a parallel program by providing cation activities can b e monitored by the diagnosis no de programmers with valuable information ab out the b ehaviour The clusters are interconnected in a toroid structure by bit of their program Both debugging and tuning of the parallel serial buses called SUPRENUM bus Figure right shows program were supp orted considerably by the measurements the cluster SUPRENUM system and a frontend computer A token ring proto col is employed for the SUPRENUM bus with a data transfer rate of MBytes By duplicati ng the The SUPRENUM Multipro torus structure the bandwidth doubles and faulttolerance is cessor achieved b ecause the clusters in a ring can always b e reached via alternative routes Hardware Architecture The SUPRENUM system is a MIMDtype mul The Programming Mo del tipro cessor consisting of up to pro cessors no des It is a The SUPRENUM computer is a multiuser machine Users distributedmemory machine with a twolevel interconnection can access the SUPRENUM kernel via a frontend computer network All the elements comprising one pro cessing no de are In order to execute a parallel program a user must rst re accommo dated on a single printed circuit b oard quest a certain number of clusters or no des If the requested The main comp onents of each no de are a Bit micro number of resources is not available at the moment the user pro cessor MC op erating at a clo ck rate of MHz has to wait The co de of the user program is then down MByte of main memory protected by Bit errordetection loaded from the frontend computer to the partition assigned and Bit errorcorrection logic and four copro cessors to the user When the users job terminates all pro cessors The paged memory management unit PMMU are released There is a certain time limit which can b e set MC checks access rights and page violation when by the op erator after which the resources assigned to a user the no de memory is b eing accessed by the CPU or at the are released even if that users job is not yet completed This b eginning of DMA is done to prevent monop olizati on The following programming mo del is employed for The oatingp oint unit FPU MC executes SUPRENUM user programs consist of one or more indep en scalar oatingp oint arithmetic dent pro cesses A pro cess can create other pro cesses at any The vector oatingp oint unit VFPU consists of the p oint of time A user applicatio n starts with an initial pro Weitek chip set WTL and KByte of fast cess Termination of the initial pro cess causes termination