Active Pages: a Computation Model for Intelligent Memory

Appears in the International Symposium on Computer Architecture Barcelona Active Pages A Computation Mo del for Intelligent Memory Mark Oskin Frederic T Chong and TimothySherwood Department of Computer Science University of California at Davis yield higher parallelism and b etter integration with commo d Abstract ity micropro cessors when compared to architectures suchas Micropro cessors and memory systems suer from a growing IRAM Pat Since memory technologies are a moving tar gap in p erformance Weintro duce Active Pages a computa get we measure the sensitivity of our results to the sp eed of tion mo del which addresses this gap by shifting dataintensive ActivePage implementations This allows us to generalize to computations to the memory system An ActivePage consists currently available technologies such as DRAM macro cells in of a page of data and a set of asso ciated functions whichcan ASIC ApplicationSp ecic Integrated Circuit technologies op erate up on that data We describ e an implementation of Ac This pap er starts with a description of Active Pages in tivePages on RADram Recongurable Architecture DRAM Section and continues with our RADram implementation a memory system based up on the integration of DRAM and in Section We then describ e our exp erimental metho dology recongurable logic Results from the SimpleScalar simulator in Section and our applications in Section Wecontinue BA demonstrate up to X sp eedups on several applica with the recongurable logic designs for each application in tions using the RADram system versus conventional memory Section We present our results in Section and generalize systems We also explore the sensitivity of our results to im these results to other technologies in Section Finally we plementations in other memory technologies conclude with a discussion of related work in Section future work in Section and conclusions in Section Intro duction Active Pages Micropro cessor p erformance continues to follow phenomenal growth curves which drive the computing industry Unfortu ActivePages intro duce new programming system and fab nately memorysystem p erformance is falling b ehind Pro cessor rication issues In this section we shall discuss program centric optimizations to bridge this processormemory gap in ming issues which arise from the ActivePage computational clude prefetching sp eculation outoforder execution and mul mo del These issues are partitioning co ordination computa tithreading WM Several of these approaches can lead to tional scaling and data manipulation We will discuss system memorybandwidth problems BGK Weintro duce Active and fabrication issues in Section where we intro duce the Pages a mo del of computation which partitions applications RADram ActivePage implementation between a pro cessor and an intelligent memory system Our To use ActivePages computation for an application must goal is to keep pro cessors running at p eak sp eeds by oloading b e divided or partitionedbetween pro cessor and memoryFor data manipulation to logic placed in the memory system example we use ActivePage functions to gather op erands for Active Pages consist of a page of data and a set of asso a sparsematrix multiply and pass those op erands on to the ciated functions that op erate on that data For example an pro cessor for multiplication To p erform such a computation ActivePage may contain an array data structure and a set of the matrix data and gathering functions must rst b e loaded insert delete and nd functions that op erate on that array A into a memory system that supp orts ActivePages The pro memory system that implements ActivePages is resp onsible cessor then through a series of memorymapp ed writes starts for b oth the storage of the data and the computation of the the gather functions in the memory system As the op erands asso ciated functions are gathered the pro cessor reads them from userdened out Rapid advances in fabrication technology promise to make put areas in each page multiplies them and writes the results the integration of logic and memory practical Although Ac back to the array datastructures in memory tivePages can b e implemented in a varietyofarchitectures and technologies we fo cus up on the integration of recongurable Interface To simplify integration with commo dity micro logic and DRAM Weintro duce the RADram Recongurable pro cessors and systems the interface to ActivePages is de Architecture DRAM system On many applications our sim signed to resemble a conventional virtual memory interface ulations show substantial p erformance gains for a unipro ces Sp ecically the ActivePage interface includes the following sor workstation using a RADram system versus a conventional memory system RADram can also function as a conventional Standard memory interface functions As memory system with negligible p erformance degradation writevaddr data and readvaddr we shall see in Section RADram is likely to have sup erior A set of functions available for computation on a partic functions ular ActivePage AP Acknowledgements Thanks to Andre DeHon Matt Farrens Lance Halstead Tom Simon Deb orah Wallach and our anonymous referees An allo cation function This work is supp orted in part by an NSF CAREER award to Fred AP al locgroup idvaddr Chong by Altera and bygrants from the UC Davis Academic Senate which allo cates an ActivePage in group group id at vir More info at httparchcsucdaviseduRAD tual address vaddr Pages op erating on the same data will often b elong to a page group named bya group id in order to co ordinate op erations Speedup A function binding pro cedure AP bindgroup id AP functions functions to a which binds a set a set of functions AP group group id of Active Pages This set of functions Processor / Memory may be redened through rep eated calls to AP bind Non-Overlap Since implementations may limit the number or com plexity of functions asso ciated with each page rebinding may be necessary to make ro om for new functions by eliminating old ones Sub-page Scalable Saturated Region Region Region Additionally applications will commonly use several vari Speedup (Active Page/Convential) Processor / Active Page Non-Overlap ables in each Active Page as synchronization variables Problem Size functions and a pro cessor to co ordinate between AP These variables require no additional supp ort beyond Figure Exp ected computation scaling of ActivePages functions and reads and writes Memory accesses by AP a pro cessor are atomic Activation Time Intuitively a pro cessor working with ActivePage functions use virtual addresses and can refer a memory system that implements ActivePages is similar to a ence any virtual address available to the allo cating pro cess In control pro cessor working with a small dataparallel machine our sparsematrix example the co de b egins by calling AP al loc Typically an algorithm is partitioned by rst dispatchingare to allo cate a group of pages to store the matrices to b e multi quest for a computation to o ccur on the data within an Active plied Then AP functions are dened to include a function for Page Awellstructured application will havetomove little bind is called index comparison and data gathering Next AP if any additional data into the page in order for that function to asso ciate this function to the pages To start the page to complete Thus the ma jority of time in dispatching a work computations the pro cessor activates the pages with an or request is sp entcommunicating to the ActivePage the func dinary memory write to an applicationdened lo cation The tion to invoke and additional required parameters Werefer AP functions poll such synchronization variables as so on as to the time it takes to dispatch this request as activation time AP bind is called Once the functions have computed their Activation time is generally constant for each page for a given results and gathered the matrix data to b e multiplied they function measurements for each application will b e given in write to another set of synchronization variables to indicate Table the data is ready The pro cess p olls these variables and b e gins reading and multiplying the data once it is ready Co ordination Partitioning computations implies that Ac Our global virtual address space implies that some Active tivePages must co ordinate with the pro cessor and with each Page functions may reference data in other pages Such refer other Pro cessorpage co ordination is accomplished via pre ences are meant to b e used sparingly and the implementation dened synchronization variables Interpage co ordination is of interpage memory references will b e discussed in Section accomplished with interpage memory references ActivePage implementations are intended to function in any Synchronization variables are used to co ordinate activities system that uses a conventional memory system For example between the Active P age functions and the pro cessor The pages may co ordinate with multiple pro cessors in a Symmetric structure and layout of these variables are implementation and Multipro cessor using ActivePage synchronization variables application sp ecic The variables may serveaslocks to indi to enforce atomicity cate when inputs or outputs for an ActivePage op eration are valid This interface is similar to memorymapp ed registers Partitioning In our sparsematrix example the applica used for network interfaces tion was partitioned b etween work done at the memory system The Active Page mo del of computation do es not dene and work done at the pro cessor Such partitioning

Load more