An Optical Delay Line Memory Model with E Cient Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
An Optical Delay Line Memory Mo del with Ecient Algorithms y z John H. Reif Akhilesh Tyagi Department of Computer Science Department of Computer Science Duke University Iowa State University Durham, NC 27706 Ames, IA 50011 Abstract The extremely high data rates of optical computing technology 100 MWords p er second and upwards present unprecedented challenges in the dynamic memory design. An optical ber loop, used as a delay line, is the b est candidate for primary, dynamic memory at this time. However, they p ose sp ecial problems in the design of algorithms due to synchronization requirements b e- tween the lo op data and the pro cessor. We develop a theoretical mo del, whichwe call the lo op memory mo del LLM, to capture the relevantcharacteristics of a lo op based memory. An imp or- tant class of algorithms, ascend/descend { which includes algorithms for merging, sorting, DFT, matrix transp osition and multiplication and data p ermutation { can b e implemented without any time loss due to memory synchronization. We develop b oth sequential and parallel implementa- tions of ascend/descend algorithms and some matrix computations. Some lower b ounds are also demonstrated. 1 Intro duction 1.1 The Success of VLSI Theory Over the last 15 years, VLSI has moved from b eing a theoretical abstraction to b eing a practical reality. As VLSI design to ols and VLSI fabrication facilities such as MOSIS b ecame widely available, the algorithm design paradigms such as systolic algorithms [Kun79], that were thought to b e of theoretical interest only,have b een used in high p erformance VLSI hardware. Along the same lines, the theoretical limitations of VLSI predicted by area-time tradeo lower b ounds [Tho79]have b een found to b e imp ortant limitations in practice. A preliminary version of this pap er app eared in Proc. of the ADVANCED RESEARCH in VLSI: International Conference 1991, MIT Press, March 1991. y supp orted in part by DARPA/ARO contract DAAL03-88-K-0195, Air Force Contract AFOSR-87-0386, DARPA/ISTO contract N00014-88-K-0458, NASA sub contract 550-63 of primecontract NAS5-30428 and by NSF Grant NSF-IRI-91-00681. z supp orted by NSF Grant MIP-8806169, NCBS&T Grant 89SE04 and a Junior Faculty Developmentaward from UNC, Chap el Hill. This work was p erformed when the second author was with UNC, Chap el Hill. 1 1.2 The Promise of Optical Computing The optical computing technology is considered to b e one of the several technologies that could provide a b o ost of 2-3 orders of magnitude over the currently used semiconductor silicon based technology,in computing sp eed. The eld of electro-optical computing, however, is at its infancy stage; comparable to the state of VLSI technology,say,10years ago. Fabrication facilities for manykey electro-optical comp onents are not widely available { instead, the crucial electro-optical devices must b e sp ecially made in the lab oratories. However, a numb er of prototyp e electro-optical computing systems have b een built recently { p erhaps most notably at Bell Lab oratories under Huang [Hua80, Hua84], and also at Boulder under Jordan see b elow. In addition, several optical message routing devices have b een built at Boulder [MJR89], Stanford and USC [Lou91], [MG90, MMG90], [SS84]. Thus optical computing technology has matured to the p oint that prototypical computing machines can b e built in the research lab oratories. But it certainly has not attained the maturity of VLSI technology. What is likely to o ccur in the future? The technology for electro-optical computing is likely to advance rapidly through the 90s, just as VLSI technology advanced in the late 70s and 80s. Therefore, following our past exp erience with VLSI, it seems likely that the theoretical underpinnings for optical computing technology { namely the discovery of ecient algorithms and of resource lower b ounds, are crucial to guide its development. This seems to us to b e the right moment for the algorithm designers to get involved in this enterprise. A careful interaction b etween the architects and the algorithm designers can lead to a b etter thought-out design. This pap er is an attempt in that direction. 1.3 Data Storage: A Key Problem in Optical Computing The optical computing technology can obtain extremely high data rates b eyond what can b e obtained by current semiconductor technology. Therefore, in order to sustain these data rates, the dynamic storage must b e based on new technologies which are likely to b e completely or partially optical. Jordan at the Colorado Opto electronic Computing Systems Center [Jor89b] and some other groups have prop osed and used optical delay line lo ops for dynamic storage. In these data storage systems, an optical b er, whose characteristics match the op erating wavelength, is used to form a delay line lo op. In particular, the system sends a sequence of optically enco ded bits down one end of the lo op and after a certain delay which dep ends on the length and optical characteristics of the lo op, the optically enco ded bits app ear at the end of the lo op, to b e either utilized at that time and/or once again sentdown the lo op. This idea of using propagation delay for data storage dates back to the use of mercury delay lo ops in early electronic computing systems b efore the advent of large primary or 4 secondary memory storage. Jordan [Jor89b] has b een able to store 10 bits p er b er lo op with the b er length of approximately one kilometer. This was achieved in a small, low cost prototyp e system with 2 a synchronous lo op without very precise temp erature control. Jordan used such a delay lo op system to build the second known purely optical computer after Huang's, which can simulate a counter. Note that this do es not represent the ultimate limitations on the storage capacity of optical delay lo ops, which could in principle provide very large storage using higher p erformance electro-optical transducers and multiple lo ops. Maguire and Prucnal [JP89] suggest using optical delay lines to form disk storage. The main problem with such a dynamic storage is that it is not a random-access memory. A delay line lo op cannot b e tapp ed at many p oints since a larger numb er of taps leads to excessive signal degradation. This implies that if an algorithm is not designed around this shortcoming of the dynamic storage, it mighthavetowait for the whole length of the lo op for each data access. Systolic algorithms [Kung, [Kun79]] also exhibit such a tightinter-dep endence b etween the dynamic storage and the data access pattern. 1.4 Lo op Memory Mo del and Our Results In this pap er, we prop ose the lo op-memory mo del LMM as a mo del of sequential electro-optical computing with delay line based dynamic storage. The lo op-memory mo del contains the basic features that current delay lo op systems use, as well as the features that systems in the future are likely to use. It would seem that the restrictive discipli ne imp osed on the data access patterns by a lo op memory would degrade the p erformance of most algorithms, b ecause the pro cessor mighthave to idle waiting for data. However, we demonstrate that an imp ortant class of algorithms, ascend/descend algorithms, can b e realized in the lo op memory mo del without any loss of eciency. Note that many problems including merging, sorting, DFT, matrix transp osition and multiplication and data p ermutation are solvable with an ascend/descend algorithm describ ed in Section 3. In fact, we give sequential algorithms covering a broad range for the numb er of lo ops required. A parallel implementation p erforming the optimal amountofwork is also shown. The work p erformed by a parallel algorithm running on p pro cessors in time T is given by p T . An ascend or descend phase takes time O n log n in LMM using log n lo ops of sizes 1; 2; 4; :::;n. Note that a straight-forward emulation of a butter y network with O n log n time p erformance re- quires O n lo ops: n lo ops of size 1, n=2 of size 2, n=4 of size 4, :::, 1 of size n. It can b e implemented p 1:5 in time n just with two lo ops of sizes n and n each. This can b e generalized into an ascend-descend 1:5 k=2 scheme with time nk + n 2 with 2 k log n lo ops. At this p oint in time, a lo op is a precious resource in optical technology, and hence tailoring an algorithm around the numberofavailable lo ops is an imp ortant capability. The k -lo op adaptation of the ascend/descend algorithm provides just this 2 capability. A single lo op pro cessor takes n time. A matching lower b ound also exists for this case, 3 which is derived in Section 6 from one tap e Turing machine crossing sequence arguments. Matrix multiplication and matrix transp osition can also b e p erformed in LMM without any loss of time. We also consider a butter y network with p log p LMM pro cessors, where 1 <pn. The work of pro cessors, time pro duct of this network for ascend-descend algorithms is shown to b e O n log n.