Parallel Simulation of a Multi-Span DWDM System Limited by FWM

Total Page:16

File Type:pdf, Size:1020Kb

Parallel Simulation of a Multi-Span DWDM System Limited by FWM Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 23 March 2021 doi:10.20944/preprints202103.0558.v1 Article Parallel simulation of a multi-span DWDM system limited by FWM Rafael Sanchez-Lara 1 , Jose Luis Lopez-Martinez 2 , Joel Antonio Trejo-Sanchez 3 ,Herman Leonard Offerhaus 4 and Jose Alfredo Alvarez-Chavez 4* 1 Universidad Autonoma del Carmen (UNACAR), Facultad de Ingenieria, C.56, No. 4, C.P. 24180, Cd. Del Carmen, Campeche, Mexico; [email protected] 2 Facultad de Matematicas, Universidad Autonoma de Yucatan,Merida, Mexico; [email protected] 3 CONACYT-Centro de Investigación en Matemáticas; [email protected] 4 Optical Sciences Group, University of Twente, Enschede, The Netherlands;[email protected]; [email protected] * Correspondence: [email protected] 1 Abstract: One of the non-linear phenomena affecting high bandwidth and long reach communica- 2 tion systems is Four-Wave Mixing (FWM). Unfortunately, the simulation of systems for parameter 3 optimization requires more time as the number of channels increases. In this paper, we propose a 4 new high-performance computational model to obtain the optimal design parameters in a multi- 5 span Dense Wavelength Division Multiplexing (DWDM) system, limited by FWM and the intrinsic 6 Amplified Spontaneous Emission (ASE) noise of optical amplifiers employed in each segment. The 7 simulation in this work provides a complete optical design characterization and compares the effi- 8 ciency and speed improvement of the proposed parallelization model versus an earlier sequential 9 model. Additionally, an analysis of the computational complexity of parallel models is presented, 10 in which two parallel implementations are used; firstly, Open MultiProcessing (OpenMP) which is 11 based on the use of a central multi-core processing unit and secondly, the Compute Unified Device 12 Architecture (CUDA), which is based on the use of a Graphics Processing Unit (GPU). Results 13 show that parallelism improves by up to 40 times the performance of a simulation when nested 14 parallelization with CUDA is used over a sequential method and up to 6 times compared with 15 the implementation with OpenMP using 12 processors. Within our parallel implementation, it Citation: Sanchez-Lara, R.; 16 is possible to simulate with an increased number of channels that which was impractical in the Lopez-Martinez, J.L.; Trejo-Sanchez, 17 sequential simulation. J. A.; Offerhaus, H.;Alvarez-Chavez,J.A. Parallel 18 Keywords: Computer Simulation; DWDM; FWM; GPU programming; Parallel Architectures; simulation of a multi-span DWDM 19 Parallel Programming system limited by FWM. Appl. Sci. 2021, 1, 0. https://doi.org/ Received: Accepted: 20 1. Introduction Published: 21 Non-linear phenomena occur in optical fibre systems especially when they are 22 used at their maximum . It is a limitation for multi-segment, high-bandwidth and long- Publisher’s Note: MDPI stays neu- 23 range Dense Wavelength Division Multiplexing (DWDM) systems. DWDM is used to tral with regard to jurisdictional 24 maximize the bandwidth capabilities of optical fibres where their segments are connected claims in published maps and insti- 25 by optical repeaters. Such communication systems are found in land or transoceanic tutional affiliations. 26 communication systems. Five non-linear effects that play a role in such systems are 27 known so far: Four-Wave Mixing (FWM), Stimulated Raman Scattering (SRS), Stimulated 28 Copyright: © 2021 by the authors. Brillouin Scattering (SBS), Self-Phase Modulation (SPM) and Cross-Phase Modulation Submitted to Appl. Sci. for possible 29 (XPM); each appears under certain design circumstances. In any design analysis, basic open access publication under the 30 linear effects will also be present such as fibre attenuation and dispersion, defined by terms and conditions of the Cre- 31 the parameter a and D, respectively. Erbium Doped Fiber Amplifiers (EDFAs) have ative Commons Attribution (CC 32 been used to compensate for fibre losses and to increase the transmission distance, BY) license (https://creativecom- 33 causing at the same time an increase on amplified spontaneous emission (ASE) noise mons.org/licenses/by/ 4.0/). 34 and nonlinearities [1,2].Different kinds of optical fibres have been designed to reduce the Version March 20, 2021 submitted to Appl. Sci. https://www.mdpi.com/journal/applsci © 2021 by the author(s). Distributed under a Creative Commons CC BY license. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 23 March 2021 doi:10.20944/preprints202103.0558.v1 Version March 20, 2021 submitted to Appl. Sci. 2 of 13 35 effect of dispersion and nonlinear effects [3–6]. There are numerous studies on non-linear 36 impairments showing the importance and complexity of such phenomena for DWDM 37 [7–12]. 38 The sequential model presented in [13] can be used on a multi-span DWDM system 39 limited by FWM/ASE to obtain the optimal power (Popt) and maximum link length (Lmax) 40 with respect to the dispersion D. The model is highly parallelizable so that different 41 parallelization models can be used to reduce the computational complexity. Paralleliza- 42 tion can significantly reduce the processing or execution time of a sequential algorithm 43 [14],including the simulation of Non-Linear phenomena [15]. To implement paralleliza- 44 tion, several tools exist, such as MPI (Message Passing Interface) [16], OpenMP[17], and 45 recently GPU cards [18]. The latter are programmable using the libraries provided by 46 NVIDIA, in particular the CUDA-C language. Other examples using GPU cards are 47 mentioned in [19,20], where the use of this tool significantly improves the execution of 48 algorithms. 49 50 In this work we use two schemes to significantly reduce the time complexity of 51 DWDM and simulate more channels than obtained in [13].First, we show a multipro- 52 cessing parallel paradigm using OpenMP since it only uses the processors located in the 53 Central Processing Unit (CPU) and does not require a distributed system like MPI. The 54 efficiency has been proven [21], but it is limited by the number the cores in the processor. 55 Secondly,we use of Graphic Processing Units for the cost-benefit ratio [22–24].For CUDA, 56 we use Dynamic Parallelism (DM) [22]. Dynamic parallelism consists of hierarchical 57 kernels in a tree structure, where a thread is defined as a father kernel. This father kernel 58 can generate new child kernels to execute tasks in the GPU context. The height of the 59 tree depends on the characteristics of the GPU card. Finally, an analysis of both imple- 60 mentations using metrics such as speedup, efficiency and performance ratio is presented. 61 62 The paper is divided into the following sections: Section 2 describes the theory of a 63 multi-span DWDM system limited by FWM; Section 3 describes the proposed sequential 64 model of computation; in Section 4 architectural models are explained, in Section 5, 65 experimental, and simulated results are presented in terms of execution time, efficiency, 66 performance ratio, and speedup. Finally concluding remarks are presented in Section 6. 67 2. Theoretical Analysis 3 68 The power dependence of the refractive index is denoted by c and FWM, is de- 3 69 scribed using c . If three optical fields with carrier frequencies f1, f2, and f3 copropagate 70 inside the fiber simultaneously, FWM generates a fourth field whose frequency f4 is 71 related to other frequencies by a relation f4 = f1 ± f2 ± f3. All frequencies correspond- 72 ing to different plus and minus sign combinations are possible. In practice, most do 73 not grow to significance because of a phase-mis-matching. Frequency combinations of 74 the form f4 = f1 + f2 − f3 are troublesome for multichannel communication systems 75 since they can become nearly phase-matched when channel wavelengths lie close to the 76 zero-dispersion wavelength [1]. 77 A multi-span DWDM system refers to the multiplexing of multiple information 78 frequency channels ( f1, f2, ... , fN) in a single fibre with the aim of transmitting the 79 largest possible bandwidth. To achieve long ranges, the optical fibre is segmented 80 and connected by optical amplifiers as shown in Figure1. The operating bandwidth 81 of the amplifiers must match the spectral bandwidth of the combined channels to be 82 transmitted. The C band is the most commonly used for its low lossesand the EDFA fits 83 this system [2]. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 23 March 2021 doi:10.20944/preprints202103.0558.v1 Version March 20, 2021 submitted to Appl. Sci. 3 of 13 Figure 1. Multi-span DWDM System setup 84 Figure1 represents the multi-span DWDM system setup. fi represents the i − th 85 frequency channel, LS is the length of a segment or the space between amplifiers (typical 86 values between 50 km and 100 km), N is the total number of channels, LT is the total 87 length of the system and M is the total number of segments. 88 The total spectral width in band C is W = fN − f1 = 3.75 THz (30nm). The multi- 89 channel density D f is a function of the total number of N channels and can be expressed 90 as: W D f = f − f − = . (1) i i 1 N − 1 91 The central frequency can be expressed as: f + f f = 1 N = 193THz[1550nm]. (2) 2 92 An FWM signal appears by mixing three frequency channels fi, fj and fk, so that is, 93 fn = fi + fj − fk(i, j 6= k). The power at fn is given by [11,12,25]: MLe f f 2 2 2 −aLS Pi,j,k = k PiPjPkhi,j,kdi,j,k( ) e , (3) Ae f f 32p3c3 94 k = c for h2cl , where l is wavelength, is the vacuum light speed, h is the core 95 refractive index, a is the linear loss coefficient,Pi, Pj, Pk are launched input power levels 96 per channel, Le f f = (1 − exp(−aLS)/a is the fibre effective length, Ae f f is the effective 97 area of the guided mode, d is the degeneracy factor (d = 3 for i = j, d = 6 in case i 6= j), 3 −14 3 −1 98 c is the third-order nonlinear electric susceptibility (6 × 10 m W ), and hi,j,k is the 99 FWM efficiency, which is given by: 2 −aLS a 4e 2 Dbi,j,k LS hi,j,k = ( )(1 + ) sin ( ), (4) 2 + 2 ( − −aLS )2 a Dbi,j,k 1 e 2 100 where Dbi,j,k is the phase mismatch that is expressed as: 2pl2 Dbi,j,k = ( )(j fi − fkjj fj − fkj) c (5) dD (D + (( f − f ) + ( f − f ))).
Recommended publications
  • Designing an Ultra Low-Overhead Multithreading Runtime for Nim
    Designing an ultra low-overhead multithreading runtime for Nim Mamy Ratsimbazafy Weave [email protected] https://github.com/mratsim/weave Hello! I am Mamy Ratsimbazafy During the day blockchain/Ethereum 2 developer (in Nim) During the night, deep learning and numerical computing developer (in Nim) and data scientist (in Python) You can contact me at [email protected] Github: mratsim Twitter: m_ratsim 2 Where did this talk came from? ◇ 3 years ago: started writing a tensor library in Nim. ◇ 2 threading APIs at the time: OpenMP and simple threadpool ◇ 1 year ago: complete refactoring of the internals 3 Agenda ◇ Understanding the design space ◇ Hardware and software multithreading: definitions and use-cases ◇ Parallel APIs ◇ Sources of overhead and runtime design ◇ Minimum viable runtime plan in a weekend 4 Understanding the 1 design space Concurrency vs parallelism, latency vs throughput Cooperative vs preemptive, IO vs CPU 5 Parallelism is not 6 concurrency Kernel threading 7 models 1:1 Threading 1 application thread -> 1 hardware thread N:1 Threading N application threads -> 1 hardware thread M:N Threading M application threads -> N hardware threads The same distinctions can be done at a multithreaded language or multithreading runtime level. 8 The problem How to schedule M tasks on N hardware threads? Latency vs 9 Throughput - Do we want to do all the work in a minimal amount of time? - Numerical computing - Machine learning - ... - Do we want to be fair? - Clients-server - Video decoding - ... Cooperative vs 10 Preemptive Cooperative multithreading:
    [Show full text]
  • Fiber Context As a first-Class Object
    Document number: P0876R6 Date: 2019-06-17 Author: Oliver Kowalke ([email protected]) Nat Goodspeed ([email protected]) Audience: SG1 fiber_context - fibers without scheduler Revision History . .1 abstract . .2 control transfer mechanism . .2 std::fiber_context as a first-class object . .3 encapsulating the stack . .4 invalidation at resumption . .4 problem: avoiding non-const global variables and undefined behaviour . .5 solution: avoiding non-const global variables and undefined behaviour . .6 inject function into suspended fiber . 11 passing data between fibers . 12 termination . 13 exceptions . 16 std::fiber_context as building block for higher-level frameworks . 16 interaction with STL algorithms . 18 possible implementation strategies . 19 fiber switch on architectures with register window . 20 how fast is a fiber switch . 20 interaction with accelerators . 20 multi-threading environment . 21 acknowledgments . 21 API ................................................................ 22 33.7 Cooperative User-Mode Threads . 22 33.7.1 General . 22 33.7.2 Empty vs. Non-Empty . 22 33.7.3 Explicit Fiber vs. Implicit Fiber . 22 33.7.4 Implicit Top-Level Function . 22 33.7.5 Header <experimental/fiber_context> synopsis . 23 33.7.6 Class fiber_context . 23 33.7.7 Function unwind_fiber() . 27 references . 29 Revision History This document supersedes P0876R5. Changes since P0876R5: • std::unwind_exception removed: stack unwinding must be performed by platform facilities. • fiber_context::can_resume_from_any_thread() renamed to can_resume_from_this_thread(). • fiber_context::valid() renamed to empty() with inverted sense. • Material has been added concerning the top-level wrapper logic governing each fiber. The change to unwinding fiber stacks using an anonymous foreign exception not catchable by C++ try / catch blocks is in response to deep discussions in Kona 2019 of the surprisingly numerous problems surfaced by using an ordinary C++ exception for that purpose.
    [Show full text]
  • Fiber-Based Architecture for NFV Cloud Databases
    Fiber-based Architecture for NFV Cloud Databases Vaidas Gasiunas, David Dominguez-Sal, Ralph Acker, Aharon Avitzur, Ilan Bronshtein, Rushan Chen, Eli Ginot, Norbert Martinez-Bazan, Michael Muller,¨ Alexander Nozdrin, Weijie Ou, Nir Pachter, Dima Sivov, Eliezer Levy Huawei Technologies, Gauss Database Labs, European Research Institute fvaidas.gasiunas,david.dominguez1,fi[email protected] ABSTRACT to a specific workload, as well as to change this alloca- The telco industry is gradually shifting from using mono- tion on demand. Elasticity is especially attractive to op- lithic software packages deployed on custom hardware to erators who are very conscious about Total Cost of Own- using modular virtualized software functions deployed on ership (TCO). NFV meets databases (DBs) in more than cloudified data centers using commodity hardware. This a single aspect [20, 30]. First, the NFV concepts of modu- transformation is referred to as Network Function Virtual- lar software functions accelerate the separation of function ization (NFV). The scalability of the databases (DBs) un- (business logic) and state (data) in the design and imple- derlying the virtual network functions is the cornerstone for mentation of network functions (NFs). This in turn enables reaping the benefits from the NFV transformation. This independent scalability of the DB component, and under- paper presents an industrial experience of applying shared- lines its importance in providing elastic state repositories nothing techniques in order to achieve the scalability of a DB for various NFs. In principle, the NFV-ready DBs that are in an NFV setup. The special combination of requirements relevant to our discussion are sophisticated main-memory in NFV DBs are not easily met with conventional execution key-value (KV) stores with stringent high availability (HA) models.
    [Show full text]
  • A Thread Performance Comparison: Windows NT and Solaris on a Symmetric Multiprocessor
    The following paper was originally published in the Proceedings of the 2nd USENIX Windows NT Symposium Seattle, Washington, August 3–4, 1998 A Thread Performance Comparison: Windows NT and Solaris on A Symmetric Multiprocessor Fabian Zabatta and Kevin Ying Brooklyn College and CUNY Graduate School For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: [email protected] 4. WWW URL:http://www.usenix.org/ A Thread Performance Comparison: Windows NT and Solaris on A Symmetric Multiprocessor Fabian Zabatta and Kevin Ying Logic Based Systems Lab Brooklyn College and CUNY Graduate School Computer Science Department 2900 Bedford Avenue Brooklyn, New York 11210 {fabian, kevin}@sci.brooklyn.cuny.edu Abstract Manufacturers now have the capability to build high performance multiprocessor machines with common CPU CPU CPU CPU cache cache cache cache PC components. This has created a new market of low cost multiprocessor machines. However, these machines are handicapped unless they have an oper- Bus ating system that can take advantage of their under- lying architectures. Presented is a comparison of Shared Memory I/O two such operating systems, Windows NT and So- laris. By focusing on their implementation of Figure 1: A basic symmetric multiprocessor architecture. threads, we show each system's ability to exploit multiprocessors. We report our results and inter- pretations of several experiments that were used to compare the performance of each system. What 1.1 Symmetric Multiprocessing emerges is a discussion on the performance impact of each implementation and its significance on vari- Symmetric multiprocessing (SMP) is the primary ous types of applications.
    [Show full text]
  • Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++
    Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++ Christopher S. Zakian, Timothy A. K. Zakian Abhishek Kulkarni, Buddhika Chamith, and Ryan R. Newton Indiana University - Bloomington, fczakian, tzakian, adkulkar, budkahaw, [email protected] Abstract. Library and language support for scheduling non-blocking tasks has greatly improved, as have lightweight (user) threading packages. How- ever, there is a significant gap between the two developments. In previous work|and in today's software packages|lightweight thread creation incurs much larger overheads than tasking libraries, even on tasks that end up never blocking. This limitation can be removed. To that end, we describe an extension to the Intel Cilk Plus runtime system, Concurrent Cilk, where tasks are lazily promoted to threads. Concurrent Cilk removes the overhead of thread creation on threads which end up calling no blocking operations, and is the first system to do so for C/C++ with legacy support (standard calling conventions and stack representations). We demonstrate that Concurrent Cilk adds negligible overhead to existing Cilk programs, while its promoted threads remain more efficient than OS threads in terms of context-switch overhead and blocking communication. Further, it enables development of blocking data structures that create non-fork-join dependence graphs|which can expose more parallelism, and better supports data-driven computations waiting on results from remote devices. 1 Introduction Both task-parallelism [1, 11, 13, 15] and lightweight threading [20] libraries have become popular for different kinds of applications. The key difference between a task and a thread is that threads may block|for example when performing IO|and then resume again.
    [Show full text]
  • Multiprocessing Contents
    Multiprocessing Contents 1 Multiprocessing 1 1.1 Pre-history .............................................. 1 1.2 Key topics ............................................... 1 1.2.1 Processor symmetry ...................................... 1 1.2.2 Instruction and data streams ................................. 1 1.2.3 Processor coupling ...................................... 2 1.2.4 Multiprocessor Communication Architecture ......................... 2 1.3 Flynn’s taxonomy ........................................... 2 1.3.1 SISD multiprocessing ..................................... 2 1.3.2 SIMD multiprocessing .................................... 2 1.3.3 MISD multiprocessing .................................... 3 1.3.4 MIMD multiprocessing .................................... 3 1.4 See also ................................................ 3 1.5 References ............................................... 3 2 Computer multitasking 5 2.1 Multiprogramming .......................................... 5 2.2 Cooperative multitasking ....................................... 6 2.3 Preemptive multitasking ....................................... 6 2.4 Real time ............................................... 7 2.5 Multithreading ............................................ 7 2.6 Memory protection .......................................... 7 2.7 Memory swapping .......................................... 7 2.8 Programming ............................................. 7 2.9 See also ................................................ 8 2.10 References .............................................
    [Show full text]
  • Multiprocessing and Scalability
    Multiprocessing and Scalability A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Multiprocessing and Scalability Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uni- processor systems. However, due to a number of difficult problems, the potential of these machines has been difficult to realize. This is because of the: Fall 2004 2 Multiprocessing and Scalability Advances in technology ─ Rate of increase in performance of uni-processor, Complexity of multiprocessor system design ─ This drastically effected the cost and implementation cycle. Programmability of multiprocessor system ─ Design complexity of parallel algorithms and parallel programs. Fall 2004 3 Multiprocessing and Scalability Programming a parallel machine is more difficult than a sequential one. In addition, it takes much effort to port an existing sequential program to a parallel machine than to a newly developed sequential machine. Lack of good parallel programming environments and standard parallel languages also has further aggravated this issue. Fall 2004 4 Multiprocessing and Scalability As a result, absolute performance of many early concurrent machines was not significantly better than available or soon- to-be available uni-processors. Fall 2004 5 Multiprocessing and Scalability Recently, there has been an increased interest in large-scale or massively parallel processing systems. This interest stems from many factors, including: Advances in integrated technology. Very
    [Show full text]
  • Design of a Message Passing Interface for Multiprocessing with Atmel Microcontrollers
    DESIGN OF A MESSAGE PASSING INTERFACE FOR MULTIPROCESSING WITH ATMEL MICROCONTROLLERS A Design Project Report Presented to the Engineering Division of the Graduate School of Cornell University in Partial Fulfillment of the Requirements of the Degree of Master of Engineering (Electrical) by Kalim Moghul Project Advisor: Dr. Bruce R. Land Degree Date: May 2006 Abstract Master of Electrical Engineering Program Cornell University Design Project Report Project Title: Design of a Message Passing Interface for Multiprocessing with Atmel Microcontrollers Author: Kalim Moghul Abstract: Microcontrollers are versatile integrated circuits typically incorporating a microprocessor, memory, and I/O ports on a single chip. These self-contained units are central to embedded systems design where low-cost, dedicated processors are often preferred over general-purpose processors. Embedded designs may require several microcontrollers, each with a specific task. Efficient communication is central to such systems. The focus of this project is the design of a communication layer and application programming interface for exchanging messages among microcontrollers. In order to demonstrate the library, a small- scale cluster computer is constructed using Atmel ATmega32 microcontrollers as processing nodes and an Atmega16 microcontroller for message routing. The communication library is integrated into aOS, a preemptive multitasking real-time operating system for Atmel microcontrollers. Report Approved by Project Advisor:__________________________________ Date:___________ Executive Summary Microcontrollers are versatile integrated circuits typically incorporating a microprocessor, memory, and I/O ports on a single chip. These self-contained units are central to embedded systems design where low-cost, dedicated processors are often preferred over general-purpose processors. Some embedded designs may incorporate several microcontrollers, each with a specific task.
    [Show full text]
  • The Impact of Process Placement and Oversubscription on Application Performance: a Case Study for Exascale Computing
    Takustraße 7 Konrad-Zuse-Zentrum D-14195 Berlin-Dahlem fur¨ Informationstechnik Berlin Germany FLORIAN WENDE,THOMAS STEINKE,ALEXANDER REINEFELD The Impact of Process Placement and Oversubscription on Application Performance: A Case Study for Exascale Computing ZIB-Report 15-05 (January 2015) Herausgegeben vom Konrad-Zuse-Zentrum fur¨ Informationstechnik Berlin Takustraße 7 D-14195 Berlin-Dahlem Telefon: 030-84185-0 Telefax: 030-84185-125 e-mail: [email protected] URL: http://www.zib.de ZIB-Report (Print) ISSN 1438-0064 ZIB-Report (Internet) ISSN 2192-7782 The Impact of Process Placement and Oversubscription on Application Performance: A Case Study for Exascale Computing Florian Wende, Thomas Steinke, Alexander Reinefeld Abstract: With the growing number of hardware components and the increasing software complexity in the upcoming exascale computers, system failures will become the norm rather than an exception for long-running applications. Fault-tolerance can be achieved by the creation of checkpoints dur- ing the execution of a parallel program. Checkpoint/Restart (C/R) mechanisms allow for both task migration (even if there were no hardware faults) and restarting of tasks after the occurrence of hard- ware faults. Affected tasks are then migrated to other nodes which may result in unfortunate process placement and/or oversubscription of compute resources. In this paper we analyze the impact of unfortunate process placement and oversubscription of compute resources on the performance and scalability of two typical HPC application workloads, CP2K and MOM5. Results are given for a Cray XC30/40 with Aries dragonfly topology. Our results indicate that unfortunate process placement has only little negative impact while oversubscription substantially degrades the performance.
    [Show full text]
  • And Computer Systems I: Introduction Systems
    Welcome to Computer Team! Noah Singer Introduction Welcome to Computer Team! Units Computer and Computer Systems I: Introduction Systems Computer Parts Noah Singer Montgomery Blair High School Computer Team September 14, 2017 Overview Welcome to Computer Team! Noah Singer Introduction 1 Introduction Units Computer Systems 2 Units Computer Parts 3 Computer Systems 4 Computer Parts Welcome to Computer Team! Noah Singer Introduction Units Computer Systems Computer Parts Section 1: Introduction Computer Team Welcome to Computer Team! Computer Team is... Noah Singer A place to hang out and eat lunch every Thursday Introduction A cool discussion group for computer science Units Computer An award-winning competition programming group Systems A place for people to learn from each other Computer Parts Open to people of all experience levels Computer Team isn’t... A programming club A highly competitive environment A rigorous course that requires commitment Structure Welcome to Computer Team! Noah Singer Introduction Units Four units, each with five to ten lectures Computer Each lecture comes with notes that you can take home! Systems Computer Guided activities to help you understand more Parts difficult/technical concepts Special presentations and guest lectures on interesting topics Conventions Welcome to Computer Team! Noah Singer Vocabulary terms are marked in bold. Introduction Definition Units Computer Definitions are in boxes (along with theorems, etc.). Systems Computer Parts Important statements are highlighted in italics. Formulas are written
    [Show full text]
  • Piotr Warczak Quarter: Fall 2011 Student ID: 99XXXXX Credit: 2 Grading: Decimal
    CSS 600: Independent Study Contract ­ Final Report Student: Piotr Warczak Quarter: Fall 2011 Student ID: 99XXXXX Credit: 2 Grading: Decimal Independent Study Title The GPU version of the MASS library. Focus and Goals The current version of the MASS library is written in java programming language and combines the power of every computing node on the network by utilizing both multithreading and multiprocessing. The new version will be implemented in C and C++. It will also support multithreading and multiprocessing and finally CUDA parallel computing architecture utilizing both single and multiple devices. This version will harness the power of the GPU a general purpose parallel processor. My specific goals of the past quarter were: • To create a baseline in C language using Wave2D application as an example. • To implement single thread Wave2D • To implement multithreading Wave2D • To implement CUDA enabled Wave2D Work Completed During this quarter I have created Wave2D application in C language using a single thread, multithreaded and CUDA enabled application. The reason for this is that CUDA is an extension of C and thus we need to create baseline against which other versions of the program can be measured. This baseline illustrates how much faster the CUDA enabled program is versus single thread and multithreaded versions. Furthermore, once the GPU version of the MASS library is implemented, we can compare the results to identify if there is any difference in program’s total execution time due to the potential MASS library overhead. And if any major delays in execution are found, the problem ares will be identified and corrected.
    [Show full text]
  • Enabling Technologies for Distributed and Cloud Computing Dr
    Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Technologies for Network-Based Systems Multi-core CPUs and Multithreading Technologies CPU’s today assume a multi-core architecture with dual, quad, six, or more processing cores. The clock rate increased from 10 MHz for Intel 286 to 4 GHz for Pentium 4 in 30 years. However, the clock rate reached its limit on CMOS chips due to power limitations. Clock speeds cannot continue to increase due to excessive heat generation and current leakage. Multi-core CPUs can handle multiple instruction threads. Technologies for Network-Based Systems Multi-core CPUs and Multithreading Technologies LI cache is private to each core, L2 cache is shared and L3 cache or DRAM is off the chip. Examples of multi-core CPUs include Intel i7, Xeon, AMD Opteron. Each core can also be multithreaded. E.g. the Niagara II has 8 cores with each core handling 8 threads for a total of 64 threads maximum. Technologies for Network-Based Systems Hyper-threading (HT) Technology A feature of certain Intel chips (such as Xeon, i7) that makes one physical core appear as two logical processors. On an operating system level, a single-core CPU with Hyper- Threading technology will be reported as two logical processors, dual-core CPU with HT is reported as four logical processors, and so on. HT adds a second set of general, control and special registers. The second set of registers allows the CPU to keep the state of both cores, and effortlessly switch between them by switching the register set.
    [Show full text]