Jennifer Moore Pipeline Pipelining Is an Instruction Set in the Xeon Phi
Total Page:16
File Type:pdf, Size:1020Kb
Jennifer Moore Pipeline Pipelining is an instruction set in the Xeon Phi Processor that lists several steps to a fetch and execute cycle. Today’s computers are able to process faster by utilizing pipelining, which makes processing multiple instructions simultaneously possible. Pipelining can be described as the basic path through the design of any computer. Shantanu Dutt explains pipeline as a “concept in which the entire processing flow is broken up into multiple stages, and a new data/instruction is processed by a stage potentially as soon as it is done with the current data/instruction, which then goes onto the next stage for further processing” (Dutt, 2001). Comparing it to the Very Long Instruction Word operation, it is very similar in the concept of using parallelism and having different steps that work together and paralleled. The user can operate more than one instruction set at the same time. These steps or instructions help describe the different steps to operate and perform a fetch and execute cycle. For example, each step can be known as the instructions that are given to the Little Man Computer so it can execute an operation. Each of these steps is timed and can result in a delay depending on the results of each step (Englander, 2009). Pipelining is similar to the Little Man Computer, however, LMC uses different operation codes or instruction sets to determine the outcome of a number that is input by the user. On the other hand, pipelining has different instruction sets to set up a path to request data on a timed basis. (Englander, 2009). Disk Cache A Disk Cache is made up of the main memory or integrated memory within most of the new disk drives. The disk cache makes it possible to access information from the disk faster by storing frequently used data in temporary memory so that is promptly accessible. Englander express “when a disk read or write request is made, the system checks the disk cache first. If the required data is present, no disk access is necessary; otherwise, a disk cache line made up of several adjoining disk blocks is moved from the disk into the disk cache area of memory” (Englander, 2009). Caching allows the system to temporally store commonly used data where it can be quickly accessed without accessing. In the diagram below, it shows that the server accessing data from the disk cache. If the server finds the data requested in the disk cache, it does not have to access the disk. When the data is accepted and stored, the Vickovic, Celar, Mudnic article explains that “when the request is stored, the amount of free space on Disk Cache is decreased and it is pushed on cache queue” (Vickovic, Celar & Mudnic, 2011). Data from the disk cache located in the Xeon Phi processor can be transmitted faster than actually reading data directly from the drive itself. (Vickovic, Celar & Mudnic, 2011) ** (similar to the one from this article)** Very Long Instruction Word (VLIW) A very long instruction word is a component used in the Xeon Phi processor that provides instructions for programs to perform efficiently. According to the Englander text, the main purpose of this architecture is “to increase execution speed by processing instruction operations in parallel” (Englander, 2009). Working on a high-speed program, VLIW would have to be used by the user. VLIW consist of numerous processors that help enhance the process of the program in order to successfully run faster. Binu Mathew explains the VLIW as “one particular style of processor design that tries to achieve high levels of parallelism by executing long instruction words composed of multiple operations” (Philips, 2008). To have a CPU that runs fast and efficient in running programs, then one needs to get one with a Very Long Instruction Word processor. VLIW can be characterized by a processor known as the Transmeta Crusoe, which is a processor design. The Transmeta Crusoe consists of different instructions. Englander explains it as “a 128 bit instruction word called molecule. The molecule is divided into four 32-bit atoms. Each atom represents an operation similar to those of a normal 32-bit instruction word” (Englander, 2009). The diagram below demonstrates the 128 bit instruction Englander explains in the text. Compared to the LMC, they both perform a fetch and execute cycle. Each can add, load, branch on condition, and store numbers. There are four operations that the atoms are used in the instruction word. These atoms collaborate to complete the execution cycle. By using parallelism, there are two cycles that work simultaneously. (Englander, 2009). References Pipeline Englander, I. (2009). The architecture of computer hardware, systems software, & networking. (4th ed., p. 253). Wiley. Dutt, S. (2001). Pipeline basics-lecture notes #14 Retrieved from http://www.ece.uic.edu/~dutt/courses/ece366/lect-notes.html Disk Cache Vickovic, L., Celar, S., & Mudnic, E. (2011). Disk array simulation model development. Retrieved from http://ehis.ebscohost.com.proxygsu- sct1.galileo.usg.edu/eds/pdfviewer/pdfviewer?sid=1eb5e2fd-ad53-4fad-8dc2- 8357a74e92b8@sessionmgr14&vid=6&hid=101 Englander, I. (2009). The architecture of computer hardware, systems software, & networking. (4th ed., p. 263). Wiley. Very Long Instruction Word Englander, I. (2009). The architecture of computer hardware, systems software, & networking. (4th ed., p. 244). Wiley. Philips. (2008). An introduction to very-long instruction word (vliw) computer architecture. Retrieved from http://twins.ee.nctu.edu.tw/courses/ca_08/literature/11_vliw.pdf IT5200 Kornchai Anujhun Ring Bus Ring bus is a substation switching arrangement that may consist of four, six, or more breakers connected in a closed loop, with the same number of connection points. Figure1 depicts the layout of a ring bus configuration, which is an extension of the sectionalized bus. In the ring bus a sectionalizing breaker has been added between the two open bus ends. In other words, there is a closed loop on the bus with each section separated by a circuit breaker. This provides greater reliability and allows for flexible operation. Figure 1 Ring bus Figure2 4-Breaker Ring Bus in ATI Graphic Card 1 IT5200 Kornchai Anujhun USB Universal Serial Bus, also known as USB, is a standard type of connection for many different kinds of devices. Generally, USB refers to the types of cables and connectors used to connect these many types of external devices to computers. The Universal Serial Bus standard has been extremely successful. USB ports and cables are used to connect hardware such as printers, scanners, keyboards, mice, flash drives, external hard drives, joysticks, cameras, and more to computers of all kinds, including desktops, tablets, laptops. In fact, USB has become so common that you'll find the connection available on nearly any computer-like device such as video game consoles, home audio/visual equipment, and even in many automobiles. Many portable devices, like Smartphone, eBook readers, and small tablets, use USB primarily for charging. USB charging has become so common that it's now easy to find replacement electrical outlets at home improvement stores with USB ports built it, negating the need for a USB power adapter. Figure3 USB Connection 2 IT5200 Kornchai Anujhun Memory Interleaving Memory interleaving is a method to increase the speed of the high-end microprocessors. This is a memory access technique that divides the system memory into a series of equal sized banks. These banks are expressed in terms of n-way interleaved: 2- way interleaving, which is using two complete address buses, 4-way interleaving, which is using complete four address buses, and 8-way interleaving, which is using complete eight address buses. While one section is busy processing upon a word at a particular location, the other section accesses the word at the next location. Figure4 2-way Interleaved Memory In a 2-way interleaved memory system, there are two physical banks of DRAM, but logically the system sees one bank of memory that is twice as large. In the interleaved bank, the first long word of bank 0 is followed by the first long word of bank 1, which is followed by the second long word of bank 0, which is followed by the second long word of bank 1, and so on. Figure2 shows this organization for four physical banks of N long words. All even long words of the logical bank are located in physical bank 0 and all odd long words are located in physical bank 1. 3 IT5200 Kornchai Anujhun References Schuette, M. (2011, Jan 02). Intel’s Sandy Bridge I. Architecture&CPU Performance. One Ring Bus to Master Them All. Retrieved from http://www.lostcircuits.com/mambo//index.php?option=com_content&task=view &id=98&Itemid=1&limit=1&limitstart=6 Wikipedia. (2011, Dec). Network Topology. Retrieved from http://en.wikipedia.org/wiki/Network_topology Shimpi, A. (2010, Sep 14). Intel’s Sandy Bridge Architecture Exposed. Retrieved from http://www.anandtech.com/show/3922/intels-sandy-bridge-architecture-exposed/4 Wikiedia. (2012, Dec). Universal Serial Bus. Retrieved from http://en.wikipedia.org/wiki/Universal_Serial_Bus PCMag. USB Definition. Retrieved from http://www.pcmag.com/encyclopedia_term/0,2542,t%3DUSB&i%3D53531,00.as p Matloff, N. (2003, Nov 05). Memory Interleaving. Retrieved from http://heather.cs.ucdavis.edu/~matloff/154A/PLN/Interleaving.pdf ORNL Physics Division. Interleaved Memory. Retrieved from http://www.phy.ornl.gov/csep/ca/node19.html 4 Execution Unit An execution unit also called a functional unit is a part of CPU that preforms the operations and calculation called for by the Branch Unit, which receives data from the CPU. It may have its own internal control sequence unit, some registers and other internal units such as a sub ALU.