Dr. Ernesto Gomez : CSE 401 Chapter 5 Topics

Dr. Ernesto Gomez : CSE 401 Chapter 5 topics. We will concentrate on sections 5.3-5.8, with some excursions into others if there is time 1. Memory The so-called memory hierarchy is something that comes up in the Von Neumann model of computation. A brief recap - our prime model for an algorithm is the Turing Machine - a finite state machine that stores program, data, and work space on a rewriteable tape. The key is being able to move back and forth on the tape, and the ability to write something, go do something else, then come back to what you wrote, in any order and any number of times. Consider our three basic machines - Finite Automaton - no memory, information is stored in the structure but cannot be changed ; Pushdown Automaton - add a stack, now information can be stored, but can only be retrieved in specific, in this case in last-in, first-out, and to get at infromation deeper on the stack you must delete information above it ; Turing Machine - replace the stack with a tape, now you can read-write stuff in any order, access as many times as you need to. Unlike simpler automata, the T.M. can store information about what to do when it reads data - it is programmable. Nothing we know is more computationally powerful than the T.M. The VonNeumann machine, which is the model we use in moat modern computers, is no more powerful than a T.M. in terms of problems it can solve - but it is faster and easier to program. The idea is, we have a CPU with a limited amount of local workspace, and we have a separate memory, which can store arbitrarily large amounts of data, and which allows random access (the random access property - accessing the information stored in any address in memory takes constant time). Immediately this gives the VonNeumann machine a speed/ease of programming ad- vantage over the TM - accessing data on a tape takes linear time, because you need to move the tape to the data location, and acess cn be done in a single instruction without needing to loop over the tape to reach the data. TM and VonNeumann are not the only models we could use, neural networks, the lambda calculus, cellular automata, as well as other models are all computationally the same as a TM, each has advantages and disadvantages., and different data access models. Von Neu- mann is currently the model that we can most easily build, and that we know how to program. 1.1. The memory hierarchy. We generally have multiple levels of memory, how many levels and what they are depends on physics, technology and commercial reasons. The first computers we built using the VonNeumann model were just that - CPU and memory. But the memory split into multiple layers - a modern machine might have : CPU - Cache( may be multiple levels) - RAM - SSD(solid state drive) - hard drive (mechanical) - removable storage (tape, DVD, other). 1.1.1. Physics and engineering: It is not physically possible to build memory that combines speed, arbitrary size, and random access. Two basic reasons: anything we build has finite size, the speed of light c =∼ 3x108meters/second, and we cannot communicate information faster than light (electric current transmits signals slower than light, it is almost c on a straight highly conductive wire, slower in circuits and semiconductors or other materials. The only way we could get real random access is by arranging all the memory in a sphere of fixed radius r around the CPU. Since 1 2 our memory elements are finite size, the amount of memory elements we get is limited by the area of the sphere, so if we need to add memory, r increases and it gets slower. It is not practical to enclose a CPU in a sphere of memory, anyway. If we arrange memory in a flat grid (how we build circuits), then it is not random access because different memory elements are at different distance form the input/output connection. We can simulate random access (recall - random access means same time, not fast!) by using a clock driven circuit (like we do on the CPU), and setting the clock rate to allow the slowest access time of any memory element on the chip - like on a pipeline, the cycle time is the time needed by the slowest element. By the way - this implies, if you have multiple memory chips in your computer, the clock time is set by the slowest chip (best performance when all the chips match!). It also implies, all other things being equal, bigger memory capacity ! more circuitea ! longer wires ! slower. How much does this matter? Convert units on speed of light value, we get 30cm/nanosecond -10cm is about 1/3 nanosecond. Since a memory chip is physically much smaller than 10cm, this may not matter much in an individual chip - placement on the motherboard could matter, however. Recalling that wires don’t always take a direct path between any two points in a circuit, you can see that, for a 3GHz clock (typical midrange workstation), it can take one or two CPU cycles for a signal from the CPU to reach memory. 1.1.2. Technology. We can store bits in many ways - the fastest right now is as the on-off state of a transistor, this is what we do in CPU registers. The disadvantage is, they require power to hold the state. Many transistors packed into a CPU chip use a lot of power, generate a lot of heat particularly if we want to be able to switch states in under a nanosecond. Also, it is expensive to build the chips, between expense and power requirements we can not build large amounts of storage this way. We do transistor memory in CPU working memory, and also in cache on the CPU chip - anywhere else it is too expensive and takes too much power. With current tech, we can put megabytes of memory on the CPU chip, but we need gigabytes to hold programs, data and the operating system. Current memory is built on capacitors - these store a voltage from 0 (value 0) to between 1 and 2 volts (value 1) in current DDR2 and higher memory. It would be theoretically possible to store more than one bit by using a range of voltages, but storring a single bit is much more error resistant, because we don’t care what the exact voltage is, just if there is a voltage. Voltage in a capacitor depends on a difference between - and + charge on different plates with an insulator between them, and the attractive electric field between the two charges tends to maintain the voltage without power consumption. Capacitors leak due to contact with surrounding material, so they need to be refreshed periodically - circuitry in the memory chip periodically reads and rewrites all bits, which resets the charge for bits set to 1 (this only happens about every microsecond, for memory with a clock of 1 to 2 GHz - so it does not greatly reduce performance (see Wikipedia - “computer memory refresh” for more info). This applies to Dynamic RAM - DRAM. There is Static RAM (SRAM) but it costs more, has more complicated circuitry, and has lower storage density so is used mostly in sppecialized applications. DRAM means when we turn off the computer we lose the memory contents (because memory refresh takes power). So we use other devices - typically solid 3 state drives (SSD) or spinnig drives (classic hard drives with magnetic storage on a spinning disk) to store data and programs when the power is off. 1.1.3. Memory hierarchy. The “memory hierarchy” is the term used to describe the multiple layers of memory storage required by our machine model and technology. TM, of course has only one layer of memory - the tape. VonNeumann architecture explicitly names two layers - 1. working storage inside the CPU - (current tech is registers, but stack has been used in the past). 2. RAM. In modern practice, we have the following: (1) CPU registers - work space for computation. fastest, high power draw, complicated circuitry, high cost - generally between 10 and 100 registers, total storage ~8K access time <‌< 1 cycle, for cycles faster than 1/2 nansecond (2) Cache - fast storage for program and data we are processing - in the CPU chip - access time between 1 and ~10 CPU cycles (mostly from more com- plex addressing and longer wires), same storage technology as registers, but simpler circuitry so lower cost per bit. Usually 1-4MB per CPU core. Some systems also have off-chip cache, this can be up to several hundred MB, but runs slower than the CPU. Latency is between on-chip cacche and DRAM. (3) RAM (SDRAM) - capacitive storage, clock typically between 1 and 2 GHz bandwidth, but latency is ~10 to 60 nanoseconds. Currently over 8GBytes up to ~1 Terabyte for large multicore servers. Cost per bit much lower than cache (4) SSD - permanent solid state storage - access time ~microseconds, size ~.5 or 2 Terabytes, cost per bit lower than RAM. These have come down in price and have become much more common in the last ~5 years. Sometimes a smaller SSD is combined in the same machine with a larger, cheaper hard drive. SSD storage is subject to wear, can only take a certain number of read/write cycles. This is getting better, but still not considered as permanent data storage. (5) Hard drives - permanent spinning magnetic disks. Lower cost per bit than SSD or memory, size ~1 to 10 Tb, latency in miliseconds.

Dr. Ernesto Gomez : CSE 401 Chapter 5 Topics

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support