<<

JAYARAJ ANNAPACKIAM CSI COLLEGE OF

Department of Information

CS6303 ARCHITECTURE Anna University 2 & 16 Mark Questions & Answers JACSICE Year / Semester: III IT A & B / V

Regulation: 2013

Academic year: 2017 - 2018 CS6303 UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? Apr/May 2015 The eight great ideas in computer architecture are: 1. Design for Moore’s Law 2. Use Abstraction to Simplify Design 3. Make the Common Case Fast 4. Performance via Parallelism 5. Performance via Pipelining 6. Performance via Prediction 7. Hierarchy of Memories 8. via Redundancy 2. Give an example each of zero-address, one-address, two-address and three-address instructions. Apr/May 2014 Zero Address : CMA One-Address : AC<-AC+B Two-Address : ADD A,B Three-Address : ADD C,A,B 3. Define – Instruction Set Architecture. Nov/Dec 2015 The instruction set architecture, or simply architecture of a computer is the between the hardware and the lowest-level . It includes anything programmers need to know to make a binary language program work correctly, includingJACSICE instructions, I/O devices, and so on. 4. List the major components of a computer system. Apr/May 2017  Unit  Output Unit   Memory Unit  ALU 5. State the need for indirect addressing mode. Give an example. The instruction contains the address of memory which refers the address of the operand.The effective address of the operand is the contents of a register or the main memory location whose address is given explicitly in the instruction. 6. Define – Response Timeand Throughput. May/June 2012 Response time is also called execution time. The total time required for the computer to complete a task, including disk accesses, memory accesses, I/O activities, overhead, CPU execution time, and so on is called response time. Throughput or bandwidth is the total amount of work done in a given time. 8. Write the CPU performance equation. The Classic CPU Performance Equation in terms of instruction count (the number of instructions executed by the program), CPI, and clock cycle time: 9. If computer A runs a program in 10 seconds, and computer B runs the same program in 15 seconds, how much faster is A over B. Nov/Dec 2015

10. What are the basic components of performance? The basic components of performance and how each is measured are: Components of Performance  Units of measure  CPU execution time for a program  Seconds for the program  Instruction count  Instruction executed for the program  Clock cycles per instruction(CPI)  Average number of clock cycles per instruction  Clock cycle time  Seconds per clock cycle 11. Write the formula for CPU execution time for a program. Nov/Dec 2015

12. Write the formula for CPU clock cycles required for a program. Nov/Dec 2014

13. Define – MIPS Million Instructions Per SecondJACSICE (MIPS) is a measurement of program execution speed based on the number of millions of instructions. MIPS is computed as:

14. What are the fields in an MIPS instruction? Nov/Dec 2014 MIPS fields are op rs rt rd shamt funct 6 5bits 5 bits 5 bits 5 bits 6 bits Where, op: Basic operation of the instruction, traditionally called the opcode. rs: The first register source operand. rt: The second register source operand. rd: The register destination operand. It gets the result of the operation. shamt: Shift amount. funct: Function. 15. Write an example for immediate operand. The quick add instruction with one constant operand is called add immediate or addi. To add 4 to register $s3, we just write addi $s3,$s3,4 # $s3 = $s3 + 4 16. Define – Stored Program Concepts Apr/May 2015 Today’s are built on two key principles: 1. Instructions are represented as numbers. 2. Programs are stored in memory to be read or written, just like data. These principles to the stored-program concept. Treating instructions in the same way as data greatly simplifies both the memory hardware and the software of computer systems. 17. Define – Addressing Modes Nov/Dec 2008 The different ways in which the operands of an instruction are specified are called as addressing modes. The MIPS addressing modes are the following: 1. Immediate addressing 2. Register addressing 3. Base or displacement addressing 4. PC-relative addressing 5. Pseudo direct addressing 18. State Amdahl’s Law 18.Define SPEC rating. May/June 2012 The SPEC rating is given by Running time on the reference computer SPEC rating = ------Running time on the computer under test

19. What is relative addressing mode? When it is used? May/June 2012 The referenced register is (PC) and hence this addressing mode is also known as PC-relative addressing. The effective address is determined by adding the contents of PC to the address field. EA=PC+Address part of Instruction. The address part is a signed number so that it is possible to have branch target location either before or after the branch instruction. This addressing mode is commonly used to specify the target address in branch instructions. 20. Distinguish Pipelining from Parallelism. Apr/May 2015 Pipelining is a method of increasing system throughput and performance by using inherent parallelism in instructions. In pipelining, instructions execution is divided into stages and executed using processing elements connected in series. These processing elements are often executed in parallel to execute 2 or more instructions at the same time. Parallelism is the simultaneousJACSICE execution of the same task on multiple processors in order to obtain better performance.

PART - B

1. Discuss about the vadous techniques to represent instructions in a computer system. Apr/May 2015 2. Wlat is the need for addressing in a computer system? Explain the different addressing modes with suitable examples. Apr/May 2015, May/June 2014 3. Discuss in detail about Eight great ideas of computer Architecture. Apr/May 2015, May/June 2014 4. Explain in detail about for Building Processors and Memory Nov/Dec 2014 5. Discuss in detail the various measures of performance of a computer Apr/May 2015 Explain operations and operands of computer Hardware in detail and Discuss the Logical operations and control operations of computer May/June 2014 UNIT-II ARITHMETIC OPERATIONS

1. Add 610 to 710 in binary and Subtract 610 from 710 in binary. Addition,

Subtraction directly,

Or via two’s complement of -JACSICE6,

2. Write the overflow conditions for addition and subtraction.

Operation Operand A Operand B Result Indicating overflow

A+B ≥0 ≥0 <0 A+B <0 <0 ≥0 A-B ≥0 <0 <0 A-B <0 ≥0 ≥0

3. Define – Moore’s Law Moore’s Law has provided so much more in resources that hardware desi gners can now build much faster multiplication and division hardware. Whether the multiplicand is to be added or not is known at the beginning of the multiplication by looking at each of the 32 multiplier bits. 4. What are the floating point instructions in MIPS? Nov/Dec 2104 MIPS supports the IEEE 754 single precision and double precision formats with these instructions: ■ Floating-point addition ■ Floating-point subtraction ■ Floating-point multiplication ■ Floating-point division ■ Floating-point comparison ■ Floating-point branch 5. Define – Guard and Round. Nov/Dec 2104 Guard is the first of two extra bits kept on the right during intermediate calculations of floating point numbers. It is used to improve rounding accuracy. Round is a method to make the intermediate floating-point result fit the floating- point format; the goal is typically to find the nearest number that can be represented in the format. IEEE 754, therefore, always keeps two extra bits on the right during intermediate additions, called guard and round, respectively.

6. Define – ULP Units in the Last Place is defined as the number of bits in error in the least significant bits of the significant between the actual number and the number that can be represented. 7. What is meant by sticky ? Sticky bit is a bit used in rounding in addition to guard and round that is set whenever there are nonzero bits to the right of the round bit. This sticky bit allows the computer to see the difference between 0.50 … 00 ten and .... 01 ten when rounding. 8. Write the IEEE 754 floating point format. The IEEE 754 standard floating point representation is almost always an approximation of the real number.

9. What is meant by sub-word parallelism? Nov/Dec 2105 Given that the parallelism occurs within a wide word, the extensions are classified as sub-word parallelism. It is also classified under the more general name of data level parallelism. They have been also called vector or SIMD, for single instruction, multiple data . The rising popularity of multimedia applications led to arithmetic instructions that support narrower operations that can easily operate in parallel. For example, ARM addeJACSICEd more than 100 instructions in the NEON multimedia instruction extension to support sub-word parallelism, which can be used either with ARMv7 or ARMv8. 10. Multiply 100010 * 100110.

11. Divide 1,001,010ten by 1000ten.

12. Write the instruction format for the jump instruction. Nov/Dec 2105 The destination address for a jump instruction is formed by concatenating the upper 4 bits of the current PC + 4 to the 26-bit address field in the jump instruction and adding 00 as the 2 low-order bits. 13. What is meant by pipelining? Pipelining is an implementation technique in which multiple instructions are overlapped in execution. Pipelining improves performance by increasing instruction throughput, as opposed to decreasing the execution time of an individual instruction. 14. What are the five steps in MIPS instruction execution? Nov/Dec 2103 1. Fetch instruction from memory. 2. Read registers while decoding the instruction. The regular format of MIPS instructions allows reading and decoding to occur simultaneously. 3. Execute the operation or calculate an address. 4. Access an operand in data memory. 5. Write the result into a register. 15. What are the steps in the floating-point addition? Nov/Dec 2103 The steps in the floating-point addition are 1. Align the decimal point of the number that has the smaller exponent. 2. Addition of the significands 3. Normalize the sum. 4. Round the result. 16. What is meant by forwarding? Forwarding, also called bypassing, is a method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer visible registers orJACSICEmemory. 17. What is pipeline stall? Pipeline stall, also called bubble, is a stall initiated in order to resolve a hazard. They can be seen elsewhere in the pipeline. 18. What is meant by branch prediction? Branch prediction is a method of resolving a branch hazard that assumes a given outcome for the branch and proceeds from that assumption rather than waiting to ascertain the actual outcome. 19. What are the 5 pipeline stages? The 5 stages of instruction execution in a pipelined are: 1. IF: Instruction fetch 2. ID: Instruction decode and register file read 3. EX: Execution or address calculation 4. MEM: Data memory access 5. WB: Write back 20. What are exceptions and ? Exception, also called , is an unscheduled that disrupts program execution used to detect overflow. Eg. Arithmetic overflow, using an undefined instruction. Interrupt is an exception that comes from outside of the processor. Eg. I/O device request PART - B 1.Explain the sequential version of multiplication and its hardware. Apr/May 2015,Nov/Dec 2014 2. Explain how floating point addition is carried out in a computer system. Give an example for a binary floating point addition. Apr/May 2015, Nov/Dec 2014 3.Explain the Multiplication algorithm in detail with diagram and examples. 4. Discuss in detail about division algorithm in detail with diagram and examples Apr/May 2014 5. Multiply the following pair of signed 2’s complement numbers : Nov/Dec 2015 A = 010111, B = 101100. UNIT III PROCESSOR AND CONTROL UNIT 1. What is meant by data path element? A data path element is a unit used to operate on or hold data within a processor. In the MIPS implementation, the data path elements include the instruction and data memories, the register file, the ALU, and adders. 2. What is the use ofPC register? Program Counter (PC) is the register containing the address of the instruction in the program being executed. 3. What is meant by register file? The processor’s 32 general purpose registers are stored in a structure called a register file. A register file is a collection of registers in which any register can be read or written by specifying the number of the register in the file. The register file contains the register state of the computer.

4. What are the two state elements needed to store and access an instruction?

JACSICE

5. Draw the diagram of portion ofdatapath used for fetching instruction.

6.Define Sign Extend Sign-extend is used to increase the size of a data item by replicating the high- order sign bit of the original data item in the high order bits of the larger, destination data item.

7. What is meant by branch target address? Branch target address is the address specified in a branch, which becomes the new program counter (PC) if the branch is taken. In the MIPS architecture the branch target is given by the sum of the off set field of the instruction and the address of the instruction following the branch.

8.Differentiate branch taken frombranch not taken. Branch taken is a branch where the branch condition is satisfied and the program counter (PC) becomes the branch target. All unconditional jumps are taken branches. Branch not taken or (untaken branch) is a branch where the branch condition is false and the program counter (PC) becomes the address of the instruction that sequentially follows the branch.

9.What is meant by delayed branch? Delayed branch is a type of branch where the instruction immediately following the branch is always executed, independent of whether the branch condition is true or false 10.What is meant by pipelining? JACSICE Pipelining is an implementation technique in which multiple instructions are overlapped in execution. Pipelining improves performance by increasing instruction throughput, as opposed to decreasing the execution time of an individual instruction.

13.What is meant by forwarding? Forwarding, also called bypassing, is a method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer visible registers or memory. 14. What is pipeline stall? Pipeline stall, also called bubble, is a stall initiated in order to resolve a hazard. They can be seen elsewhere in the pipeline.

15. What is meant by branch prediction? Branch prediction is a method of resolving a branch hazard that assumes a given outcome for the branch and proceeds from that assumption rather than waiting to ascertain the actual outcome.

16. What is Exception? Exceptions are internally generated unscheduled events that disrupt program execution and they are used to detect overflow. When an exception occurs the processor saves the address of the offending instruction in the exception program counter and then transfers the control to the operating system, the OS takes appropriate action.

17. What is the need for speculation? Speculation is the approach that allows the processor to guess the properties of an instruction , to enable execution of other instruction that may depend speculated instruction.

18. Distinguish between implicit multithreading and explicit multithreading. Implicit multithreading refers to the concurrent execution of multiple threads extracted from a single sequential program. Explicit Multithreading refers to the concurrent execution of instructions from different explicit threads, either by interleaving instructions from different threads.

20. Explain Hazard and its types Any reason that causes the pipeline to stall is called a hazard. 1. Structural Hazard: Due to insufficient resources 2. Data Dependent Hazard : When an instruction depend on the result of previous instruction. 3. Control Hazard: When the branch or other instruction change the contents of Program counter.

21. Define VLIW Very Long Instruction Word – It places multiple instructions in a single word. The operations which mayJACSICE be executed in parallel may placed in the same word.

22. Brief about Multithreading The Instruction stream is divided into several smaller streams called threads, such that the threads can be executed in parallel. This gives high degree of Instruction level Parallelism.

PART B

1. What is Hazard? Explain its types with suitable examples. Apr/May 15 2. Explain in detail about handling data and control hazard. Nov/Dec 15 3. Explain in detail how exceptions are handled in MIPS architecture. Apr/May 16 4. Explain the basic MIPS implementation with control implementation scheme.. Apr/May 15 5. Explain different types of pipelining.

UNIT-4 PARALLELISM 1. What is Instruction Level Parallelism? (NOV/DEC 2011,16,may 17) Pipelining is used to overlap the execution of instructions and improve performance. This potential overlap among instructions is called instruction level parallelism (ILP). 2. Explain various types of Dependences in ILP.  Data Dependences  Name Dependences  Control Dependences 3. What is Multithreading? (NOV/DEC 2016) Multithreading allows multiple threads to share the functional units of a single processor in an overlapping fashion. To permit this sharing, the processor must duplicate the independent state of each . 4. What are multiprocessors? Mention the categories of multiprocessors? Multiprocessor are used to increase performance and improve availability. The different categories are SISD, SIMD, MIMD. 5. What are two main approaches to multithreading?  Fine-grained multithreading  Coarse-grained multithreading 6. What is the need to use multiprocessors? 1. as the fastest CPUs Collecting several much easier than redesigning 1 2. Complexity of current microprocessors Do we have enough ideas to sustain 1.5X/yr? Can we deliver such complexity on schedule? 3. Slow (but steady) improvement in parallel software (scientific apps, , OS) 4. Emergence of embedded and markets driving microprocessors in addition to desktops Embedded functional parallelism, producer/consumer model 5. Server figure of merit is tasks per hour vs. latency.

7. Write the advantages of Multithreading. If a thread gets a lot of cache misses, the other thread(s) can continue, taking advantage of the unused computing resources, whichJACSICE thus can lead to faster overall execution, as these resources would have been idle if only a single thread was executed. If a thread cannot use all the computing resources of the CPU (because instructions depend on each other's result), running another thread permits to not leave these idle. If several threads work on the same set of data, they can actually share their cache, leading to better cache usage or synchronization on its values. 8. Write the disadvantages of Multithreading. Multiple threads can interfere with each other when sharing hardware resources such as caches or translation lookaside buffers (TLBs). Execution times of a single-thread are not improved but can be degraded, even when only one thread is executing. This is due to slower frequencies and/or additional pipeline stages that arc necessary to accommodate thread-switching hardware. Hardware support for Multithreading is more visible to software, thus requiring more changes to both application programs and operating systems than Multi processing.

9. What is CMT? Chip multiprocessors - also called multi-core microprocessors or CMPs for short - are now the only way to build high-performance microprocessors, for a variety of reasons. Large uniprocessors are no longer scaling in performance, because it is only possible to extract a limited amount of parallelism from a typical instruction stream using conventional superscalar instruction issue techniques. In addition, one cannot simply ratchet up the clock speed on today's processors, or the power dissipation will become prohibitive in all but water-cooled systems. 10. What is SMT? Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures. 11. Write the advantages of CMP? CMPs have several advantages over single processor solutions energy and area efficiency i. By Incorporating smaller less complex cores onto a single chip ii. Dynamically switching between cores and powering down unused cores iii. Increased throughput performance by exploiting parallelism iv.Multiple computing resources can take better advantage of instruction, thread, and level 12. What are the Disadvantages of SMT? Simultaneous multithreading cannot improve performance if any of the shared resources are limiting bottlenecks for the performance. In fact, some applications run slower when simultaneous multithreading is enabled. Critics argue that it is a considerable burden to put on software developers that they have to test whether simultaneous multithreading is good or bad for their application in various situations and insert extra logic to turn it off if it decreases performance. 13. define super ( dec -15) Superscalar describes a design that makes it possible for more than one instruction at a time to be executed during a single clock cycle . In a superscalar design, the processor or the instruction is able to determine whether an instruction can be carried out independently of other sequential instructions, or whether it has a dependency on another instruction and must be executed in sequence with it. The processor then uses multiple execution units to simultaneously carry out two or more independent instructions at a time. Superscalar design is sometimes called "second generation RISC 14. What Thread-level parallelism (TLP)?  Explicit parallel programs already have TLP (inherent)  Sequential programs that are hard to parallelize or ILP-limited can be speculatively parallelized in hardware.

15. Distinguish between explicit multithreading and Implicit multithreading (may-17) explicit multithreading JACSICE All of the commercial processors and most of the experimental processors so far have used explicit multithreading. These systems concurrently execute instruc-tions from different explicit threads, either by interleaving instructions from dif-ferent threads on shared pipelines or by parallel execution on parallel pipelines Implicit multithreading Implicit multithreading refers to the concurrent execution of multiple threads extracted from a single sequential program. These implicit threads may be defined either statically by the compiler or dynamically by the hardware. In the remainder of this section we consider explicit multithreading 16. Distinguish between multiprocessor and message-passing multiprocessor.  A multiprocessor with a shared address space, that address space can be used to communicate data implicitly via load and store operations is shared memory multiprocessor.  A multiprocessor with a multiple address space, of data is done by explicitly passing message among processor is message-passing multiprocessor.

17. Differentiate Between Strong Scaling And Weak Scaling ( MAY-15) Strong – where the problem size is fixed and multiprocessor expands, and our goal is to minimize the time-to-solution. Here the scalability means speed-up roughly proportional to the number of used processors. For example, if we have used a multiprocessor configuration with 16 processors and have reduced the uni-processor configuration computing time by a factor close to 16 we have achieved excellent parallel strong scalability. Weak Scalability — where the problem size and the multiprocessor size expand, and our goal is to achieve constant time-to-solution for larger problems. In this case the scalability means the ability of maintaining a fixed time-to-solution solving larger problems on larger computers.

18. What is multicore'? At its simplest, multi-core is a design in which a single physical processor contains the core logic of more than one processor. It's as if an Intel Xeon processor were opened up and inside were packaged all the circuitry and logic for two (or more) Intel Xcon processors. The multi-core design takes several such processor "cores" and packages them as a single physical processor.The goal of this design is to enable a system to run more tasks simultaneously and thereby achieve greater overall system performance. 19. what is fine grained multithreading ?MAY-16 • Switches between threads on each instruction, causing the execution of multiple threads to be interleaved

• Usually done in a round-robin fashion, skipping any stalled threads • CPU must be able to switch threads every clock

20. Compare uma and numa multiprocessor (APR/MAY2015) (UMA) In this model, all the processors share the physical memory uniformly. All the processors have equal access time to all the memory words. Each processor may have a private cache memory. Same rule is followed for devices Non-uniform Memory Access JACSICE(NUMA) In NUMA multiprocessor model, the access time varies with the location of the memory word. Here, the shared memory is physically distributed among all the processors, called local memories. The collection of all local memories forms a global address space which can be accessed by all the processors.

PART B

Explain hardware multithreading – may15

Explain in detail about Flynn classification of parallel hardware –may 15

Discuss shared memory multiprocessor with a neat diagram – nov16

Explain multicore processor –may16

Explain parallel process challenges UNIT-4 PARALLELISM 1. What is Instruction Level Parallelism? (NOV/DEC 2011,16,may 17) Pipelining is used to overlap the execution of instructions and improve performance. This potential overlap among instructions is called instruction level parallelism (ILP). 2. Explain various types of Dependences in ILP.  Data Dependences  Name Dependences  Control Dependences 3. What is Multithreading? (NOV/DEC 2016) Multithreading allows multiple threads to share the functional units of a single processor in an overlapping fashion. To permit this sharing, the processor must duplicate the independent state of each thread. 4. What are multiprocessors? Mention the categories of multiprocessors? Multiprocessor are used to increase performance and improve availability. The different categories are SISD, SIMD, MIMD. 5. What are two main approaches to multithreading?  Fine-grained multithreading  Coarse-grained multithreading 6. What is the need to use multiprocessors? 1. Microprocessors as the fastest CPUs Collecting several much easier than redesigning 1 2. Complexity of current microprocessors Do we have enough ideas to sustain 1.5X/yr? Can we deliver such complexity on schedule? 3. Slow (but steady) improvement in parallel software (scientific apps, databases, OS) 4. Emergence of embedded and server markets driving microprocessors in addition to desktops Embedded functional parallelism, producer/consumer model 5. Server figure of merit is tasks per hour vs. latency.

7. Write the advantages of Multithreading. If a thread gets a lot of cache misses, the other thread(s) can continue, taking advantage of the unused computing resources, which thus can lead to faster overall execution, as these resources would have been idle if only a single thread was executed. If a thread cannot use all the computing resources of the CPU (because instructions depend on each other's result), running another thread permits to not leave these idle. If several threads work on the same set of data, they can actually share their cache, leading to better cache usage or synchronization on its values. 8. Write the disadvantages of Multithreading. Multiple threads can interfereJACSICE with each other when sharing hardware resources such as caches or translation lookaside buffers (TLBs). Execution times of a single-thread are not improved but can be degraded, even when only one thread is executing. This is due to slower frequencies and/or additional pipeline stages that arc necessary to accommodate thread-switching hardware. Hardware support for Multithreading is more visible to software, thus requiring more changes to both application programs and operating systems than Multi processing.

9. What is CMT? Chip multiprocessors - also called multi-core microprocessors or CMPs for short - are now the only way to build high-performance microprocessors, for a variety of reasons. Large uniprocessors are no longer scaling in performance, because it is only possible to extract a limited amount of parallelism from a typical instruction stream using conventional superscalar instruction issue techniques. In addition, one cannot simply ratchet up the clock speed on today's processors, or the power dissipation will become prohibitive in all but water-cooled systems. 10. What is SMT? Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures. 11. Write the advantages of CMP? CMPs have several advantages over single processor solutions energy and silicon area efficiency i. By Incorporating smaller less complex cores onto a single chip ii. Dynamically switching between cores and powering down unused cores iii. Increased throughput performance by exploiting parallelism iv.Multiple computing resources can take better advantage of instruction, thread, and process level

12. What are the Disadvantages of SMT? Simultaneous multithreading cannot improve performance if any of the shared resources are limiting bottlenecks for the performance. In fact, some applications run slower when simultaneous multithreading is enabled. Critics argue that it is a considerable burden to put on software developers that they have to test whether simultaneous multithreading is good or bad for their application in various situations and insert extra logic to turn it off if it decreases performance. 13. define super scalar processor ( dec -15) Superscalar describes a microprocessor design that makes it possible for more than one instruction at a time to be executed during a single clock cycle . In a superscalar design, the processor or the instruction compiler is able to determine whether an instruction can be carried out independently of other sequential instructions, or whether it has a dependency on another instruction and must be executed in sequence with it. The processor then uses multiple execution units to simultaneously carry out two or more independent instructions at a time. Superscalar design is sometimes called "second generation RISC 14. What Thread-level parallelism (TLP)?  Explicit parallel programs already have TLP (inherent)  Sequential programs that are hard to parallelize or ILP-limited can be speculatively parallelized in hardware.

15. Distinguish between explicit multithreading and Implicit multithreading (may-17) explicit multithreading All of the commercial processors and most of the experimental processors so far have used explicit multithreading. These systems concurrently execute instruc-tions from different explicit threads, either by interleaving instructions from dif-ferent threads on shared pipelines or by parallel execution on parallel pipelines Implicit multithreading Implicit multithreading refersJACSICE to the concurrent execution of multiple threads extracted from a single sequential program. These implicit threads may be defined either statically by the compiler or dynamically by the hardware. In the remainder of this section we consider explicit multithreading 16. Distinguish between shared memory multiprocessor and message-passing multiprocessor.  A multiprocessor with a shared address space, that address space can be used to communicate data implicitly via load and store operations is shared memory multiprocessor.  A multiprocessor with a multiple address space, communication of data is done by explicitly passing message among processor is message-passing multiprocessor.

17. Differentiate Between Strong Scaling And Weak Scaling ( MAY-15) Strong Scalability – where the problem size is fixed and multiprocessor expands, and our goal is to minimize the time-to-solution. Here the scalability means speed-up roughly proportional to the number of used processors. For example, if we have used a multiprocessor configuration with 16 processors and have reduced the uni-processor configuration computing time by a factor close to 16 we have achieved excellent parallel strong scalability. Weak Scalability — where the problem size and the multiprocessor size expand, and our goal is to achieve constant time-to-solution for larger problems. In this case the scalability means the ability of maintaining a fixed time-to-solution solving larger problems on larger computers.

18. What is multicore'? At its simplest, multi-core is a design in which a single physical processor contains the core logic of more than one processor. It's as if an Intel Xeon processor were opened up and inside were packaged all the circuitry and logic for two (or more) Intel Xcon processors. The multi-core design takes several such processor "cores" and packages them as a single physical processor.The goal of this design is to enable a system to run more tasks simultaneously and thereby achieve greater overall system performance. 19. what is fine grained multithreading ?MAY-16 • Switches between threads on each instruction, causing the execution of multiple threads to be interleaved

• Usually done in a round-robin fashion, skipping any stalled threads • CPU must be able to switch threads every clock

20. Compare uma and numa multiprocessor (APR/MAY2015) Uniform Memory Access (UMA) In this model, all the processors share the physical memory uniformly. All the processors have equal access time to all the memory words. Each processor may have a private cache memory. Same rule is followed for peripheral devices

Non-uniform Memory Access (NUMA)

In NUMA multiprocessor model, the access time varies with the location of the memory word. Here, the shared memory is physically distributed among all the processors, called local memories. The collection of all local memories forms a global address space which can be accessed by all the processors.

PART B

Explain hardware multithreading – may15JACSICE

Explain in detail about Flynn classification of parallel hardware –may 15

Discuss shared memory multiprocessor with a neat diagram – nov16

Explain multicore processor –may16

Explain parallel process challenges Sub: CS6303 Computer Architecture Staff: Mr.I. Solomon & Mr.J.Ferdinand Class: II IT A & B UNIT V PART A 1. What is the need to implement memory as a hierarchy? Apr / May 15 should be fast, large and inexpensive. Unfortunately. It is impossible to meet all the three of these requirements using one type of memory. Hence it is necessary to implement memory as a hierarchy.

2. What is Cache Memory? Nov / Dec 16 A Small section of SRAM memory added between processor and main memory to speed up execution process is known as Cache memory.

3. Define Hit Rate or Hit Ratio? Nov/Dec 15 The Percentage of accesses where the processor finds the code or data word it needs in the cache memory is called the hit ratio or hit rate.

4. What are various memory technologies? Nov/Dec 15 Computer memory is classified as primary memory and secondary memory. Memory technologies in use for primary memory are : static RAM, Dynamic RAM, ROM and its types (PROM, EPROM,EEPROM) etc. for Secondary memory are : Optical disk, flash- based removal , hard-disk drives, magnetic tapes etc.

5. How DMA can improve I/O speed? Apr/May15 DMA is a hardware controlled data transfer. It does not spend testing I/O device status and executing a number of instructions for I/O data transfer. In DMA transfer, data is transferred directly from the disk to the memory location without passing through the processor or the DMA controller.

6. What is Virtual memory?

Virtual memory is a technique that uses main memory as a “cache” for secondary storage. Two major motivations for virtual memory: to allow efficient and safe sharing of memory among multiple programs, and to remove the programming burdens of a small, limited amount of main memory.

7. State the advantages of Virtual Memory? May/Jun16 The size of program can be more than the size main memory. Memory can be used efficiently because a section of program loaded only when it need in CPU. Virtual memory allows sharing of code and data, unlimited amounts of multiprogramming. We can reduce internal fragmentation using segmented paging and eliminates external fragmentation.

8. What is DMA? A special control unit may be providedJACSICE to enable transfer a block of data directly between an external device and memory without contiguous intervention by the CPU. This approach is called DMA.

9. What is programmed I/O? Data transfer to and from may be handled using this mode. Programmed I/O operations are the result of I/O instructions written in the .

10. Define Vectored Interrupts In order to reduce the overhead involved in the polling process, a device requesting an interrupt may identify itself directly to the CPU. Then, the CPU can immediately start executing the corresponding interrupt- service routine. The term vectored interrupts refers to all interrupt handling schemes base on this approach.

11. What are the use of interrupts? *Recovery from errors * *Communication between programs *Use of interrupts in operating system

12. How the interrupt is handled during exception? * cpu identifies source of interrupt * cpu obtains of interrupt handles * pc and other cpu status information are saved * Pc is loaded with address of interrupt handler and handling program to handle it

13. What do you mean by associative mapping? The tag of an address received from the CPU is compared to the tag bits of each block of the cache to see if the desired block is present. This is called associative mapping technique. 14. Differentiate physical address from logical address Physical address is an address in main memory. Logical address (or) virtual address is the CPU generated addresses that corresponds to a location in virtual space and is translated by address mapping to a physical address when memory is accessed.

15. What are the steps to be taken in an instruction cache miss? 1. Send the original PC value to the memory. 2. Instruct main memory to perform a read and wait for the memory to complete its access.

16. What are the writing strategies in cache memory? Write-through is a scheme in which writes always update both the cache and the lower level of the memory hierarchy, ensuring that data is always consistent between the two. Write-back is a scheme that handles writes by updating values only to the block in the cache, then writing the modified block to the lower level of the hierarchy when the block is replaced.

17. What is direct mapped cache? Direct-mapped cache is a cache structure in which each memory location is mapped to exactly one location in the cache. For example, almost all direct- mapped caches use this mapping to find a block, (Block address) modulo (Number of blocks in the cache)

18. Define rotational latency Rotational latency, also called rotational delay, is the time required for the desired sector of a disk to rotate under the read/write head, usuJACSICEally assumed to be half the rotation time.

19. What are temporal and spatial localities of references? Temporal locality (locality in time): if an item is referenced, it will tend to be referenced again soon. Spatial locality (locality in space): if an item is referenced, items whose addresses are close by will tend to be referenced soon.

20. What is a mapping function? The cache memory can store a reasonable number of blocks at any given time, but this number is small compared to the total number of blocks the main memory. The correspondence between the main memory blocks and those in the cache is specified by a mapping function.

PART B

1. Define Cache Memory? Explain the Various Mapping techniques associated with cache memories May/Jun16 2. Explain about DMA Controller Nov/Dec 16 3. Explain detail about various memory technologies and its relevance Apr/May15 4. What is Virtual memory? Explain How virtual memory is implemented with neat diagram? Nov/Dec 15 5. Explain Arbitration Techniques in DMA Nov/Dec 14 JACSICE