Performance Analysis of Dual Core, Core 2 Duo and Core I3 Intel Processor

International Journal of Computer Applications (0975 – 8887) Volume 120 – No.10, June 2015 Performance Analysis of Dual Core, Core 2 Duo and Core i3 Intel Processor Taiwo O. Ojeyinka Olusola Olajide Ajayi Department of Computer Science, Department of Computer Science Adekunle Ajasin University, Akungba-Akoko Adekunle Ajasin University, Akungba-Akoko Ondo State, Nigeria. Ondo State, Nigeria. ABSTRACT 3) show methods for exploring processor architectures. One of Performance analysis is a more efficient method of improving the goals of this work is to highlight the advantages of each processor performance. This research work discusses heavily feature in a system and to study how the hardware makes use of on performance analysis of Dual Core, Core 2 Duo and Core i3 CPU resources. Intel architectures. The study described the evolution of Intel architectures and gave the reason for testing the performances of 2. PROCESSOR MICROARCHITECTURE the systems. All experiment will be carried out using Intel 2.1 The Microarchitecture of Intel Core 2 VTune Performance Analyzer, with all the systems running on Duo Windows 7 and 8. It is a well-known fact that, overall The Intel Core 2 Duo processor belongs to the Intel’s mobile performance is a major function of: path length of the core family. It is implemented by using two Intel’s Core application, frequency, and cycle per instruction. Based on the architecture on a single die. The design of Intel Core 2 Duo is analysis from this research, it was confirmed that, Corei3 has chosen to maximize performance and minimize power two distinct advantages: faster core-to-core communication, and consumption. It emphasizes mainly on cache efficiency and does dynamic cache sharing between cores. The research also not stress on the clock frequency for high power efficiency. highlights other areas where Dual Core and Core 2 Duo can be Although clocking at a slower rate than most of its competitors, preferred architectures over Core i3. shorter stages and wider issuing pipeline compensates the performance with higher IPC’s. In addition, the Core 2 Duo General Terms processor has more ALU units. Core 2 Duo employs Intel’s System performance evaluation. Advanced Smart Cache which is a shared L2 cache to increase the effective on-chip cache capacity [13]. Upon a miss from the Keywords core’s L1 cache, the shared L2 and the L1 of the other core are Performance, Multi-core, Processor, Microarcitecture. looked up in parallel before sending the request to the memory 1. INTRODUCTION [9]. The cache block located in the other L1 cache can be fetched without off-chip traffic. Both memory controller and FSB are With the evolution of Intel processor architecture over time, still located off-chip. The off-chip memory controller can adapt most customer (buyers) of Intel architecture don’t really have the new DRAM technology with the cost of longer memory time to test and analyse the architecture before they purchase it access latency. Intel Advanced Smart Cache provides a peak for their various day to day use. In a nutshell, people have not transfer rate of 96 GB/sec (at 3 GHz frequency) being able to analyse the differences in the architectures before purchase. Testing performance of computer system is very 2.2 The Microarchitectures of Nehalem necessary, because it helps consumers decide what type and Nehalem architecture is more modular than the Core architecture configurations of products to purchase for a particular nature of which makes it much more flexible and customizable to the computing job. However, the performance is strongly dependent application. The architecture really only consists of a few basic on a number of factors which include the system architecture, building blocks. The main blocks are a microprocessor core processor microarchitecture, operating systems, type of (with its own L2 cache), a shared L3 cache, a Quick Path compiler, and program implementation etc. Many processor Interconnect (QPI) bus controller, an integrated memory manufacturers including Intel has performance analysis tools controller (IMC), and graphics core [14]. With this flexible which can be used to determine the performance of their architecture, the blocks can be configured to meet what the architecture. Intel Corporation produces different processors market demands. For example, the Bloomfield model, which is with different numbers of cores for different nature of jobs, intended for a performance desktop application, has four cores, however, it is the important for users of processor machines to an L3 cache, one memory controller, and one QPI bus controller. acquire the right processor specifications that would efficiently Another significant improvement in the Nehalem process target applications based on the workloads microarchitecture involves branch prediction. For the Core characteristics of the application program. For instance, some architecture, Intel designed what they call a “Loop Stream specification of machine works better on graphics while others Detector,” which detects loops in code execution and saves the perform best on computation. With the evolution of Intel instructions in a special buffer so they do not need to be processor architectures over time, testing performance is continually fetched from cache [11]. This increased branch necessary. The aim of this study is to measure the performance prediction success for loops in the code and improved of different cores using different applications (both Single and performance. Intel engineers took the concept even further with Multithreaded). The objectives are 1) compare architecture the Nehalem architecture by placing the Loop Stream Detector performance on applications (Single and Multithreaded), 2) after the decode stage eliminating the instruction decode from a measure performance counters on representative processors and, loop iteration and saving CPU cycles. 1 International Journal of Computer Applications (0975 – 8887) Volume 120 – No.10, June 2015 Fig. 1: Intel Core Microarchitectures [10] Fig. 3: Nehalem Microarchitectures [11] Table 1: Processors microarchitecture features µArchitecture Intel Core Nehalem Speed: 3GHz (100%) 2.4GHz Minimum/Maximum/T 1.2GHz - 3GHz 931MHz - 2.4GHz urbo Speed: Peak Processing 24GFLOPS 19.2GFLOPS Performance (PPP): Adjusted Peak 7.2WG 5.76WG Performance (APP): Cores per Processor: 2 Unit(s) 2 Unit(s) Threads per Core: 1 Unit(s) 2 Unit(s) Front Side Bus Speed: 200MHz 133MHz Mobile, Type: Dual-Core Dual-Core Revision/Stepping: 17 / A 25 / 5 Microcode: MU06170A07 MU06250503 2x 32kB, 2x 32kB, L1D (1st Level) Data Write/Back, Write/Back, Cache: 8-Way, 64kB 8-Way, 64kB Line Line Size Size, 2 Thread(s) 2MB, ECC, : 2x 256kB, ECC, Fig. 2: Intel Core 2 Microarchitectures [9] L2 (2nd Level) Unified Advanced, 8-Way, 64kB Line Cache: 8-Way, 64kB Size, 2 Thread(s) 2 International Journal of Computer Applications (0975 – 8887) Volume 120 – No.10, June 2015 Configure Line Size, 2 Choose Target Thread(s) Target 3MB, ECC, Run Analysis Write/Back, Configure L3 (3rd Level) Unified 12-Way, Fully Analysis - Cache: Inclusive, 64kB Interpret Result Line Size, 16 Create Configuration Thread(s) Memory Controller 133MHz 133MHz Interpret Results Speed: Compare MMX - Multi-Media Results Yes Yes eXtensions: Handle SSE - Streaming SIMD Results Yes Yes RESOLVE Extensions: ISSUES SSE2 - Streaming Yes Yes SIMD Extensions v2: WORK FLOW SSE3 - Streaming Yes Yes SIMD Extensions v3: Fig. 6: Work flow in VTune Analyser [6] SSSE3 - Supplemental Yes Yes SSE3: 3. BENCHMARK ANALYSIS METHOD Hyper-Threading In this project work, we used VTune performance analysis tool No Yes Technology offered by Intel to measure the performances of Pentium Core 2 DUO, Intel Dual Core and the Nehalem Core i3 on five different Intel released the Nehalem and it was a leap in performance and benchmarks: VLC, Mcbench, Max Pi, Firefox, and Cinebench. efficiency compared to previous architectures designs, though The data of CPI in all of the analysis are analysed for the sake of still lagging in gamming performance compared to Core 2 Duo completeness. [8] which dozens of review testifying to that fact. 3.1 Hardware Event Counts Hardware Event Counts is a performance metric used by the Intel VTune Amplifier XE [6] when interpreting event-based sampling analysis results. The Hardware Event Counts metric shows the event count for all collected processor events. While the Hardware Sample Counts metric provides the actual number of samples collected for an event, Hardware Event Counts metric estimates the number of times this event occurred during the analysis [16]. We will focus on examining comparable aspect of all these microprocessor with Hardware Event Counts: Such as Clockticks per Instructions (CPI), LLC Misses, Branch Mispredict, Instruction Starvation etc. 3.2 CPI (Clockticks per Instructions) Retired CPI is just a general metric for measuring the processor Fig. 4: Instruction execution cycles in Core-based and efficiency [5]. The CPI value is dependent on application Nehalem CPUs [17] workload and platform and can be best used as a factor to compare processors. Since CPI is a ratio that is dependent on the The goal of performance analysis is to understand the behaviour number of retired instructions, its value will change if the binary of an application on a given platform. Here, real world code size of the application changes. In general, if CPI reduces applications are used to test how hardware makes use of the as a result of resources of CPU. The researchers want to know the advantage of each machine parameter, the program hotspots, and the effects The CPI measured in Dual Core and Core 2 Duo are “Per Core of each hardware feature program. CPI” because there was no SMT. In this case, the instructions were generally executed by the two cores. The aggregate CPIs is VTune Analysis the sum of all logical cores or logical threads per processor. VTune Analyser [4] is an application performance monitoring and analysis profiler that is capable of analysis execution 3.3 LLC Misses bottlenecks in both serial and parallel programs.

Load more