A Multithreading Embedded Architecture
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the 7th WSEAS International Conference on DATA NETWORKS, COMMUNICATIONS, COMPUTERS (DNCOCO '08) A Multithreading Embedded Architecture Lan Dong, Xiufeng Sui Computer Engineering Department Beijing Jiaotong University Shang yuan village No.3, Beijing China Abstract: -More embedded microprocessors are emerging and more multithreaded tasks need to be executed simultaneously on embedded device. This paper proposes a multithreaded architecture for embedded processor. It improves the system performance in a large degree meanwhile considering reducing energy dissipation and improving the cache utilization in limited size. Key-Words: - multithreading technology, thread, cache, fetch policy, decoder, hybrid partition 1 Introduction 2.2 Architecture Embedded computers have been grown as the The basis for this effort on introducing the fastest part in the computer market nowadays. multithreading embedded architecture is an They are spread over from cell phones, video ARM-derived instruction set superscalar processor games to most family machines. Embedded [3, 4]. Only necessary changes to enable systems often deal with event under the real-time simultaneous multithreading are added to the basic constraints. So the performance of the embedded superscalar architecture. The necessary to support system must be considered with these changes. the simultaneous multithreading are mainly: In embedded world the power and the size are the program counter, return stack for each thread, and important constrains for embedded processor [1, 2]. threaded register bank, etc. The techniques to reduce the system energy In embedded system, cache size is usually limited. dissipation and improve the resource utilization It is important to redesign the cache partition must be considered with these variations. mechanism to improve the hit rate in multi-tasks A multithreading embedded architecture is environment on fixed cache size. The hybrid cache proposed in this paper in this background. In partition method in this architecture proposed in section 2, the architecture of multithreading the previous work [5] combines shared with embedded processor is described. Section 3 shows distributed partition method. It is feasible for the experiment results of this architecture. In embedded systems in multithreading circumstance. Section 4, a conclusion is made. Fetch unit in this architecture supports many fetch policies [6, 7], such as Icount, Brcount, Misscount, which have a different effect on IPC 2 Multitreading Embedded corresponding to different features of programs. Branch prediction estimator gathered with the Architecture thread switch mechanism [8] is adopted in this architecture. It can reduce the wrong instructions and switch frequencies. 2.1 Instruction Set ARM processor has occupied the majority of the embedded market distribution. To make the design 3 Experiment practical, the architecture proposed in this paper Simulator used here was developed based on supports ARM instruction set. simple-arm [9, 10]. A multithreading extension was given to it in my previous work. In this paper, several enhancement architectures described above Supported by the Science and Technique Foundation of Beijing are added in the new simulator. The experiment Jiaotong University 2007RC086 configuration is as table 1. ISSN: 1790-5109 152 ISBN: 978-960-474-020-8 Proceedings of the 7th WSEAS International Conference on DATA NETWORKS, COMMUNICATIONS, COMPUTERS (DNCOCO '08) Table 1 system configuration a.size=32KB 3 b.associative=4 2. 5 sha L1 Instr. c.replacement=LRU Cache d.block size=32B 2 bi t cnt s e.hit latency=1cycle C 1. 5 cj peg IP pgp a.size=64KB 1 b.associative=2 FFT L1 data 0. 5 c.replacement=LRU st r i ngsear ch Cache d.block size=32B 0 e.hit latency=1cycle 123456 a.size=2MB Thr ead Number b.associative=2 c.replacement L2 instr. distributed:shared Cache Fig.1 IPC with different thread number 1:4 d.blocksize=64B From Fig.1, it can be concluded that the e.hit latency=6cycle multithreading embedded architecture can run well a.ialu=8 in the multi-tasks environment with embedded b.imult=2 benchmarks. IPC is obviously improved when the Function c.memport=4 thread number is increased from 1 to 6. unit d.fpalu=8 e.fpmult=2 f.divmult=2 5 Conclusion a.size=256KB In this paper, we design a multithreading embedded b.associative=4 architecture. Different to traditional multithread c.replacement=LRU architecture and embedded processors, it exploits Itlb d.blocksize=4KB several technologies to enhance system performance e.hit and improve the utilization of the resource. latency=30cycle a.size=512KB b.associative=4 References: c.replacement=LRU dtlb [1]Petrov, P., Orailoglu, A.,Customizable embedded d.blocksize=4KB processor architectures, Proceedings. Euromicro e.hit Symposium on Digital System Design, 2003, latency=30cycle pp.468 – 475 a.latency=first [2] Arora, D., Raghunathan, A. Ravi, S. Jha, N.K., block18cycle Architectural support for safe software execution memory successive block on embedded processors, Proceedings of the 4th intervals 2cycle international conference on Hardware/software b.access width=8 codesign and system synthesis,2006, pp.106 - 111 a.decode width=8 b.issue width=8 [3] J.E.Smith and G.S.Sohi, Microarchitecture of c.commit width=8 superscalar processors, Proceedings of the IEEE, d.RUUsize=32 Vol.83,No.12, 1995, pp.1609-1624 pipeline e.LSQsize=16 [4] S.Palacharla, N.P.Jouppi, and J.E.Smith, f.RUU/LSQ Complexity-Effective Superscalar Processors, type :shared 24th IEEE International Symposium on Computer f.fetch Architecture, 1997, pp.206-218 queuesize=16 [5]L. Dong Y.Yang, An approach on distributed and predictor Gshare, shared dynamic cache partition, 7th Wseas estimator JRS estimator International Conference on Data Networks, Communications, Computers, 2008, to appear [6] D.M. Tullsen, S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo,and R.L. Stamm, Exploiting choice: ISSN: 1790-5109 153 ISBN: 978-960-474-020-8 Proceedings of the 7th WSEAS International Conference on DATA NETWORKS, COMMUNICATIONS, COMPUTERS (DNCOCO '08) Instruction fetch andissue on an implementable simultaneous multithreading processor, In 23rd Annual International Symposium on Computer Architecture, May ,1996. [7] D.M. Tullsen, S.J. Eggers, and H.M. Levy, Simultaneous multithreading: Maximizing on-chip parallelism, Annual International Symposium on Computer Architecture, June 1995,pp. 392-403 [8]L. Dong, Z.Z. Ji, An approach of Branch Multithreading Switch Mechanism, Journal of Communication and Computer, vol.3, No.5. 2006, pp.14-16 [9] www.simplescalar.com [10]T. Austin, E. Larson, D. Ernst, SimpleScalar: an infrastructure for computer system modeling, Computer, Vol. 35, No.2, 2002, pp.59 – 67 ISSN: 1790-5109 154 ISBN: 978-960-474-020-8.