Analysis and Design of High Performance Inter-Core Process Communication for Linux

UPTEC IT 14 020 Examensarbete 30 hp November 2014 Analysis and Design of High Performance Inter-core Process Communication for Linux Andreas Hammar Abstract Analysis and Design of High Performance Inter-core Process Communication for Linux Andreas Hammar Teknisk- naturvetenskaplig fakultet UTH-enheten Today multicore systems are quickly becoming the most commonly used hardware Besöksadress: architecture within embedded systems. Ångströmlaboratoriet Lägerhyddsvägen 1 This is due to that multicore Hus 4, Plan 0 architectures offer more performance at lower power consumption costs than Postadress: singlecore architectures. However, since Box 536 751 21 Uppsala embedded systems are more limited in terms of hardware resources than desktop Telefon: computers, the switch to multicore has 018 – 471 30 03 introduced many challenges. One of the Telefax: greatest challenges is that most 018 – 471 30 00 existing programming models for interprocess communication are either Hemsida: too heavy for embedded systems or do not http://www.teknat.uu.se/student perform well on multicore. This thesis covers a study on interprocess communication for embedded systems and proposes a design that aims to bring the two most common standards for interprocess communication to embedded multicore. Furthermore, an implementation that achieves efficient inter-core process communication using zero-copy shared memory is implemented and evaluated. It is shown that shared memory utilization yields much higher performance than conventional communication mechanisms like sockets. Handledare: Detlef Scholle Ämnesgranskare: Philipp Rümmer Examinator: Roland Bol ISSN: 1401-5749, UPTEC IT 14 020 Contents 1 Introduction 1 1.1 Background . .1 1.2 Problem Statement . .2 1.3 Relation to Previous Work . .2 1.4 Method . .2 1.5 Use Case . .3 1.5.1 Team Goal . .3 1.5.2 Team Workflow . .3 1.5.3 Development Tools . .3 1.6 Delimitations . .3 1.7 Contributions . .3 1.8 Report Structure . .3 2 Background: Interprocess Communication on Multicore Systems 4 2.1 Introduction . .4 2.2 The Importance of Efficient Interprocess Communication . .4 2.3 Message Passing or Shared Memory . .5 2.4 Shared Memory . .5 2.4.1 Data Coherence . .5 2.4.2 Synchronization . .5 2.4.3 Distributed Shared Memory . .6 2.5 Message Passing . .6 2.6 Message Passing Vs Shared Memory . .6 3 A Survey of Message Passing Libraries 8 3.1 Introduction . .8 3.2 Requirements on a Message Passing API For Embedded Systems . .8 3.3 Message Passing Interface (MPI) . .9 3.3.1 Open MPI . .9 3.3.2 MPICH . 10 3.4 MCAPI . 10 3.4.1 Messages . 11 3.4.2 Packet Channels . 11 3.4.3 Scalar Channels . 11 3.4.4 OpenMCAPI: An Open Source MCAPI . 11 3.5 MSG................................................ 11 3.6 LINX . 12 3.6.1 Key Features . 12 3.6.2 System Overview . 12 3.7 Summary . 14 4 Requirements on a New Message Passing Implementation 16 4.1 Introduction . 16 4.2 Design 1: ZIMP: Zero-copy Inter-core Message Passing . 16 4.2.1 Suitability for Implementation . 17 4.3 Design 2: A New Approach to Zero-Copy Message Passing with Reversible Memory Allo- cation in Multi-Core Architectures . 17 4.3.1 Shared Memory Heap Structure . 17 4.3.2 Smart Pointer Design . 18 4.4 Suitability for Implementation . 19 4.5 Design 3: Designing High Performance and Scalable MPI Intra-node Communication Sup- port for Clusters . 19 4.5.1 Design Overview . 19 4.6 Conclusion on Suitable Solution for Implementation . 20 5 Providing Traceability in Embedded Systems 21 5.1 Introduction . 21 5.2 Tracing . 21 5.3 Tracing Goals . 21 5.3.1 Information Trace . 22 5.3.2 Error Trace . 22 5.3.3 Debug Trace . 22 5.3.4 Test Trace . 22 5.4 Open Trace Format . 22 5.4.1 Open Trace Format 2 (OTF2) . 22 5.5 LTTng . 23 5.5.1 LTTng Features . 23 5.6 Evaluation for Design . 23 6 Designing a Standard Messaging API for Embedded Systems 24 6.1 Introduction . 24 6.2 Software Requirements . 24 6.2.1 Motivation for the Requirements . 25 6.3 System Overview . 26 6.4 eMPI Application Programming Interface . 26 6.4.1 Built in Information and Debug Trace . 27 6.5 Boost Shared Memory Mailbox . 27 6.5.1 Boost Interprocess . 27 6.5.2 Mailbox Design . 28 6.5.3 Built in Debug and Error Trace . 28 7 Implementation of Zero-Copy Communication on Linux 29 7.1 Introduction . 29 7.2 Delimitations on the Implementation . 29 7.3 Tracing With OTF2 . 29 7.4 Implementation Overview . 29 8 Use Case Implementation: Online Face Recognition 31 8.1 Introduction . 31 8.2 OpenCV . 31 8.3 Application Overview . 31 8.3.1 Video Reader . 32 8.3.2 Face Recognition Trainer . 32 8.3.3 Face Recognition Classifier . 32 8.3.4 Video Output Module . 32 8.4 Zero-Copy Integration . 32 9 Evaluation 34 9.1 Introduction . 34 9.2 Zero-Copy Implementation Evaluation . ..

Analysis and Design of High Performance Inter-Core Process Communication for Linux

Efficient Inter-Core Communications on Manycore Machines

D-Bus, the Message Bus System Training Material

Beej's Guide to Unix IPC

Mmap and Dma

An Introduction to Linux IPC

Message Passing Model

Table of Contents

Message Passing

Open Message Queue Technical Overview Release 5.0

IPC: Mmap and Pipes

A Machine-Independent DMA Framework for Netbsd

Interprocess Communication