Porting Barrelfish to the Tilera Tilepro64 Architecture

Porting Barrelfish to the Tilera TILEPro64 Architecture ROBERT RADKIEWICZ and XIAOWEN WANG KTH Information and Communication Technology Master of Science Thesis Stockholm, Sweden 2013 TRITA-ICT-EX-2013:69 KTH Royal Institute of Technology Master Project at KTH Porting Barrelfish to the Tilera TILEPro64 Architecture Authors: Robert Radkiewicz and Xiaowen Wang Examiner: Prof. Mats Brorsson, KTH, Sweden Abstract Barrelfish is a research operating system with the focus on the scalability of manycore architectures and the increasing numbers of heterogeneous hardware. Instead of heavily relying on the cache coherency protocol which has been proved to be an inherent bottleneck on manycore systems, Barrelfish employs the thought of distributed systems and uses the message-passing approach to implement inter-core communication. The TilePro architecture is a manycore system with up to 64 cores and several mesh networks. Because of its special hardware design, TilePro is considered to be a ideal vehicle to run Barrelfish on, in order to make full use of the advantages from its manycore and network structure. The porting of Barrelfish on TilePro architecture involves some general set-ups of image booting, virtual memory system, context switch, interrupts and system calls, inter-dispatcher communication and so on. At the beginning, the whole system fully starts up on the first logic core, and later the monitor process on the first core will be responsible for booting up others in order, according to the pre-configuration of memory space on the initial core. There are two sorts of communication provided originally in Barrelfish under the protocol of user remote procedure call (URPC). The local message passing (LMP), which happens when two dispatchers communicate with each other on the same core, is implemented by invoking system calls, passing all the values through reserved registers. The user-level message passing (UMP), which is designed for inter-core communication, depends on a shared memory approach. The inter-core communication begins as the second core is starting. The project also investigates how to utilize TilePro mesh network structure to fulfil the inter-core communication so that the characteristic of the architecture would be exploited thoroughly. TilePro offers several mesh networks with different properties and behaviours. In this case, we mainly use user dynamic network (UDN) instead of UMP to carry out remote core-to-core communication, although still based on the existing protocol of URPC. The result shows that Barrelfish can completely boots up on two cores at least and some user applications could be executed either on the first core or on the second properly, while the core-to-core communication is working on TilePro UDN network. Contents 1 Introduction 3 1.1 Current Implementations . .4 1.1.1 Factored Operating System . .4 1.1.2 Tessellation . .4 1.1.3 Barrelfish . .4 1.2 Core-to-Core Communication . .5 1.3 TilePro64 . .6 1.4 Contributions . .6 1.5 Structure of Thesis . .6 2 Implementation 7 2.1 Requirements . .7 2.2 Booting an OS on Tilera . .7 2.2.1 Shipping the Kernel . .7 2.2.2 Hypervisor . .8 2.2.3 Overview of Booting Process . .9 2.3 newlib . 11 2.4 Virtual Memory . 11 2.5 ASIDs . 14 2.6 Interrupt Handling . 15 2.7 System Calls . 15 2.8 Processes and Threads . 16 2.9 I/O . 17 2.10 Local Communication . 17 2.11 Core-to-Core Communication . 18 2.11.1 Existing Barrelfish Ports . 18 2.11.2 Implementation of Message Passing on TilePro . 18 2.11.2.1 Static Network . 18 2.11.2.2 User Dynamic Network . 18 2.11.2.3 Other Dynamic Networks . 19 2.11.3 Implementation of a New Backend . 19 3 Reflection on the Porting Process 23 4 Results 25 4.1 Porting Results . 25 Contents 4.2 Modifications on Barrelfish . 25 5 Conclusion 29 vi List of Figures 2.1 Procedure to create a bootrom file . .8 2.2 Virtual memory layout . 13 2.3 Barrelfish internal virtual memory layout . 13 2.4 Physical memory layout . 14 2.5 Dispatcher structure in Barrelfish . 16 2.6 UDN backend . 20 4.1 Bootstrap on TilePro . 26 List of Tables 4.1 Modification to Barrelfish . 27 Chapter 1 1 Introduction Computer hardware is changing rapidly in past two decades. From the single-core to dual-core, and then to multi-core architectures, researchers have been always seeking the best way to boost the performance of computers. According to some seemingly inevitable defects on the single-core processor, e.g. the lack of parallelism and the increasing thermal issues, the single-core architecture seems to hit the bottleneck and cannot be developed significantly any more [1]. Meanwhile the multi-core architecture emerges and is widely used in variety of workplaces owing to its inherent advantages. For example, different programs have a different gain in speed due to the usage of multiple processors. It depends on how separated the tasks with most cpu-load are. These separated tasks could be a server, which answers to several network connections and does some calculation per connection. For some tasks which are not separated so naturally, they can be split into several sub-tasks, which are then separated. Another example is image processing, where an image could be split into several sub-images, which can be processed by one task. In this example all tasks have a dependency to the whole image, where they get to select the data from. Based on the fast advancement of hardware and its needs in the real working environment, future OSes should be able to make full use of these new features accordingly. Some traditional general-purpose OSes, e.g. the variants of Unix and Windows, use a shared memory kernel with data structures protected by locks [2], which derives from the basic OS thought on the single-core architecture. They are capable of using multiple cores to allocate the workload between them, but this approach does not scale very well with the number of cores rising significantly [3, 4]. Although those OSes have been developed to fully support SMP and ccNUMA architectures in order to obtain the relatively high performance computing. However, the future computer architectures tend to increase the number of cores and hardware diversity [2]. Then as a result, the cache coherence protocols and memory management will become prohibitively costly, when a large number of cores (manycore system) are involved. 1 Introduction 1.1 Current Implementations How to exploit this so-called manycore system efficiently becomes a popular topic. Some researchers have proposed new thoughts by re-conceiving the OS architecture to avoid being limited by the scalability bottleneck of traditional OSes. 1.1.1 Factored Operating System Factored Operating System (fos) [5] is a new operating system targeting manycore systems with scalability as the primary design constraint, where space sharing replaces time sharing to increase scalability. The main feature of fos is that it factors an OS into a set of services where each service is built to resemble a distributed Internet server. The OS kernel and user application services are respectively located on the different sets of servers, so that they would not interfere each other, thereby increasing the degree of distribution and parallelism of the system. Each server runs on a given core, and they may communicate with each other based on the paradigm of message passing. 1.1.2 Tessellation Tessellation OS [6] restructures the operating system to support a simultaneous mix of interactive, real-time, and high-throughput parallel applications. It utilizes two novel ideas, Space-Time Partitioning and Two-Level Scheduling, to reach the goals of resource distribution, performance isolation and QoS guarantees. Applications are divided into performance-isolated, gang-scheduled cells communicating through secure channels. 1.1.3 Barrelfish Barrelfish [2] demonstrates that the message passing method outperforms traditional shared memory idea which is the main bottleneck to scale the OS to many-core architecture. It also proposes the “share-no-memory” concept for the OS, exploiting the explicit and asynchronous message passing method for the communication between cores in order to reduce the side effects from cache coherence protocols. This message passing or exchanging is also for the implementation of the shared OS state replication for each core, which will reduce the overhead of load on the system interconnect, memory contention and synchronization. Another remarkable characteristic of Barrelfish is the compatibility of heterogeneous architectures. According to the demands from real workplaces, there is a trend that several different types of architectures may be combined into one board, incorporating to deal with some 4 1.2 Core-to-Core Communication tasks together. Barrelfish is designed originally to implement this demand, making it possible to boot up different kernels for different hardware. 1.2 Core-to-Core Communication In a shared memory scenario data may be accessed from different processors and is normally placed somewhere in the RAM. So the accesses to this data must be controlled to avoid the situation where the asynchronous nature of this scenario may corrupt the data or calculations on these. Problems arise, when one processor overwrites data from another processor or a processor is working with outdated data. To maintain control over this there are tools like locking and cache coherence protocols. Locking is a tool to lock accesses to this memory, so that only the processor which holds this lock can access it, while all other access at this time are blocked until the lock is free again. Cache coherence is a tool to let the processor-specific caches stay in sync. Processors have local caches, which allow them to access cached data in orders of magnitude faster.

Load more