Optimistic Compiler Optimizations for Network Systems Sapan Bhatia
Total Page:16
File Type:pdf, Size:1020Kb
Optimistic compiler optimizations for network systems Sapan Bhatia To cite this version: Sapan Bhatia. Optimistic compiler optimizations for network systems. Software Engineering [cs.SE]. Université Sciences et Technologies - Bordeaux I, 2006. English. tel-00402492 HAL Id: tel-00402492 https://tel.archives-ouvertes.fr/tel-00402492 Submitted on 7 Jul 2009 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. i Ecole´ doctorale de Math´ematiques et Informatique de Bordeaux/No. d’ordre 3169 Optimisations de Compilateur Optimistes pour les Syst`emes R´eseaux (Optimistic compiler optimizations for network systems) THESE` pr´esent´ee et soutenue publiquement le 9 Juin 2006 pour l’obtention du Doctorat de l’universit´ede Bordeaux 1 (sp´ecialit´einformatique) par Sapan Bhatia Composition du jury Pr´esident : Raymond Namyst (Professeur) Rapporteurs : Gilles Muller (Professeur) Marc Shapiro (Directeur de recherche) Directeur de th`ese: Charles Consel (Professeur) Laboratoire Bordelais de Recherche en Informatique ii iii This thesis is dedicated to you, the reader. iv Contents List of Figures viii 1 Introduction 1 1.1 Overview: optimizations and system architecture ........... 2 1.2 Contributions ............................... 3 1.2.1 Protocol-stack specialization ................... 4 1.2.2 Remote specialization ....................... 4 1.2.3 Memory-manager/scheduler co-design to improve server perfor- mance ............................... 5 1.2.4 The broomstick toolkit ...................... 5 1.3 Thesis organization ............................ 6 2 Related Work 7 2.1 Optimizations for network software ................... 8 2.1.1 Optimizing protocol stacks .................... 8 2.1.2 Network-server optimization ................... 9 2.2 Extensible operating systems ....................... 10 2.3 Program specialization .......................... 11 2.4 Cache optimizations ........................... 12 2.5 Miscellaneous work ............................ 12 3 Protocol-Stack Specialization 14 3.1 Specialization of protocol stacks: concept and mechanism ....... 15 3.1.1 Specialization opportunities in the sockets, udp and ip layers . 16 3.1.2 Code guards ............................ 21 3.2 Enabling the specialization ........................ 23 3.2.1 Describing the specialization opportunities to the specializer . 24 3.2.2 Specialization process: local or remote? ............. 25 3.2.3 Specializing locally ........................ 25 3.2.4 Specializing remotely ....................... 25 3.3 Experimental performance evaluation and analysis ........... 26 3.3.1 Experiments ............................ 26 v 3.3.2 Size and performance of specialized code ............ 26 3.4 Conclusion ................................. 29 3.5 Shortcomings and possible extensions .................. 29 4 Remote specialization 30 4.1 Remote Specialization: A birds-eye view ................ 32 4.1.1 From A Programmer’s Point of View .............. 34 4.1.2 From an OS developer’s point of View ............. 36 4.2 Specialization Infrastructure ....................... 38 4.2.1 Functional overview ........................ 38 4.2.2 Detailed descriptions ....................... 40 4.2.3 Code manager ........................... 41 4.2.4 Helper process ........................... 43 4.2.5 Run-time layer .......................... 43 4.2.6 Specialization templates ..................... 44 4.2.7 Specializer and compiler ..................... 44 4.2.8 TCP client and server ...................... 45 4.3 Discussion ................................. 46 4.3.1 Specialization latency ....................... 46 4.4 Discussion ................................. 46 4.4.1 Applicability of our approach .................. 46 4.4.2 Module dependencies ....................... 47 4.5 Conclusion and Future Work ....................... 47 5 Memory-manager/Scheduler Co-design 49 5.1 Event-driven servers ........................... 51 5.1.1 Overview ............................. 51 5.1.2 Performance of event-driven servers ............... 52 5.1.3 Caching behavior ......................... 53 5.2 Eliminating data-cache misses ...................... 55 5.2.1 The stingy allocator ....................... 55 5.2.2 Constraints ............................ 56 5.2.3 Objective function ........................ 57 5.2.4 Scheduling for cache efficiency .................. 58 5.3 Performance evaluation .......................... 58 5.3.1 Benchmarking methodology ................... 58 5.3.2 Tools ................................ 58 5.3.3 Environment ........................... 60 5.4 Performance analysis ........................... 60 5.4.1 Httperf .............................. 60 5.4.2 Apachebench ........................... 60 vi 5.4.3 Analysis .............................. 62 5.5 Shortcomings and possible extensions .................. 63 6 The Broomstick Optimizer 64 6.1 A programmer’s eye view ......................... 64 6.2 Analyses and transformations ...................... 67 6.2.1 Identifying the stages and the scheduler ............. 67 6.2.2 Memory analysis ......................... 69 6.2.3 Parameterizing the stingy allocator ............... 71 6.2.4 Code annotations ......................... 71 6.3 Application to real programs ....................... 72 6.4 Applicability ................................ 72 6.5 Conclusion ................................. 73 7 Case Study: The TUX server 76 7.1 The tux web server ............................ 76 7.1.1 Overview ............................. 76 7.1.2 Server architecture ........................ 77 7.1.3 Staging and scheduling ...................... 77 7.1.4 Memory management. ...................... 78 7.2 Specializing TUX ............................. 79 7.2.1 Protocol served .......................... 79 7.2.2 Number of CPUs, Sockets .................... 80 7.2.3 Server actions ........................... 80 7.2.4 Mime-types handled ....................... 81 7.2.5 Other options ........................... 81 7.2.6 Experiments ............................ 81 7.3 Cache-related slowdowns in TUX .................... 82 7.4 Cache-optimizing the TUX server: an experience study ........ 83 7.4.1 Preparatory step ......................... 83 7.4.2 Memory analysis with memwalk ................. 84 7.4.3 Generating a customized allocator using stingygen ...... 86 7.4.4 Modifying the server to use the customized allocator ..... 87 7.4.5 Modifying the scheduler ..................... 87 7.5 Performance evaluation .......................... 91 7.5.1 Benchmarking methodology ................... 91 7.5.2 Tools ................................ 91 7.5.3 Environment ........................... 92 7.6 Performance analysis ........................... 92 7.6.1 Httperf .............................. 93 7.6.2 Apachebench ........................... 93 vii 7.6.3 Analysis .............................. 93 7.7 Shortcomings and possible extensions .................. 95 8 Conclusion 97 Publications derived from this thesis 99 Bibliography of Related Work 100 viii List of Figures 1.1 High-level view of the optimization framework. (a) Specialization of protocol stacks. (b) Optimization of an event-driven network server. 2 3.1 Fast-path of the UDP send operation in BSD ............. 22 3.2 Specialization declarations ........................ 24 3.3 Specialization results: Performance, code size and overhead. ..... 27 3.4 Original vs. specialized code: maximum throughput on the PIII, 486 and iPAQ for UDP (±5% at 99%) confidence and TCP (±2.5% at 99% confidence) ................................ 28 4.1 Requesting and using a specialized functionality. ............ 33 4.2 A fallback functionality. ......................... 33 4.3 From an application developer’s point of view. ............. 34 4.4 A fragment of the binding-time annotations for the send system call . 37 4.5 Configuration information provided to the stub generator ....... 38 4.6 Taxonomy of functional units of the specialization infrastructure ... 39 4.7 User-support functions for the TCP/IP stack. ............. 40 4.8 Code excerpt: do˙specialize ........................ 42 4.9 Code excerpt: done˙specialize ...................... 42 4.10 Code excerpt: specialized ......................... 43 4.11 Remote specialization infrastructure ................... 45 5.1 An event-driven server. .......................... 52 5.2 Per-task state during program execution ................ 54 5.3 Throughput degradation with increasing L2 cache misses. ....... 54 5.4 (a) Throughput of TUX with increasing concurrency. (b) Correspond- ing increase in L2 cache misses (c) Peak performance of TUX for uni- form load. ................................. 61 5.5 Comparison of the performance of the original thttpd server to that of the optimized thttpd server ....................... 62 6.1 Example annotations and wrappers for TUX. ............. 65 ix 6.2 Interface used to extract the structure and memory utilization behavior. 66 6.3 Modifying the scheduler: A high-level conceptual overview ...... 66 6.4 A fragment of the Stage Call Graph (SCG) of TUX. nf