UvA-DARE (Digital Academic Repository)

On the construction of operating systems for the Microgrid many-core architecture van Tol, M.W.

Publication date 2013

Link to publication

Citation for published version (APA): van Tol, M. W. (2013). On the construction of operating systems for the Microgrid many-core architecture.

General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)

Download date:27 Sep 2021 General Bibliography

[1] J. Aas. Understanding the linux 2.6.8.1 cpu scheduler. Technical report, Silicon Graphics Inc. (SGI), Eagan, MN, USA, February 2005. [2] D. Abts, S. Scott, and D. J. Lilja. So many states, so little time: Verifying memory coherence in the Cray X1. In Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS ’03, page 10 pp., Washington, DC, USA, April 2003. IEEE Computer Society. [3] M. J. Accetta, R. V. Baron, W. J. Bolosky, D. B. Golub, R. F. Rashid, A. Teva- nian, and M. Young. Mach: A New Kernel Foundation for UNIX Development. In Proceedings of the Summer 1986 USENIX Conference, pages 93–113, July 1986. [4] O. Agesen, A. Garthwaite, J. Sheldon, and P. Subrahmanyam. The evolution of an x86 virtual machine monitor. SIGOPS Oper. Syst. Rev., 44:3–18, December 2010. [5] L. Albinson, D. Grabas, P. Piovesan, M. Tombroff, C. Tricot, and H. Yassaie. Unix on a loosely coupled architecture: The chorus/mix approach. Future Generation Computer Systems, 8(1-3):67 – 81, 1992. [6] G. Allen, T. Dramlitsch, I. Foster, N. T. Karonis, M. Ripeanu, E. Seidel, and B. Toonen. Supporting efficient execution in heterogeneous environments with cactus and globus. In Supercomputing ’01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pages 52–52, New York, NY, USA, 2001. ACM. [7] AMD. AMD Opteron 6000 Series Platform Quick Reference Guide. http:// sites.amd.com/us/Documents/48101A_Opteron%20_6000_QRG_RD2.pdf, 2010. (Retrieved January 2011). [8] G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, spring joint com- puter conference, AFIPS ’67 (Spring), pages 483–485, New York, NY, USA, 1967. ACM. [9] M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl’s law through EPI throttling. In Computer Architecture, 2005. ISCA ’05. Proceedings. 32nd International Symposium on, pages 298–309, June 2005.

165 166 GENERAL BIBLIOGRAPHY

[10] K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Re- port UCB/EECS-2006-183, EECS Department, University of California, Berkeley, December 2006.

[11] M. Azimi, N. Cherukuri, D. N. Jayasimha, A. Kumar, P. Kundu, S. Park, I. Schoinas, and A. S. Vaidya. Integration challenges and tradeoffs for tera-scale architectures. Intel Technology Journal, 11(3):173 – 184, 2007.

[12] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In Proceedings of the nineteenth ACM symposium on Operating systems principles, SOSP ’03, pages 164–177, New York, NY, USA, 2003. ACM.

[13] A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Sch¨upbach, and A. Singhania. The multikernel: a new os architecture for scalable multicore systems. In SOSP ’09: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 29–44, New York, NY, USA, 2009. ACM.

[14] P. Bellens, J. M. Perez, R. M. Badia, and J. Labarta. Cellss: a programming model for the cell be architecture. In SC ’06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 86, New York, NY, USA, 2006. ACM.

[15] J. A. Bergstra and C. A. Middelburg. On the definition of a theoretical concept of an . ArXiv e-prints, June 2010.

[16] A. Birrell, J. Guttag, J. Horning, and R. Levin. Synchronization primitives for a multiprocessor: a formal specification. In Proceedings of the eleventh ACM Symposium on Operating systems principles, SOSP ’87, pages 94–102, New York, NY, USA, 1987. ACM.

[17] A. D. Birrell and B. J. Nelson. Implementing remote procedure calls. ACM Trans. Comput. Syst., 2:39–59, February 1984.

[18] D. L. Black, R. F. Rashid, D. B. Golub, and C. R. Hill. Translation lookaside buffer consistency: a software approach. In Proceedings of the third international conference on Architectural support for programming languages and operating sys- tems, ASPLOS-III, pages 113–122, New York, NY, USA, 1989. ACM.

[19] M. J. Bligh, M. Dobson, D. Hart, and G. Huizenga. Linux on NUMA systems. In J. W. Lockhart, editor, Proceedings of the Linux Symposium, volume 1, pages 91–104, July 2004.

[20] R. D. Blumofe, M. Frigo, C. F. Joerg, C. E. Leiserson, and K. H. Randall. Dag-consistent distributed shared memory. Parallel Processing Symposium, In- ternational, 0:132, 1996. GENERAL BIBLIOGRAPHY 167

[21] R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207–216, 1995. [22] S. Borkar. Thousand core chips: a technology perspective. In DAC ’07: Pro- ceedings of the 44th annual Design Automation Conference, pages 746–749, New York, NY, USA, 2007. ACM. [23] S. Borkar and A. A. Chien. The future of microprocessors. Commun. ACM, 54:67–77, May 2011. [24] D. Box, D. Ehnebuske, G. Kakivaya, A. Layman, N. Mendelsohn, H. Frystyk Nielsen, S. Thatte, and D. Winer. Simple object access protocol (SOAP) 1.1. World Wide Web Consortium, Note NOTE-SOAP-20000508, May 2000. [25] S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Mor- ris, and N. Zeldovich. An analysis of linux scalability to many cores. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, OSDI’10, pages 1–8, Berkeley, CA, USA, 2010. USENIX Association. [26] R. Bryant, B. Hartner, Q. He, and G. Venkitachalam. SMP scalability com- parisons of Linux kernels 2.2.14 and 2.3.99. In Proceedings of the 4th annual Linux Showcase & Conference - Volume 4, pages 13–13, Berkeley, CA, USA, 2000. USENIX Association. [27] G. D. Burns, A. K. Pfiffer, D. L. Fielding, and A. A. Brown. Trillium operating system. In Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1, C3P, pages 374–376, New York, NY, USA, 1988. ACM. [28] M. Burnside and A. D. Keromytis. High-speed I/O: the operating system as a signalling mechanism. In NICELI ’03: Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence, pages 220–227, New York, NY, USA, 2003. ACM. [29] M. Butler, L. Barnes, D. Sarma, and B. Gelinas. Bulldozer: An approach to multithreaded compute performance. Micro, IEEE, 31(2):6 –15, march-april 2011. [30] J. B. Carter, J. K. Bennett, and W. Zwaenepoel. Implementation and perfor- mance of Munin. In SOSP ’91: Proceedings of the thirteenth ACM symposium on Operating systems principles, pages 152–164, New York, NY, USA, 1991. ACM. [31] R. P. Case and A. Padegs. Architecture of the IBM System/370. Commun. ACM, 21(1):73–96, Jan. 1978. [32] B. L. Chamberlain, D. Callahan, and H. P. Zima. Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl., 21(3):291–312, 2007. [33] J. Charles, P. Jassi, N. Ananth, A. Sadat, and A. Fedorova. Evaluation of the intel core i7 turbo boost feature. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 188–197, October 2009. 168 GENERAL BIBLIOGRAPHY

[34] P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In OOPSLA ’05: Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 519–538, New York, NY, USA, 2005. ACM.

[35] J. S. Chase, H. M. Levy, M. J. Feeley, and E. D. Lazowska. Sharing and protection in a single-address-space operating system. ACM Trans. Comput. Syst., 12:271– 307, November 1994.

[36] D. R. Cheriton. The V distributed system. Commun. ACM, 31:314–333, March 1988.

[37] D. R. Cheriton and W. Zwaenepoel. The distributed V kernel and its performance for diskless workstations. SIGOPS Oper. Syst. Rev., 17:129–140, October 1983.

[38] E. F. Codd, E. S. Lowry, E. McDonough, and C. A. Scalzi. Multiprogramming STRETCH: feasibility considerations. Commun. ACM, 2(11):13–17, 1959.

[39] J. A. Colmenares, S. Bird, H. Cook, P. Pearce, D. Zhu, J. Shalfy, S. Hofmeyr, K. Asanovi´c,and J. Kubiatowiczh. Resource management in the Tessellation manycore os. In Proceedings of the Second USENIX conference on Hot topics in parallelism, HotPar’10, Berkeley, CA, USA, 2010. USENIX Association.

[40] F. J. Corbat´o,M. Merwin-Daggett, and R. C. Daley. An experimental time- sharing system. In Proceedings of the May 1-3, 1962, spring joint computer con- ference, AIEE-IRE ’62 (Spring), pages 335–344, New York, NY, USA, 1962. ACM.

[41] F. J. Corbat´oand V. A. Vyssotsky. Introduction and overview of the Multics system. In Proceedings of the November 30–December 1, 1965, fall joint computer conference, part I, AFIPS ’65 (Fall, part I), pages 185–196, New York, NY, USA, 1965. ACM.

[42] R. J. Creasy. The origin of the VM/370 time-sharing system. IBM J. Res. Dev., 25:483–490, September 1981.

[43] Y. Cui, Y. Chen, Y. Shi, and Q. Wu. Scalability comparison of commodity operating systems on multi-cores. In Performance Analysis of Systems Software (ISPASS), 2010 IEEE International Symposium on, pages 117–118, March 2010.

[44] F. Dahlgren and J. Torrellas. Cache-only memory architectures. Computer, 32(6):72–79, June 1999.

[45] R. C. Daley and J. B. Dennis. , processes, and sharing in Multics. In Proceedings of the first ACM symposium on Operating System Principles, SOSP ’67, pages 12.1–12.8. ACM, 1967.

[46] Dell. PowerEdge 11G R815 Spec Sheet. http://www.dell.com/downloads/ global/products/pedge/en/poweredge-r815-spec-sheet-en.pdf, 2010. (Re- trieved January 2011). GENERAL BIBLIOGRAPHY 169

[47] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry. Epidemic algorithms for replicated database main- tenance. In Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, PODC ’87, pages 1–12, New York, NY, USA, 1987. ACM. [48] P. J. Denning. Virtual memory. ACM Comput. Surv., 2:153–189, September 1970. [49] J. B. Dennis and E. C. Van Horn. Programming semantics for multiprogrammed computations. Commun. ACM, 9(3):143–155, 1966. [50] E. W. Dijkstra. The structure of the THE-multiprogramming system. In Pro- ceedings of the first ACM symposium on Operating System Principles, SOSP ’67, pages 10.1–10.6, New York, NY, USA, 1967. ACM. [51] E. W. Dijkstra. Hierarchical ordering of sequential processes. Acta Informatica, 1(2):115–138, June 1971. [52] R. P. Draves, B. N. Bershad, R. F. Rashid, and R. W. Dean. Using continuations to implement management and communication in operating systems. In Proceedings of the thirteenth ACM symposium on Operating systems principles, SOSP ’91, pages 122–136, New York, NY, USA, 1991. ACM. [53] M. Eisler. XDR: External Data Representation Standard. RFC 4506 (Standard), May 2006. [54] D. R. Engler, M. F. Kaashoek, and J. O’Toole, Jr. : an operating sys- tem architecture for application-level resource management. In SOSP ’95: Pro- ceedings of the fifteenth ACM symposium on Operating systems principles, pages 251–266, New York, NY, USA, 1995. ACM. [55] H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In Proceeding of the 38th annual international symposium on Computer architecture, ISCA ’11, pages 365–376, New York, NY, USA, 2011. ACM. [56] P. T. Eugster, R. Guerraoui, A.-M. Kermarrec, and L. Massoulieacute;. Epidemic information dissemination in distributed systems. Computer, 37(5):60–67, May 2004. [57] K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: programming the memory hierarchy. In SC ’06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 83, New York, NY, USA, 2006. ACM. [58] B. D. Fleisch and M. A. A. Co. Workplace and os: a case study. Softw. Pract. Exper., 28(6):569–591, 1998. [59] I. Foster and C. Kesselman. The globus project: A status report. In Proceedings of the Seventh Heterogeneous Computing Workshop, HCW ’98, pages 4–, Los Alamitos, CA, USA, 1998. IEEE Computer Society. 170 GENERAL BIBLIOGRAPHY

[60] H. Franke and B. Abbott. Tops-a distributed operating system kernel for trans- puter systems. In System Theory, 1990., Twenty-Second Southeastern Symposium on, pages 103 –107, mar 1990. [61] M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the cilk-5 multithreaded language. In PLDI ’98: Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, pages 212–223, New York, NY, USA, 1998. ACM. [62] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM: Parallel virtual machine: a users’ guide and tutorial for networked parallel computing. MIT Press, Cambridge, MA, USA, 1994. [63] A. Geist, W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, W. Saphir, T. Skjellum, and M. Snir. Mpi-2: Extending the message-passing interface. In L. Boug, P. Fraigniaud, A. Mignotte, and Y. Robert, editors, Euro-Par’96 Parallel Processing, volume 1123 of Lecture Notes in Computer Science, pages 128–135. Springer Berlin / Heidelberg, 1996. [64] I. Gelado, J. E. Stone, J. Cabezas, S. Patel, N. Navarro, and W.-m. W. Hwu. An asymmetric distributed shared memory model for heterogeneous parallel systems. In ASPLOS ’10: Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, pages 347–358, New York, NY, USA, 2010. ACM. [65] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiproces- sors. In ISCA ’90: Proceedings of the 17th annual international symposium on Computer Architecture, pages 15–26, New York, NY, USA, 1990. ACM. [66] M. Gschwind, H. Hofstee, B. Flachs, M. Hopkin, Y. Watanabe, and T. Yamazaki. Synergistic processing in cell’s multicore architecture. Micro, IEEE, 26(2):10–24, march-april 2006. [67] D. Guniguntala, P. E. McKenney, J. Triplett, and J. Walpole. The read-copy- update mechanism for supporting real-time applications on shared-memory mul- tiprocessor systems with Linux. IBM Systems Journal, 47(2):221–236, January 2008. [68] E. Hagersten, A. Landin, and S. Haridi. DDM - a cache-only memory architec- ture. Computer, 25(9):44–54, September 1992. [69] T. R. Halfhill. ARM’s smallest thumb. Microprocessor Report, pages 1–6, March 2009. [70] S. M. Hand. Self-paging in the Nemesis operating system. In Proceedings of the third symposium on Operating systems design and implementation, OSDI ’99, pages 73–86. USENIX Association, 1999. [71] P. B. Hansen. The nucleus of a multiprogramming system. Commun. ACM, 13:238–241, April 1970. GENERAL BIBLIOGRAPHY 171

[72] A. F. Harvey. DMA Fundamentals on Various PC Platforms. Application Note 011, National Instruments, April 1991. [73] D. A. Heger, S. K. Johnson, M. Anand, M. Peloquin, M. Sullivan, A. Theurer, and P. W. Y. Wong. An application centric performance evaluation of the linux 2.6 operating system. IBM Redbooks paper REDP-3876-00, International Business Machines (IBM) Corp., July 2004. [74] G. Heiser. Many-core chips – a case for virtual shared memory. In Proceedings of the 2nd Workshop on Managed Many-Core Systems, Washington, DC, USA, March 2009. ACM. [75] G. Heiser, K. Elphinstone, J. Vochteloo, S. Russell, and J. Liedtke. The Mungi single-address-space operating system. Softw. Pract. Exper., 28:901–928, July 1998. [76] J. N. Herder, H. Bos, B. Gras, P. Homburg, and A. S. Tanenbaum. Minix 3: a highly reliable, self-repairing operating system. SIGOPS Oper. Syst. Rev., 40:80– 89, July 2006. [77] M. D. Hill, J. R. Larus, S. K. Reinhardt, and D. A. Wood. Cooperative shared memory: software and hardware for scalable multiprocessor. In ASPLOS-V: Pro- ceedings of the fifth international conference on Architectural support for program- ming languages and operating systems, pages 262–273, New York, NY, USA, 1992. ACM. [78] W. D. Hillis and L. W. Tucker. The CM-5 Connection Machine: a scalable supercomputer. Commun. ACM, 36:31–40, November 1993. [79] C. A. R. Hoare. Communicating sequential processes. Commun. ACM, 21(8):666–677, 1978. [80] M. Homewood, D. May, D. Shepherd, and R. Shepherd. The IMS T800 trans- puter. Micro, IEEE, 7(5):10–26, oct. 1987. [81] J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, F. Pailet, S. Jain, T. Jacob, S. Yada, S. Marella, P. Salihundam, V. Erraguntla, M. Konow, M. Riepen, G. Droege, J. Lindemann, M. Gries, T. Apel, K. Henriss, T. Lund-Larsen, S. Steibl, S. Borkar, V. De, R. V. D. Wijngaart, and T. Mattson. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, pages 108–109, February 2010. [82] HyperTransport Technology Consortium. HyperTransport I/O Link Specifica- tion. Technical Document HTC20051222-0046-0026, July 2008. [83] IBM. 709 data processing system. http://www-03.ibm.com/ibm/history/ exhibits/mainframe/mainframe_PP709.html, 1957. [84] Intel. Product Brief: Intel Xeon Processor 7500 Series. http://www.intel.com/ Assets/en_US/PDF/prodbrief/323499.pdf, 2010. (Retrieved January 2011). 172 GENERAL BIBLIOGRAPHY

[85] M. Jelasity, R. Guerraoui, A.-M. Kermarrec, and M. van Steen. The peer sam- pling service: experimental evaluation of unstructured gossip-based implementa- tions. In Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware, Middleware ’04, pages 79–98, New York, NY, USA, 2004. Springer- Verlag New York, Inc.

[86] K. L. Johnson, M. F. Kaashoek, and D. A. Wallach. Crl: high-performance all- software distributed shared memory. In SOSP ’95: Proceedings of the fifteenth ACM symposium on Operating systems principles, pages 213–226, New York, NY, USA, 1995. ACM.

[87] E. Jul, H. Levy, N. Hutchinson, and A. Black. Fine-grained mobility in the emerald system. ACM Trans. Comput. Syst., 6:109–133, February 1988.

[88] G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elka- duwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Win- wood. sel4: formal verification of an os kernel. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP ’09, pages 207–220, New York, NY, USA, 2009. ACM.

[89] E. J. Koldinger, J. S. Chase, and S. J. Eggers. Architecture support for sin- gle address space operating systems. In ASPLOS-V: Proceedings of the fifth in- ternational conference on Architectural support for programming languages and operating systems, pages 175–186, New York, NY, USA, 1992. ACM.

[90] O. Krieger, P. McGachey, and A. Kanevsky. Enabling a marketplace of clouds: Vmware’s vcloud director. SIGOPS Oper. Syst. Rev., 44:103–114, December 2010.

[91] R. Kumar, V. Zyuban, and D. Tullsen. Interconnections in multi-core architec- tures: understanding mechanisms, overheads and scaling. In Computer Archi- tecture, 2005. ISCA ’05. Proceedings. 32nd International Symposium on, pages 408–419, June 2005.

[92] S. Kumar, C. J. Hughes, and A. Nguyen. Carbon: architectural support for fine- grained parallelism on chip multiprocessors. SIGARCH Comput. Archit. News, 35:162–173, June 2007.

[93] G. Kurian, J. E. Miller, J. Psota, J. Eastep, J. Liu, J. Michel, L. C. Kimerling, and A. Agarwal. ATAC: a 1000-core cache-coherent processor with on-chip optical network. In Proceedings of the 19th international conference on Parallel architec- tures and compilation techniques, PACT ’10, pages 477–488, New York, NY, USA, 2010. ACM.

[94] L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21:558–565, July 1978.

[95] L. Lamport. How to make a multiprocessor computer that correctly executes mul- tiprocess programs. Computers, IEEE Transactions on, C-28(9):690–691, Septem- ber 1979. GENERAL BIBLIOGRAPHY 173

[96] B. W. Lampson. Protection. SIGOPS Oper. Syst. Rev., 8:18–24, January 1974.

[97] C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Gan- mukhi, J. V. Hill, D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong, S.-W. Yang, and R. Zak. The network architecture of the Connection Machine CM-5 (extended abstract). In Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, SPAA ’92, pages 272–285, New York, NY, USA, 1992. ACM.

[98] M. Lewis and A. Grimshaw. The core legion object model. In HPDC ’96: Proceed- ings of the 5th IEEE International Symposium on High Performance Distributed Computing, page 551, Washington, DC, USA, 1996. IEEE Computer Society.

[99] K. Li and P. Hudak. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst., 7:321–359, November 1989.

[100] J. Liedtke. Improving IPC by kernel design. In SOSP ’93: Proceedings of the fourteenth ACM symposium on Operating systems principles, pages 175–188, New York, NY, USA, 1993. ACM.

[101] J. Liedtke. On micro-kernel construction. In SOSP ’95: Proceedings of the fifteenth ACM symposium on Operating systems principles, pages 237–250, New York, NY, USA, 1995. ACM.

[102] J. Liedtke. µ-kernels must and can be small. Object-Orientation in Operating Systems, International Workshop on, 0:152, 1996.

[103] J. Liedtke. Toward real . Commun. ACM, 39:70–77, September 1996.

[104] R. Liu, K. Klues, S. Bird, S. Hofmeyr, K. Asanovi´c,and J. Kubiatowicz. Tes- sellation: space-time partitioning in a manycore client os. In Proceedings of the First USENIX conference on Hot topics in parallelism, HotPar’09, pages 10–10, Berkeley, CA, USA, 2009. USENIX Association.

[105] G. H. Loh. 3D-stacked memory architectures for multi-core processors. In Pro- ceedings of the 35th Annual International Symposium on Computer Architecture, ISCA ’08, pages 453–464, Washington, DC, USA, 2008. IEEE Computer Society.

[106] D. May. Occam. SIGPLAN Not., 18:69–79, April 1983.

[107] P. R. McJones and G. F. Swart. Evolving the UNIX system interface to support multithreaded programs. Technical Report SRC-21, Digital Systems Research Center, Palo Alto, California, US, September 1987.

[108] C. Meenderinck and B. Juurlink. A case for hardware task management support for the starss programming model. In Proceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools, DSD ’10, pages 347–354. IEEE, September 2010. 174 GENERAL BIBLIOGRAPHY

[109] R. M. Metcalfe and D. R. Boggs. Ethernet: distributed packet switching for local computer networks. Commun. ACM, 19:395–404, July 1976. [110] P. V. Mockapetris and K. J. Dunlap. Development of the domain name system. SIGCOMM Comput. Commun. Rev., 18(4):123–133, August 1988. [111] G. E. Moore. Cramming more components onto integrated circuits. Electronics, 38(8), April 1965. [112] C. Morin. XtreemOS: A grid operating system making your computer ready for participating in virtual organizations. In Proceedings of the 10th IEEE Inter- national Symposium on Object and Component-Oriented Real-Time Distributed Computing, ISORC ’07, pages 393–402, Washington, DC, USA, 2007. IEEE Com- puter Society. [113] S. J. Mullender, G. van Rossum, A. S. Tanenbaum, R. van Renesse, and H. van Staveren. Amoeba: a distributed operating system for the 1990s. Computer, 23(5):44–53, May 1990. [114] E. B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, and G. Hunt. Helios: heterogeneous multiprocessing with satellite kernels. In SOSP ’09: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 221–234, New York, NY, USA, 2009. ACM. [115] R. Nikhil. Parallel symbolic computing in cid. In T. Ito, R. H. Halstead, and C. Queinnec, editors, Parallel Symbolic Languages and Systems, Proceedings of the 1995 International Workshop PSLS’95 Beaune, France, October 24, volume 1068 of Lecture Notes in Computer Science, pages 215–242. Springer-Verlag Berlin / Heidelberg, 1996. [116] Object Management Group. CORBA component model 4.0 specification. Spec- ification Version 4.0, Object Management Group, April 2006. [117] K. Olukotun and L. Hammond. The future of microprocessors. Queue, 3:26–29, September 2005. [118] PCI SIG. PCI Local Bus Specification Revision 2.3 MSI-X ECN. http://www. pcisig.com/specifications/conventional/msi-x_ecn.pdf. [119] Perihelion Software Ltd. The Helios Operating System. Prentice Hall Interna- tional, UK, May 1991. [120] S. Peter, A. Sch¨upbach, P. Barham, A. Baumann, R. Isaacs, T. Harris, and T. Roscoe. Design principles for end-to-end multicore schedulers. In Proceedings of the 2nd USENIX conference on Hot topics in parallelism, HotPar’10, Berkeley, CA, USA, June 2010. USENIX Association. [121] S. Peter, A. Sch¨upbach, D. Menzi, and T. Roscoe. Early experience with the Barrelfish OS and the Single-chip Cloud Computer. In 3rd Many-core Applications Research Community (MARC) Symposium. KIT Scientific Publishing, September 2011. GENERAL BIBLIOGRAPHY 175

[122] R. Pike, D. L. Presotto, S. Dorward, B. Flandrena, K. Thompson, and H. Trickey. Plan 9 from bell labs. Computing Systems, 8(2):221–254, 1995.

[123] J. S. Rellermeyer, G. Alonso, and T. Roscoe. R-osgi: distributed applications through software modularization. In Middleware ’07: Proceedings of the ACM/I- FIP/USENIX 2007 International Conference on Middleware, pages 1–20, New York, NY, USA, 2007. Springer-Verlag New York, Inc.

[124] D. M. Ritchie and K. Thompson. The UNIX time-sharing system. Commun. ACM, 17:365–375, July 1974.

[125] G. Rodrigues. Taming the OOM killer. LWN.net, February 2009.

[126] T. Roscoe. Linkage in the nemesis single address space operating system. SIGOPS Oper. Syst. Rev., 28(4):48–55, October 1994.

[127] D. Ross. A personal view of the personal work station: some firsts in the fifties. In Proceedings of the ACM Conference on The history of personal workstations, HPW ’86, pages 19–48, New York, NY, USA, 1986. ACM.

[128] D. T. Ross. Gestalt programming: a new concept in automatic programming. In Papers presented at the February 7-9, 1956, joint ACM-AIEE-IRE western computer conference, AIEE-IRE ’56 (Western), pages 5–10, New York, NY, USA, 1956. ACM.

[129] P. Rovner. Extending modula-2 to build large, integrated systems. Software, IEEE, 3(6):46–57, nov. 1986.

[130] S. Rusu, S. Tam, H. Muljono, J. Stinson, D. Ayers, J. Chang, R. Varada, M. Ratta, S. Kottapalli, and S. Vora. A 45 nm 8-core enterprise Xeon processor. Solid-State Circuits, IEEE Journal of, 45(1):7–14, January 2010.

[131] P. Salihundam, S. Jain, T. Jacob, S. Kumar, V. Erraguntla, Y. Hoskote, S. Van- gal, G. Ruhl, and N. Borkar. A 2 Tb/s 6, times, 4 mesh network for a single-chip cloud computer with DVFS in 45 nm CMOS. Solid-State Circuits, IEEE Journal of, 46(4):757–766, april 2011.

[132] D. Sanchez, R. M. Yoo, and C. Kozyrakis. Flexible architectural support for fine-grain scheduling. In Proceedings of the fifteenth edition of ASPLOS on Archi- tectural support for programming languages and operating systems, ASPLOS ’10, pages 311–322. ACM, 2010.

[133] F. Scheler, W. Hofer, B. Oechslein, R. Pfister, W. Schr¨oder-Preikschat, and D. Lohmann. Parallel, hardware-supported handling in an event- triggered real-time operating system. In CASES ’09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, pages 167–174, New York, NY, USA, 2009. ACM.

[134] M. D. Schroeder and M. Burrows. Performance of the firefly rpc. ACM Trans. Comput. Syst., 8:1–17, February 1990. 176 GENERAL BIBLIOGRAPHY

[135] A. Sch¨upbach, S. Peter, A. Baumann, T. Roscoe, P. Barham, T. Harris, and R. Isaacs. Embracing diversity in the Barrelfish manycore operating system. In Proceedings of the Workshop on Managed Many-Core Systems, 2008. [136] J. L. Shin, D. Huang, B. Petrick, C. Hwang, K. W. Tam, A. Smith, H. Pham, H. Li, T. Johnson, F. Schumacher, A. S. Leon, and A. Strong. A 40 nm 16- core 128-thread SPARC SoC processor. Solid-State Circuits, IEEE Journal of, 46(1):131–144, January 2011. [137] B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. In Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 298, pages 241–248, January 1981. [138] M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. In Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, ASPLOS ’09, pages 253–264, New York, NY, USA, 2009. ACM. [139] Sun/Oracle. Sun Fire X4470 Server Data Sheet. http://www.oracle.com/us/ products/servers-storage/servers/x86/sun-fire-x4470-ds-079894.pdf, 2010. (Retrieved January 2011). [140] Sun/Oracle. SPARC T3-4 server Data Sheet. http://www.oracle.com/ us/products/servers-storage/servers/sparc-enterprise/t-series/ sparc-t3-4-ds-173100.pdf, 2011. (Retrieved June 2011). [141] A. S. Tanenbaum. Modern Operating Systems, 2nd ed. Prentice-Hall, 2001. [142] A. S. Tanenbaum, H. E. Bal, and M. F. Kaashoek. Programming a distributed system using shared objects. In High Performance Distributed Computing, 1993., Proceedings the 2nd International Symposium on, pages 5–12, 20-23 1993. [143] A. S. Tanenbaum, J. N. Herder, and H. Bos. Can we make operating systems reliable and secure? Computer, 39(5):44–51, May 2006. [144] C. P. Thacker and L. C. Stewart. Firefly: a multiprocessor workstation. In Proceedings of the second international conference on Architectual support for programming languages and operating systems, ASPLOS-II, pages 164–172, Los Alamitos, CA, USA, 1987. IEEE Computer Society Press. [145] Thinking Machines Corporation, Cambridge, MA, USA. Connection Machine CM-5 Technical Summary, November 1993. [146] J. E. Thornton. Parallel operation in the control data 6600. In Proceedings of the October 27-29, 1964, fall joint computer conference, part II: very high speed computer systems, AFIPS ’64 (Fall, part II), pages 33–40, New York, NY, USA, 1965. ACM. [147] J. E. Thornton. The CDC 6600 project. Annals of the History of Computing, 2(4):338–348, October-December 1980. GENERAL BIBLIOGRAPHY 177

[148] R. M. Tomasulo. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development, 11(1):25–33, Jan. 1967.

[149] A. W. Topol, D. C. L. Tulipe, L. Shi, D. J. Frank, K. Bernstein, S. E. Steen, A. Kumar, G. U. Singco, A. M. Young, K. W. Guarini, and M. Ieong. Three- dimensional integrated circuits. IBM Journal of Research and Development, 50(4.5):491–506, July 2006.

[150] H. S. Tropp. FORTRAN anecdotes. Annals of the History of Computing, 6(1):59–64, January-March 1984.

[151] R. F. van der Wijngaart, T. G. Mattson, and W. Haas. Light-weight communi- cations on intel’s single-chip cloud computer processor. SIGOPS Oper. Syst. Rev., 45:73–83, February 2011.

[152] R. van Renesse, H. van Staveren, and A. S. Tanenbaum. Performance of the world’s fastest distributed operating system. SIGOPS Oper. Syst. Rev., 22:25–34, October 1988.

[153] G. van Rossum. AIL - a class-oriented RPC stub generator for Amoeba. In W. Schr¨oder-Preikschat and W. Zimmer, editors, Progress in Distributed Operat- ing Systems and Distributed Systems Management, volume 433 of Lecture Notes in Computer Science, pages 13–21. Springer Berlin / Heidelberg, 1990.

[154] S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y. Hoskote, N. Borkar, and S. Borkar. An 80-tile sub-100-W teraflops processor in 65-nm CMOS. Solid-State Circuits, IEEE Journal of, 43(1):29 –41, January 2008.

[155] T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser. Active mes- sages: a mechanism for integrated communication and computation. In ISCA ’92: Proceedings of the 19th annual international symposium on Computer architecture, pages 256–266, New York, NY, USA, 1992. ACM.

[156] J. Waldo, G. Wyant, A. Wollrath, and S. C. Kendall. A note on distributed computing. Technical Report TR-94-29, Sun Microsystems Research, November 1994.

[157] W.-D. Weber and A. Gupta. Exploring the benefits of multiple hardware con- texts in a multiprocessor architecture: preliminary results. In Proceedings of the 16th annual international symposium on Computer architecture, ISCA ’89, pages 273–280, New York, NY, USA, 1989. ACM.

[158] D. F. Wendel, R. Kalla, J. Warnock, R. Cargnoni, S. Chu, J. Clabes, D. Dreps, D. Hrusecky, J. Friedrich, S. Islam, J. Kahle, J. Leenstra, G. Mittal, J. Pare- des, J. Pille, P. J. Restle, B. Sinharoy, G. Smith, W. J. Starke, S. Taylor, J. Van Norstrand, S. Weitzel, P. G. Williams, and V. Zyuban. POWER7, a highly parallel, scalable multi-core high end server processor. Solid-State Circuits, IEEE Journal of, 46(1):145–161, January 2011. 178 GENERAL BIBLIOGRAPHY

[159] D. Wentzlaff and A. Agarwal. Factored operating systems (fos): the case for a scalable operating system for multicores. SIGOPS Oper. Syst. Rev., 43:76–85, April 2009. [160] D. Wentzlaff, C. Gruenwald, III, N. Beckmann, K. Modzelewski, A. Belay, L. Youseff, J. Miller, and A. Agarwal. An operating system for multicore and clouds: mechanisms and implementation. In Proceedings of the 1st ACM sym- posium on Cloud computing, SoCC ’10, pages 3–14, New York, NY, USA, 2010. ACM. [161] C. Whitby-Strevens. The transputer. In Proceedings of the 12th annual in- ternational symposium on Computer architecture, ISCA ’85, pages 292–300, Los Alamitos, CA, USA, 1985. IEEE Computer Society Press. [162] B. Widrow and S. D. Stearns. Adaptive Signal Processing. Prentice Hall, En- glewood Cliffs, NJ, 1985.

[163] E. Witchel, J. Cates, and K. Asanovi´c.Mondrian . In Pro- ceedings of the 10th international conference on Architectural support for program- ming languages and operating systems, ASPLOS-X, pages 304–316, New York, NY, USA, 2002. ACM. [164] A. Wollrath, R. Riggs, and J. Waldo. A distributed object model for the java system. In Proceedings of the 2nd conference on USENIX Conference on Object- Oriented Technologies (COOTS) - Volume 2, Berkeley, CA, USA, 1996. USENIX Association. [165] W. Wulf, E. Cohen, W. Corwin, A. Jones, R. Levin, C. Pierson, and F. Pollack. Hydra: the kernel of a multiprocessor operating system. Commun. ACM, 17:337– 345, June 1974.

[166] Xilinx UG080 (v2.5). ML401/ML402/ML403 Evaluation Platform, User Guide, May 2006. [167] Xilinx UG081 (v6.0). MicroBlaze Processor Reference Guide, June 2006. [168] X. Zhou, H. Chen, S. Luo, Y. Gao, S. Yan, W. Liu, B. Lewis, and B. Saha. A case for software managed coherence in many-core processors. In Proceedings of the 2nd USENIX conference on Hot topics in parallelism, HotPar’10, Berkeley, CA, USA, June 2010. USENIX Association.