A Quantitative Performance Evaluation of SCI Memory Hierarchies
Roberto A Hexsel
University of Edinburgh
1994 Abstract
The Scalable Coherent Interface (SCI) is an IEEE standard that defines a hard- ware platform for scalable shared-memory multiprocessors. SCI consists of three parts. The first is a set of physical interfaces that defines board sizes, wiring and network clock rates. The second is a communication protocol based on unidirec- tional point to point links. The third defines a cache coherence protocol based on a full directory that is distributed amongst the cache and memory modules. The cache controllers keep track of the copies of a given datum by maintaining them in a doubly linked list. SCI can scale up to 65520 nodes.
This dissertation contains a quantitative performance evaluation of an SCI- connected multiprocessor that assesses both the communication and cache coher- ence subsystems. The simulator is driven by reference streams generated as a by-product of the execution of "real" programs. The workload consists of three programs from the SPLASH suite and three parallel loops.
The simplest topology supported by SCI is the ring. It was found that, for the hardware and software simulated, the largest efficient ring size is between eight and sixteen nodes and that raw network bandwidth seen by processing elements is limited at about 80Mbytes/s. This is because the network saturates when link traffic reaches 600-700Mbytes/s. These levels of link traffic only occur for two poorly designed programs. The other four programs generate low traffic and their execution speed is not limited by interconnect nor cache coherence protocol. An analytical model of the multiprocessor is used to assess the cost of some frequently occurring cache coherence protocol operations. In order to build large systems, networks more sophisticated than rings must be used. The performance of SCI meshes and cubes is evaluated for systems of up to 64 nodes. As with rings, processor throughput is also limited by link traffic for the same two poorly designed programs. Cubes are 10-15% faster than meshes for programs that generate high levels of network traffic. Otherwise, the differences are negligible. No significant relationship between cache size and network dimensionality was found. I
Acknowledgements
This dissertation is one of the products of my living in Scotland. It was a period of much discovery, both technically and personally. On the personal side, I lived there long enough to become well acquainted with British culture and politics. Some of the good memories I will keep are from many hours spent with BBC Radio 4, BBC 1 and 2, Channel 4, The Edinburgh Filmhouse, The Cameo Cinema, The Queen's Hall, The Royal Sheakespeare Company. Through these media, I met the Bard and Handel, Inspectors Taggart and Frost, Jeremy Paxman and John Pilger, Marina Warner and Glenys Kinnock, Noam Chomsky and Edward Said, Dennis Skinner and Tony Benn, Prime Minister Question Time and Spitting Image, Peter Greenaway and a wealth of European cinema. Along with many others, these people, their work and the institutions they work for became an important element of my thinking. Being in exile is not easy but it can be an extremely enriching experience. It was for me.
Many people took part, directly or indirectly, in the work that is reported here. Some were related to me in a professional capacity, some helped by just being there. I would like to express my gratitude to them all.
The Coordenadoria de Aperfeiçoamento de Pessoal de NIvel Superior (CAPES), Ministério da Educação, Brazil, awarded the scholarship that made possible my coming to Britain.
I would like to thank the people at the Department of Computer Science for making easy my life and endeavours as a graduate student. J particular, I am grateful to Angela, Chris (cc), Eleanor, George, Jenny, John (jhb), Murray, Paul, Sam and Todd. I would also like to thank all the people who put up with my simulations hogging the compute servers and/or their workstations.
Dave Gustayson, the chairman of the IEEE-SCI working group, provided useful coments on a paper about SCI rings that evolved into Chapter 4. He kept insisting, thankfully, that SCI= rings is not true!
Nigel Topham was my supervisor and I am indebted to him for the opportunity to work on Computer Architecture. I am also indebted to Nigel for his guidance and support. Had I followed his advice more closely on a few occasions, much time and grief would have been saved. Stuart Anderson was my second supervisor and I thank him for being there during the mid-summer crises.
My parents offered much needed support and encouragement. Without their financial support in the later stages of this work, it would not have been completed. Table of Contents