Understanding SPARC Processor Performance

Understanding SPARC Processor Performance

Understanding SPARC Processor Performance MAY 15 & 16, 2019 CLEVELAND PUBLIC AUDITORIUM, CLEVELAND, OHIO WWW.NEOOUG.ORG/GLOC About the Speaker • Akiva Lichtner • Physics background • Twenty years experience in IT • Enterprise production support analyst • Java developer • Oracle query plan manager … • Spoke here at G.L.O.C. about TDD and Java dynamic tracing Audience • Developers • System administrators • Tech support analysts • IT managers Motivation • I have been working in tech support for a large application • We have run SPARC T4 servers and now we run T7 servers • Application servers, database servers • Environments are all different • Users complained for years about “environment X” being slow, finally figured out why • What I learned can be very useful for users of SPARC servers What is SPARC? • First released in 1987, created by Sun Microsystems to replace the Motorola 68000 in its workstation products • During the .com boom Solaris/SPARC and Windows/Intel were the only supported platforms for the JVM • In 2000 the bubble burst and Sun server sales plunged • Sun acquired Afara Websystems, which had built an interesting new processor, and renamed it the UltraSPARC T1 • Was followed by T2 through M8, evolutions of the same design • More recently Oracle has added significant new functionality Processor Design • High core count (even in the early days) • Many threads per core • “Barrel” processor • Designed to switch efficiently • Non-uniform memory access • Per-processor shared cache • Core-level shared cache A picture speaks a thousand words … What happens if my threads run on the same core? • Core runs one thread at a time • Context switch is instantaneous • Ideally memory latency hides context switch entirely • However the caches have a limited size • Performance will depend on cache utilization • Are the threads on the same core “related” or not? • I had a CPU-intensive application that would slow down by a factor of five at random times, in the presence of another CPU- intensive application – coincidence? • To Solaris one core looks like 8 virtual processors What performance should I expect in general? • It all depends • Common for a Java application to have hundreds of threads • These servers often consolidate independent applications • By default, Solaris uses all the virtual processors • Not a massive SMP machine • Not a traditional CC-NUMA machine Some experiments to illustrate • One Java process, N threads • Random reads/writes on one array of data • Used pbind to influence choice of virtual processors One core 1 core 20 15 10 5 0 7 5 3 1 0-5 5-10 10-15 15-20 Two cores 2 non-adjacent cores 60 2 adjacent cores 50 40 30 20 20 15 10 13 10 0 5 9 13 0 9 5 4 16 5 64 1 256 1 1024 4096 16384 65536 262144 0-10 10-20 20-30 30-40 40-50 50-60 0-5 5-10 10-15 15-20 Four cores 4 non-adjacent cores 4 adjacent cores 100 80 60 60 40 40 20 20 0 0 1 4 1 4 4 9 5 7 64 13 32 10 13 21 17 1024 256 16 19 29 25 16384 2048 22 25 28 262144 16384 31 4194304 131072 1048576 8388608 0-20 20-40 40-60 0-20 20-40 40-60 60-80 80-100 Multi-processor server • For several years I had a problematic server • I finally noticed that it had two processors • Applications were mostly idle, but load average was high • I think thread migration overhead was inflating the run queue • A simple Java memory allocation test was 60% slower than on a single-processor server • Once a single-processor server was installed, the problem disappeared Choose application software carefully • Well-designed concurrent applications • Learn good concurrent programming • Streaming applications – Each core has a Data Analytics Acceleration (DAX) pipeline – compare / filter / translate / compress – Oracle Database, Apache Spark query acceleration – Java Streams API • Neural networks? Solaris Resource Management • Also known as Solaris Zones • Not virtualization, still just one server • Solaris “projects” • Assign application processes to projects • Create (virtual) processor sets • Allocate virtual processors to processor sets Solaris VM Server for SPARC • Also known as Logical Domains • This is virtualization • Hypervisor • Multiple instances of Solaris on the same server • Allocate processors to different VMs Dynamic Hardware Domains • Hardware feature in M-Series servers • Can add/delete/replace processor boards to domains Take-aways • There’s a lot to a modern SPARC processor • This is not an SMP, not a CC-NUMA • Complex, hierarchical structure • With great power comes great responsibility • If performance is a problem, consider your workload sharing • Keep different applications on different cores • Try to keep related apps on adjacent cores • VMs and Zones are not nice to have, you need them.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    19 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us