Processors that can do 20+ GFLOPS per Watt - 2012-08-27 by vincent - StreamComputing - http://www.streamcomputing.eu

Processors that can do 20+ GFLOPS per Watt by vincent – Monday, 27 August 2012 http://www.streamcomputing.eu/blog/2012-08-27/processors-that-can-do-20-gflops-watt/

System for communicating power-efficiency of new equipment. “A” being best, “F” being worst. 2011-A is incomparable with 2012-A.

For yearly power-usage there is a rule-of-thumb which states that a device that is continuously on, costs the amount of Watt times 1.5 in Euro per year. So the computer in front of me, that takes around 107 Watt, costs me €160 a year if I would leave it on. A moderate cluster with several GPUs of a few hundred Watts each, would cost a few thousand Euros a year. I would say: very doable for most companies.

So why is the performance per Watt? There is more to a Watt than just the costs. The energy to cool a cluster is quite high, as most of the energy escapes via heat. And then there is the increase in demand for portable power. In cases you are thinking of sweeping you credit card for a top 10 , then these energy-costs are extremely high.

In this article I try to get an overview of who is entering the 20+ GFLOPS/Watt area. All processors that do less than 20 GFLOPS/Watt, need to have other advantages to survive. And you’ll see that all the green processors are programmed with OpenCL, the technology StreamComputing is all about.

IMPORTANT: The total power used is sometimes including and sometimes excluding memory-transfers. So the comparison below is not fair. The graphics cards are including memory-transfers, while the CPUs and SoCs are not.

The list

Understand that since I mix CPUs, GPUs and SoCs (= CPU+GPU) the list is really only an indication of

page 1 / 9 Processors that can do 20+ GFLOPS per Watt - 2012-08-27 by vincent - StreamComputing - http://www.streamcomputing.eu

what is possible. Also a computer is built up of more energy-consuming parts than just the processors: interconnects, memory, harddrives, etc.

Disclaimer: The below list is incomplete and based on theoretical values. TDP is assumed to be consumed when is working at maximum performance. Actual FLOPS/Watt values can be much lower, depending on many factors. If you want to buy hardware specifically for the purpose of highest FLOPS/Watt have your software tested on the device.

Processor Type GFLOPS GFLOPS Watt (TDP) GFLOPS/Watt FLOPS/Watt (32bit) (64bit) (32bit) (64bit) Epiphany 100 N/A 2 50 N/A Epiphany-IV Movidius ARM SoC: 15.28 N/A 0.32 48 N/A Myriad LEON3+SHA VE ZiiLabs ARM SoC 58 N/A ? 20? N/A Tesla GPU 4577 190 225 20.34 ? K10 ARM + MALI ARM SoC 8 + 68 N/A 4? 19? N/A T604 NVidia GTX X86 GPU x 2 5621 234? 300 18.74 0.78 690 GeForce GTX X86 GPU 3090 128 195 15.85 0.65 680 AMD X86 GPU 4300 1075 300+ 14.3 3.58 HD 7970 GHz Knight’s ? 2000? 1000 200? 10? 5? Corner ( Phi) AMD X86 SoC 121 + 614 ? 100 7.35 ? A10-5800K + HD 7660D X86 SoC 225 + 294,4 112 + 73.6 77 6.74 2.41 i7-3770 + HD4000 NVIDIA ARM + GPU ? + 200 ? 40 5.00 ? CARMA (complete board) IBM Power Power CPU 204? 204 55 3.72? 3.72 A2 Intel Core X86 CPU 225 112 ? ? ? i7-3770 AMD X86 CPU 121 60? ? ? ? A10-5800K The list contains recent and general available processors, but I will add any processor you want to see in

page 2 / 9 Processors that can do 20+ GFLOPS per Watt - 2012-08-27 by vincent - StreamComputing - http://www.streamcomputing.eu

the list – just request them in a comment.

Please also point me to sources where official data can be found on these processors, as it seems to be top-secret data. As not all the data was available, I had to make some guesses.

Below you find a graph of the list, including architectures grouped by GFLOPS + GFLOPS/Watt.

GFLOPS/Watt for 32-bit. Red: CPUs, orange: APUs, yellow: GPUs, light-blue: ARM, green: grid-processors, not circled: Phi. The upper-right area is where we need to go.

Below is a maybe more interesting view: Watt/GFLOPS. This projection has the advantage that low-power processors (< 2Watt) don’t get overrated and are closer together.

page 3 / 9 Processors that can do 20+ GFLOPS per Watt - 2012-08-27 by vincent - StreamComputing - http://www.streamcomputing.eu

Watt/GFLOPS (lower is better) vs GFLOPS, excluding the CPUs. You see the Radeons doing best if it comes to performance and Watt/GFLOPS. The left-upper area is where we need to go.

CPU vs GPU

Let’s be clear:

1. A GPU needs a CPU as a host.

page 4 / 9 Processors that can do 20+ GFLOPS per Watt - 2012-08-27 by vincent - StreamComputing - http://www.streamcomputing.eu

2. A GPU is great in vector-computations, a CPU much better in scalar computations.

In other words, a mix between a scalar and a is best. But once a problem can be defined as a vector-problem, the GPU is much, much faster than a CPU.

64 bit vs 32 bit

As the memory-usage is energy-consuming and results in half the number of data showing up at the processor, we have two reasons why more energy is consumed. Due to architecture-differences, CPUs have a penalty for 32 bit and GPUs a penalty for 64 bit.

Notice that most X86-alternatives have no 64 bit support, or just recently started with it. GPUs crunch double precision numbers at a fourth or less of the 32-bit performance-roof.

Architectures

ARM, X86/X87, Power and Epiphany all have different architecture-choices to get their targeted trade-off between precision, power-consumption and performance-optimisation (). These choices make it sometimes impossible to get with the pace of other architectures in a certain direction.

Current winner: Adapteva Epiphany

Their 64-core Epiphany-IV is programmable with OpenCL and the 50 GFLOPS/Watt makes it worth to put time in porting software if you need a portable device. People who have ported their software to OpenCL already have an advantage here. Adapteva even claims 72 GFLOPS/Watt, as you can read here. With a 100-core CPU coming up, they will probably even raise the bar.

X86 CPUs have the advantage of precision and legacy code, of which precision is the biggest advantage. As X86 GPUs (with Nvidia on top) have a great performance/Watt entering the 20+ GFLOPS/Watt, this could be very interesting for defending the X86 market against ARM.

ARM-processors have a lot of software written for it (via Android) and is very flexible in design, while keeping power-usage for the CPU-part around 1Watt. For instance ZiiLabs’ processor can be compared to the design of Adapteva, but then with an ARM-CPU attached to it.

Conclusion

There is much more than just this number of GFLOPS/Watt, and which architecture will be mainstream architecture in a few years one can only speculate on. Luckily recompiling for other architectures is getting easier with compiler-technologies such as LLVM, so we don’t need to worry too much. Except to redesign our software for multi-core of course. You have read above that new architectures are programmed with OpenCL. It is better to invest in this technology now than later.

page 5 / 9 Processors that can do 20+ GFLOPS per Watt - 2012-08-27 by vincent - StreamComputing - http://www.streamcomputing.eu

More reading

As memory-access takes energy, minimising memory-calls can lower consumption. This article on the ARM blog explains how this is done with MALI GPUs.

The Mont Blanc project is a supercomputer based on ARM. This 12 page PDF shows some numbers and specifications of this supercomputer.

As eat lots of power, The Green 500 tries to stimulate to build greener HPC.

Also check out these posts

The entanglement of Bitcoins and compute-capa…

The OpenCL power: offloading to the CPU (AVX+…

AMD positions FirePro S10000 against both TES…

page 6 / 9 Processors that can do 20+ GFLOPS per Watt - 2012-08-27 by vincent - StreamComputing - http://www.streamcomputing.eu

page 7 / 9 Processors that can do 20+ GFLOPS per Watt - 2012-08-27 by vincent - StreamComputing - http://www.streamcomputing.eu

Intel’s answer to AMD and NVIDIA: the XEON Ph…

______

Also check out these posts

The entanglement of Bitcoins and compute-capa...

The OpenCL power: offloading to the CPU (AVX+...

AMD positions FirePro S10000 against both TES...

page 8 / 9 Processors that can do 20+ GFLOPS per Watt - 2012-08-27 by vincent - StreamComputing - http://www.streamcomputing.eu

Intel's answer to AMD and NVIDIA: the XEON Ph...

page 9 / 9

Powered by TCPDF (www.tcpdf.org)