Rethinking HPC Benchmarks with a Focus on Applications and Power Efficiency

From left to right: HPC scientists Costas Bekas, IBM;
Pavel Klavik,IBM and Charles University in Prague;
Yves Ineichen, IBM and Cristiano Malossi, IBM

Today with the release of the latest TOP500 supercomputing list we see the largest and most powerful high performance computers in the world. And while it is impressive, it’s quickly becoming an antiquated measuring stick for several reasons including how to measure workload and power efficiency.

Since 1993 the TOP500 list has been published to portray an exponential increase in performance, due to Moore’s law, but in the last 10 years, the core technology has dramatically changed with chip manufacturers turning to a recipe of multi-core chips and parallelism to keep the effect of Moore’s Law alive.

In doing so a new set of critical issues and questions have arisen about the ability to program such machines and how to deal with the very fast increase in power requirements.

As the industry moves towards more data-centric computing workloads IBM scientists believe that the answer to both questions are closely connected and have recently published a paper in the Proceedings of The Royal Society A on this topic. Some of the paper's authors answered a few questions about their motivation.

Your paper mentions the shortcomings of the FLOPs performance metric. What are they?

Costas Bekas (@CostasBekas): The main shortcoming of the FLOPs metric is that it concentrates on doing as many computations as possible on a given piece of hardware in the unit of time. This is not about trying to solve the problem quickly, but essentially measuring how many FLOPs we can squeeze out of the machine. This metric stems from the days when computing was very expensive — so the thinking was you better use every last transistor.

But today with the cloud and massive high performance computing (HPC) machines the equation has flipped and energy is a significant cost factor. Today, the real issues are the results. Do you get the results fast and how accurate are they?

So the question clients need to ask is, do you care that you run a machine to its full "nominal" capacity or that you get the results back as fast as possible and at the lowest possible cost? The answer lies in both hardware improvements, such as water-cooled design and photonics, combined with advanced algorithms.

What should a new benchmark look like for measuring high performance computers?

Cristiano Malossi: The biggest challenge will be to find a metric which is accepted by everyone. From what I hear at various conferences is that the industry is in agreement that the LINPACK benchmark isn’t sufficient anymore. The GRAPH500 is a good start, but we need to take it a step further. I see the need to have more than one metric, in fact 13 or one for each kernel, which are then weighed, scored and averaged. This is the bottom-up approach popularized by Phil Collela's (or the Berkeley) dwarfs.

Costas: The big drivers are the HPC applications. The nature of applications has changed and there are hundreds. In the past the focus was numerical linear algebra, but today applications span bioinformatics, finance, traditional engineering, and data analytics. A new benchmark needs to account for speed to result and energy efficiency to result.

Besides providing a ranking tool for marketing purposes, do benchmarks also help the development of HPC?

Costas: Absolutely, the rankings provide scientists and engineers with a guide on designing the future systems. For instance, here in Zurich we worked at a very early stage with the prototype of the IBM BlueGene/Q. The goal was to incorporate in the design loop as soon as possible feedback from application developers and algorithm specialists. So a benchmark is more than just touting first place, it helps to close the circle between developers and users.

We shouldn’t forget that benchmarks are also important for politics and financial officials. These machines require budget after all and when using tax payer funding you need to demonstrate results and coming in the top of a benchmark is one way to do this. But a better way to demonstrate the ROI of a system is to solve a problem quickly and efficiently — designing a new drug, simulating a city to make it run more efficiently or engineering a safer product. Ask the average citizen if they prefer a blue ribbon for first place or a city with less traffic and the answer is obvious.

Your paper proposes a new benchmark based on application, time to result and energy efficiency. How can HPC users begin testing your benchmark on their systems?

Costas: In this publication and our previous paper are the starting points. Together with my colleague, IBM Fellow Alessandro Curioni (@Ale_Curioni) we have been evangelizing the need to move away from using the LINPACK benchmark alone for several years. This paper provides a framework and tools for accurate, on-chip and on-line power measurements in the context of energy-aware performance. The tools described the easy implementation of user code for the detailed power profiling. Beyond just the benchmark also describe a patented framework that enables users to combine a bulk of relaxed accuracy, non-reliable, calculations together with a small fraction of accurate (reliable) calculations in order to achieve full final accuracy. The result is up to 2 orders of reduction in energy to solution. They key is to change the computing paradigm.

The key takeaway is the following. Within 10 years we will have HPCs running billions of threads. For comparison, our world record work last year, which led to the Gordon Bell Prize (together with Alessandro and colleagues from ETH Zurich, Technical University of Munich and the Lawrence Livermore National Lab) was using 6.4 million threads. So with a billion you have tremendous scale, therefore you need to develop algorithms which can recognize failure, resolve failure, but also live with it for a reliable result.

Where does academia fit into this new benchmark? Clearly we need to teach this to the next generation of HPC users.

Costas: Indeed, and we are seeing signs of progress. For example, Pavel Klavik, one of the authors of this paper was an intern at our lab from the Charles University in Prague. He joined us after winning our annual IBM Great Minds Competition and he obviously understands where this trend is going.

Cristiano: These concepts are starting to be integrated into part of the academic curricula. In fact, we are participating in an EU project with several universities called Exa2Green with the goal of creating energy-aware sustainable computing for future exascale systems. This will help trickle down these ideas to the classroom.

Yves Ineichen: I think we are in a transition period as the topic begins to reach the lecture halls. Part of the challenge is that many universities don’t have access to these HPC systems at the level required to actually run these algorithms and testing, but with collaborations this will change. But first we need a consensus from the HPC industry.

Labels: bluegene, HPC, IBM Research - Zurich, supercomputer