slideshow 1

You are here

AutoTune Project

Project context and goals: High Performance Computing is a key enabling factor for research and development. It is based on parallel platforms ranging from high-end workstations and servers to large-scale supercomputers. These systems leverage multi- and manycore processors to reach high execution rates. Heterogeneous systems are getting more common due to advancements in accelerator technology and dynamic variation in resource capabilities, e.g., Dynamic Voltage and Frequency Scaling (DVFS).

The heterogeneous nature of current and future HPC systems mandates the use of a variety of programming paradigms, such as MPI, OpenMP, PGAS, CUDA, and OpenCL, which are often combined within a single program. Higher-level approaches have recently become available that facilitate programming of accelerators via directive-based automatic code generation, e.g., HMPP and OpenACC.

Due to the increasing complexity of parallel architectures for HPC, it is extremely difficult to develop programs exploiting the full capability of the hardware. Application developers have to go through a time-consuming program tuning process after the program was written and debugged. Thus, the whole development process is time consuming and cumbersome and unveils a huge productivity gap.

Program tuning covers many different tuning parameters, e.g., core pipeline utilization, cache optimization, data distribution, idle time reduction in message passing, load balancing, compiler flag selection etc. In addition to tuning applications for performance, energy reduction is getting more and more important in the context of rising energy prices and the pace towards exascale systems. Additionally, since many tuning actions are input data dependent, they need to be verified for different data sets, requiring a large number of experiments.

It is the goal of the AutoTune project that started in October 2011 to develop an extensible tuning environment that automates the tuning process of applications. The framework is called Periscope Tuning Framework (PTF) and will focus on static tuning, i.e., it will identify tuning recommendations in special application tuning runs. These tuning recommendations can then be applied to optimize the code for later production runs.

Approach

The project builds on the results of many research efforts in performance analysis and automatic application tuning. PTF is unique in providing an extensible framework that combines state-of-the-art analysis and tuning techniques and integrates performance tuning with energy optimization.

PTF includes automated performance analysis strategies for various different parallel programming models. The strategies are based on formalized performance properties specifying typical performance bottlenecks, the metrics required for their detection, as well as the severity of the bottlenecks.

In the center of PTF are so called tuning plugins that focus on individual tuning aspects. A plugin explores a tuning space, i.e, the cross product of tuning parameters relevant for the tuning aspect. Since the tuning space is typically quite large, plugins can run performance analysis strategies first and use codified expert knowledge to shrink the tuning space based on the resulting performance properties. The remaining space will be searched by predefined or plugin-specific search strategies. Plugins will be loaded dynamically. This allows developing plugins as open source modules or proprietary plugins. The implementation of PTF is based on Periscope, a distributed online performance analysis tool that is under development at Technische Universität München (TUM).

Figure 1 outlines PTF’s architecture. The PTF frontend executes the tuning plugins and controls the overall tuning process. It starts the parallel application and a hierarchy of agent processes. The number of agents depends on the size of the application and thus, PTF can tune applications on large scale HPC systems. The leave agents communicate at runtime with the monitor, which is linked to the application processes. They can request the measurement of runtime metrics, control the execution of the application, and request the execution of tuning actions that modify certain application parameters at runtime. The framework provides also an interactive user interface based on Eclipse. It allows inspecting performance properties and tuning actions by linking them to the application’s source code.

The user interface is currently being extended in another project, LMAC to provide tuning workflows supporting the management of experiments with different input data sets

PTF applies an online tuning approach that can consist of many performance analysis steps and tuning experiments during a single execution of the application. This is realized by exploiting the iterative nature of many HPC applications. The whole execution repeats the same algorithm many times, for example by simulating discrete time steps. During each time step, called a Phase in PTF, a certain analysis step or an evaluation experiment can be executed. If the phase structure is unknown to PTF or the application terminates before the tuning is finished, PTF will automatically restart the application.

Figure 1: Periscope Tuning Framework (PTF)

 

Rich framework supporting the design and implementation of tuning plugins

PTF provides a rich framework facilitating the implementation of tuning plugins. The entire set of performance analysis strategies of Periscope is available to gather performance information. Furthermore, Periscope provides static program information, specifying the program’s region structure and static properties of those regions.

It provides predefined search algorithms that can be used in the plugins to generate scenarios that are experimentally evaluated. The execution of the experiments is fully automated and supports the parallel evaluation of scenarios.

Objective functions can be specified on a high level based on runtime metrics that are automatically measured by the underlying monitoring system. The monitoring system implements runtime tuning actions that can be used to set tuning parameters to certain tuning values for individual code regions. PTF supports variable and function runtime tuning actions that assign the tuning value to a program variable or executed a function with the tuning value as an argument.

Within the AutoTune project a number of tuning plugins will be developed. These plugins will allow to tune the selection of compiler flags and the selection of MPI runtime parameters; to optimize code generation for HMPP codes and the execution of high-level parallel programming patterns for heterogeneous architectures; and to reduce the energy to solution of applications based on DVFS.

The development of the tuning plugins and their evaluation will be based on the AutoTune application repository that includes standard HPC benchmarks and entire applications. The tuning techniques were manually applied at the beginning of the project and the automatically achieved results will be compared to those.

Expected impact

AutoTune will enhance the international recognition of all participating organizations and facilitate the networking of academic and industrial participating organizations. The scientific impact will include but not be limited to extensions of European performance analysis tools towards automatic tuning. The involved universities will include the gained knowledge into their curricula. Thus future engineers will be better trained, which is an important asset for the European industry.

The industrial partner CAPS will exploit AutoTune results by improving its current and future product tool offering enhancing it with new code optimization techniques that will help to provide users with more portable performance and more automatic adaptation to new hardware target architectures.

The big supercomputing centers such as Leibniz Computing Centre will exploit the energy saving results leading directly to financial benefit for the operation cost of its high-end systems. Due to the utmost importance of energy savings not only for HPC but also for high-end servers, the project will have a big societal impact. This reduction of energy consumption will also significantly contribute to a greener world. In addition, the European economy will profit by strengthening companies working on desktop accelerated systems as well as those using those systems in their business.

List of major results achieved in the first project period

  • Development of the tuning model specifying the overall approach and the terminology
  • Design of the PTF architecture detailing the integration of automatic tuning into Periscope
  • Implementation of performance analysis support for GPGPUs and high-level parallel patterns in the PTF monitor
  • Implementation of energy monitoring for SuperMUC
  • Development of designs for the following tuning plugins:
    • Parallel Pattern Tuning Plugin
    • HMPP Codelet Tuning Plugin
    • CPU Frequency Tuning Plugin
    • Master-Worker MPI Plugin
    • MPI Runtime Plugin
    • Compiler Flag Selection Plugin
  • Implementation of the base classes for tuning plugins and search algorithms
  • Implementation of a demonstrator plugin based on user-level tuning points
  • Demonstration of the PTF approach based on the demonstrator plugin and HMPP codelet tuning.
  • Realization of the AutoTune application repository
  • Manual tuning of applications from the application repository demonstrating the potential of the tuning plugins.
  • Joint publication on AutoTune at PARA’12; poster presentations at GTC 2012 and the SC 2012 exhibition; workshop and conference presentations including VI-HPS tuning workshops

Acknowledgement. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n°248481.

Work packages

 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer