Another observation is that iterative methods, like e.g. multigrid, are characterized by successive sweeps over data sets, which - for representative problems in science and engineering - are much too large to fit in cache. Therefore, conservative implementations of such algorithms often reach disappointing execution speeds, which are far away from the theoretically available peak performances of the machines under consideration.
In this paper we present techniques in order to enhance the cache efficiency of multigrid methods for variable-coefficient problems on regular mesh structures. These techniques comprise both data layout optimizations and data access optimizations. Data layout optimizations are code transformation techniques which change the data arrangement in memory, e.g. array padding and the introduction of cache-aware data structures. In contrast, data access optimizations modify the order in which data are retrieved in the course of the computation. Loop blocking (tiling), for example, belongs to the class of data access optimizations.
A variety of performance results are provided, showing both profiling data and the speedups which can be achieved by applying these kinds of optimization techniques. It can be observed that in most cases speedup factors of 2-3 can be obtained.
For further details, we would like to refer the reader to the web page of our DiME project (Data-local iterative methods for the efficient solution of partial differential equations).