Using Link-Time Optimization to Improve the Performance of MPI Programs
Gregory Andrews,
Saumya Debray,
Benjamin Schwarz,
Matthew Legendre
Department of Computer Science
University of Arizona
Tucson, AZ 85721, U.S.A.
{greg, debray, bschwarz, legendre}@cs.arizona.edu
Abstract
This paper describes PLTO (``Pluto''), a link-time optimizer for Intel
Pentium processors.
PLTO automatically improves the performance of parallel
scientific programs that use the MPI library.
Input to PLTO is a binary program consisting of object code
for the application and for the MPI library.
PLTO produces a functionally equivalent binary program
that executes faster, even when the object code for the application
and for the library have already been highly optimized.
The main techniques that PLTO employs are specializing the
code in MPI library routines to account for exactly how they
are used by an application, inlining the code to avoid function
call overhead and to eliminate redundant loads and stores,
and laying out the code to improve instruction cache performance.
These techniques are general---they can be used for any application
and any communication library.