Amdahl's law and parallelization of the FMLSQ program on the Intel Nehalem architecture
Abstract
This paper highlights a parallelization of the FMLSQ program, which allows full-matrix least-squares refinement of large macromolecular structures. The detailed elapsed time profiling of FMLSQ and analysis of its execution on two different Intel architectures has led to a dramatic speedup due to parallelization of all stages of the algorithm. Amdahl's law proved to be very useful during this analysis. It has been shown that processor memory bandwidth may be more important than raw processing power for parallel crystallographic calculations. The new parallelized version of the program has been tested on several protein structures at high resolution. Requirements for a computing architecture intended for full-matrix refinement are discussed in detail.