Volume 27, Issue 6 pp. 1575-1590
Special Issue Paper

Asymmetric communication models for resource-constrained hierarchical ethernet networks

Jun Zhu

Corresponding Author

Jun Zhu

Technical University of Eindhoven, Eindhoven, The Netherlands

Correspondence to: Jun Zhu, Technical University of Eindhoven, Eindhoven, The Netherlands.

E-mail: [email protected]

Search for more papers by this author
Alexey Lastovetsky

Alexey Lastovetsky

University College Dublin, Dublin, Ireland

Search for more papers by this author
Shoukat Ali

Shoukat Ali

Dublin Research Laboratory, IBM, Dublin, Ireland

Search for more papers by this author
Rolf Riesen

Rolf Riesen

Dublin Research Laboratory, IBM, Dublin, Ireland

Search for more papers by this author
Khalid Hasanov

Khalid Hasanov

University College Dublin, Dublin, Ireland

Search for more papers by this author
First published: 30 July 2014
Citations: 2

Summary

Communication time prediction is critical for parallel application performance tuning, especially for the rapidly growing field of data-intensive applications. However, making such predictions accurately is non-trivial when contention exists on different components in hierarchical networks. In this article, we derive an ‘asymmetric network property’ on transmission control protocol (TCP) layer for concurrent bidirectional communications in a commercial off-the-shelf (COTS) cluster and develop a communication model as the first effort to characterize the communication times on hierarchical Ethernet networks with contentions on both network interface card and backbone cable levels. We develop a micro-benchmark for a set of simultaneous point-to-point message-passing interface (MPI) operations on a parametrized network topology and use it to validate our model extensively and show that the model can be used to predict the communication times for simultaneous MPI operations (both point-to-point and collective communications) on resource-constrained networks effectively. We show that if the asymmetric network property is excluded from the model, the communication time predictions will be significantly less accurate than those made by using the asymmetric network property. In addition, we validate the model on a cluster of Grid5000 infrastructure, which is a more loosely coupled platform. As such, we advocate the potential to integrate this model in performance analysis for data-intensive parallel applications. Our observation of the performance degradation caused by the asymmetric network property suggests that some part of the software stack below TCP layer in COTS clusters needs targeted tuning, which has not yet attracted any attention in literature. Copyright © 2014 John Wiley & Sons, Ltd.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.