Volume 32, Issue 3 e5158
SPECIAL ISSUE PAPER

Tail queues: A multi-threaded matching architecture

Matthew G.F. Dosanjh

Corresponding Author

Matthew G.F. Dosanjh

Center for Computing Research, Sandia National Laboratories, Albuquerque, New Mexico

Department of Computer Science, University of New Mexico, Albuquerque, New Mexico

Matthew G.F. Dosanjh, Center for Computing Research, Sandia National Laboratories, Albuquerque, NM; or Department of Computer Science, University of New Mexico, Albuquerque, NM.

Email: [email protected]

Search for more papers by this author
Ryan E. Grant

Ryan E. Grant

Center for Computing Research, Sandia National Laboratories, Albuquerque, New Mexico

Department of Computer Science, University of New Mexico, Albuquerque, New Mexico

Search for more papers by this author
Whit Schonbein

Whit Schonbein

Center for Computing Research, Sandia National Laboratories, Albuquerque, New Mexico

Department of Computer Science, University of New Mexico, Albuquerque, New Mexico

Search for more papers by this author
Patrick G. Bridges

Patrick G. Bridges

Department of Computer Science, University of New Mexico, Albuquerque, New Mexico

Search for more papers by this author
First published: 06 February 2019
Citations: 5
Sandia National Laboratories is a multimission laboratory managed and operated by the National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Summary

As we approach exascale, computational parallelism will have to drastically increase in order to meet throughput targets. Many-core architectures have exacerbated this problem by trading reduced clock speeds, core complexity, and computation throughput for increasing parallelism. This presents two major challenges for communication libraries such as MPI: the library must leverage the performance advantages of thread level parallelism and avoid the scalability problems associated with increasing the number of processes to that scale. Hybrid programming models, such as MPI+X, have been proposed to address these challenges. MPI THREAD MULTIPLE is MPI's thread safe mode. While there has been work to optimize it, it largely remains non-performant in most implementations. While current applications avoid MPI multithreading due to performance concerns, it is expected to be utilized in future applications. One of the major synchronous data structures required by MPI is the matching engine. In this paper, we present a parallel matching algorithm that can improve MPI matching for multithreaded applications. We then perform a feasibility study to demonstrate the performance benefit of the technique.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.