Javelin++: scalability issues in global computing
Abstract
Javelin is a Java-based infrastructure for global computing. This paper presents Javelin++, an extension of Javelin, intended to support a much larger set of computational hosts. Contributions to scalability and fault tolerance are presented. This is the focus of the paper. Two scheduling schemes are presented: probabilistic work stealing and deterministic work stealing. The distributed deterministic work stealing is integrated with a distributed deterministic eager scheduler, which is one of the paper's primary original contributions. An additional fault tolerance mechanism is implemented for replacing hosts that have failed or retreated. A Javelin++ API is sketched, then illustrated on a raytracing application. Performance results for the two schedulers are reported, indicating that Javelin++, with its broker network, scales better than the original Javelin. Copyright © 2000 John Wiley & Sons, Ltd.