Volume 32, Issue 3 e4432
SPECIAL ISSUE PAPER

An efficient iterative graph data processing framework based on bulk synchronous parallel model

Chao Liu

Chao Liu

School of Computer Science, Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, China

Search for more papers by this author
Deze Zeng

Corresponding Author

Deze Zeng

School of Computer Science, Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, China

Deze Zeng, School of Computer Science, China University of Geosciences, No.388, Lumo Road, Wuhan 430074, China.

Email: [email protected]

Search for more papers by this author
Hong Yao

Hong Yao

School of Computer Science, Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, China

Search for more papers by this author
Xuesong Yan

Xuesong Yan

School of Computer Science, Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, China

Search for more papers by this author
Linchen Yu

Linchen Yu

School of Computer Science, Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, China

Search for more papers by this author
Zhangjie Fu

Zhangjie Fu

School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, China

Search for more papers by this author
First published: 22 January 2018
Citations: 5

Summary

Graph data processing has been widely applied in a variety of domains such as industry, science, social network, and so on. It therefore has stimulated many efforts devoted to this area. To embrace the fast development trend of big graph data, graph data processing based on Pregel-like systems has been regarded as one of the most promising ways and has widely attracted the attention of researchers. However, it still remains in its early stage and there still exist many challenges. In Pregel, the superstep synchronization is time consuming as the graph data iteration operation requires multiple synchronizations. Furthermore, the graph data partition strategy adopted by Pregel fails to support load balancing, therefore causing the increase of network I/O overhead as the scale of graph data grows. To address these issues, this paper presents an efficient computational framework for graph data processing based on the bulk synchronous parallel model. The global synchronization control mechanism is improved by determining the start time of the next round of superstep through counting the number of global message files. Furthermore, an improved graph data partition mechanism based on a balanced hash method is proposed to reduce the communication overhead between different partitions of sub-graph computational tasks. We also re-design the PageRank algorithm to verify the effectiveness of the proposed framework. Experimental results on different real-world datasets verify the efficiency of our proposed framework as it outperforms Giraph (an open source Pregel-like system) by 58%−69%, and achieves 10×−17× performance improvement over Hadoop.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.