Volume 37, Issue 10 pp. 7502-7525
RESEARCH ARTICLE

Measuring and sampling: A metric-guided subgraph learning framework for graph neural network

Jiyang Bai

Jiyang Bai

Department of Computer Science, Florida State University, Tallahassee, Florida, USA

Search for more papers by this author
Yuxiang Ren

Corresponding Author

Yuxiang Ren

IFM Lab, Department of Computer Science, Florida State University, Tallahassee, Florida, USA

Correspondence Yuxiang Ren, Department of Computer Science, Florida State University, Tallahassee, FL 32306, USA.

Email: [email protected]

Search for more papers by this author
Jiawei Zhang

Jiawei Zhang

IFM Lab, Department of Computer Science, University of California, Davis, Davis, California, USA

Search for more papers by this author
First published: 28 April 2022
Citations: 2

Jiyang Bai and Yuxiang Ren should be considered joint first author.

Abstract

Graph neural networks (GNNs) have shown convincing performance in learning powerful node representations that preserve both node attributes and graph structural information. However, many GNNs encounter problems in effectiveness and efficiency when they are designed with a deeper network structure or handle large-sized graphs. Several sampling algorithms have been proposed for improving and accelerating the training of GNNs, yet they ignore understanding the source of GNNs performance gain. The measurement of information within graph data can help the sampling algorithms to keep high-value information while removing redundant information and even noise. In this paper, we propose a Metric-Guided (MeGuide) subgraph learning framework for GNNs. MeGuide employs two novel metrics: Feature Smoothness and Connection Failure Distance to guide the subgraph sampling and mini-batch based training. Feature Smoothness is designed for analyzing the feature of nodes to retain the most valuable information, while Connection Failure Distance can measure the structural information to control the size of subgraphs. We demonstrate the effectiveness and efficiency of MeGuide in training various GNNs on multiple data sets.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are openly available at https://github.com/tkipf/gcn/tree/master/gcn/data, and https://github.com/GraphSAINT/GraphSAINT, reference number.14, 19, 43

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.