Volume 41, Issue 5 e13184
ORIGINAL ARTICLE

Classification of open source software bug report based on transfer learning

Liao Zhifang

Liao Zhifang

School of Computer Science and Engineering, Central South University, Changsha, China

Search for more papers by this author
Wang Kun

Wang Kun

School of Computer Science and Engineering, Central South University, Changsha, China

Search for more papers by this author
Zeng Qi

Zeng Qi

School of Computer Science and Engineering, Central South University, Changsha, China

Search for more papers by this author
Liu Shengzong

Corresponding Author

Liu Shengzong

School of Information Technology and Management, Hunan University of Finance and Economics, Changsha, China

Correspondence

Liu Shengzong, School of Information Technology and Management, Hunan University of Finance and Economics, No. 139, Fenglin 2nd Road, Yuelu District, Changsha, China.

Email: [email protected]

He Jianbiao, School of Computer Science and Engineering, Central South University, No. 932, Lushan South Road, Yuelu District, Changsha, China.

Email: [email protected]

Search for more papers by this author
Zhang Yan

Zhang Yan

School of Engineering and Built Environment, Glasgow Caledonian University, Glasgow, UK

Search for more papers by this author
He Jianbiao

Corresponding Author

He Jianbiao

School of Computer Science and Engineering, Central South University, Changsha, China

Correspondence

Liu Shengzong, School of Information Technology and Management, Hunan University of Finance and Economics, No. 139, Fenglin 2nd Road, Yuelu District, Changsha, China.

Email: [email protected]

He Jianbiao, School of Computer Science and Engineering, Central South University, No. 932, Lushan South Road, Yuelu District, Changsha, China.

Email: [email protected]

Search for more papers by this author
First published: 21 November 2022
Citations: 4

Abstract

Currently, the feature richness of text encoding vectors in the bug report classification model based on deep learning is limited by the size of the domain dataset and the quality of the text. However, it is difficult to further enrich the features of text encoding vectors. At the same time, most existing bug report classification methods ignore the submitter's personal information. To solve these problems, we construct nine personal information characteristics of bug report submitters in GitHub by survey. Then, we propose a GitHub bug report classification method named personal information fine-tuning network (PIFTNet) based on transfer learning and the submitter's personal information. PIFTNet transfers the general text feature vectors in bidirectional encoder representation from transformers (BERT) to the domain of bug report classification by fine-tuning the pre-training parameters in BERT. It also combines the text characteristics and the characteristics of the submitter's personal information to construct the classification model. In addition, we propose a two-stage training method to alleviate the catastrophic changes in the pre-training parameters and loss of the initially learned knowledge caused by direct training of PIFTNet. We verify the proposed PIFTNet on the dataset extracted from GitHub and empirical results prove the effectiveness of PIFTNet.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available in GHTorrent at https://ghtorrent.org/. These data were derived from the following resources available in the public domain: GHTorrent, https://ghtorrent.org/downloads.html.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.