Journal of Software: Evolution and Process

SPECIAL ISSUE - TECHNOLOGY PAPER

Supporting maintenance and testing for AI functions of mobile apps based on user reviews: An empirical study on plant identification apps

Corresponding Author

Chuanqi Tao

[email protected]

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University, Nanjing, China

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

Correspondence

Chuanqi Tao, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China.

Email: [email protected]

Search for more papers by this author

Hongjing Guo,

Hongjing Guo

orcid.org/0000-0001-5286-874X

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Search for more papers by this author

Jingxuan Zhang,

Jingxuan Zhang

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Search for more papers by this author

Zhiqiu Huang,

Zhiqiu Huang

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Search for more papers by this author

Chuanqi Tao,

Corresponding Author

Chuanqi Tao

[email protected]

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University, Nanjing, China

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

Correspondence

Chuanqi Tao, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China.

Email: [email protected]

Search for more papers by this author

Hongjing Guo,

Hongjing Guo

orcid.org/0000-0001-5286-874X

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Search for more papers by this author

Jingxuan Zhang,

Jingxuan Zhang

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Search for more papers by this author

Zhiqiu Huang,

Zhiqiu Huang

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Search for more papers by this author

First published: 27 February 2022

https://doi.org/10.1002/smr.2444

Share a link

Email
Wechat
Bluesky

Abstract

Despite the tremendous development of artificial intelligence (AI)-based mobile apps, they suffer from quality issues. Data-driven AI software poses challenges for maintenance and quality assurance. Metamorphic testing has been successfully adopted to AI software. However, most previous studies require testers to manually identify metamorphic relations in an ad hoc and arbitrary manner, thereby encountering difficulties in reflecting real-world usage scenarios. Previous work showed that information available in user reviews is effective for maintenance and testing tasks. Yet, there is a lack of studies leveraging reviews to facilitate AI function maintenance and testing activities. This paper proposes METUR, a novel approach to supporting maintenance and testing for AI functions based on reviews. Firstly, METUR automatically classifies reviews that can be exploited for supporting AI function maintenance and evolution activities. Then, it identifies test contexts from reviews in the usage scenario category. METUR instantiates the metamorphic relation pattern for deriving concrete metamorphic relations based on test contexts. The follow-up test dataset is constructed for conducting metamorphic testing. Empirical studies on plant identification apps indicate that METUR effectively categorizes reviews that are related to AI functions. METUR is feasible and effective in detecting inconsistent behaviors by using the metamorphic relations constructed based on reviews.

CONFLICT OF INTEREST

The authors declare no potential conflict of interests.

Open Research

DATA AVAILABILITY STATEMENT

The data and code that support the findings of this study are openly available in GitHub at https://github.com/TestingAIGroup/METUR.⁵⁹

REFERENCES

1Du X, Xie X, Li Y, Ma L, Zhao J, Liu Y. Deepcruiser: automated guided testing for stateful deep learning systems. CoRR. 2018; abs/1812.05339.
Google Scholar
2Gao J, Tao C, Jie D, Lu S. Invited paper: what is AI software testing? and why. In: 13th IEEE International Conference on Service-Oriented System Engineering, SOSE 2019, San Francisco, CA, USA, April 4-9, 2019; 2019: 27-2709.
Google Scholar
3Tian Y, Pei K, Jana S, Ray B. Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018; 2018: 303-314.
Google Scholar
4Pei K, Cao Y, Yang J, Jana S. Deepxplore: automated whitebox testing of deep learning systems. In: Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28-31, 2017; 2017: 1-18.
Google Scholar
5Amershi S, Begel A, Bird C, et al. Software engineering for machine learning: a case study. In: Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, ICSE (SEIP) 2019, Montreal, QC, Canada, May 25-31, 2019; 2019: 291-300.
Google Scholar
6Zhang JM, Harman M, Ma L, Liu Y. Machine learning testing: survey, landscapes and horizons. IEEE Transactions on Software Engineering. 2020: 1-1.
Web of Science® Google Scholar
7Wang S, Su Z. Metamorphic testing for object detection systems. CoRR. 2019; abs/1912.12162.
Google Scholar
8Chen TY, Kuo F-C, Liu H, et al. Metamorphic testing: a review of challenges and opportunities. ACM Comput. Surv. 2018; 51(1): 4:1-4:27.
Web of Science® Google Scholar
9Sun C-A, Fu A, Poon P-L, Xie X, Liu H, Chen TY. Metric+: a metamorphic relation identification technique based on input plus output domains. IEEE Transactions on Software Engineering. 2019: 1-1.
10.1109/TSE.2019.2934848
Web of Science® Google Scholar
10Dwarakanath A, Ahuja M, Sikand S, Rao RM, Bose RPJC, Dubash N, Podder S. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16-21, 2018 F Tip, E Bodden, eds. ACM; 2018: 118-128.
Google Scholar
11Zhou Z, Xiang S, Chen TY. Metamorphic testing for software quality assessment: a study of search engines. IEEE Trans. Software Eng. 2016; 42(3): 264-284.
10.1109/TSE.2015.2478001
Web of Science® Google Scholar
12Panichella S, Sorbo AD, Guzman E, Visaggio CA, Canfora G, Gall HC. How can i improve my app? classifying user reviews for software maintenance and evolution. In: 2015 IEEE International Conference on Software Maintenance and Evolution, ICSME 2015, Bremen, Germany, September 29-October 1, 2015; 2015: 281-290.
Google Scholar
13Ciurumelea A, Schaufelbühl A, Panichella S, Gall HC. Analyzing reviews and code of mobile apps for better release planning. In: IEEE 24th International Conference on Software Analysis, Evolution and Reengineering, SANER 2017, Klagenfurt, Austria, February 20-24, 2017; 2017: 91-102.
Google Scholar
14Martin WJ, Sarro F, Jia Y, Zhang Y, Harman M. A survey of app store analysis for software engineering. IEEE Trans. Software Eng. 2017; 43(9): 817-847.
10.1109/TSE.2016.2630689
Web of Science® Google Scholar
15Grano G, Ciurumelea A, Panichella S, Palomba F, Gall HC. Exploring the integration of user feedback in automated testing of android applications. In: 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20-23, 2018; 2018: 72-83.
Google Scholar
16Wang S, Su Z. Metamorphic object insertion for testing object detection systems. In: 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE; 2020: 1053-1065.
Google Scholar
17Chen TY, Cheung SC, Yiu S-M. Metamorphic testing: a new approach for generating next test cases. In: Technical report, Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong; 1998.
Google Scholar
18Segura S, Fraser G, Sánchez AB, Cortés AR. A survey on metamorphic testing. IEEE Trans. Software Eng. 2016; 42(9): 805-824.
10.1109/TSE.2016.2532875
Web of Science® Google Scholar
19Ding J, Wu T, Lu JQ, Hu X-H. Self-checked metamorphic testing of an image processing program. In: Fourth International Conference On Secure Software Integration and Reliability Improvement, SSIRI 2010, Singapore, June 9-11, 2010. IEEE Computer Society; 2010: 190-197.
Google Scholar
20Zhou ZQ, Sun L, Chen TY, Towey D. Metamorphic relations for enhancing system understanding and use. IEEE Trans. Software Eng. 2020; 46(10): 1120-1154.
10.1109/TSE.2018.2876433
Web of Science® Google Scholar
21Wu C, Sun L, Zhou ZQ. The impact of a dot: case studies of a noise metamorphic relation pattern. In: Proceedings of the 4th International Workshop on Metamorphic Testing, MET@ICSE 2019, Montreal, QC, Canada, May 26, 2019 X Xie, P-L Poon, LL Pullum, eds. IEEE / ACM; 2019: 17-23.
Google Scholar
22Zhou Y, Su Y, Chen T, Huang Z, Gall HC, Panichella S. User review-based change file localization for mobile applications. IEEE Transactions on Software Engineering. 2020. to appear.
Web of Science® Google Scholar
23Gu X, Kim S. What parts of your apps are loved by users? (T). In: 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015; 2015: 760-770.
Google Scholar
24Vu PM, Nguyen TT, Pham HV, Nguyen TT. Mining user opinions in mobile app reviews: a keyword-based approach (T). In: 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015; 2015: 749-759.
Google Scholar
25Tao C, Guo H, Huang Z. Identifying security issues for mobile applications based on user review summarization. Inf. Softw. Technol. 2020; 122: 106290.
10.1016/j.infsof.2020.106290
Web of Science® Google Scholar
26Bird S. NLTK: the natural language toolkit. In: ACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006; 2006.
Google Scholar
27Palomba F, Salza P, Ciurumelea A, et al. Recommending and localizing change requests for mobile apps based on user reviews. In: Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017; 2017: 106-117.
Google Scholar
28Khalid H, Shihab E, Nagappan M, Hassan AE. What do mobile app users complain about? IEEE Softw. 2015; 32(3): 70-77.
10.1109/MS.2014.50
Web of Science® Google Scholar
29Kowsari K, Meimandi KJ, Heidarysafa M, Mendu S, Barnes LE, Brown DE. Text classification algorithms: a survey. Information. 2019; 10(4): 150.
10.3390/info10040150
Web of Science® Google Scholar
30Maalej W, Nabil H. Bug report, feature request, or simply praise? On automatically classifying app reviews. In: 23rd IEEE International Requirements Engineering Conference, RE 2015, Ottawa, ON, Canada, August 24-28, 2015; 2015: 116-125.
Google Scholar
31Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, IJCAI 95, Montréal Québec, Canada, August 20-25 1995, 2 Volumes; 1995: 1137-1145.
Google Scholar
32Campos PG, Rodríguez-Artigot N, Cantador I. Extracting context data from user reviews for recommendation: a linked data approach. In: Proceedings of the RecSys 2017 Workshop on Recommendation in Complex Scenarios co-located with 11th ACM Conference on Recommender Systems (RecSys 2017), Como, Italy, August 31, 2017; 2017: 14-18.
Google Scholar
33Yin Y, Chen L, Xu Y, Wan J. Location-aware service recommendation with enhanced probabilistic matrix factorization. IEEE Access. 2018; 6: 62815-62825.
10.1109/ACCESS.2018.2877137
Web of Science® Google Scholar
34Zhang M, Zhang Y, Zhang L, Liu C, Khurshid S. Deeproad: gan-based metamorphic testing and input validation framework for autonomous driving systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018; 2018: 132-142.
Google Scholar
35Zhou ZQ, Sun L. Metamorphic testing of driverless cars. Commun. ACM. 2019; 62(3): 61-67.
10.1145/3241979
Web of Science® Google Scholar
36Carreño LVG, Winbladh K. Analysis of user comments: an approach for software requirements evolution. In: 35th International Conference on Software Engineering, ICSE '13, San Francisco, CA, USA, May 18-26, 2013; 2013: 582-591.
Google Scholar
37McIlroy S, Ali N, Khalid H, Hassan AE. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering. 2016; 21(3): 1067-1106.
10.1007/s10664-015-9375-7
Web of Science® Google Scholar
38Zhang Z, Xie X. On the investigation of essential diversities for deep learning testing criteria. In: 19th IEEE International Conference on Software Quality, Reliability and Security, QRS 2019, Sofia, Bulgaria, July 22-26, 2019; 2019: 394-405.
Google Scholar
39Spieker H, Gotlieb A, Marijan D, Mossige M. Reinforcement learning for automatic test case prioritization and selection in continuous integration. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, July 10-14, 2017 T Bultan, K Sen, eds. ACM; 2017: 12-22.
Google Scholar
40Tao Z, Liu H, Fu H, Fu Y. Image cosegmentation via saliency-guided constrained clustering with cosine similarity. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA; 2017: 4285-4291.
Google Scholar
41Iacob C, Harrison R. Retrieving and analyzing mobile apps feature requests from online reviews. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR '13, San Francisco, CA, USA, May 18-19, 2013; 2013: 41-44.
Google Scholar
42Villarroel L, Bavota G, Russo B, Oliveto R, Penta MD. Release planning of mobile apps based on user reviews. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016; 2016: 14-24.
Google Scholar
43Sorbo AD, Panichella S, Alexandru CV, et al. What would users change in my app? summarizing app reviews for recommending software changes. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016; 2016: 499-510.
Google Scholar
44Guzman E, El-Haliby M, Bruegge B. Ensemble methods for app review classification: an approach for software evolution (N). In: 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015; 2015: 771-776.
Google Scholar
45Panichella S. Summarization techniques for code, change, testing, and user feedback (invited paper). In: 2018 IEEE Workshop on Validation, Analysis and Evolution of Software Tests, VST@SANER 2018, Campobasso, Italy, March 20, 2018; 2018: 1-5.
Google Scholar
46Pelloni L, Grano G, Ciurumelea A, Panichella S, Palomba F, Gall HC. BECLoMA: Augmenting stack traces with user review information. In: 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20-23, 2018; 2018: 522-526.
Google Scholar
47Gao J, Bai X, Tsai W-T, Uehara T. Mobile application testing: a tutorial. Computer. 2014; 47(2): 46-55.
10.1109/MC.2013.445
Web of Science® Google Scholar
48Pan M, Huang A, Wang G, Zhang T, Li X. Reinforcement learning based curiosity-driven testing of android applications. In: ISSTA '20: 29th ACM SIGSOFT International Symposium On Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020 S Khurshid, CS Pasareanu, eds. ACM; 2020: 153-164.
Google Scholar
49Mao K, Harman M, Jia Y. Sapienz: multi-objective automated testing for android applications. In: Proceedings of the 25th International Symposium On Software Testing and Analysis, ISSTA 2016, SaarbrÜCken, Germany, July 18-20, 2016 A Zeller, A Roychoudhury, eds. ACM; 2016: 94-105.
Google Scholar
50Lai D, Rubin J. Goal-driven exploration for android applications. In: 34th IEEE/ACM International Conference On Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11-15, 2019. IEEE; 2019: 115-127.
Google Scholar
51Mahmood R, Mirzaei N, Malek S. Evodroid: segmented evolutionary testing of android apps. In: Proceedings of the 22nd ACM SIGSOFT International Symposium On Foundations Of Software Engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014 S-C Cheung, A Orso, M-AD Storey, eds. ACM; 2014: 599-609.
Google Scholar
52Ye H, Cheng S, Zhang L, Jiang F. Droidfuzzer: Fuzzing the android apps with intent-filter tag. In: The 11th International Conference On Advances In Mobile Computing & Multimedia, MoMM '13, Vienna, Austria, December 2-4, 2013 R Mayrhofer, L Chen, M Steinbauer, G Kotsis, I Khalil, eds. ACM; 2013: 68.
Google Scholar
53Sasnauskas R, Regehr J. Intent fuzzer: crafting intents of death. In: Proceedings of the 2014 Joint International Workshop On Dynamic Analysis (WODA) and Software And System Performance Testing, Debugging, and Analytics (PERTEA), WODA+PERTEA 2014, San Jose, CA, USA, July 22, 2014 H Xu, T Xie, S Lu, D Zhang, S Nagarakatte, C Csallner, eds. ACM; 2014: 1-5.
Google Scholar
54Murphy C, Kaiser GE, Hu L, Wu L. Properties of machine learning applications for use in metamorphic testing. In: Proceedings of the Twentieth International Conference on Software Engineering & Knowledge Engineering (SEKE'2008), San Francisco, CA, USA, July 1-3, 2008; 2008: 867-872.
Google Scholar
55Zhu H, Liu D, Bayley I, Harrison R, Cuzzolin F. Datamorphic testing: a method for testing intelligent applications. In: IEEE International Conference On Artificial Intelligence Testing, AITest 2019, Newark, CA, USA, April 4-9, 2019; 2019: 149-156.
Google Scholar
56Bozic J, Wotawa F. Testing chatbots using metamorphic relations. In: Testing Software and Systems - 31St IFIP WG 6.1 International Conference, ICTSS 2019, Paris, France, October 15-17, 2019, Proceedings C Gaston, N Kosmatov, PL Gall, eds., Lecture Notes in Computer Science, vol. 11812. Springer; 2019: 41-55.
Google Scholar
57Zhang J, Chen J, Hao D, et al. Search-based inference of polynomial metamorphic relations. In: ACM/IEEE International Conference On Automated Software Engineering, ASE '14, Vasteras, Sweden - September 15 - 19, 2014 I Crnkovic, M Chechik, P Grünbacher, eds. ACM; 2014: 701-712.
Google Scholar
58Chen TY, Poon P-L, Xie X. METRIC: metamorphic relation identification based on the category-choice framework. J. Syst. Softw. 2016; 116: 177-190.
10.1016/j.jss.2015.07.037
Web of Science® Google Scholar
59 Metur. https://github.com/TestingAIGroup/METUR; 2021.
Google Scholar

Volume35, Issue11

Special Issue:Intelligent Bug Fixing

November 2023

e2444

Supporting maintenance and testing for AI functions of mobile apps based on user reviews: An empirical study on plant identification apps

Abstract

CONFLICT OF INTEREST

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Supporting maintenance and testing for AI functions of mobile apps based on user reviews: An empirical study on plant identification apps

Abstract

CONFLICT OF INTEREST

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

References

Related

Information