Supporting maintenance and testing for AI functions of mobile apps based on user reviews: An empirical study on plant identification apps
Corresponding Author
Chuanqi Tao
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University, Nanjing, China
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Correspondence
Chuanqi Tao, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China.
Email: [email protected]
Search for more papers by this authorHongjing Guo
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Search for more papers by this authorJingxuan Zhang
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Search for more papers by this authorZhiqiu Huang
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Search for more papers by this authorCorresponding Author
Chuanqi Tao
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University, Nanjing, China
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Correspondence
Chuanqi Tao, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China.
Email: [email protected]
Search for more papers by this authorHongjing Guo
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Search for more papers by this authorJingxuan Zhang
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Search for more papers by this authorZhiqiu Huang
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Search for more papers by this authorAbstract
Despite the tremendous development of artificial intelligence (AI)-based mobile apps, they suffer from quality issues. Data-driven AI software poses challenges for maintenance and quality assurance. Metamorphic testing has been successfully adopted to AI software. However, most previous studies require testers to manually identify metamorphic relations in an ad hoc and arbitrary manner, thereby encountering difficulties in reflecting real-world usage scenarios. Previous work showed that information available in user reviews is effective for maintenance and testing tasks. Yet, there is a lack of studies leveraging reviews to facilitate AI function maintenance and testing activities. This paper proposes METUR, a novel approach to supporting maintenance and testing for AI functions based on reviews. Firstly, METUR automatically classifies reviews that can be exploited for supporting AI function maintenance and evolution activities. Then, it identifies test contexts from reviews in the usage scenario category. METUR instantiates the metamorphic relation pattern for deriving concrete metamorphic relations based on test contexts. The follow-up test dataset is constructed for conducting metamorphic testing. Empirical studies on plant identification apps indicate that METUR effectively categorizes reviews that are related to AI functions. METUR is feasible and effective in detecting inconsistent behaviors by using the metamorphic relations constructed based on reviews.
CONFLICT OF INTEREST
The authors declare no potential conflict of interests.
Open Research
DATA AVAILABILITY STATEMENT
The data and code that support the findings of this study are openly available in GitHub at https://github.com/TestingAIGroup/METUR.59
REFERENCES
- 1Du X, Xie X, Li Y, Ma L, Zhao J, Liu Y. Deepcruiser: automated guided testing for stateful deep learning systems. CoRR. 2018; abs/1812.05339.
- 2Gao J, Tao C, Jie D, Lu S. Invited paper: what is AI software testing? and why. In: 13th IEEE International Conference on Service-Oriented System Engineering, SOSE 2019, San Francisco, CA, USA, April 4-9, 2019; 2019: 27-2709.
- 3Tian Y, Pei K, Jana S, Ray B. Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018; 2018: 303-314.
- 4Pei K, Cao Y, Yang J, Jana S. Deepxplore: automated whitebox testing of deep learning systems. In: Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28-31, 2017; 2017: 1-18.
- 5Amershi S, Begel A, Bird C, et al. Software engineering for machine learning: a case study. In: Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, ICSE (SEIP) 2019, Montreal, QC, Canada, May 25-31, 2019; 2019: 291-300.
- 6Zhang JM, Harman M, Ma L, Liu Y. Machine learning testing: survey, landscapes and horizons. IEEE Transactions on Software Engineering. 2020: 1-1.
- 7Wang S, Su Z. Metamorphic testing for object detection systems. CoRR. 2019; abs/1912.12162.
- 8Chen TY, Kuo F-C, Liu H, et al. Metamorphic testing: a review of challenges and opportunities. ACM Comput. Surv. 2018; 51(1): 4:1-4:27.
- 9Sun C-A, Fu A, Poon P-L, Xie X, Liu H, Chen TY. Metric+: a metamorphic relation identification technique based on input plus output domains. IEEE Transactions on Software Engineering. 2019: 1-1.
- 10Dwarakanath A, Ahuja M, Sikand S, Rao RM, Bose RPJC, Dubash N, Podder S. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16-21, 2018 F Tip, E Bodden, eds. ACM; 2018: 118-128.
- 11Zhou Z, Xiang S, Chen TY. Metamorphic testing for software quality assessment: a study of search engines. IEEE Trans. Software Eng. 2016; 42(3): 264-284.
- 12Panichella S, Sorbo AD, Guzman E, Visaggio CA, Canfora G, Gall HC. How can i improve my app? classifying user reviews for software maintenance and evolution. In: 2015 IEEE International Conference on Software Maintenance and Evolution, ICSME 2015, Bremen, Germany, September 29-October 1, 2015; 2015: 281-290.
- 13Ciurumelea A, Schaufelbühl A, Panichella S, Gall HC. Analyzing reviews and code of mobile apps for better release planning. In: IEEE 24th International Conference on Software Analysis, Evolution and Reengineering, SANER 2017, Klagenfurt, Austria, February 20-24, 2017; 2017: 91-102.
- 14Martin WJ, Sarro F, Jia Y, Zhang Y, Harman M. A survey of app store analysis for software engineering. IEEE Trans. Software Eng. 2017; 43(9): 817-847.
- 15Grano G, Ciurumelea A, Panichella S, Palomba F, Gall HC. Exploring the integration of user feedback in automated testing of android applications. In: 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20-23, 2018; 2018: 72-83.
- 16Wang S, Su Z. Metamorphic object insertion for testing object detection systems. In: 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE; 2020: 1053-1065.
- 17Chen TY, Cheung SC, Yiu S-M. Metamorphic testing: a new approach for generating next test cases. In: Technical report, Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong; 1998.
- 18Segura S, Fraser G, Sánchez AB, Cortés AR. A survey on metamorphic testing. IEEE Trans. Software Eng. 2016; 42(9): 805-824.
- 19Ding J, Wu T, Lu JQ, Hu X-H. Self-checked metamorphic testing of an image processing program. In: Fourth International Conference On Secure Software Integration and Reliability Improvement, SSIRI 2010, Singapore, June 9-11, 2010. IEEE Computer Society; 2010: 190-197.
- 20Zhou ZQ, Sun L, Chen TY, Towey D. Metamorphic relations for enhancing system understanding and use. IEEE Trans. Software Eng. 2020; 46(10): 1120-1154.
- 21Wu C, Sun L, Zhou ZQ. The impact of a dot: case studies of a noise metamorphic relation pattern. In: Proceedings of the 4th International Workshop on Metamorphic Testing, MET@ICSE 2019, Montreal, QC, Canada, May 26, 2019 X Xie, P-L Poon, LL Pullum, eds. IEEE / ACM; 2019: 17-23.
- 22Zhou Y, Su Y, Chen T, Huang Z, Gall HC, Panichella S. User review-based change file localization for mobile applications. IEEE Transactions on Software Engineering. 2020. to appear.
- 23Gu X, Kim S. What parts of your apps are loved by users? (T). In: 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015; 2015: 760-770.
- 24Vu PM, Nguyen TT, Pham HV, Nguyen TT. Mining user opinions in mobile app reviews: a keyword-based approach (T). In: 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015; 2015: 749-759.
- 25Tao C, Guo H, Huang Z. Identifying security issues for mobile applications based on user review summarization. Inf. Softw. Technol. 2020; 122: 106290.
- 26Bird S. NLTK: the natural language toolkit. In: ACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006; 2006.
- 27Palomba F, Salza P, Ciurumelea A, et al. Recommending and localizing change requests for mobile apps based on user reviews. In: Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017; 2017: 106-117.
- 28Khalid H, Shihab E, Nagappan M, Hassan AE. What do mobile app users complain about? IEEE Softw. 2015; 32(3): 70-77.
- 29Kowsari K, Meimandi KJ, Heidarysafa M, Mendu S, Barnes LE, Brown DE. Text classification algorithms: a survey. Information. 2019; 10(4): 150.
- 30Maalej W, Nabil H. Bug report, feature request, or simply praise? On automatically classifying app reviews. In: 23rd IEEE International Requirements Engineering Conference, RE 2015, Ottawa, ON, Canada, August 24-28, 2015; 2015: 116-125.
- 31Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, IJCAI 95, Montréal Québec, Canada, August 20-25 1995, 2 Volumes; 1995: 1137-1145.
- 32Campos PG, Rodríguez-Artigot N, Cantador I. Extracting context data from user reviews for recommendation: a linked data approach. In: Proceedings of the RecSys 2017 Workshop on Recommendation in Complex Scenarios co-located with 11th ACM Conference on Recommender Systems (RecSys 2017), Como, Italy, August 31, 2017; 2017: 14-18.
- 33Yin Y, Chen L, Xu Y, Wan J. Location-aware service recommendation with enhanced probabilistic matrix factorization. IEEE Access. 2018; 6: 62815-62825.
- 34Zhang M, Zhang Y, Zhang L, Liu C, Khurshid S. Deeproad: gan-based metamorphic testing and input validation framework for autonomous driving systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018; 2018: 132-142.
- 35Zhou ZQ, Sun L. Metamorphic testing of driverless cars. Commun. ACM. 2019; 62(3): 61-67.
- 36Carreño LVG, Winbladh K. Analysis of user comments: an approach for software requirements evolution. In: 35th International Conference on Software Engineering, ICSE '13, San Francisco, CA, USA, May 18-26, 2013; 2013: 582-591.
- 37McIlroy S, Ali N, Khalid H, Hassan AE. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering. 2016; 21(3): 1067-1106.
- 38Zhang Z, Xie X. On the investigation of essential diversities for deep learning testing criteria. In: 19th IEEE International Conference on Software Quality, Reliability and Security, QRS 2019, Sofia, Bulgaria, July 22-26, 2019; 2019: 394-405.
- 39Spieker H, Gotlieb A, Marijan D, Mossige M. Reinforcement learning for automatic test case prioritization and selection in continuous integration. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, July 10-14, 2017 T Bultan, K Sen, eds. ACM; 2017: 12-22.
- 40Tao Z, Liu H, Fu H, Fu Y. Image cosegmentation via saliency-guided constrained clustering with cosine similarity. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA; 2017: 4285-4291.
- 41Iacob C, Harrison R. Retrieving and analyzing mobile apps feature requests from online reviews. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR '13, San Francisco, CA, USA, May 18-19, 2013; 2013: 41-44.
- 42Villarroel L, Bavota G, Russo B, Oliveto R, Penta MD. Release planning of mobile apps based on user reviews. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016; 2016: 14-24.
- 43Sorbo AD, Panichella S, Alexandru CV, et al. What would users change in my app? summarizing app reviews for recommending software changes. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016; 2016: 499-510.
- 44Guzman E, El-Haliby M, Bruegge B. Ensemble methods for app review classification: an approach for software evolution (N). In: 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015; 2015: 771-776.
- 45Panichella S. Summarization techniques for code, change, testing, and user feedback (invited paper). In: 2018 IEEE Workshop on Validation, Analysis and Evolution of Software Tests, VST@SANER 2018, Campobasso, Italy, March 20, 2018; 2018: 1-5.
- 46Pelloni L, Grano G, Ciurumelea A, Panichella S, Palomba F, Gall HC. BECLoMA: Augmenting stack traces with user review information. In: 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20-23, 2018; 2018: 522-526.
- 47Gao J, Bai X, Tsai W-T, Uehara T. Mobile application testing: a tutorial. Computer. 2014; 47(2): 46-55.
- 48Pan M, Huang A, Wang G, Zhang T, Li X. Reinforcement learning based curiosity-driven testing of android applications. In: ISSTA '20: 29th ACM SIGSOFT International Symposium On Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020 S Khurshid, CS Pasareanu, eds. ACM; 2020: 153-164.
- 49Mao K, Harman M, Jia Y. Sapienz: multi-objective automated testing for android applications. In: Proceedings of the 25th International Symposium On Software Testing and Analysis, ISSTA 2016, SaarbrÜCken, Germany, July 18-20, 2016 A Zeller, A Roychoudhury, eds. ACM; 2016: 94-105.
- 50Lai D, Rubin J. Goal-driven exploration for android applications. In: 34th IEEE/ACM International Conference On Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11-15, 2019. IEEE; 2019: 115-127.
- 51Mahmood R, Mirzaei N, Malek S. Evodroid: segmented evolutionary testing of android apps. In: Proceedings of the 22nd ACM SIGSOFT International Symposium On Foundations Of Software Engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014 S-C Cheung, A Orso, M-AD Storey, eds. ACM; 2014: 599-609.
- 52Ye H, Cheng S, Zhang L, Jiang F. Droidfuzzer: Fuzzing the android apps with intent-filter tag. In: The 11th International Conference On Advances In Mobile Computing & Multimedia, MoMM '13, Vienna, Austria, December 2-4, 2013 R Mayrhofer, L Chen, M Steinbauer, G Kotsis, I Khalil, eds. ACM; 2013: 68.
- 53Sasnauskas R, Regehr J. Intent fuzzer: crafting intents of death. In: Proceedings of the 2014 Joint International Workshop On Dynamic Analysis (WODA) and Software And System Performance Testing, Debugging, and Analytics (PERTEA), WODA+PERTEA 2014, San Jose, CA, USA, July 22, 2014 H Xu, T Xie, S Lu, D Zhang, S Nagarakatte, C Csallner, eds. ACM; 2014: 1-5.
- 54Murphy C, Kaiser GE, Hu L, Wu L. Properties of machine learning applications for use in metamorphic testing. In: Proceedings of the Twentieth International Conference on Software Engineering & Knowledge Engineering (SEKE'2008), San Francisco, CA, USA, July 1-3, 2008; 2008: 867-872.
- 55Zhu H, Liu D, Bayley I, Harrison R, Cuzzolin F. Datamorphic testing: a method for testing intelligent applications. In: IEEE International Conference On Artificial Intelligence Testing, AITest 2019, Newark, CA, USA, April 4-9, 2019; 2019: 149-156.
- 56Bozic J, Wotawa F. Testing chatbots using metamorphic relations. In: Testing Software and Systems - 31St IFIP WG 6.1 International Conference, ICTSS 2019, Paris, France, October 15-17, 2019, Proceedings C Gaston, N Kosmatov, PL Gall, eds., Lecture Notes in Computer Science, vol. 11812. Springer; 2019: 41-55.
- 57Zhang J, Chen J, Hao D, et al. Search-based inference of polynomial metamorphic relations. In: ACM/IEEE International Conference On Automated Software Engineering, ASE '14, Vasteras, Sweden - September 15 - 19, 2014 I Crnkovic, M Chechik, P Grünbacher, eds. ACM; 2014: 701-712.
- 58Chen TY, Poon P-L, Xie X. METRIC: metamorphic relation identification based on the category-choice framework. J. Syst. Softw. 2016; 116: 177-190.
- 59 Metur. https://github.com/TestingAIGroup/METUR; 2021.