ORIGINAL ARTICLE

Textual analysis of insurance claims with large language models

Dongchen Li

Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062 P.R. China

Search for more papers by this author

Zhuo Jin,

Zhuo Jin

Department of Actuarial Studies and Business Analytics, Macquarie University, 2109 NSW, Australia

Search for more papers by this author

Linyi Qian,

Corresponding Author

Linyi Qian

[email protected]

Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062 P.R. China

China Inclusive Aging Finance Research Center, East China Normal University, Shanghai, 200062 P.R. China

Correspondence Linyi Qian, Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai 200062, P.R. China.

Email: [email protected]

Search for more papers by this author

Hailiang Yang,

Hailiang Yang

Department of Financial and Actuarial Mathematics, School of Mathematics & Physics, Xi'an Jiaotong-Liverpool University, Suzhou, 215123 P.R. China

Search for more papers by this author

Dongchen Li,

Dongchen Li

Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062 P.R. China

Search for more papers by this author

Zhuo Jin,

Zhuo Jin

Department of Actuarial Studies and Business Analytics, Macquarie University, 2109 NSW, Australia

Search for more papers by this author

Linyi Qian,

Corresponding Author

Linyi Qian

[email protected]

Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062 P.R. China

China Inclusive Aging Finance Research Center, East China Normal University, Shanghai, 200062 P.R. China

Correspondence Linyi Qian, Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai 200062, P.R. China.

Email: [email protected]

Search for more papers by this author

Hailiang Yang,

Hailiang Yang

Department of Financial and Actuarial Mathematics, School of Mathematics & Physics, Xi'an Jiaotong-Liverpool University, Suzhou, 215123 P.R. China

Search for more papers by this author

First published: 26 March 2025

https://doi.org/10.1111/jori.70004

Share a link

Email
Wechat
Bluesky

Abstract

This study proposes a comprehensive and general framework for examining discrepancies in textual content using large language models (LLMs), broadening application scenarios in the insurtech and risk management fields, and conducting empirical research based on actual needs and real-world data. Our framework integrates OpenAI's interface to embed texts and project them into external categories while utilizing distance metrics to evaluate discrepancies. To identify significant disparities, we design prompts to analyze three types of relationships: identical information, logical relationships and potential relationships. Our empirical analysis shows that 22.1% of samples exhibit substantial semantic discrepancies, and 38.1% of the samples with significant differences contain at least one of the identified relationships. The average processing time for each sample does not exceed 4 s, and all processes can be adjusted based on actual needs. Backtesting results and comparisons with traditional NLP methods further demonstrate that our proposed method is both effective and robust.

REFERENCES

Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., … Zhu, T. (2023). Qwen technical report. arXiv preprint arXiv:2309.16609.
Google Scholar
Balona, C. (2024). ActuaryGPT: applications of large language models to insurance and actuarial work. British Actuarial Journal, 29, e15.
10.1017/S1357321724000102
Google Scholar
Banulescu, R. D., & Yankol, S. M. (2023). Practical guideline to efficiently detect insurance fraud in the era of machine learning: a household insurance case. Journal of Risk and Insurance, 91(4), 867–913.
10.1111/jori.12452
Web of Science® Google Scholar
Bayerstadler, A., Dijk, L. V., & Winter, F. (2016). Bayesian multinomial latent variable modeling for fraud and abuse detection in health insurance. Insurance: Mathematics and Economics, 71, 244–252.
10.1016/j.insmatheco.2016.09.013
Web of Science® Google Scholar
Biswas, S. S. (2023). Potential use of ChatGPT in global warming. Annals of Biomedical Engineering, 51(6), 1126–1127.
10.1007/s10439-023-03171-8
PubMed Google Scholar
Biswas, S. (2023). Using ChatGPT for insurance: Current and prospective roles. Available at: SSRN 4405394.
Google Scholar
Brockmeier, A. J., Mu, T. T., Ananiadou, S., & Goulermas, J. Y. (2017). Quantifying the informativeness of similarity measurements. Journal of Machine Learning Research, 18(76), 1–61.
Google Scholar
Bryson, J. M., Crosby, B. C., & Stone, M. M. (2015). Designing and implementing cross-sector collaborations: needed and challenging. Public Administration Review, 75(5), 647–663.
10.1111/puar.12432
Web of Science® Google Scholar
Chen, A., Chen, Y. S., Murphy, F., Xu, W., & Xu, X. (2023). How does the insurer's mobile application sales strategy perform? Journal of Risk and Insurance, 90(2), 487–519.
10.1111/jori.12424
Web of Science® Google Scholar
Cheng, Y. H., & Tang, K. (2024). GPT's idea of stock factors. Quantitative Finance, 24(9), 1301–1326.
10.1080/14697688.2024.2318220
Web of Science® Google Scholar
De Boer, P. T., Kroese, D. P., Mannor, S., & Rubinstein, R. Y. (2005). A tutorial on the cross-entropy method. Annals of Operations Research, 134(1), 19–67.
10.1007/s10479-005-5724-z
Web of Science® Google Scholar
Derrig, R. A. (2002). Insurance fraud. Journal of Risk and Insurance, 69(3), 271–287.
10.1111/1539-6975.00026
Web of Science® Google Scholar
Diaconis, P., & Graham, R. L. (1997). Spearman's footrule as a measure of disarray. Journal of the Royal Statistical Society Series B: Statistical Methodology, 39(2), 262–268.
10.1111/j.2517-6161.1977.tb01624.x
Google Scholar
Felício, J. A., & Rodrigues, R. (2015). Organizational factors and customers' motivation effect on insurance companies' performance. Journal of Business Research, 68(7), 1622–1629.
10.1016/j.jbusres.2015.02.006
Web of Science® Google Scholar
Gatzert, N., & Heidinger, D. (2020). An empirical analysis of market reactions to the firstsolvency and financial condition reports in the european insurance industry. Journal of Risk and Insurance, 87(2), 407–436.
10.1111/jori.12287
Web of Science® Google Scholar
Gatzert, N., & Schubert, M. (2022). Cyber risk management in the us banking and insurance industry: a textual and empirical analysis of determinants and value. Journal of Risk and Insurance, 89(3), 725–763.
10.1111/jori.12381
Web of Science® Google Scholar
Gold, S., & Rangarajan, A. (1996). Softmax to softassign: neural network algorithms for combinatorial optimization. Journal of Artificial Neural Networks, 2(4), 381–399.
Google Scholar
Hassanzadeh, F. F., & Milenkovic, O. (2014). An axiomatic approach to constructing distances for rank comparison and aggregation. IEEE Transactions on Information Theory, 60(10), 6417–6439.
10.1109/TIT.2014.2345760
Google Scholar
Järvelin, K., & Kekäläinen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 41–48).
10.1145/345508.345545
Google Scholar
Kang, I., Woensel, W. V., & Seneviratne, O. (2024). Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies. In AI for Health Equity and Fairness: Leveraging AI to Address Social Determinants of Health (pp. 129–146). Springer Nature Switzerland.
10.1007/978-3-031-63592-2_11
Google Scholar
Kim, H., Howland, P., Park, h., & Christianini, N. (2005). Dimension reduction in text classification with support vector machines. Journal of Machine Learning Research, 6(1), 37–53.
Google Scholar
Kumar, R., & Vassilvitskii, S. (2010). Generalized distances between rankings. In Proceedings of the 19th International Conference on World Wide Web (pp. 571–580).
10.1145/1772690.1772749
Google Scholar
Lee, P., Bubeck, S., & Petro, J. (2023). Benefits, limits, and risks of GPT-4 as an AI Chatbot for Medicine. New England Journal of Medicine, 388(13), 1233–1239.
10.1056/NEJMsr2214184
PubMed Web of Science® Google Scholar
Liu, H. F., Jing, l. P., Wen, J. X., Xu, P. Y., Wang, J. Q., Yu, J., & Ng, M. K. (2021). Interpretable deep generative recommendation models. Journal of Machine Learning Research, 22(202), 1–54.
Google Scholar
Liu, Y., Hong, X. P., Tao, X. Y., Dong, S. L., Shi, J. G., & Gong, Y. H. (2023). Model behavior preserving for class-incremental learning. IEEE Transactions on Neural Networks and Learning Systems, 34(10), 7529–7540.
10.1109/TNNLS.2022.3144183
PubMed Web of Science® Google Scholar
Mannor, S., Peleg, D., & Rubinstein, R. (2005). The cross entropy method for classification. In Proceedings of the 22nd International Conference on Machine Learning (pp. 561–568).
10.1145/1102351.1102422
Google Scholar
Martin, F., Hemmelmayr, V. C., & Wakolbinger, T. (2021). Integrated express shipment service network design with customer choice and endogenous delivery time restrictions. European Journal of Operational Research, 294(2), 590–603.
10.1016/j.ejor.2021.02.014
Web of Science® Google Scholar
Namperumal, G., Paul, D., & Soundarapandiyan, R. (2024). Deploying LLMs for insurance underwriting and claims processing: a comprehensive guide to training, model validation, and regulatory compliance. Australian Journal of Machine Learning Research and Applications, 4(1), 226–263.
Google Scholar
Peng, W., Xu, D., Xu, T., Zhang, J., & Chen, E. (2023). Are gpt embeddings useful for ads and recommendation? In International Conference on Knowledge Science, Engineering and Management (pp. 151–162).
10.1007/978-3-031-40292-0_13
Google Scholar
Reimers, N. (2019). Sentence-BERT: sentence embeddings using siamese BERT-Networks. arXiv preprint arXiv:1908.10084.
Google Scholar
Ressel, J., Völler, M., Murphy, F., & Mullins, M. (2024). Addressing the notion of trust around ChatGPT in the high-stakes use case of insurance. Technology in Society, 78, 102644.
10.1016/j.techsoc.2024.102644
Web of Science® Google Scholar
Spearman, C. (1906). Footrule for measuring correlation. British Journal of Psychology, 2(1), 89–108.
Google Scholar
Tan, L. C., & Clarke, L. A. (2015). A family of rank similarity measures based on maximized effectiveness difference. IEEE Transactions on Knowledge and Data Engineering, 27(11), 2865–2877.
10.1109/TKDE.2015.2448541
Web of Science® Google Scholar
Valizadegan, H., Jin, R., Zhang, R. F., & Mao, J. C. (2009). Learning to rank by optimizing NDCG measure. In Advances in Neural Information Processing Systems (pp. 1883–1891).
Google Scholar
Van der Aa, H., Leopold, H., & Reijers, H. A. (2017). Comparing textual descriptions to process models the automatic detection of inconsistencies. Information Systems, 64, 447–460.
10.1016/j.is.2016.07.010
Web of Science® Google Scholar
Van Dis, E. A. M., Bollen, J., Zuidema, W., Van Rooij, R., & Bockting, C. L. (2023). ChatGPT: five priorities for research. Nature, 614(7947), 224–226.
10.1038/d41586-023-00288-7
CAS PubMed Web of Science® Google Scholar
Wang, L., Yang, N., Huang, X., Yang, L., Majumder, R., & Wei, F. (2023). Improving text embeddings with large language models. arXiv preprint arXiv:2401.00368.
Google Scholar
Wang, Y. B., & Xu, W. (2018). Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decision Support Systems, 105, 87–95.
10.1016/j.dss.2017.11.001
Web of Science® Google Scholar
Xu, S. Z., Zhang, C. L., & Hong, D. (2022). Bert-based NLP techniques for classification and severity modeling in basic warranty data study. Insurance: Mathematics and Economics, 107, 57–67.
10.1016/j.insmatheco.2022.07.013
Web of Science® Google Scholar
Yoganarasimhan, H. (2020). Search personalization using machine learning. Management Science, 66(3), 1045–1070.
10.1287/mnsc.2018.3255
Web of Science® Google Scholar
Zhang, Z. L., & Sabuncu, M. (2018). Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in Neural Information Processing Systems (pp. 8792–8802).
Google Scholar

Volume92, Issue2

June 2025

Pages 505-535

Textual analysis of insurance claims with large language models

Abstract

REFERENCES

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Textual analysis of insurance claims with large language models

Abstract

REFERENCES

References

Related

Information