Volume 92, Issue 2 pp. 505-535
ORIGINAL ARTICLE

Textual analysis of insurance claims with large language models

Dongchen Li

Dongchen Li

Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062 P.R. China

Search for more papers by this author
Zhuo Jin

Zhuo Jin

Department of Actuarial Studies and Business Analytics, Macquarie University, 2109 NSW, Australia

Search for more papers by this author
Linyi Qian

Corresponding Author

Linyi Qian

Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062 P.R. China

China Inclusive Aging Finance Research Center, East China Normal University, Shanghai, 200062 P.R. China

Correspondence Linyi Qian, Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai 200062, P.R. China.

Email: [email protected]

Search for more papers by this author
Hailiang Yang

Hailiang Yang

Department of Financial and Actuarial Mathematics, School of Mathematics & Physics, Xi'an Jiaotong-Liverpool University, Suzhou, 215123 P.R. China

Search for more papers by this author
First published: 26 March 2025

Abstract

This study proposes a comprehensive and general framework for examining discrepancies in textual content using large language models (LLMs), broadening application scenarios in the insurtech and risk management fields, and conducting empirical research based on actual needs and real-world data. Our framework integrates OpenAI's interface to embed texts and project them into external categories while utilizing distance metrics to evaluate discrepancies. To identify significant disparities, we design prompts to analyze three types of relationships: identical information, logical relationships and potential relationships. Our empirical analysis shows that 22.1% of samples exhibit substantial semantic discrepancies, and 38.1% of the samples with significant differences contain at least one of the identified relationships. The average processing time for each sample does not exceed 4 s, and all processes can be adjusted based on actual needs. Backtesting results and comparisons with traditional NLP methods further demonstrate that our proposed method is both effective and robust.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.