Volume 70, Issue 2 pp. 589-606
ORIGINAL PAPER

Cross entropy and log likelihood ratio cost as performance measures for multi-conclusion categorical outcomes scales

Eric M. Warren PhD

Eric M. Warren PhD

SEP Forensic Consultants, Memphis, Tennessee, USA

Search for more papers by this author
John C. Handley PhD

John C. Handley PhD

Simon Business School, University of Rochester, Rochester, New York, USA

Search for more papers by this author
H. David Sheets PhD

Corresponding Author

H. David Sheets PhD

Computer and Data Sciences, Merrimack College, North Andover, Massachusetts, USA

Correspondence

H. David Sheets, Computer and Data Sciences, Merrimack College, 315 Turnpike St, North, Andover, MA 01845, USA.

Email: [email protected]

Search for more papers by this author
First published: 10 December 2024
Citations: 1

Abstract

The inconclusive category in forensics reporting is the appropriate response in many cases, but it poses challenges in estimating an “error rate”. We discuss the use of a class of information-theoretic measures related to cross entropy as an alternative set of metrics that allows for performance evaluation of results presented using multi-category reporting scales. This paper shows how this class of performance metrics, and in particular the log likelihood ratio cost, which is already in use with likelihood ratio forensic reporting methods and in machine learning communities, can be readily adapted for use with the widely used multiple category conclusions scales. Bayesian credible intervals on these metrics can be estimated using numerical methods. The application of these metrics to published test results is shown. It is demonstrated, using these test results, that reducing the number of categories used in a proficiency test from five or six to three increases the cross entropy, indicating that the higher number of categories was justified, as it they increased the level of agreement with ground truth.

CONFLICT OF INTEREST STATEMENT

The authors have no conflicts of interest to declare.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.