Volume 42, Issue 8 e70093
ORIGINAL ARTICLE

The Performance of Distances Between Time Series: An In-Depth Comparison

Margarida G. M. S. Cardoso

Corresponding Author

Margarida G. M. S. Cardoso

Instituto Universitário de Lisboa (ISCTE-IUL), Business Research Unit (BRU-IUL), Lisbon, Portugal

Correspondence:

Margarida G. M. S. Cardoso ([email protected])

Search for more papers by this author
Ana A. Martins

Ana A. Martins

CIMA-Research Centre for Mathematics and Applications & CIMOSM–Centro de Investigação Em Modelação e Optimização de Sistemas Multifuncionais, Instituto Superior de Engenharia de Lisboa-ISEL, Instituto Politécnico de Lisboa, Lisbon, Portugal

Search for more papers by this author
First published: 24 June 2025

Funding: This research was supported by Fundação para a Ciência e a Tecnologia, grant UIDB/00315/2020 (DOI: 10.54499/UIDB/00315/2020). It was also supported by Instituto Politécnico Lisboa (IPL) with reference IPL/IDI&CA2023/ELForcast2_ISEL and Fundação para a Ciência e a Tecnologia, Portugal, through the project UID/MAT/04674/2013, CIMA and ISEL.

ABSTRACT

The performance of distance measures between time series has been discussed in diverse studies. Most identified performance as the accuracy resulting from the use of a specific distance in 1-Nearest Neighbour. Few studies have addressed the related computation time, and no systematic analyses of the associations between the distances' performance (1-NN-based accuracy and computation time) and the time series' characteristics have been presented yet. We propose to fill this research gap by analysing these relationships considering the following features: the training and test sets' dimensions, the time series' length, the number of classes, and the classes' separability as measured by the Average Silhouette index. This last characteristic was not mentioned in previous studies. A methodological approach is devised to compare nine distance measures, including three recently proposed combined distances (COMB and two variants). We resort to a stepwise method for multiple comparisons and deal with the experiment-wise error rate to obtain homogeneous groups of distances with indistinct performances. The CART algorithm is used to explore the relationships between accuracy values corresponding to each distance measure under study (target) and the time series characteristics (predictors). Our analyses are based on datasets from the UCR time series classification archive. We concluded that the combined distance (COMB), dynamic time warping distance (DTW), and complexity invariance distance (CID) are consistently included in the subset of best-performing distances in all experimental scenarios. The latter (CID) has a significantly lower computational cost. We determined that the classes' separability is the time series' attribute most associated with the distances' performance.

Data Availability Statement

Yes. The data sets were drawn from the University of California Riverside (UCR) Time Series Classification Archive datasets.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.