The Performance of Distances Between Time Series: An In-Depth Comparison
Funding: This research was supported by Fundação para a Ciência e a Tecnologia, grant UIDB/00315/2020 (DOI: 10.54499/UIDB/00315/2020). It was also supported by Instituto Politécnico Lisboa (IPL) with reference IPL/IDI&CA2023/ELForcast2_ISEL and Fundação para a Ciência e a Tecnologia, Portugal, through the project UID/MAT/04674/2013, CIMA and ISEL.
ABSTRACT
The performance of distance measures between time series has been discussed in diverse studies. Most identified performance as the accuracy resulting from the use of a specific distance in 1-Nearest Neighbour. Few studies have addressed the related computation time, and no systematic analyses of the associations between the distances' performance (1-NN-based accuracy and computation time) and the time series' characteristics have been presented yet. We propose to fill this research gap by analysing these relationships considering the following features: the training and test sets' dimensions, the time series' length, the number of classes, and the classes' separability as measured by the Average Silhouette index. This last characteristic was not mentioned in previous studies. A methodological approach is devised to compare nine distance measures, including three recently proposed combined distances (COMB and two variants). We resort to a stepwise method for multiple comparisons and deal with the experiment-wise error rate to obtain homogeneous groups of distances with indistinct performances. The CART algorithm is used to explore the relationships between accuracy values corresponding to each distance measure under study (target) and the time series characteristics (predictors). Our analyses are based on datasets from the UCR time series classification archive. We concluded that the combined distance (COMB), dynamic time warping distance (DTW), and complexity invariance distance (CID) are consistently included in the subset of best-performing distances in all experimental scenarios. The latter (CID) has a significantly lower computational cost. We determined that the classes' separability is the time series' attribute most associated with the distances' performance.
Open Research
Data Availability Statement
Yes. The data sets were drawn from the University of California Riverside (UCR) Time Series Classification Archive datasets.