Volume 2025, Issue 1 2239983

Research Article

Open Access

Real-Time Traffic Conflict Prediction at Intersections: A Novel Approach Integrating Statistical Models and Machine Learning

Chuanyun Fu

orcid.org/0000-0002-7310-210X

School of Transportation Science and Engineering , Harbin Institute of Technology , Harbin , 150090 , Heilongjiang, China , hit.edu.cn

Search for more papers by this author

Jiaming Liu,

Jiaming Liu

orcid.org/0009-0002-2755-054X

School of Transportation Science and Engineering , Harbin Institute of Technology , Harbin , 150090 , Heilongjiang, China , hit.edu.cn

Search for more papers by this author

Huahua Liu,

Huahua Liu

orcid.org/0009-0005-9429-7494

School of Transportation Science and Engineering , Harbin Institute of Technology , Harbin , 150090 , Heilongjiang, China , hit.edu.cn

Search for more papers by this author

Xiaoli Wang,

Xiaoli Wang

Jiaozhou Transportation Bureau , Jiaozhou , 266300 , Shandong, China

Search for more papers by this author

Zhaoyou Lu,

Zhaoyou Lu

orcid.org/0000-0001-6389-5583

School of Transportation Science and Engineering , Harbin Institute of Technology , Harbin , 150090 , Heilongjiang, China , hit.edu.cn

Search for more papers by this author

Jushang Ou,

Jushang Ou

orcid.org/0009-0008-1586-5310

Department of Road Traffic Management , Sichuan Police College , Luzhou , 646000 , Sichuan, China , scpolicec.com

Intelligent Policing Key Laboratory of Sichuan Province , Luzhou , 646000 , Sichuan, China

Search for more papers by this author

Wei Bai,

Corresponding Author

Wei Bai

[email protected]

orcid.org/0009-0006-1596-703X

Department of Road Traffic Management , Sichuan Police College , Luzhou , 646000 , Sichuan, China , scpolicec.com

Intelligent Policing Key Laboratory of Sichuan Province , Luzhou , 646000 , Sichuan, China

Search for more papers by this author

Chuanyun Fu,

Chuanyun Fu

orcid.org/0000-0002-7310-210X

School of Transportation Science and Engineering , Harbin Institute of Technology , Harbin , 150090 , Heilongjiang, China , hit.edu.cn

Search for more papers by this author

Jiaming Liu,

Jiaming Liu

orcid.org/0009-0002-2755-054X

School of Transportation Science and Engineering , Harbin Institute of Technology , Harbin , 150090 , Heilongjiang, China , hit.edu.cn

Search for more papers by this author

Huahua Liu,

Huahua Liu

orcid.org/0009-0005-9429-7494

School of Transportation Science and Engineering , Harbin Institute of Technology , Harbin , 150090 , Heilongjiang, China , hit.edu.cn

Search for more papers by this author

Xiaoli Wang,

Xiaoli Wang

Jiaozhou Transportation Bureau , Jiaozhou , 266300 , Shandong, China

Search for more papers by this author

Zhaoyou Lu,

Zhaoyou Lu

orcid.org/0000-0001-6389-5583

School of Transportation Science and Engineering , Harbin Institute of Technology , Harbin , 150090 , Heilongjiang, China , hit.edu.cn

Search for more papers by this author

Jushang Ou,

Jushang Ou

orcid.org/0009-0008-1586-5310

Department of Road Traffic Management , Sichuan Police College , Luzhou , 646000 , Sichuan, China , scpolicec.com

Intelligent Policing Key Laboratory of Sichuan Province , Luzhou , 646000 , Sichuan, China

Search for more papers by this author

Wei Bai,

Corresponding Author

Wei Bai

[email protected]

orcid.org/0009-0006-1596-703X

Department of Road Traffic Management , Sichuan Police College , Luzhou , 646000 , Sichuan, China , scpolicec.com

Intelligent Policing Key Laboratory of Sichuan Province , Luzhou , 646000 , Sichuan, China

Search for more papers by this author

First published: 04 June 2025

https://doi.org/10.1155/atr/2239983

Academic Editor: Kun Wang

Share a link

Email
Wechat
Bluesky

Abstract

Real-time traffic conflict prediction is crucial for developing proactive safety management strategies and improving overall traffic safety. However, existing studies have failed to fully consider the entire process of traffic conflict generation at both signalized and unsignalized intersections. Given this, this study proposes a real-time three-stage approach integrating statistical and machine learning models developed from three perspectives to reveal the influencing factors, occurrence identification, and quantity prediction of traffic conflicts. The results show that the proposed approach can effectively predict traffic conflicts at signalized and nonsignalized intersections. The findings of this study provide new ideas for proactive safety management in urban road networks.

1. Introduction

Real-time crash risk prediction is a crucial prerequisite for proactive safety in the field of traffic management. Traditional traffic safety analysis has predominantly relied on historical crash data. However, this approach presents several inherent limitations. First, to ensure the reliability of statistical evaluations based on crash data, it is essential to collect data over extended periods. However, it can be resource-intensive and time-consuming [1–3]. Moreover, crash data often suffer from various issues related to both availability and quality, including underreporting, small sample sizes, relative scarcity, randomness, excessive dispersion, overdispersion, an overabundance of zero observations, unobserved heterogeneity, and temporal and spatial correlations. These factors contribute to the potential inaccuracy and bias of the data [4–6]. Furthermore, the issue of data imbalance remains unresolved and directly affects the reliability of traffic safety assessments derived from crash data [7]. Given these limitations, there has been a growing shift toward the use of traffic conflict data in safety evaluation. Unlike crashes, traffic conflicts are more frequent and can be directly observed. While they share the same underlying failure mechanisms as crashes, traffic conflicts do not result in tangible crash outcomes. Traffic conflict data, compared with crash data, offer distinct advantages such as shorter time periods and easier accessibility, making it a more viable alternative. In recent years, the use of traffic conflict data has garnered increasing attention from scholars in the field of traffic safety due to its potential for more accurate and timely safety assessments [8, 9].

Typically, prediction models based on traffic conflicts primarily include statistical regression models and extreme value theory (EVT) [10–14]. Commonly used statistical regression models include linear discriminant analysis (LDA), generalized linear regression models, Bayesian Tobit models, and binary logistic regression models. For instance, Xu et al. [15] developed a Bayesian random parameter logistic regression model to assess the impact of traffic variables on conflicts under different service levels on highways. Caleffi et al. [16] applied the LDA method to establish a model between real-time traffic states and the probability of traffic conflicts, with the established model showing a satisfactory classification rate. Essa and Sayed [17] used traffic video data from six signalized intersections and employed a fully Bayesian approach to develop a conflict-based safety performance function (SPF) for signalized intersections, demonstrating that the developed SPF exhibited good fit, with all explanatory variables showing statistical significance. In recent years, EVT has been widely applied in real-time crash risk prediction based on conflicts. For example, Wang et al. [18] proposed a safety evaluation method for signalized intersections that integrates microtraffic simulation with EVT, introducing three calibration strategies—basic, semicalibration, and full calibration—to develop the simulation model. The results indicated that the EVT based on the full calibration strategy was a better choice for simulation-based safety assessment. Wang et al. [19] proposed a bivariate EVT framework-based conflict prediction method, finding that the bivariate EVT model more accurately predicts rear-end and side-swipe conflicts. Fu and Sayed [20] introduced a dynamic Bayesian hierarchical peak threshold modeling approach to estimate real-time crash risk based on traffic conflict. Zheng and Sayed [21] applied the EVT method for real-time safety analysis at signalized intersections, establishing a Bayesian hierarchical extreme value model based on traffic conflicts and dynamic traffic parameters as covariates. The performance of the developed EVT model was validated by comparing the cumulative crash estimates with the observed crashes. Fu and Sayed [22] introduced a random-parameter Bayesian hierarchical extreme value model with heterogeneity in means and variances (RPBHEV-HMVs), with results indicating that the RPBHEV-HMV model outperforms existing RPBHEV models in terms of goodness of fit, explanatory power, and crash estimation accuracy and precision. Fu and Sayed [23] constructed a Bayesian dynamic conflict extreme value model based on conflicts, allowing the model parameters to change over time, thus improving the real-time prediction accuracy and portability of crash risk based on conflict extreme value models.

Regarding real-time traffic conflict prediction, Katrakazas et al. [24] used traffic data from highway sections in the UK and traffic simulations to study four different time intervals of traffic data. Their results confirmed the feasibility of using microscopic traffic simulation and the SSAM model for real-time conflict prediction. Fu et al. [25] proposed a dynamic safety warning distance (SWD) and explored its application under adverse weather conditions. The results demonstrate that SWD, as a novel safety warning distance, can more effectively identify both longitudinal and lateral conflicts in mixed traffic flow, particularly under harsh weather conditions. Formosa et al. [26] proposed a real-time traffic conflict prediction model based on deep learning. They developed a deep neural network (DNN) model to predict conflicts in real-time, and their results showed that the best DNN model achieved an accuracy rate of 94%. Meanwhile, Hu et al. [27] proposed a real-time traffic safety assessment method using high-resolution trajectory data, which combined traffic states and conflicts. They used the HighD trajectory dataset from Germany, with data collection intervals of 1 min and 30 s, and employed time-to-collision (TTC) as the conflict indicator. By applying machine learning algorithms for predictive modeling, their results showed that the random forest (RF) model, using resampling techniques, achieved the best performance. Yuan et al. [28], considering data heterogeneity, used the HighD trajectory data to explore the relationship between traffic flow characteristics and conflicts. Analyzing trajectory data at 30-s intervals, they found that the Extreme Gradient Boosting (XGBoost) model, trained on an undersampled dataset, performed the best. In addition, Fu and Sayed [23] introduced a real-time safety analysis method for traffic conflicts based on a Bayesian dynamic extreme value model. This method combined machine learning techniques with the EVT framework and was applied to real-time safety analysis at the cycle level for signalized intersections. Islam and Abdel-Aty [29] developed a long short-term memory (LSTM) model, which used the trajectory, speed, acceleration, and heading of individual connected vehicles as inputs to predict whether a conflict would occur in the short term. Their results showed that this model achieved a conflict prediction accuracy of 72%.

In summary, scholars have made considerable progress in the areas of traffic conflict prediction models and real-time traffic conflict prediction, with a relatively mature research framework. However, there remains room for further development in the fields of traffic conflict influencing factors and the construction of traffic conflict prediction models. Existing studies on traffic conflicts tend to focus on one aspect, rarely considering both the factors influencing the occurrence of conflicts and the frequency of conflicts. In particular, in the field of traffic conflict prediction models, due to issues such as high computational costs, difficulty in data acquisition, and data imbalance, limited attention has been given to short-term conflict prediction, especially the prediction of short-term conflict frequencies.

To address these issues, this paper explores the relationship between traffic states and traffic conflicts (including the occurrence and frequency of conflicts) based on the high-resolution trajectory dataset pNEUMA from Greece. It proposes a real-time intersection traffic conflict prediction method that integrates statistical models with machine learning techniques. This research aims to broaden the scope of real-time traffic safety assessment and provide more accurate prediction tools for traffic safety management.

2. Relevant Data Extraction

2.1. Data Acquisition

This study utilizes the pNEUMA high-resolution trajectory dataset from Greece [30]. The pNEUMA dataset was collected in October 2018 by researchers using 10 drones in the city center of Athens. The dataset records data during the morning peak hours (8:00–10:30 AM) over four working days within 1 week. The study area spans 1.3 square kilometers, with over 100 km of roadways and approximately 100 busy intersections. The data cover a 10-h period and include over 500,000 detailed trajectories of nearly all vehicles within the study area.

Given that the focus of this study is on conflicts between motor vehicles at intersections and recognizing that motorcycle trajectory data often deviate from the road network, it was necessary to remove motorcycle-related data from the dataset. After excluding motorcycle data, intersection-related data were extracted. Based on the study area, 102 intersections were identified, of which 60 were signalized intersections and 42 were unsignalized intersections. The study area and the intersection distribution are shown in Figure 1.

Details are in the caption following the image — **Figure 1 (a)**
Open in figure viewer PowerPoint

Study area and intersection distribution. (a) Study area. (b) Intersection distribution.

2.2. Extraction of Conflict Indicators

This paper uses TTC and postencroachment time (PET) as the discrimination indicators for conflicts between motor vehicles at intersections. The specific calculation methods are as follows:

()

where Δl is the distance between the preceding and following vehicles; L represents the length of the preceding vehicle; v_sub is the speed of the following vehicles; and v_pre is the speed of the preceding vehicle.

()

where t_sub represents the time when the following vehicle reaches the conflict point; t_pre represents the time when the preceding vehicle reaches the conflict point; Δl is the distance between the preceding and following vehicles; and v_sub represents the speed of the following vehicle.

For the conflict indicators TTC and PET, thresholds must be established to determine whether a conflict occurs and the severity of the conflict. Based on the relevant literature and the content of this study, a threshold of 1–5 s is comprehensively considered to reflect the severity of the conflict. Within a specific time interval of 1 min, conflicts can be categorized into two types: binary safety conditions (denoted as Bi_conflict, abbreviated as z) and conflict frequency (Num_conflict, abbreviated as n). Herein, Bi_conflict is derived from Num_conflict such that if the conflict frequency n is greater than 0, then z = 1; if the conflict frequency n equals 0, then z = 0.

From the dataset processed in the previous section, conflict data were extracted, totaling 54,244 instances of conflict data, of which 31,816 instances were from signalized intersections and 20,428 instances were from unsignalized intersections. The traffic conflict data are presented in Tables 1 and 2, where “Mean” denotes the average value and “SD” represents the standard deviation.

Table 1. Descriptive statistics of conflict data at signalized intersections.

Conflict indicators	Definition^∗	Num_conflict		Bi_conflict
Conflict indicators	Definition^∗	Mean	SD	z = 0
TTC1	TTC ≤ 1s	0.1674	0.7354	29,980 (94.23%)
TTC2	TTC ≤ 2s	0.3617	0.8623	27,852 (87.54%)
TTC3	TTC ≤ 3s	0.9870	1.2316	22,093 (69.44%)
TTC4	TTC ≤ 4s	1.7059	1.6002	15,278 (48.02%)
TTC5	TTC ≤ 5s	2.3614	2.1284	11,282 (35.46%)
PET1	PET ≤ 1s	0.1805	0.7203	29,837 (93.78%)
PET2	PET ≤ 2s	0.7709	0.9611	23,366 (73.44%)
PET3	PET ≤ 3s	1.2642	1.3521	18,501 (58.15%)
PET4	PET ≤ 4s	1.6212	1.5223	15,758 (49.53%)
PET5	PET ≤ 5s	2.0754	1.7758	14,874 (46.75%)

^∗Represents the number of conflicts within 1-min intervals under a certain threshold of conflict indicators.

Table 2. Descriptive statistics of conflict data at unsignalized intersections.

Conflict indicators	Definition^∗	Num_conflict		Bi_conflict
Conflict indicators	Definition^∗	Mean	SD	z = 0
TTC1	TTC ≤ 1 s	0.1846	0.8324	19,002 (93.02%)
TTC2	TTC ≤ 2 s	0.3679	0.7357	17,774 (87.01%)
TTC3	TTC ≤ 3 s	1.0655	1.1087	13,388 (65.54%)
TTC4	TTC ≤ 4 s	2.0059	1.6942	9170 (44.89%)
TTC5	TTC ≤ 5 s	2.2068	1.9387	7730 (37.84%)
PET1	PET ≤ 1 s	0.1974	0.8342	19,031 (93.16%)
PET2	PET ≤ 2 s	0.9634	0.8977	14,269 (69.85%)
PET3	PET ≤ 3 s	1.1231	1.2143	11,123 (54.45%)
PET4	PET ≤ 4 s	1.9798	1.5753	10,267 (50.26%)
PET5	PET ≤ 5 s	2.2769	1.5532	9693 (47.45%)

^∗Represents the number of conflicts within 1-min intervals under a certain threshold of conflict indicators.

2.3. Extraction of Traffic State Variables

In alignment with the research objectives of this paper and the characteristics of the selected dataset, three types of traffic state variables were chosen: indicators based on spatiotemporal trajectories (Type 1), first-order indicators (Type 2), and second-order indicators (Type 3). Drawing from the generalized definition of traffic variables by scholar Edie, for any spatiotemporal region S, the Type 1 indicators can be derived as follows:

()

where K is the density, veh/km; Q is the flow of traffic, veh/h; V is the speed of traffic, km/h; Δx represents the size of the intersection, m; Δt is the time interval, s; n is the number of vehicles, veh; t_i is the travel time of the vehicle, s; and d_i is the travel distance of vehicle, m.

Furthermore, assuming a very small time interval, the spatiotemporal region can be divided into an infinite number of small subregions such that

()

where m represents the number of subregions.

Consequently, the indicators for the other two categories can be obtained, which are the first-order indicators (Type 2) and the second-order indicators (Type 3):

()

where x_i represents a specific index value within the i − th subregion; Me stands for the mean value of x_i; Sd is the standard deviation of x_i; x_i,j represents the index value of the j − th vehicle within the i − th subregion; n_i refers to the number of vehicles in the i − th subregion; Me_i represents the average value of a particular index within the i − th subregion; and

represents the mean value of Me_i

The descriptions of the three types of traffic state variables are presented in Table 3.

Table 3. Definitions and explanations of traffic state variables.

Type	Variable symbol	Variable explanation	Unit
Type 1	Q	Volume	veh/h
	K	Density	veh/km
	V	Speed	km/h

Type 2	Hm	Headway mean	s
	Hs	Headway standard deviation	s
	Nm	Vehicle count mean	—
	Ns	Vehicle count standard deviation	—

Type 3	Vmm	Mean of average vehicle speed	km/h
	Vsm	Mean of vehicle speed standard deviation	km/h
	Vms	Standard deviation of average vehicle speed	km/h
	Amm	Mean of average vehicle acceleration	m/s²

3. Models and Methods

3.1. Binary Logistic Model

Due to the inherent lack of interpretability in machine learning classifiers, it is essential to prefilter the factors influencing the occurrence of traffic conflicts. Therefore, this study employs a binary logistic regression model to conduct a preliminary screening and identify statistically significant factors associated with conflict occurrence. By incorporating logistic regression, the analysis can effectively quantify the influence of each factor while enhancing the transparency and interpretability of the overall modeling framework. The specific expression of the model is presented as follows:

()

where p represents the probability of conflicts occurring in the sample data; x_i stands for the selected traffic state variable; σ_i represents the regression coefficient of the variable x_i; and σ₀ refers to the constant.

In addition, to quantify the correlation between conflicts and the selected traffic state variables, the odds ratio (OR) is introduced for quantitative analysis. The OR represents the OR, which is the increase in f(x) for each unit increase in x.

()

In the binary logistic regression model, OR serves as a metric to measure the degree of influence of a particular independent variable. The range of OR values is from 0 to infinity, with specific interpretations as follows: when OR = 1, it indicates that the independent variable has no effect on the occurrence of the dependent variable, implying that the two are unrelated; when OR > 1, it suggests that the independent variable is a risk factor, indicating a positive correlation between the two; and when OR < 1, it indicates that the independent variable is a protective factor, implying a negative correlation between the two.

3.2. Machine Learning Algorithms

3.2.1. Algorithm Selection

In the field of machine learning, training samples can be categorized into two types: supervised learning and unsupervised learning. Supervised learning relies on labeled data, where each sample has a known label, and the model is trained using these labeled samples before being employed to generate predictions. When the model’s output variable is continuous, the task is considered a regression problem; conversely, if the output variable is discrete, it is regarded as a classification problem. In contrast, unsupervised learning deals with unlabeled samples, which precludes the use of labels in model training. Instead, unsupervised learning analyzes the hidden structure within the data to uncover its underlying patterns and characteristics. Based on these definitions, the real-time prediction of intersection conflicts studied in this paper falls under the binary classification category within supervised learning. To address this binary classification problem, the study employs four different machine learning algorithms—support vector machine (SVM), K-nearest neighbors (KNNs), RF, and XGBoost.

SVMs are a robust classification algorithm. For real-time prediction of traffic conflicts, SVMs can effectively handle high-dimensional data. Currently, SVMs have been applied to the real-time prediction of crashes and traffic flow. For instance, Li et al. [31] compared SVMs with the negative binomial model in predicting highway crashes, and the results indicated that the SVM model exhibited a better fit.

The KNN algorithm is a simple, easily understood, and implementable method that performs exceptionally well in classification problems and is suitable for real-time prediction. Recently, KNN has been applied in the field of traffic prediction. For example, Lin et al. [32] conducted conflict prediction with time intervals of 5 and 10 min, and the results demonstrated that KNN is effective in predicting conflicts.

RF is an ensemble learning method based on Bagging. Data for real-time prediction of traffic conflicts often contain complex nonlinear relationships, which RF algorithms can effectively handle. For instance, Hu et al. [27] proposed a real-time traffic safety assessment method that combines traffic state and conflict based on high-resolution trajectory data. The RF prediction model achieved optimal performance using resampling techniques.

The XGBoost algorithm is an ensemble learning method based on Boosting, characterized by its high accuracy and robust performance on large-scale datasets. For real-time prediction of traffic conflicts, the XGBoost algorithm is capable of processing large-scale and diverse feature data, providing predictions with high precision. For example, Yuan et al. [28] explored the relationship between conflicts and traffic flow characteristics under the premise of considering heterogeneity, and the results indicated that XGBoost trained on the undersampled dataset was the optimal model.

The advantages and disadvantages of the four algorithms are presented in Table 4.

Table 4. Advantages and disadvantages of four machine learning algorithms.

Algorithm	Advantage	Disadvantage
SVM	Possessing strong nonlinear processing capabilities and being able to effectively handle high-dimensional data	Not suitable for solving multiclassification problems and training large-scale datasets and sensitive to missing data
KNN	The algorithm is simple and easy to implement, making it suitable for large-scale data	Computationally intensive, the presence of noise data affects prediction accuracy, and the prediction speed is slow
RF	The training speed is fast, overfitting is minimal, it can handle both categorical and continuous prediction variables, and the model variance is low	Not suitable for multiclass classification problems and sensitive to noise
XGBoost	Simple to use, fast in execution, effective in performance, and capable of avoiding overfitting	High memory and time consumption and are not suitable for processing data with extremely high-dimensional features

3.2.2. Data Preprocessing

Within the dataset, conflict data are imbalanced, meaning that the number of samples without conflicts significantly exceeds the number of samples with conflicts, particularly when the threshold is low. When conflict data are imbalanced, the classifier’s predictions may be biased toward the class with a larger number of samples, leading to erroneous prediction outcomes. Before conflict prediction, the Borderline SMOTE algorithm is employed to resample the data. Table 5 delineates the sample counts in the dataset both before and following the implementation of oversampling.

Table 5. The sample size after the dataset is oversampled.

Intersection type	Conflict indicators	Before		After
Intersection type	Conflict indicators	z = 0	z = 1	z = 0	z = 1
Signalized intersections	TTC1	29,980	1836	29,980	29,980
	TTC2	27,852	3964	27,852	27,852
	TTC3	22,093	9723	22,093	22,093
	TTC4	15,278	16,538	16,538	16,538
	TTC5	11,282	20,534	20,534	20,534
	PET1	29,837	1979	29,837	29,837
	PET2	23,366	8450	23,366	23,366
	PET3	18,501	13,315	18,501	18,501
	PET4	15,758	16,058	16,058	16,058
	PET5	14,874	16,942	16,942	16,942

Unsignalized intersections	TTC1	19,002	1426	19,002	19,002
	TTC2	17,774	2654	17,774	17,774
	TTC3	13,388	7040	13,388	13,388
	TTC4	9170	11,258	11,258	11,258
	TTC5	7730	12,698	12,698	12,698
	PET1	19,031	1397	19,031	19,031
	PET2	14,269	6159	14,269	14,269
	PET3	11,123	9305	11,123	11,123
	PET4	10,267	10,161	10,267	10,267
	PET5	9693	10,735	10,735	10,735

3.2.3. Model Training and Hyperparameter Optimization

To mitigate model overfitting, enhance generalization performance on unseen data, and reduce the impact of random factors on the prediction results, this study employed a grid search combined with 5-fold cross-validation to optimize the hyperparameters of the four machine learning models. With 5-fold, the data are divided into five equal parts. Four of them are in turn for training, while the remaining part is used for testing. The average of the five test results is taken as the final hyperparameters of the model, which are shown in Table 6.

Table 6. The value of hyperparameters.

Model	Hyperparameter	Value
SVM	C	5
	Kernel	RBF
	Gamma	0.005

KNN	n_neighbors	7
KNN	Metric	Euclidean distance

RF	n_estimators	150
	max_depth	12
	min_samples_split	5
	min_samples_leaf	3

XGboost	learning_rate	0.07
	max_depth	6
	Subsample	0.8
	colsample_bytree	0.7
	Gamma	0.2
	Lambda	1
	Alpha	0.5

3.2.4. Model Evaluation

In evaluating the real-time prediction model of intersection conflict occurrence based on machine learning, it is essential to select appropriate evaluation methods and metrics to obtain effective and reliable results. Before evaluation, the confusion matrix must be computed, with its explanation provided in Table 7.

Table 7. Confusion matrix.

		Actual
		Positive example	Counter example	Total
Prediction	Positive example (z = 1)	TP	FP	TP + FP
	Counter example (z = 0)	FN	TN	FN + TN
	Total	TP + FN	FP + TN	TP + FN + FP + TN

Here, TP denotes the correct prediction of conflict occurrence z = 1, TN represents the correct prediction of safe conditions z = 0, FP indicates samples that did not experience a conflict but were incorrectly predicted as having a conflict, and FN refers to conflicting samples that were incorrectly predicted as being free of conflict.

Through the confusion matrix, three model evaluation metrics required in this paper can be calculated as follows:

(1)
Accuracy: the percentage of correctly predicted samples out of the total test samples, with the specific calculation formula as follows:
()
(2)
False alarm rate (FAR): the percentage of incorrect predictions among the positive instances, with the specific calculation formula as follows:
()
(3)
Missed alarm rate (MAR): the percentage of incorrect predictions among the negative instances, with the calculation formula as follows:
()
Here, a higher Accuracy and lower FAR and MAR indicate superior performance of the intersection conflict prediction model.

3.3. Bayesian Spatial Poisson Model

The Poisson model is a commonly used model for analyzing traffic-related data. The Poisson distribution assumes that the number of occurrences of an event within a certain time or spatial range is random, but the average occurrence rate is known, and the occurrences are independent of previous events. The Poisson model is simple, easy to understand, and widely applicable. Models based on the Poisson distribution have been extensively used in accident frequency analysis, as the Poisson model effectively captures the randomness and discreteness of traffic accidents. The Poisson-lognormal model introduces a residual random term to address data dispersion and heteroscedasticity, making it suitable for cases where the Poisson distribution alone is inadequate due to data overdispersion.

Spatial correlation is extensively present among neighboring intersections, thereby affecting the accuracy of conflict frequency prediction at intersections. This paper introduces the conditional autoregressive (CAR) model into the Bayesian spatial-Poisson log model to predict the frequency of conflicts occurring at intersections. Let Y_i be the frequency of conflicts at the i_th intersection, and the structure of the model is as follows:

()

where λ represents the intensity parameter, which reflects the expected number of conflicts within the intersection; X_i is the vector of explanatory variables; β₀ is the coefficient to be estimated; and β represents the coefficient to be estimated.

Compared with the maximum likelihood estimation method, which only uses the collected sample data for parameter estimation without considering other information about the parameters, Bayesian estimation emphasizes the consideration of uncertainty, allowing for a more comprehensive and flexible inference of models and parameters. In essence, the core idea of Bayesian estimation is to use both sample information and prior information to solve for the posterior information of the parameters. The fundamental Bayesian formula is presented as follows:

()

where f(x|θ) is the likelihood function; π(θ) is the prior distribution of the parameter; and π(θ|x) is the posterior distribution of the parameter.

In Bayesian estimation, noninformative prior distributions are assigned to β and θ_i.

()

The spatial correlation term ϕ_i is prespecified using a CAR model.

()

When intersections i and j are adjacent, the value of ω_ij is 1; when intersections i and j are not adjacent, the value of ω_ij is 0.

To ensure the convergence of the model, this paper employs the prior distribution form established.

()

where τ_θ represents the overdispersion characteristic caused by the spatial effect and τ_c represents the overdispersion characteristic caused by the random effect.

4. Results and Discussion

4.1. Analysis of Factors Influencing Traffic Conflicts

Given the diverse nature of the selected variables, there may exist strong correlations among them, leading to multicollinearity. This can adversely affect the accuracy of the constructed model, thereby impacting the analysis of factors influencing traffic conflicts. Consequently, before modeling the factors affecting traffic conflicts, a multicollinearity test is conducted using the Pearson correlation coefficient method. A correlation coefficient heatmap for the 13 variables is depicted in Figure 2.

Analysis of the heatmap reveals that the correlation coefficients between flow Q and density K, flow Q and velocity V, the average value of vehicle average speed Vmm and velocity V, the standard deviation of headway Hs and the average headway Hm, as well as density K and the average number of vehicles Nm, are depicted in deep red within the heatmap, indicating correlation coefficients greater than 0.7 and thus exhibiting strong correlations. To enhance the accuracy of the model and to better analyze the factors influencing conflict occurrence, it has been decided to exclude the four traffic state variables of flow Q, the average value of vehicle average speed Vmm, the standard deviation of headway Hs, and the average number of vehicles Nm. Apart from these five sets of variables, the correlation coefficients among the remaining traffic state variables are all less than 0.5, with the majority being below 0.1, suggesting that the other nine traffic state variables do not exhibit multicollinearity issues and can be selected as independent variables in subsequent modeling analyses.

A binary logistic regression model was constructed to analyze the significant factors affecting the occurrence of conflicts, with the results presented in Table 8.

Table 8. Results of the binary logistic regression model.

Model		K	V	Hm	Ns	Vsm	Vms	Amm	Asm	Ams
TTC1	Coefficient	0.369¹	−0.373¹	0.001	0.038	−0.053	0.105	0.429²	0.265	0.046
	p	0.001	0.001	0.876	0.463	0.621	0.265	0.005	0.071	0.604
	OR	1.481	0.663	1.001	1.043	0.905	1.114	1.536	1.279	1.054

TTC2	Coefficient	0.419¹	−0.351¹	−0.001	0.059	−0.039	0.182²	0.314³	0.323	0.098
	p	0.001	0.001	0.756	0.221	0.544	0.005	0.021	0.025	0.074
	OR	1.543	0.711	0.998	1.063	0.924	1.183	1.428	1.424	1.105

TTC3	Coefficient	0.503¹	−0.339¹	−0.002	0.093³	0.137²	0.299¹	0.421¹	0.158	0.064
	p	0.001	0.001	0.325	0.023	0.003	0.001	0.001	0.231	0.118
	OR	1.713	0.736	0.996	1.098	1.136	1.324	1.554	1.201	1.097

TTC4	Coefficient	0.635¹	−0.254¹	−0.004	0.169¹	0.254¹	0.394¹	0.453¹	0.124	0.073
	p	0.001	0.001	0.374	0.001	0.001	0.001	0.001	0.153	0.068
	OR	1.916	0.779	0.994	1.222	1.268	1.499	1.679	1.165	1.063

TTC5	Coefficient	0.707¹	−0.229¹	−0.008²	0.383¹	0.331¹	0.472¹	0.514¹	0.117	0.059
	p	0.001	0.001	0.009	0.001	0.001	0.001	0.001	0.324	0.089
	OR	2.132	0.793	0.992	1.492	1.451	1.698	1.733	1.154	1.063

PET1	Coefficient	0.461¹	0.183³	0.009	0.323¹	−0.113	−0.006	0.231	0.124	0.063
	p	0.001	0.032	0.425	0.001	0.164	0.868	0.352	0.164	0.565
	OR	1.684	1.185	1.008	1.424	0.913	0.993	1.232	1.178	1.072

PET2	Coefficient	0.498¹	−0.004	−0.002	0.203¹	0.063	0.247¹	0.342¹	0.102	−0.028
	p	0.001	0.894	0.265	0.001	0.143	0.001	0.001	0.254	0.533
	OR	1.702	0.994	0.996	1.197	1.072	1.259	1.471	1.109	0.968

PET3	Coefficient	0.523¹	−0.123¹	−0.008²	0.186¹	0.178¹	0.376¹	0.406¹	0.114	−0.047
	p	0.001	0.001	0.009	0.001	0.001	0.001	0.001	0.163	0.137
	OR	1.732	0.894	0.992	1.189	1.176	1.493	1.521	1.149	0.949

PET4	Coefficient	0.546¹	−0.164¹	−0.012¹	0.223¹	0.216¹	0.453¹	0.413¹	0.131	−0.034
	p	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.094	0.413
	OR	1.721	0.876	0.990	1.211	1.202	1.668	1.539	1.183	0.953

PET5	Coefficient	0.577¹	−0.183¹	−0.009¹	0.241¹	0.225¹	0.439¹	0.443¹	0.158	−0.026
	p	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.132	0.667
	OR	1.764	0.857	0.992	1.252	1.213	1.647	1.652	1.203	0.960

¹Indicates a significance level of α = 0.001.
²Indicates a significance level of α = 0.01, and.
³Indicates a significance level of α = 0.05.

Based on Table 8, it can be observed that the traffic state variables significantly associated with different conflict indicators and varying levels of conflict severity, that is, different thresholds, are distinct. When the threshold of a conflict indicator is higher, it tends to have more significant explanatory variables. Specifically, for TTC thresholds ranging from TTC1 (TTC threshold of 1 s) to TTC5 (TTC threshold of 5 s), the number of significant factors influencing conflict occurrence are 3, 4, 6, 6, and 7, respectively. Correspondingly, for PET thresholds ranging from PET1 (PET threshold of 1 s) to PET5 (PET threshold of 5 s), the number of significant factors influencing conflict occurrence are 3, 4, 7, 7, and 7, respectively.

From the results of the regression model, it is evident that regardless of whether the conflict indicator is TTC or PET, and irrespective of the threshold values of these two indicators, density K and velocity V are consistently significant factors influencing the occurrence of conflicts. The impact of other traffic state variables on conflict occurrence, however, varies depending on the severity of the conflict. This suggests that the severity of conflicts significantly affects the scale of the data, meaning that the more severe the conflict, the sparser the corresponding dataset. Due to the sparsity of the dataset, there is a certain challenge in identifying significant traffic variables that affect conflict occurrence, which in turn leads to differences in significant influencing factors when the threshold values of conflict indicators vary.

For TTC and PET with thresholds of 3 s, 4 s, and 5 s, their significant variables are largely consistent, showing statistically significant correlations with density K, velocity V, the standard deviation of vehicle count Ns, the average value of vehicle speed standard deviation Vsm, the standard deviation of vehicle average speed Vms, and the average value of vehicle average acceleration Amm. When density K, the standard deviation of vehicle count Ns, the average value of vehicle speed standard deviation Vsm, the standard deviation of vehicle average speed Vms, and the average value of vehicle average acceleration Amm serve as significant influencing factors, their coefficients are all positive whereas the impact of velocity V and the average headway Hm on conflict occurrence is negative, with the exception of velocity V in PET1. These results indicate that when the density of vehicles within an intersection is high, the average acceleration is large, the variability in vehicle count is significant, and the likelihood of conflicts occurring within the intersection will markedly increase. In addition, if the temporal Vms and spatial Vsm variability of vehicle speeds within the intersection increases, the risk of conflicts will also significantly rise. Conversely, it can be inferred that when vehicles travel at higher speeds, have larger headways between them, enjoy better traffic conditions within the intersection, and the traffic flow is stable, the driving environment will be safer with a lower probability of conflicts. However, this state implies that fewer vehicles are waiting at the traffic light during the current cycle, resulting in a sparse traffic flow that can pass freely without the need for queuing. From another perspective, density K, velocity V, and the average value of vehicle average acceleration Amm are the three most significant factors influencing conflicts (except Amm and V in PET1), as they exhibit statistical significance and are significantly correlated with conflict occurrence across the 10 models with different thresholds of TTC and PET compared with other traffic state variables.

4.2. Real-Time Prediction of Conflict Occurrence Based on Machine Learning

Four machine learning algorithms—SVM, KNN, RF, and XGBoost—are selected to construct real-time prediction models for conflict occurrence at signalized and unsignalized intersections, respectively. The model prediction results are presented in Tables 9 and 10.

Table 9. Comparison of conflict prediction results for signalized intersections using machine learning algorithms.

	Accuracy (%)				FAR (%)				MAR (%)
	SVM	KNN	RF	XGB	SVM	KNN	RF	XGB	SVM	KNN	RF	XGB
TTC1	84.2	77.3	91.8	84.2	38.5	40.3	0.0	8.9	16.8	16.8	5.5	15.6
TTC2	70.5	74.8	90.9	83.8	30.2	31.5	8.4	10.7	29.6	21.9	6.4	17.4
TTC3	69.7	72.1	89.3	82.5	31.3	28.3	7.8	11.3	29.9	23.8	6.8	18.9
TTC4	69.3	72.6	87.6	81.6	30.8	29.4	7.4	10.6	30.4	24.5	5.9	19.3
TTC5	68.9	70.5	86.4	78.7	32.1	24.8	7.7	12.5	30.2	28.4	6.6	20.8
TTC (mean)	72.5	73.5	89.2	82.1	32.6	30.9	6.3	10.8	27.4	23.1	6.3	18.4
PET1	77.3	75.4	89.5	82.4	46.2	42.8	24.1	23.6	22.9	20.1	8.3	18.3
PET2	65.8	68.2	83.7	79.7	44.7	36.5	22.8	25.3	33.4	27.3	13.8	25.1
PET3	66.4	66.9	81.3	75.3	44.9	35.9	24.3	26.8	31.8	30.4	17.2	25.9
PET4	64.9	64.2	79.9	72.7	43.8	34.6	27.1	25.9	33.7	33.8	18.5	28.8
PET5	64.7	63.7	78.6	72.2	42.6	31.2	26.8	28.6	33.1	34.3	20.0	30.4
PET (mean)	67.8	67.7	82.6	76.5	44.4	36.2	25.0	26.0	31.0	29.2	15.6	25.7

Table 10. Comparison of conflict prediction results for unsignalized intersections using machine learning algorithms.

	Accuracy (%)				FAR (%)				MAR (%)
	SVM	KNN	RF	XGB	SVM	KNN	RF	XGB	SVM	KNN	RF	XGB
TTC1	80.8	75.4	88.1	91.5	38.9	42.5	5.3	3.4	20.3	20.7	7.9	8.9
TTC2	66.4	70.6	86.5	89.4	31.8	33.6	8.8	8.6	30.8	25.2	10.3	9.7
TTC3	68.6	71.1	84.9	87.5	31.1	31.5	13.0	9.9	29.7	27.6	14.3	10.6
TTC4	67.9	68.5	82.7	86.1	32.3	29.8	11.9	10.1	31.2	30.3	16.9	13.5
TTC5	65.4	66.9	80.5	84.7	32.9	26.7	13.8	11.4	31.7	32.1	17.4	15.3
TTC (mean)	69.8	70.5	84.5	87.8	33.4	32.8	10.6	8.7	28.7	27.2	13.4	11.7
PET1	70.8	70.7	86.4	85.5	48.9	45.5	26.9	19.5	25.1	25.2	12.4	13.8
PET2	65.4	68.3	80.9	82.1	45.2	40.8	27.5	21.1	35.4	30.3	16.1	15.2
PET3	63.9	64.8	77.5	78.5	45.5	39.4	28.9	23.4	36.8	33.7	18.8	17.6
PET4	63.0	63.2	75.8	76.4	47.7	37.5	30.6	23.9	34.7	34.9	20.7	18.9
PET5	61.6	60.5	73.3	73.3	44.6	32.3	33.4	25.8	34.3	36.8	23.4	21.4
PET (mean)	64.9	65.5	78.8	79.2	46.4	39.1	29.5	22.7	33.3	32.2	18.3	17.4

From the tables, it can be observed that for the prediction of traffic conflict occurrence at both signalized and unsignalized intersections, the prediction models using the conflict indicator TTC generally outperform those using PET. This is because the average accuracy of TTC (72.5%–89.2% for signalized intersections and 69.8%–87.8% for unsignalized intersections) is higher than that of PET (67.7%–82.6% for signalized intersections and 64.9%–79.2% for unsignalized intersections), the average FAR of TTC (6.3%–32.6% for signalized intersections and 8.7%–33.4% for unsignalized intersections) is lower than that of PET (25.0%–44.4% for signalized intersections and 22.7%–46.4% for unsignalized intersections), and the average MAR of TTC (6.3%–27.4% for signalized intersections and 15.6%–31.0% for unsignalized intersections) is lower than that of PET (15.6%–31.0% for signalized intersections and 16.1%–46.4% for unsignalized intersections). Furthermore, overall, for the same machine learning algorithm and the same conflict indicator, the predictive performance of the model tends to decline as the threshold of the conflict indicator increases.

In terms of algorithm comparison, for the prediction of traffic conflicts at signalized intersections, regardless of the conflict indicator and its corresponding severity, RF demonstrates significantly superior performance compared to the other three algorithms. Its average accuracy for TTC and PET is 89.2% and 82.6%, with average FAR of 6.3% and 25.0%, and average MAR of 6.3% and 15.6%. For the prediction of traffic conflicts at unsignalized intersections, the XGBoost algorithm outperforms the other three algorithms, with average accuracies for TTC and PET of 87.8% and 79.2%, respectively, average FARs of 8.7% and 22.7%, and average MARs of 11.7% and 16.1%, respectively.

In addition, the receiver operating characteristic (ROC) curves and area under the curve (AUC) values corresponding to the prediction models for conflict occurrence at signalized and unsignalized intersections are depicted in Figures 3 and 4, respectively. From these figures, it can be observed that consistent with the model evaluation metrics, RF performs better than other algorithms for signalized intersections, with an AUC value ranging from 0.745 to 0.845, particularly when the threshold of the conflict indicator is higher. For unsignalized intersections, the XGBoost algorithm demonstrates the best performance, with an AUC value ranging from 0.730 to 0.822.

Based on the comprehensive evaluation of model metrics and the ROC curve diagrams, it is evident that the AUC values of the four machine learning algorithms selected in this study are all greater than 0.6, indicating satisfactory predictive performance. Specifically, for the prediction of traffic conflicts at signalized intersections, the RF algorithm is the most optimal whereas for the prediction of traffic conflicts at unsignalized intersections, the XGBoost algorithm is the most optimal. Consequently, these two algorithms are chosen for application in subsequent predictions of traffic conflict frequency.

4.3. Analysis of Factors Influencing Traffic Conflict Frequency

A Bayesian spatial Poisson model is constructed, and Bayesian parameter estimation is performed using the WinBUGS software. To ensure the convergence speed of the model and the accuracy of the results, this paper sets up two Markov Chain Monte Carlo (MCMC) chains for 50,000 iterations, discarding the first 20,000 unstable samples.

Table 11 presents the parameter estimation results of 10 Bayesian spatial Poisson models trained using the conflict dataset. As expected, the results of the spatial model are largely consistent with those of the binary logistic regression model, indicating that the significant factors influencing conflict frequency vary depending on different conflict indicators and different levels of severity for the same indicator. However, for the same indicator and severity level, the significant influencing factors identified by the binary logistic model and the Bayesian spatial Poisson model differ, suggesting that the significant factors affecting conflict occurrence and conflict frequency are not the same. This implies that before predicting the frequency of traffic conflicts at intersections, it is necessary to estimate the significant correlation between the explanatory variables and conflict frequency.

Table 11. Parameter estimation results of the Bayesian spatial Poisson model.

Model		TTC1/PET1		TTC2/PET2		TTC3/PET3		TTC4/PET4		TTC5/PET5
Model		Mean	Confidence interval	Mean	Confidence interval	Mean	Confidence interval	Mean	Confidence interval	Mean	Confidence interval
TTC	K	0.328^∗	(0.052, 0.751)	—	—	0.328	(0.032, 0.623)	0.355	(0.125, 0.582)	0.463	(0.278 0.667)
	V	—	—	—	—	—	—	−0.295	(-0.528, −0.063)	−0.397	(-0.613, −0.221)
	Hm	—	—	—	—	−0.143	(-0.288, −0.020)	−0.169	(-0.698, −0.229)	−0.156	(-0.251, −0.067)
	Vms	—	—	—	—	−0.549	(-0.866, −0.213)	−0.458	(-0.843, −0.165)	−0.481	(-0.678, −0.293)
	Amm	—	—	0.433	(-0.100, 1.221)	—	—	—	—	—	—
	Asm	−0.143	(-0.837, 0.106)	−0.289^∗	(-1.163, 0.020)	−0.600^∗	(-1.073, −0.145)	−0.512	(-0.828, −0.177)	−0.509	(-0.769, −0.208)
	Ams	—	—	—	—	0.456	(0.199, 0.726)	0.392	(0.193, 0.599)	0.344	(0.183, 0.522)
	α	0.211	(0.058, 0.482)	0.167	(0.045, 0.418)	0.259	(0.085, 0.508)	0.201	(0.073, 0.417)	0.212	(0.069, 0.500)
	sd(ϕ)	0.214	(0.019, 1.375)	0.082	(0.009, 0.327)	0.227	(0.013, 1.933)	0.066	(0.019, 0.231)	0.099	(0.012, 0.487)
	sd(θ)	0.473	(0.210, 1.906)	0.295	(0.203, 0.564)	0.378	(0.143, 2.123)	0.232	(0.175, 0.397)	0.301	(0.212, 0.725)

PET	K	—	—	0.046^∗	(-0.837, 0.106)	—	—	0.131^∗	(-0.078, 0.364)	0.246	(0.041, 0.499)
	V	—	—	—	—	—	—	−0.354	(-0.612, −0.117)	−0.422	(-0.653, −0.176)
	Hm	—	—	—	—	−0.075	(-0.210, 0.011)	−0.161	(-0.277, −0.040)	−0.162	(-0.278, −0.053)
	Vms	−0.462^∗	(-1.138, 0.089)	−0.177^∗	(-0.544, −0.061)	−0.349^∗	(-0.593, 0.115)	−0.322	(-0.545, −0.044)	−0.358	(-0.592, −0.033)
	Asm	—	—	—	—	—	—	−0.448	(-0.693, −0.157)	−0.455	(-0.788, −0.103)
	Ams	—	—	—	—	0.211	(0.025, 0.448)	0.258	(0.039, 0.455)	0.311	(0.111, 0.523)
	α	0.165	(0.045, 0.441)	0.281	(0.083, 0.499)	0.203	(0.080, 0.441)	0.199	(0.056, −0.408)	0.214	(0.065, 0.501)
	sd(ϕ)	0.093	(0.011, 0.459)	0.704	(0.022, 2.903)	0.066	(0.021, 0.266)	0.073	(0.014, 0.260)	0.092	(0.021, 0.412)
	sd(θ)	0.362	(0.238, 0.736)	1.277	(0.144, 4.719)	0.224	(0.163, 0.449)	0.231	(0.178, 0.411)	0.266	(0.187, 0.611)

Note: An asterisk (^∗) indicates credibility within a 90% confidence interval, while all others are credible within a 95% confidence interval.

From the table, it can be observed that the models with conflict indicators TTC4, TTC5, PET4, and PET5 share the same significant variables, and these models have the highest number of significant variables among the 10 models. In addition, the impact of all variables on conflict frequency is consistent, with density K, the standard deviation of vehicle average acceleration Ams, and the average value of vehicle average acceleration Amm being positively correlated with conflict frequency. In contrast, velocity V, the average headway Hm, the standard deviation of vehicle average speed Vms, and the mean of vehicle acceleration standard deviation Asm are negatively correlated with conflict frequency. An unexpected result is that, compared with the binary logistic model, the three acceleration-related variables in the spatial model—the average value of vehicle average acceleration Amm, the mean of vehicle acceleration standard deviation Asm, and the standard deviation of vehicle average acceleration Ams—have different effects on conflict frequency. The coefficients of Amm and Ams are positive, while the coefficient of Asm is negative. This suggests that if the temporal variability of vehicle acceleration within the intersection Ams is greater and the spatial variability of vehicle acceleration Asm is smaller, the frequency of conflicts occurring at the intersection will be higher.

Furthermore, regarding the impact of the residual term, α represents the ratio of the sum of spatial residuals to overdispersion residuals. The closer the value of α is to 0, the stronger the spatial correlation, indicating that the influence between conflicts at adjacent intersections is greater. Generally, a smaller value of α suggests a better model fit; otherwise, the spatial correlation within the model should be reconsidered. Based on the values of α from the parameter estimation results (for TTC models, α ranges from 0.167 to 0.259; for PET models, α ranges from 0.165 to 0.281), it can be concluded that the model fit is satisfactory, that is, considering the spatial correlation between adjacent intersections helps to predict the frequency of conflicts at intersections more accurately.

From the perspective of explanatory variables, it can be observed that unlike the factors influencing the occurrence of conflicts, the average value of vehicle speed standard deviation Vsm and the standard deviation of vehicle count Ns are not statistically significant in predicting conflict frequency across the 10 spatial models. This suggests that changes in vehicle speed and vehicle count over a certain time interval are not suitable for predicting the frequency of conflicts at intersections. In contrast, the variables density K and the standard deviation of vehicle average speed Vms are statistically significant in most models, K is significant in seven models and Vms is significant in eight models. Therefore, it can be inferred that during conflicts, the high density of vehicles within an intersection and significant temporal fluctuations in average speed may lead to an increase in conflict frequency. In addition, the results of the Bayesian spatial Poisson model indicate that the most significant variables for models with conflict indicators TTC and PET are different. For models with the TTC conflict indicator, the most significant variable is the mean of vehicle acceleration standard deviation Asm; for models with the PET conflict indicator, the most significant variable is the standard deviation of vehicle average speed Vms, as they consistently exhibit statistical significance under different thresholds of their respective conflict indicators. Consequently, it can be deduced that speed variation indicators are more applicable to models with the PET conflict indicator, while acceleration variation indicators are more suitable for models with the TTC conflict indicator.

4.4. Analysis of Conflict Frequency Prediction Accuracy

The conflicting samples predicted as 1 by the RF and XGBoost algorithms are input into the Bayesian spatial Poisson model for the prediction of traffic conflict frequency at signalized and unsignalized intersections.

After proposing the method for predicting intersection conflict frequency, it is necessary to evaluate the method. Referring to previous studies, this paper selects three evaluation metrics—root mean square error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAE)—to assess the superiority or inferiority of the model.

()

where x_i represents the actual value of the conflict frequency and

represents the predicted value of the conflict frequency.

Tables 12 and 13, respectively, present the final results of traffic conflict prediction at signalized and unsignalized intersections. The smaller the values of RMSE, MAPE, and MAE in the tables, the better the performance of the model.

Table 12. Final results of conflict prediction at signalized intersections.

	TTC1	TTC2	TTC3	TTC4	TTC5	PET1	PET2	PET3	PET4	PET5
RMSE	0.153	0.359	0.568	0.986	1.232	0.134	0.376	0.661	0.883	1.102
MAPE (%)	8.9	13.4	19.7	25.5	30.6	8.5	13.8	20.8	22.9	27.1
MAE	0.010	0.032	0.085	0.297	0.526	0.007	0.065	0.193	0.276	0.352

Table 13. Final results of conflict prediction at unsignalized intersections.

	TTC1	TTC2	TTC3	TTC4	TTC5	PET1	PET2	PET3	PET4	PET5
RMSE	0.168	0.332	0.586	1.059	1.313	0.175	0.413	0.624	1.132	1.400
MAPE (%)	9.9	12.2	19.9	26.3	35.8	12.3	18.3	20.1	27.9	38.2
MAE	0.015	0.027	0.135	0.320	0.575	0.012	0.077	0.231	0.306	0.628

For signalized intersections, from TTC1 to TTC5 and from PET1 to PET5, as the threshold values of conflict indicators increase, the values of the three evaluation metrics, namely, RMSE, MAPE, and MAE, all increase, which mean that the performance of the models gradually deteriorates. Regarding the two conflict indicators, TTC and PET, the performance of the models with TTC1, TTC2, and TTC3 are, respectively, superior to that of the models with PET1, PET2, and PET3, while the performance of the models with PET4 and PET5 is, respectively, better than that of the models with TTC4 and TTC5. Among the 10 models for traffic conflict prediction at signalized intersections, the model with the PET1 conflict indicator performs the best, with an RMSE of 0.134, a MAPE of 8.5%, and an MAE of 0.007, which are the lowest among all; the model with the TTC5 conflict indicator performs the worst, with an RMSE of 1.232, a MAPE of 30.6%, and an MAE of 0.526, which are the highest.

For unsignalized intersections, similarly to signalized intersections, from TTC1 to TTC5 and from PET1 to PET5, as the threshold values increase, the performance of the models deteriorates gradually. However, differing from signalized intersections, for the two conflict indicators, TTC and PET, when the conflict indicator thresholds are the same, the performance of the models corresponding to TTC is superior to that of the models corresponding to PET. Among the 10 prediction models, the prediction model with the TTC1 conflict indicator performs the best, with an RMSE of 0.168, a MAPE of 9.9%, and an MAE of 0.015. The prediction model with the PET5 conflict indicator performs the worst, with an RMSE of 1.400, a MAPE of 38.2%, and an MAE of 0.628.

By comparing the conflict frequency prediction results of signalized and unsignalized intersections, for the same threshold of the same conflict indicator, the performance of the conflict prediction model for signalized intersections is superior to that of the conflict prediction model for unsignalized intersections (except for TTC2 and PET3). Among all 20 models, the model with the PET5 conflict indicator for unsignalized intersections performs the worst. Taking it as an example, when predicting the number of conflicts within a one-minute time interval at the intersection, the accuracy is 66.23%, which means that 66.23% of the predicted values match the actual values. Given the actual data background, this prediction result is relatively satisfactory. In fact, in routine research, conflict thresholds of 3 and 4 s are the most commonly used indicators, and the corresponding models (TTC3, TTC4, PET3, and PET4) perform well, with RMSE ranging from 0.568 to 1.132, MAPE ranging from 19.7% to 27.9%, and MAE ranging from 0.085 to 0.320. Therefore, the proposed method performs well in predicting real-time conflicts.

In addition, Figure 5 further intuitively depicts the actual conflict frequencies, the predicted conflict frequencies, and the prediction accuracy at the intersection.

As can be seen from Figure (a) and Figure (b), the conflict occurrences at different intersections vary. However, due to the clustering characteristics of the spatial heat map, there are certain similarities among adjacent intersections, which also indicate that it is necessary to consider spatial correlation when predicting traffic conflicts. Meanwhile, it can be observed from Figure (a) and Figure (b) that the predicted conflict frequencies are nearly consistent with the actual ones, highlighting the excellent performance of this method. Moreover, the prediction accuracy (the average of the prediction accuracies of the 10 models) for each intersection in Figure (c) is above 66%, demonstrating good prediction accuracy. This shows that the prediction method proposed in this paper is reasonable and effective.

5. Conclusion

This study proposes a real-time prediction method for traffic conflicts at intersections that integrates statistical models and machine learning. First, high-resolution unmanned aerial vehicle (UAV) trajectory data are utilized to extract two conflict indicators, namely, TTC and PET. Subsequently, four machine learning algorithms are employed to construct prediction models for the occurrence of traffic conflicts at signalized and unsignalized intersections, respectively, to predict whether conflicts will occur or not. Finally, the samples predicted as “1” are further input into the Bayesian spatial Poisson model to predict the conflict frequencies.

(1)
A real-time prediction model for the occurrence of traffic conflicts at intersections based on machine learning was constructed. The RF algorithm achieved the best results in the real-time prediction of conflict occurrences at signalized intersections. For the TTC model, its average accuracy rate is 89.2%, while for the PET model, its average accuracy rate is 82.6%. The XGBoost algorithm performs best in the real-time prediction of conflict occurrences at unsignalized intersections. For the models with conflict indicators of TTC and PET, their average accuracy rates are 87.8% and 79.2%, respectively.
(2)
A Bayesian spatial Poisson prediction model for conflict frequencies at intersections was constructed. The Bayesian spatial Poisson model explains the correlation between traffic state variables and conflict frequencies, indicating that it is necessary to consider the spatial correlation among adjacent intersections and analyze the influencing factors of conflict frequencies for predicting conflict frequencies. In addition, the Bayesian spatial Poisson model effectively estimates conflict frequencies. The results of model evaluation indicators show that the model has good prediction performance, and the prediction accuracy of the model is above 66%, which demonstrates that it is reasonable and effective to use the Bayesian spatial Poisson model to predict conflict frequencies at intersections.
(3)
A real-time prediction method for traffic conflicts at intersections that integrates statistical models and machine learning is proposed. First, the binary logistic model is used to identify the significant factors influencing the occurrence of conflicts. Then, based on traffic state variables and conflict data, machine learning algorithms are employed to conduct real-time prediction of the occurrence of intersection conflicts. Finally, the Bayesian spatial Poisson model is adopted to predict the frequency of conflicts marked as “1” identified by the machine learning algorithms in the previous step and to identify the significant factors influencing the conflict frequencies.

The limitations of this paper are as follows:

(1)
The research object of this paper is intersections rather than road sections or highways. Therefore, when conducting the research, the signal cycle at intersections should be taken into account. However, due to the limitations of the dataset, only fixed time intervals were considered. In future research, a more comprehensive dataset can be adapted to incorporate the signal cycle and thus improve the integrated method proposed in this paper.
(2)
In the analysis of the influencing factors for conflict occurrence and conflict frequency, this paper only considered dynamic variables related to vehicle speed, the number of vehicles, headway, and acceleration, without taking static variables such as the number of lanes and road channelization into consideration. In future research, a combination of dynamic and static variables can be considered to make the model prediction results more accurate.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Chuanyun Fu: conceptualization, methodology, writing the original draft, reviewing and editing, and funding acquisition. Jiaming Liu: conceptualization, methodology, and writing the original draft. Huahua Liu: methodology and reviewing and editing. Xiaoli Wang: data collection and software. Zhaoyou Lu: visualization. Jushang Ou and Wei Bai: reviewing and editing and interpretation of results.

Funding

The work was jointly supported by the National Natural Science Foundation of China (72371082), the Natural Science Foundation of Sichuan Province of China (2024NSFSC0184), the Opening Project of Intelligent Policing Key Laboratory of Sichuan Province (ZNJW2023KFZD001), and the Fundamental Research Funds for the Central Universities (FRFCUAUGA5710010222).

Open Research

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

1 Sayed T. and Zein S., Traffic Conflict Standards for Intersections, Transportation Planning and Technology. (1999) 22, no. 4, 309–323, https://doi.org/10.1080/03081069908717634, 2-s2.0-0032710205.
10.1080/03081069908717634
Web of Science® Google Scholar
2 Autey J., Sayed T., and Zaki M. H., Safety Evaluation of Right-Turn Smart Channels Using Automated Traffic Conflict Analysis, Accident Analysis & Prevention. (2012) 45, 120–130, https://doi.org/10.1016/j.aap.2011.11.015, 2-s2.0-84855181535.
10.1016/j.aap.2011.11.015
PubMed Web of Science® Google Scholar
3 Zheng L., Ismail K., and Meng X., Traffic Conflict Techniques for Road Safety Analysis: Open Questions and Some Insights, Canadian Journal of Civil Engineering. (2014) 41, no. 7, 633–641, https://doi.org/10.1139/cjce-2013-0558, 2-s2.0-84903782475.
10.1139/cjce-2013-0558
Web of Science® Google Scholar
4 Lord D. and Mannering F., The Statistical Analysis of Crash-Frequency Data: A Review and Assessment of Methodological Alternatives, Transportation Research Part A: Policy and Practice. (2010) 44, no. 5, 291–305, https://doi.org/10.1016/j.tra.2010.02.001, 2-s2.0-77952089434.
10.1016/j.tra.2010.02.001
Web of Science® Google Scholar
5 Fu C. and Sayed T., Identification of Adequate Sample Size for Conflict-Based Crash Risk Evaluation: An Investigation Using Bayesian Hierarchical Extreme Value Theory Models, Analytic Methods in Accident Research. (2023) 39, https://doi.org/10.1016/j.amar.2023.100281.
10.1016/j.amar.2023.100281
Web of Science® Google Scholar
6 Kumar S., Toshniwal D., and Parida M., A Comparative Analysis of Heterogeneity in Road Accident Data Using Data Mining Techniques, Evolving Systems. (2017) 8, no. 2, 147–155, https://doi.org/10.1007/s12530-016-9165-5, 2-s2.0-85019393043.
10.1007/s12530-016-9165-5
CAS Google Scholar
7 Tarko A., Measuring Road Safety with Surrogate Events, 2019, Elsevier.
Google Scholar
8 Li H., Wang L., Yang M., and Bie Y., Collaborative Effects of Vehicle Speed and Illumination Gradient at Highway Intersection Exits on Drivers’ Stress Response Capacity, Accident Analysis & Prevention. (2025) 209, https://doi.org/10.1016/j.aap.2024.107829.
10.1016/j.aap.2024.107829
Web of Science® Google Scholar
9 Li H., Wang L., and Bie Y., Dynamic Illumination Method for Rural Highway Intersections With Traffic Flow Changes, Transportation Research Record: Journal of the Transportation Research Board. (2024) 2678, no. 7, 977–991, https://doi.org/10.1177/03611981231211895.
10.1177/03611981231211895
Google Scholar
10 Reyad P., Sacchi E., Ibrahim S., and Sayed T., Traffic Conflict–Based Before–After Study With Use of Comparison Groups and the Empirical Bayes Method, Transportation Research Record: Journal of the Transportation Research Board. (2017) 2659, no. 1, 15–24, https://doi.org/10.3141/2659-02, 2-s2.0-85040787919.
10.3141/2659-02
Google Scholar
11 Li D., Fu C., Sayed T., and Wang W., An Integrated Approach of Machine Learning and Bayesian Spatial Poisson Model for Large-Scale Real-Time Traffic Conflict Prediction, Accident Analysis & Prevention. (2023) 192, https://doi.org/10.1016/j.aap.2023.107286.
10.1016/j.aap.2023.107286
Web of Science® Google Scholar
12 Zheng L. and Sayed T., A Bivariate Bayesian Hierarchical Extreme Value Model for Traffic Conflict-Based Crash Estimation, Analytic Methods in Accident Research. (2020) 25, https://doi.org/10.1016/j.amar.2020.100111.
10.1016/j.amar.2020.100111
Web of Science® Google Scholar
13 Fu C., Lu Z., Liu H., and Wumaierjiang A., Dynamic Short-Term Crash Risk Prediction From Traffic Conflicts at Signalized Intersections with Emerging Mixed Traffic Flow: A Novel Conflict Indicator, Accident Analysis & Prevention. (2025) 217, https://doi.org/10.1016/j.aap.2025.108065.
10.1016/j.aap.2025.108065
Web of Science® Google Scholar
14 Fu C. and Sayed T., A Multivariate Method for Evaluating Safety From Conflict Extremes in Real Time, Analytic Methods in Accident Research. (2022) 36, https://doi.org/10.1016/j.amar.2022.100244.
10.1016/j.amar.2022.100244
Web of Science® Google Scholar
15 Xu C., Liu P., Wang W., and Li Z., Identification of Freeway Crash-Prone Traffic Conditions for Traffic Flow at Different Levels of Service, Transportation Research Part A: Policy and Practice. (2014) 69, 58–70, https://doi.org/10.1016/j.tra.2014.08.011, 2-s2.0-84930944172.
10.1016/j.tra.2014.08.011
Web of Science® Google Scholar
16 Caleffi F., Anzanello M. J., and Cybis H. B. B., A Multivariate-Based Conflict Prediction Model for a Brazilian Freeway, Accident Analysis & Prevention. (2017) 98, 295–302, https://doi.org/10.1016/j.aap.2016.10.025, 2-s2.0-84993940156.
10.1016/j.aap.2016.10.025
PubMed Web of Science® Google Scholar
17 Essa M. and Sayed T., Full Bayesian Conflict-Based Models for Real Time Safety Evaluation of Signalized Intersections, Accident Analysis & Prevention. (2019) 129, 367–381, https://doi.org/10.1016/j.aap.2018.09.017, 2-s2.0-85054348561.
10.1016/j.aap.2018.09.017
PubMed Web of Science® Google Scholar
18 Wang C., Xu C., Xia J., Qian Z., and Lu L., A Combined Use of Microscopic Traffic Simulation and Extreme Value Methods for Traffic Safety Evaluation, Transportation Research Part C: Emerging Technologies. (2018) 90, 281–291, https://doi.org/10.1016/j.trc.2018.03.011, 2-s2.0-85044613624.
10.1016/j.trc.2018.03.011
Web of Science® Google Scholar
19 Wang C., Xu C., and Dai Y., A Crash Prediction Method Based on Bivariate Extreme Value Theory and Video-Based Vehicle Trajectory Data, Accident Analysis & Prevention. (2019) 123, 365–373, https://doi.org/10.1016/j.aap.2018.12.013, 2-s2.0-85059119957.
10.1016/j.aap.2018.12.013
PubMed Web of Science® Google Scholar
20 Fu C. and Sayed T., Dynamic Bayesian Hierarchical Peak Over Threshold Modeling for Real-Time Crash-Risk Estimation From Conflict Extremes, Analytic Methods in Accident Research. (2023) 40, https://doi.org/10.1016/j.amar.2023.100304.
10.1016/j.amar.2023.100304
Web of Science® Google Scholar
21 Zheng L. and Sayed T., A Novel Approach for Real Time Crash Prediction at Signalized Intersections, Transportation Research Part C: Emerging Technologies. (2020) 117, https://doi.org/10.1016/j.trc.2020.102683.
10.1016/j.trc.2020.102683
Web of Science® Google Scholar
22 Fu C. and Sayed T., Random-Parameter Bayesian Hierarchical Extreme Value Modeling Approach With Heterogeneity in Means and Variances for Traffic Conflict-Based Crash Estimation, Journal of Transportation Engineering, Part A: Systems. (2022) 148, no. 9, https://doi.org/10.1061/jtepbs.0000717.
10.1061/jtepbs.0000717
Web of Science® Google Scholar
23 Fu C. and Sayed T., Bayesian Dynamic Extreme Value Modeling for Conflict-Based Real-Time Safety Analysis, Analytic Methods in Accident Research. (2022) 34, https://doi.org/10.1016/j.amar.2021.100204.
10.1016/j.amar.2021.100204
Web of Science® Google Scholar
24 Katrakazas C., Quddus M., and Chen W. H., A Simulation Study of Predicting Real-Time Conflict-Prone Traffic Conditions, IEEE Transactions on Intelligent Transportation Systems. (2018) 19, no. 10, 3196–3207, https://doi.org/10.1109/tits.2017.2769158, 2-s2.0-85038872832.
10.1109/tits.2017.2769158
Web of Science® Google Scholar
25 Fu C., Lu Z., Ding N., and Bai W., Distance Headway-Based Safety Evaluation of Emerging Mixed Traffic Flow Under Snowy Weather, Physica A: Statistical Mechanics and its Applications. (2024) 642, https://doi.org/10.1016/j.physa.2024.129792.
10.1016/j.physa.2024.129792
Web of Science® Google Scholar
26 Formosa N., Quddus M., Ison S., Abdel-Aty M., and Yuan J., Predicting Real-Time Traffic Conflicts Using Deep Learning, Accident Analysis & Prevention. (2020) 136, https://doi.org/10.1016/j.aap.2019.105429.
10.1016/j.aap.2019.105429
PubMed Web of Science® Google Scholar
27 Hu Y., Li Y., Huang H., Lee J., Yuan C., and Zou G., A High-Resolution Trajectory Data Driven Method for Real-Time Evaluation of Traffic Safety, Accident Analysis & Prevention. (2022) 165, https://doi.org/10.1016/j.aap.2021.106503.
10.1016/j.aap.2021.106503
Web of Science® Google Scholar
28 Yuan C., Li Y., Huang H., Wang S., Sun Z., and Li Y., Using Traffic Flow Characteristics to Predict Real-Time Conflict Risk: A Novel Method for Trajectory Data Analysis, Analytic Methods in Accident Research. (2022) 35, https://doi.org/10.1016/j.amar.2022.100217.
10.1016/j.amar.2022.100217
Web of Science® Google Scholar
29 Islam Z. and Abdel-Aty M., Traffic Conflict Prediction Using Connected Vehicle Data, Analytic Methods in Accident Research. (2023) 39, https://doi.org/10.1016/j.amar.2023.100275.
10.1016/j.amar.2023.100275
Web of Science® Google Scholar
30 Barmpounakis E. and Geroliminis N., On the New Era of Urban Traffic Monitoring With Massive Drone Data: The pNEUMA Large-Scale Field Experiment, Transportation Research Part C: Emerging Technologies. (2020) 111, 50–71, https://doi.org/10.1016/j.trc.2019.11.023.
10.1016/j.trc.2019.11.023
Web of Science® Google Scholar
31 Li X., Lord D., Zhang Y., and Xie Y., Predicting Motor Vehicle Crashes Using Support Vector Machine Models, Accident Analysis & Prevention. (2008) 40, no. 4, 1611–1618, https://doi.org/10.1016/j.aap.2008.04.010, 2-s2.0-46149126623.
10.1016/j.aap.2008.04.010
PubMed Web of Science® Google Scholar
32 Lin L., Wang Q., and Sadek A. W., A Novel Variable Selection Method Based on Frequent Pattern Tree for Real-Time Traffic Accident Risk Prediction, Transportation Research Part C: Emerging Technologies. (2015) 55, 444–459, https://doi.org/10.1016/j.trc.2015.03.015, 2-s2.0-84936986354.
10.1016/j.trc.2015.03.015
Web of Science® Google Scholar

All articles

Real-Time Traffic Conflict Prediction at Intersections: A Novel Approach Integrating Statistical Models and Machine Learning

Abstract

1. Introduction