Structural Control and Health Monitoring

Volume 2025, Issue 1 5595809

Research Article

Open Access

LBN-YOLO: A Lightweight Road Damage Detection Model Based on Multiscale Contextual Feature Extraction and Fusion

Guizhen Niu

orcid.org/0009-0002-2043-0588

School of Mechanical, Electrical and Information Engineering , Shandong University , Weihai , 264209 , China , sdu.edu.cn

Search for more papers by this author

Guangming Li,

Corresponding Author

Guangming Li

[email protected]

orcid.org/0000-0001-6142-8607

School of Mechanical, Electrical and Information Engineering , Shandong University , Weihai , 264209 , China , sdu.edu.cn

Shandong University–Weihai Research Institute of Industrial Technology , Weihai , 264209 , China

Search for more papers by this author

Chengyou Wang,

Chengyou Wang

orcid.org/0000-0002-0901-2492

School of Mechanical, Electrical and Information Engineering , Shandong University , Weihai , 264209 , China , sdu.edu.cn

Shandong University–Weihai Research Institute of Industrial Technology , Weihai , 264209 , China

Search for more papers by this author

Kaixuan Hui,

Kaixuan Hui

School of Mechanical, Electrical and Information Engineering , Shandong University , Weihai , 264209 , China , sdu.edu.cn

Search for more papers by this author

Guizhen Niu,

Guizhen Niu

orcid.org/0009-0002-2043-0588

School of Mechanical, Electrical and Information Engineering , Shandong University , Weihai , 264209 , China , sdu.edu.cn

Search for more papers by this author

Guangming Li,

Corresponding Author

Guangming Li

[email protected]

orcid.org/0000-0001-6142-8607

School of Mechanical, Electrical and Information Engineering , Shandong University , Weihai , 264209 , China , sdu.edu.cn

Shandong University–Weihai Research Institute of Industrial Technology , Weihai , 264209 , China

Search for more papers by this author

Chengyou Wang,

Chengyou Wang

orcid.org/0000-0002-0901-2492

School of Mechanical, Electrical and Information Engineering , Shandong University , Weihai , 264209 , China , sdu.edu.cn

Shandong University–Weihai Research Institute of Industrial Technology , Weihai , 264209 , China

Search for more papers by this author

Kaixuan Hui,

Kaixuan Hui

School of Mechanical, Electrical and Information Engineering , Shandong University , Weihai , 264209 , China , sdu.edu.cn

Search for more papers by this author

First published: 26 July 2025

https://doi.org/10.1155/stc/5595809

Academic Editor: Tzu-Kang Lin

Share a link

Email
Wechat
Bluesky

Abstract

Detecting and classifying road damage are crucial for road maintenance. To address the limitations of existing road damage detection methods, including insufficient fine-grained contextual feature extraction and complex models unsuitable for deployment, this paper proposes a lightweight backbone and neck road damage detection model named LBN-YOLO. First, the backbone and neck of the original model are improved to be lightweight, and the C2f-dilation wise residual (C2f-DWR) module is integrated in the backbone to extract multiscale contextual information. Second, a simplified bidirectional feature pyramid network is employed in the neck structure to optimize the feature fusion network, reducing the number of parameters and simplifying the model complexity. Finally, a dynamic head with self-attention is introduced to enhance the sensing capability of the detection head, thus improving the precision of detecting occluded small objects. The proposed model’s detection ability is evaluated using a custom road damage dataset. The experimental results demonstrate that our proposed LBN-YOLO model achieves superior performance compared with the YOLOv8n model, with an increase of 4.1% in [email protected] and a 5.2% enhancement in precision, outperforming other detection models. In addition, the model is evaluated on two public datasets, showing improved detection performance compared with the original model, demonstrating strong generalization capabilities. Code and dataset are available at https://github.com/gzNiuadc/Road-crack-dataset.

1. Introduction

Roads are an important part of the transportation system. As usage duration and road loading rise, various kinds of damage gradually occur, with road cracks being the most common [1]. Serious road damage can be a potential threat to the safety of pedestrians and vehicles, so it is particularly important to monitor road damage and its severity in a timely manner. How to realize high-precision automated road damage detection with high efficiency and low cost is a great challenge nowadays.

Over the past 2 decades, conventional image processing techniques such as edge detection [2], morphological operations [3], and threshold segmentation have been widely employed for road damage detection. However, these methods are sensitive to light variations and noise, making them susceptible to uneven illumination and noise, which lead to lower precision in detection results. To solve these limitations, researchers apply deep learning approaches for road damage detection. Compared with traditional image processing methods, the deep learning method is faster and more robust [4].

1.1. Related Works

As a classical deep learning algorithm, the object detection algorithm can locate and classify road damage. Object detection models are divided into one-stage object detection and two-stage object detection [5]. One-stage object detection models include SSD [6], RetinaNet [7], and YOLO series algorithms [8, 9]. Typical two-stage detection models are Fast R-CNN [10] and Faster R-CNN [11]. The one-stage object detection algorithm directly obtains detection boxes, eliminating the step of determining the candidate boxes, resulting in a faster detection speed for the one-stage detection model. The YOLO series model is a traditional one-stage object detection approach with both detection speed and precision.

To further improve the detection precision of the YOLO series of models, Liu et al. [12] optimized the YOLOv3 model by adding a four-scale detection layer and replacing the intersection over union (IoU) with the EIoU, effectively enhancing the detection capability for small and hidden cracks in pavements. Xing et al. [13] incorporated an additional Swin-Transformer prediction head (SPH) into the YOLOv5 model to enhance feature extraction in complex environments. The improved YOLOv5 model increased road crack detection accuracy by 15.4%, with ability to detect cracks as small as 1.2 mm. Xiong et al. [14] proposed an improved model, YOLOv8-GAM-wise-IoU, based on YOLOv8, which significantly improves the accuracy and efficiency of bridge crack detection by introducing a global attention module and the wise IoU loss function. Dong et al. [15] integrated visual attention networks (VANs), large convolutional attention (LCA) module, and large separable kernel attention (LSKA) module into the YOLOv8 model to enhance the extraction of damaged shapes and localized features of concrete surfaces, achieving a 15% improvement in [email protected].

The lightweight model facilitates deployment across various applications. To achieve this, Yu et al. [16] used YOLOv4-FPM, based on YOLOv4, for real-time bridge crack detection. A pruning algorithm was applied to simplify the network structure, increasing the detection speed by 20 times. Diao et al. [17] replaced the backbone network of YOLOv5 with the lightweight MobileNetV3 and substituted the squeeze-and-excitation (SE) channel attention mechanism in the MobileNetV3 network with the shuffle attention (SA) mechanism. This adaptation not only improved information exchange between channels and spatial dimensions but also reduced the model’s parameter count, providing an important reference for lightweight road damage detection algorithms. Ning et al. [18] proposed an improved YOLOv7-RDD model, which integrates a lightweight aggregation network, an enhanced spatial feature pyramid structure, and a similarity-based attention mechanism (SimAM), achieving 145 FPS on the CQURDD dataset and demonstrating high detection efficiency. Zhao et al. [19] introduced a lightweight real-time road damage detection model by incorporating MobileNetv3 as the backbone and the efficient channel attention (ECA) module, reducing computational complexity by 46.2% while maintaining real-time performance.

Although numerous advancements have been made in improving either the speed or accuracy of road damage detection models, research addressing both aspects simultaneously remains limited. Xing et al. [20] enhanced the YOLOv5 model by introducing an efficient decoupled header and replacing traditional convolution with the GCC3 module for global context modeling, improving road damage detection. On the RDD2022 dataset, accuracy increased by 1.5% but parameters and computational cost doubled compared with the original model. Wang and Chen [21] proposed a transformer-based detector incorporating receptive field attention blocks and a feature assignment mechanism to enhance crack detection accuracy. However, this increases in GFLOPs and parameters by 11% and 15%, respectively, resulted in reduced detection speed. Wan et al. [22] improved the BR-DETR model for bridge damage detection by introducing deformable conv2d and employing copy–paste augmentation, boosting accuracy. However, the high complexity of the transformer limits real-time detection. Meng et al. [23] developed a lightweight pavement crack detection method based on YOLOv8 and a knowledge distillation model with multiple teacher–assistants (KDMTA), achieving a 79.6% improvement in image processing speed. However, detection accuracy and mean average precision (mAP) decreased by 2.14% and 4.64%, respectively. He et al. [24] replaced standard convolution in YOLOv7 with ghost convolution and applied depthwise separable convolution to reduce parameters. By combining channel width, depth optimization, and knowledge distillation, the model achieves a lightweight design with a slight degradation in detection accuracy.

1.2. Motivations

In summary, road damage detection requires a balanced emphasis on both detection precision and real-time capabilities. However, the complexity of the detection models at this stage is high and the detection efficiency is low [20–23]. The intricate morphological traits of different road damages, coupled with complex background interference and fluctuating lighting conditions, present significant challenges for accurate road damage recognition. Therefore, this study aims to develop a robust road damage detection model capable of accurately identifying and categorizing various road damages, while also minimizing training costs and detection time to overcome current limitations.

1.3. Contributions

Our present research introduces the road damage detection model LBN-YOLO, which is based on YOLOv8n and achieves a balance between precision and speed. The primary contributions of this study can be outlined as follows.

1.
A dataset of road damages in real scenes has been constructed, consisting of seven types of road damages in various lighting conditions and diverse backgrounds. This dataset enhances the available data for road damage detection.
2.
The proposed LBN-YOLO model is characterized by its lightweight design. It integrates dilation-wise residual (DWR) into the C2f module of the feature extraction network, forming the C2f-DWR module. This augmentation enhances the backbone network’s ability to extract multiscale contextual and global information.
3.
The neck section of the detection model is enhanced by eliminating nodes with minimal impact on feature fusion to reduce feature redundancy and proposes a simplified bidirectional feature pyramid network (SBiFPN) based on the feature fusion concept of the bidirectional feature pyramid network (BiFPN). In addition, the cross-level connection is introduced to fuse the feature information of multiple scales to improve the detection capability of objects at various scales.
4.
Dynamic head (Dyhead) is employed as the detection head, utilizing scale awareness, spatial awareness, and task awareness among different extraintegral layers, spatial positions, and within the output channel, respectively. This significantly enhances the classification and localization precision of small objects affected by shadows and occlusions.

The rest of the paper is organized as follows. Section 2 presents the details of the road damage detection model proposed in this study. Section 3 elaborates on the dataset used, experimental configurations, and outcomes. Sections 4 and 5 summarize the paper’s findings and outline potential directions for future research.

2. Algorithms for Road Damage Detection

This section introduces the improved design of the LBN-YOLO model, including the C2f-DWR module in the backbone network, the SBiFPN feature fusion network, and the Dyhead detection head incorporating multiple self-attention mechanisms, aiming to enhance the accuracy and efficiency of road damage detection.

2.1. LBN-YOLO Algorithm

YOLOv8 is an enhanced version of YOLOv5. Recently, YOLOv8 has demonstrated excellent performance in road damage detection [25, 26]. YOLOv8 consists of five pretrained versions with different model sizes, namely, YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x [19]. Among them, YOLOv8n is the model with the minimum parameters, saving computational resources while maintaining high detection precision, which is consistent with our research goal. Therefore, this paper chooses YOLOv8n as the baseline model. YOLOv8 introduces three significant improvements compared with YOLOv5. First, it replaces the C3 module in the backbone with the C2f that contains rich information of gradient flow to reduce the computational complexity. Second, it eliminates the 1 × 1 convolution preceding the upsampling layer in the neck section. Lastly, the head section uses the decoupled head structure to separate the classification task from the regression task, and Anchor-Based is replaced with Anchor-Free [27, 28].

Due to the complexity of the backgrounds presented in the road damage images we collected, such as water damage, leaves, and shadows, which can interfere with road damage detection, the YOLOv8n detection model is prone to misdetection and omission. To address these challenges, this study proposes an improved model named LBN-YOLO. Figure 1 illustrates the architecture of LBN-YOLO model. In the Trunk section, two C2f-DWR modules replace the C2f module at the end, enhancing the extraction capability of contextual information and multi-scale object features. In the Neck section, SBiFPN is introduced for bidirectional fusion and interaction of multiscale feature information, while simultaneously achieving a lightweight design. To further enhance precision in detecting occluded small objects, the dynamic detection head Dyhead with a self-attention mechanism is employed.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Overall structure of LBN-YOLO.

2.2. C2f-DWR Module

The C2f module in YOLOv8 acquires rich gradient flow information through residual concatenation. The module’s construction is illustrated in Figure 1. The object scales in our collected road damage images are of various sizes. However, the standard convolution in the C2f module limits the sensory field of the network and lacks the ability to extract multiscale contextual information, potentially leading to false detection of damaged objects. To resolve this limitation, this paper proposes the C2f-DWR module in this study to efficiently capture multi-scale contextual information based on C2f. The C2f-DWR contains multiple DWR modules, as illustrated in Figure 2(b).

The DWR module is structured residually, featuring three dilated convolutional modules with dilation rates of 1, 3, and 5 [29], alongside two standard convolutional modules with convolutional kernels of 1 and 3, as shown in Figure 2(b). It extracts the contextual information of multiscale features through two steps. First, the initial feature extraction is carried out using 3 × 3 convolution. Subsequently, the feature maps obtained in the first step are input into three expansion convolution layers with different expansion rates to extract features with rich contextual semantic information. The output feature maps of the three expansion convolutions are spliced in channel dimension. Finally, the output feature map after feature integration is concatenated with the residuals of the original feature map to produce a feature map containing both the original feature information and multiscale contextual information.

The scales and shapes of the road damages we captured exhibit significant variation. However, DWR can expand the receptive field without sacrificing the size of the feature map, thereby enhancing the extraction of multiscale features. Consequently, integrating the C2f-DWR module into the backbone network enables the extraction of more comprehensive feature information. This integration allows for fusing and interacting local fine-grained features, captured when the receptive field is small, with global coarse-grained features, captured when the receptive field is large.

2.3. Improved Feature Fusion Network SBiFPN

The feature fusion network of YOLOv8 utilizes feature pyramid network (FPN) and path aggregation network (PAN), as shown in Figure 3(a). PANet [30] achieves bidirectional fusion of features by upsampling and downsampling, which might lead to feature information loss or redundancy. BiFPN [31] network carries out complicated bidirectional cross-scale fusion based on PAN, while simultaneously eliminating unidirectional input nodes that do not contribute to feature fusion. In Figure 3(b), the node positioned as the first node on the right side of P₇ exemplifies this kind of node. Because its contribution to the feature fusion network is minimal, its removal has negligible impact on the overall network. However, it is worth noting that the BiFPN network remains its complexity.

To better fuse the multiscale road damage features, overcome the information loss problem, and simplify the network structure, this study further optimizes the feature fusion network. The enhanced feature fusion network first aligns the number of channels through three layers of convolution for subsequent feature fusion. Then, the bottom-up and top-down bidirectional feature fusion flows are constructed, and additional lateral connections are added based on the bidirectional cross-scale feature information flow to weigh the original feature information and cross-level feature information. Taking P₆ from Figure 3(c) as an example, equation (1) illustrates the calculation of intermediate features

while equation (2) represents the output features

as follows:

()

where w is the learned weight and “Resize” means upsampling or downsampling to change the size of the feature map.

Since feature fusion primarily occurs at the P₄ and P₆ levels, and large-scale objects are relatively rare, the P₃ level output node shown in Figure 3(c) is removed to enhance small object detection performance, simplify the network, and further reduce computational overhead. Road cracks are typically small, irregularly shaped, and defined by subtle features, relying heavily on detailed information, such as edges and textures, embedded in low-level features. In small object detection, low-level features contain abundant positional information critical for accurate localization and boundary delineation. Directly connecting low-level P₃ features to the high-level output allows detailed information from the lower layers to be efficiently propagated to the final output, preserving fine-grained features critical for effective detection.

The optimized feature fusion network fuses multiscale positional and semantic information with different weights, removes redundant feature information, significantly improves the detection precision of small objects, and overcomes the defects of the traditional feature pyramid network.

2.4. Dyhead

There are small objects with uneven illumination and shadow occlusion in the pictures captured by the real scene, which are difficult to detect. To improve the precision of detection, Dyhead [32] detection head based on self-attention mechanism is introduced. Dyhead combines scale-aware attention module, spatial-aware attention module, and task-aware attention module. By incorporating the three self-attention modules, the feature representation ability of the detection head is enhanced, and the detection precision of small objects is improved.

The three attention modules in Dyhead are separated from each other, and each attention module focuses on only one dimension; accordingly, the three attention modules are arranged [33], and the attention

on the detection layer is shown in the following equation:

()

where F is a 3D tensor of L × S × C obtained from the backbone network, which is inputted into the three attention modules, π_L stands for the scale attention function, π_S is the spatial-aware attention function, and π_C is the task-aware attention function.

The scale-aware attention module dynamically fuses feature information of different scales in L dimension, and the functional expression of π_L is shown in the following equation:

()

where S and C represent the spatial and task dimensions, respectively. σ denotes the Sigmoid activation function, and f(·) is a linear function that utilizes a 1 × 1 convolutional approximation.

The spatial-aware attention module is applied after the scale-aware attention module, and sparsity is added to the attention learning by deformable convolution to aggregate different levels of information at the same spatial location, as shown in the following equation:

()

where p_k + ∆p_k is the shifted position affected by the spatial offset p_k, ∆m_k denotes the importance factor of the self-learned position, and k is the number of sparsely sampled positions.

The task-aware attention module dynamically controls the opening and closing of feature channels for different tasks, as shown in the following equation:

()

where

denotes the cutoff feature corresponding to the C channel, and

is the threshold of the activated channel.

The structure of Dyhead is shown in Figure 4. A single Dyhead module structure is composed of three kinds of attention modules in series and can also be stacked with multiple dynamic modules.

3. Experiment and Analysis

This section provides an overview of the experimental setup, evaluation methods, and analysis for model training, including data preprocessing, evaluation metrics, ablation experiments for individual modules, performance comparison with other algorithms, and validation results on different datasets.

3.1. Dataset and Preprocessing

Since there is a small dataset of publicly available road damage images containing complex environmental disturbances, this paper captured 1300 road damage images of 3000 × 4000 pixels using a smartphone. This paper captured images of road damage on various road surfaces, including asphalt concrete, cement concrete, and masonry pavement. Subsequently, this paper filtered the acquired data to eliminate excessively blurred, poorly lit, and overexposed images, ensuring high image quality. The censored road damage dataset consists of seven types of road damage: transverse crack (TC), longitudinal crack (LC), oblique crack (OC), alligator crack (AC), block crack (BC), pothole (Po), and repair (Re), and an example of the damage is shown in Figure 5. Since repair damage is relatively uncommon in real road scenes and the amount of collected image data is limited, this paper considers selecting some high-quality repair-type images from the public dataset RDD2020 [34] to augment the self-constructed dataset. The merged dataset is divided into training set, validation set, and test set according to the ratio of 7:2:1.

The richness of the dataset is closely related to the detection effectiveness, and the use of datasets containing a variety of complex environments to train the model can achieve better results and stronger robustness [35], effectively preventing the model from overfitting. Therefore, this paper employs both offline and online methods for data enhancement. Online enhancement involves using Mosaic data enhancement, while offline enhancement includes common methods such as random flipping and color perturbation. Data enhancement enriches the location and color information of the damaged images in the dataset, and the enhanced dataset contains a total of about 3000 road damage images. The sample sizes for the various road damage types in the dataset are shown in Table 1. The proportion of each damage type in the training, validation, and testing sets remains consistent with the overall dataset distribution.

Table 1. Number of images of various road damage types in the dataset.

Type of road damage	Total number	Training set	Validation set	Test set
Transverse crack	1221	917	225	79
Longitudinal crack	1608	1108	335	165
Oblique crack	2059	1538	329	192
Block crack	377	243	80	54
Alligator crack	447	333	77	37
Pothole	341	251	62	28
Repair	602	438	126	38

Training on 3000 × 4000 pixel images is time consuming and computationally demanding. Directly cropping damaged images may lead to the loss of critical information. Therefore, determining an optimal input image size that balances detection performance and training efficiency is crucial. To achieve this, a comparative experiment was conducted with different input sizes (320 × 320, 640 × 640, and 1280 × 1280) while keeping other hyperparameters constant. The experimental results are shown in Table 2. The results indicate that 640 × 640 provides the best balance between detection accuracy and computational efficiency. A smaller input size (320 × 320) reduces detection accuracy whereas a larger size (1280 × 1280) slightly improves accuracy but significantly increases memory consumption. Due to GPU limitations, the batch size must be reduced from 32 to 8, thereby prolonging training time. Furthermore, previous research [36] indicates that increasing image resolution does not substantially enhance detection accuracy. Therefore, images are resized to 640 × 640 for the YOLOv8 model to optimize performance and efficiency.

Table 2. Comparison of detection results for different image sizes on the LBN-YOLO model.

Image size	[email protected] (%)	[email protected]:0.95 (%)	GPU memory usage (G)	Training time (h)
320 × 320	63.4	44.9	2.5	1.4
640 × 640	74.0	51.9	9.2	3.4
1280 × 1280	64.5	38.4	9.2	9.2

To ensure the accuracy and reliability of the annotations, the collected road damage image dataset is manually annotated by experts using LabelImg. Detailed annotation guidelines are established based on international standards, with a strict verification procedure. Randomly selected annotated samples are validated to enhance dataset consistency and credibility.

3.2. Experimental Parameter Setting and Model Training

The model training and testing for the experiments in this paper were conducted on an Intel i5-12400F CPU and NVIDIA 4060-Ti GPU. The experimental framework is PyTorch 1.13.1, CUDA Version 11.3, and the algorithms are written in Python, Python Version 3.9.18.

The hyperparameter settings during the training process are shown in Table 3. Considering the size of the computational resources, the batch size is set to 32, the model is trained for 400 epochs, the optimizer is chosen to be stochastic gradient descent (SGD), and in order to prevent the model from overfitting, an early stopping strategy is used.

Table 3. Training parameter settings.

Hyperparameters	Value
Batch size	32
Epoch	400
Initial learning rate	0.01
Warmup	3
Optimizer	SGD
Momentum	0.9
Weight decay	0.0005
Training mechanism	EarlyStopping

3.3. Evaluation Indicators

This paper used commonly used evaluation metrics in the field of object detection, such as precision (P), recall (R), [email protected], [email protected]:0.95, parameter number (Para), frames per second (FPS), and F₁-score (F₁) to evaluate the road damage detection model. In object detection, the objects are classified into two categories: positive samples and negative samples. A positive sample refers to a correctly detected object, where the IoU between the predicted bounding box and the ground truth box exceeds a predefined threshold. A negative sample refers to a sample where no object is detected or the IoU is below the threshold.

Precision denotes the proportion of samples with positive predictions that are truly positive and is also known as the check precision rate, as shown in the following equation:

()

where N_TP denotes the number of samples correctly predicted as positive samples, N_TN denotes the number of samples correctly predicted as negative samples, N_FP denotes the number of samples incorrectly predicted as positive samples, and N_FN denotes the number of samples incorrectly predicted as negative samples.

Recall denotes the proportion of true positive samples that are correctly predicted as positive, as shown in the following equation:

()

The F₁-score is an evaluation index derived from the combined consideration of precision and recall, as shown in the following equation:

()

when 0.5 is set as the threshold value of IoU, the average precision (AP) is calculated at different recall values, AP measures the advantages and disadvantages of the detection results of a certain category, and mAP averages the AP values of all the categories, and the formula for mAP is shown in the following equation:

()

where N denotes the total number of categories and AP_i indicates the AP value of i.

The evaluation of the detection model includes two aspects of detection precision and detection speed, and FPS is used to evaluate the detection speed of the detection model, which is calculated as shown in the following equations:

()

T_total is the sum of inference time, preprocessing time, and nonmaximum suppression (NMS) time, and N_image represents the total number of images.

3.4. Model Validation and Results

This paper trained the model using a self-built dataset and the detection results for the seven road damage types are shown in Table 4. “Total” denotes the overall detection results for all categories. From Table 4, it can be concluded that in the detection using the LBN-YOLO proposed in this study, the total value of [email protected] for all types increased by 4.3%.

Table 4. Comparison of detection results of seven road damage types.

Class	P (%)	R (%)	F₁ (%)	[email protected] (%)	[email protected]:0.95 (%)
YOLOv8n:
Total	69.9	63.7	66.7	69.7	43.7
Alligator crack	84.4	76.7	80.4	83.9	49.0
Block crack	31.4	100.0	47.8	99.5	84.5
Longitudinal crack	68.2	53.2	59.8	59.1	31.6
Oblique crack	67.1	47.9	55.9	53.1	31.9
Pothole	91.7	41.7	57.3	48.2	15.7
Repair	71.5	81.1	76.0	77.1	59.3
Transverse crack	75.3	59.1	66.2	68.0	34.2

LBN-YOLO:
Total	75.1	65.9	70.6	74	51.9
Alligator crack	89.8	73.3	80.7	87.4	55.2
Block crack	35.9	100.0	52.8	99.5	94.5
Longitudinal crack	72.0	51.6	60.1	60.0	37.3
Oblique crack	69.8	45.6	55.2	55.5	32.7
Pothole	92.9	41.7	57.6	62.5	32.1
Repair	82.0	86.2	84.1	83.8	71.9
Transverse crack	83.4	59.1	69.2	69.5	39.9

When using the original YOLOv8n model for road damage detection, the detection results for pothole types are notably lower, with a [email protected] value of only 48.2%, owing to the scarcity of pothole images, which often appear in varying sizes and shapes. It is worth noting that the LBN-YOLO model incorporates the C2f-DWR module and SBiFPN, enabling more effective extraction and fusion of the irregular features of pothole across multiple scales. Utilizing LBN-YOLO for detection, the [email protected] value for pothole types increased by 14.3%, and the [email protected]:0.95 value increased by 16.4%, representing the highest improvement among several damage categories. This indicates the exceptional performance of our proposed model on the more challenging damage categories, reducing both missed detections and false alarms.

The LBN-YOLO model proposed in this study demonstrates superior improvements in P, R, and F₁ compared with the original YOLOv8n model, as visually represented in Figures 6 and 7 with confusion matrices. It is evident from Figures 6(a) and 6(d) that the performance of the enhanced LBN-YOLO model surpasses that of the original model trained for 400 epochs, demonstrating significantly improved performance. Figure 7 illustrates a comparison of the confusion matrices between the two models before and after improvement. The horizontal axis represents the actual damage categories, while the vertical axis represents the predicted categories. Notably, the classification precision depicted in Figure 7(b) surpasses that of Figure 7(a), implying the superior performance of our LBN-YOLO algorithm compared to YOLOv8n in detecting road damages.

3.5. Ablation Experiment

To verify the effectiveness of several modules proposed in this paper, ablation experiments are designed. Accordingly, the proposed C2f-DWR, SBiFPN, and Dyhead are added into the model for the ablation test. The experiments are conducted on the self-constructed dataset presented in Section 3.1, and the results are shown in Table 5.

Table 5. Results of ablation experiments.

YOLOv8n	C2f-DWR	SBiFPN	Dyhead	F₁ (%)	[email protected] (%)	[email protected]:0.95 (%)	Para (M)
√				67.7	69.7	43.7	3.0
√	√			69.4	71.7	46.1	2.9
√		√		65.7	69.0	43.1	2.0
√			√	70.0	71.2	48.9	3.4
√	√	√		69.7	72.8	51.1	1.9
√	√		√	69.2	72.1	51.0	3.4
√		√	√	68.9	71.9	49.5	2.5
√	√	√	√	70.6	74.0	51.9	2.4

In the first row of Table 5, the detection results using the original YOLOv8n model serve as the baseline. Subsequently, the C2f-DWR module is separately added to the baseline model, resulting in a 2% increase in [email protected]. The addition of SBiFPN achieves lightweight of the original model, with a significant 34% reduction in parameter count while maintaining comparable precision to YOLOv8n. When combining two modules, the results surpass those of using each module individually. Notably, the combination of C2f-DWR and SBiFPN modules increases [email protected] from 69.7% to 72.8%, a 3.1% improvement, while also reducing computational costs. Finally, incorporating all three improvement modules simultaneously results in the best performance. Compared with the original model, F₁ scores increase by 2.9%, [email protected] by 4.3%, and [email protected]:0.95 by 8.2%, with a 20% reduction in parameter count. These experimental results robustly demonstrate the significant advantages of the LBN-YOLO model proposed in this study in terms of detection precision, speed, and the number of parameters, with each module making a positive contribution.

To further verify the effectiveness of the improved SBiFPN and evaluate its impact on road damage detection models, this study conducts a comparative analysis of the original BiFPN and SBiFPN, as shown in Table 6. The results indicate that incorporating the original BiFPN into YOLOv8n improves all performance metrics, primarily due to BiFPN’s strong multiscale feature fusion capabilities. Integrating the SBiFPN into YOLOv8n reduces fusion nodes, simplifies the feature fusion network, and retains efficient feature fusion capabilities, while reducing parameters by approximately 34% with only a slight decline in evaluation metrics. Compared with BiFPN, the integration of SBiFPN with the C2f-DWR and Dyhead modules yields significant improvements in mAP performance, with [email protected] increasing by 1.5% and [email protected]:0.95 increasing by 4.7%. The SBiFPN reduces the model complexity and the number of parameters, achieving better detection performance.

Table 6. Comparison of detection performance between BiFPN and SBiFPN.

Models	F₁ (%)	[email protected] (%)	[email protected]:0.95 (%)	Para (M)
YOLOv8n	66.7	69.7	43.7	3.0
YOLOv8n + BiFPN	67.4	70.1	44.3	3.1
YOLOv8n + SBiFPN	65.7	69.0	43.1	2.0
YOLOv8n + C2f-DWR + Dyhead + BiFPN	70.2	72.5	47.2	3.4
LBN-YOLO (ours)	70.6	74.0	51.9	2.4

To visualize the image regions that the detection model focuses on when making predictions, this study uses a heatmap visualization technique. Specific layers in the model are selected to generate the heat map. Figure 8 highlights the image regions on which the model relies when identifying cracks. It can be observed from Figure 8 that during the gradual addition of the improvement module, the attention of the model is progressively focused on the region where the cracks are located. In particular, in Figure 8(d), that is, after all the improvement modules have been added, the model performs best by focusing its attention almost exclusively on the crack range.

3.6. Comparing Different Object-Detection Algorithms

According to previous studies on road damage detection, models such as Faster-RCNN [11], SSD-MobileNet [6], YOLOv5 [37], and YOLOv7 [38] have been used for road damage detection. To evaluate the detection performance of the improved LBN-YOLO model, comparative experiments were conducted between these models and LBN-YOLO. The experimental results are shown in Table 7 and Figure 9, where the inference times are tested on the same RTX4060TI.

Table 7. Comparison of LBN-YOLO with other state-of-the-art algorithms.

Models	F₁ (%)	[email protected] (%)	[email protected]:0.95 (%)	Para (M)	FPS (f·s⁻¹)
Faster R-CNN	62.0	65.7	39.7	28.3	16.2
SSD	51.6	61.7	31.1	4.3	83.8
YOLOv5s	66.1	66.8	41.1	7.0	85.1
YOLOv7-tiny	67.1	67.7	42.5	6.0	61.7
YOLOv8n	66.7	69.7	43.7	3.0	74.5
LBN-YOLO (ours)	70.6	74.0	51.9	2.4	87.7

Note that our model performs better than YOLOv5, YOLOv7-tiny, YOLOv8n, SSD, and Faster R-CNN in detecting pavement damages. The [email protected] of LBN-YOLO is 74%, showing a 6.3% improvement compared to the top-performing YOLOv7-tiny algorithm among other state-of-the-art models. When compared with the YOLOv8n model, there is a notable 4.3% enhancement in [email protected], demonstrating significant detection efficacy. Faster R-CNN, as a two-stage detection model, operates at slower speeds and involves more parameters. Conversely, the proposed model exhibits a 20% reduction in parameters compared with YOLOv8n, facilitated by the adoption of a SBiFPN as the feature fusion network, thereby achieving a lightweight design. Moreover, the FPS reaches 87.7 frames/s, surpassing that of other models. The experimental results validate that the proposed model performs exceptionally well in detection speed and precision, making it highly cost-effective for comprehensive evaluation.

3.7. Visualization of Test Results

After the model training is completed, in order to verify the detection performance of the model and distinguish the advantages and disadvantages of the model more intuitively, the detection results are visualized in this paper. A comparison of the visualization results is shown in Figure 10.

In the first image set of Figure 10, YOLOv7-tiny misidentifies leaves as pothole damage, whereas there is no misidentification in LBN-YOLO. Road damages typically exhibit spatial continuity, making them susceptible to misdetection or multiple detections when the model lacks the capability to extract global information adequately. In the second set of images, Faster R-CNN, YOLOv5s, and YOLOv7-tiny fail to detect, while LBN-YOLO can successfully detect all the fine cracks, which indicates that it is stronger in extracting multi-scale contextual information. In the third set of images, YOLOv5s has leakage detection due to shadow interference and Faster R-CNN and YOLOv7-tiny generate multiple detection boxes. LBN-YOLO improves its ability to detect occluded cracks through Dyhead’s self-attention mechanism, leading to better detection results. In the fourth set of images, alligator cracks are a rare type of road damage and are more difficult to detect. Faster R-CNN fails to detect cracks completely, while YOLOv5s and YOLOv7-tiny detect complete cracked areas but the redundant detection boxes indicate their insufficient global information feature extraction. In contrast, the detection boxes of LBN-YOLO are accurate with high confidence. In the fifth set of images, Faster R-CNN misidentifies Road Markings as transverse cracks, while YOLOv5s and YOLOv7-tiny miss more detections. LBN-YOLO detects all road damages in the images with excellent performance.

Based on the experimental results, the LBN-YOLO model proposed in this paper exhibits superior performance in road damage detection. It effectively detects road damage across various scales and under complex conditions while maintains high detection precision. Moreover, the model has minimal parameters and high detection speed, making it highly suitable for road damage detection applications.

3.8. Performances on China_MotorBike Dataset and NEU-DET

To verify the generality of LBN-YOLO, the China_MotorBike part of the publicly available road damage dataset RDD2022 [39] and the steel surface defects dataset NEU-DET [40] are selected for model validation, and the training parameters are kept the same as those in Section 3.2.

The China_MotorBike dataset is a road damage dataset containing five label types. Detection results are presented in Table 8. Experimental results show that the proposed LBN-YOLO model surpasses the baseline YOLOv8n model in detecting all five damage types while reducing the miss rate. Figure 11 shows the confusion matrix, clearly illustrating the improved detection performance for each class.

Table 8. Comparison of detailed detection results for YOLOv8n and its improved model on the China_MotorBike dataset.

Class	P (%)	R (%)	F₁ (%)	[email protected] (%)	[email protected]:0.95 (%)
YOLOv8n:
Total	87.6	84.5	86.0	91.3	62.8
D00	83.2	74.5	78.6	84.9	52.7
D10	84.7	62.2	71.7	80.1	46.7
D20	85.8	88.1	86.9	93.5	59.9
D40	89.9	97.5	93.5	98.6	60.8
Repair	94.5	100	97.2	99.5	93.9

LBN-YOLO:
Total	90.9	88.2	89.5	92.9	66.5
D00	86.8	80.0	83.3	87.1	60.1
D10	87.0	71.1	78.3	83.9	49.3
D20	88.9	89.8	89.3	94.1	66.3
D40	92.8	100	96.3	99.5	63.1
Repair	95.5	100	97.7	99.7	94.2

The NEU-DET dataset, designed for steel surface defect detection, contains 1800 images with six label types. As shown in Table 9, experimental results comparing the YOLOv8n and LBN-YOLO models demonstrate that the proposed LBN-YOLO model outperforms YOLOv8n in [email protected] and [email protected]:0.95 for most defect types. However, the detection accuracy for Patches and Pitted_Surface slightly decreases due to their significant shape differences from road damage. Nevertheless, the LBN-YOLO model achieves a lower miss rate for these categories. Figure 12 presents the comparative confusion matrix for the improved and original models.

Table 9. Comparison of detailed detection results for YOLOv8n and its improved model on the NEU-DET dataset.

Class	P (%)	R (%)	F₁ (%)	[email protected] (%)	[email protected]:0.95 (%)
YOLOv8n:
Total	75.5	67.4	71.2	74.9	39.8
Crazing	51.5	26.2	34.7	32.7	11.6
Inclusion	80.6	81.0	80.8	84.8	44.6
Patches	82.7	78.1	80.3	85.2	52.7
Pitted_surface	81.0	74.4	77.6	81.2	45.6
Rolled_in_scale	77.4	63.5	69.8	77.1	36.2
Scratches	79.7	81.4	80.5	88.5	48.4

LBN-YOLO:
Total	73.2	73.9	73.5	77.7	44.6
Crazing	51.6	35.5	42.1	40.6	13.8
Inclusion	81.3	83.4	82.4	84.9	45.5
Patches	80.8	79.1	79.9	87.7	58.8
Pitted_surface	77.1	74.6	75.8	83.4	52.6
Rolled_in_scale	72.8	80.8	76.6	81.2	41.9
Scratches	77.4	89.8	83.1	88.7	55.2

The experimental results above indicate that LBN-YOLO improves detection performance on both public datasets while reducing parameters by 20%. Thus, the proposed model exhibits strong generalization ability and is suitable for most datasets.

4. Discussion

Existing deep learning-based road damage detection models generally have low detection precision, and it is difficult to achieve the detection of multiscale and small-object road damage and other difficulties, so this paper focuses on overcoming the above difficulties and proposes an improved model LBN-YOLO; according to the results obtained from the above experiments, this study can accurately locate and classify a variety of road damage.

The experimental results demonstrate that integrating DWR modules into the last two C2f modules of the backbone network significantly enhances the model’s ability to extract global context information and obtain a more comprehensive feature representation. Utilizing SBiFPN as the feature fusion network further reduces redundant features and minimizes the number of model parameters. In addition, incorporating cross-level horizontal connections effectively fuses multiscale features with original features. Adopting Dyhead as the detection head, which combines scale-aware, spatial-aware, and task-aware attention mechanisms, enhances the model’s ability to detect occluded small objects. The proposed LBN-YOLO effectively balances precision and speed in road damage detection, reducing model parameters and computational load without compromising precision, thereby facilitating the model’s deployment in practical applications.

Although this study has achieved high detection precision in road damage detection, the dataset limitation, especially in capturing images under complex backgrounds like foggy weather, poses a challenge. Future work will focus on collecting more diverse road damage images under complex conditions and considering integration with road damage segmentation models such as U-Net, FCN, and PSPNet. This approach aims to further improve the road damage detection precision by extracting shape features and refining the road damage detection system.

5. Conclusions

This paper proposes a road damage detection model named LBN-YOLO, aiming to improve detection precision and speed. To achieve this goal, this paper has done the following. (1) To address the challenge of a small original dataset and data imbalance prone to model overfitting, data augmentation is carried out using offline and online data enhancement methods. (2) This paper addresses the issue of feature information loss due to downsampling by introducing DWR modules into the last two C2f modules of the backbone network. (3) To improve feature fusion efficiency and reduce redundancy, SBiFPN is used as the feature fusion network to achieve cross-level weighted fusion. (4) Utilizing Dyhead as the detection head enhances the scale, space, and task-sensing ability for small objects, thereby improving the classification and localization effect of occluded small objects.

This study employs LBN-YOLO for detecting seven types of road damage. Experimental results indicate that an [email protected] of 74% at 87.7 FPS, marking a 4.3% improvement over YOLOv8n. Furthermore, in comparison to other advanced object detection models such as Faster R-CNN, SSD, YOLOv5s, and YOLOv7-tiny, the proposed method demonstrates significant advantages with an 8.3%, 12.3%, 7.2%, and 6.3% increase in [email protected], respectively. Moreover, the proposed model has been validated on publicly available road damage datasets, such as the China_MotorBike dataset and the steel surface defect dataset NEU-DET, obtaining better detection results compared with the original model and indicating good generalizability. Ablation experiments and visual validations further confirm the effectiveness of the utilized modules. For future work, the model might be deployed on edge devices, such as UAVs, to achieve real-time automatic detection of road damage.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding

This work was supported in part by Weihai Sunshine Engineering Technology Co., Ltd. (Project No. 1010025055); and in part by the Science and Technology Development Plan Project of Weihai Municipality under Grant 2022DXGJ13.

Acknowledgments

This research received no external funding.

Open Research

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

1 Montero R., Victores J. G., Martinez S., Jardón A., and Balaguer C., Past, Present and Future of Robotic Tunnel Inspection, Automation in Construction. (2015) 59, 99–112, https://doi.org/10.1016/j.autcon.2015.02.003, 2-s2.0-84944400849.
10.1016/j.autcon.2015.02.003
Web of Science® Google Scholar
2 Abdel-Qader I., Abudayyeh O., and Kelly M. E., Analysis of Edge-Detection Techniques for Crack Identification in Bridges, Journal of Computing in Civil Engineering. (2003) 17, no. 4, 255–263, https://doi.org/10.1061/(asce)0887-3801(2003)17:4(255), 2-s2.0-4043147528.
10.1061/(ASCE)0887-3801(2003)17:4(255)
Web of Science® Google Scholar
3 Peng C., Yang M., Zheng Q. et al., A Triple-Thresholds Pavement Crack Detection Method Leveraging Random Structured Forest, Construction and Building Materials. (2020) 263, https://doi.org/10.1016/j.conbuildmat.2020.120080.
10.1016/j.conbuildmat.2020.120080
Web of Science® Google Scholar
4 Liu Y., Liu F., Liu W., and Huang Y., Pavement Distress Detection Using Street View Images Captured via Action Camera, IEEE Transactions on Intelligent Transportation Systems. (2024) 25, no. 1, 738–747, https://doi.org/10.1109/tits.2023.3306578.
10.1109/tits.2023.3306578
Web of Science® Google Scholar
5 Li Z., Luo S., Xiang J., Chen Y., and Luo Q., Improved Chinese Giant Salamander Parental Care Behavior Detection Based on YOLOv8, Animals. (2024) 14, no. 14, https://doi.org/10.3390/ani14142089.
10.3390/ani14142089
Web of Science® Google Scholar
6 Liu W., Anguelov D., Erhan D. et al., SSD: Single Shot Multibox Detector, Proceedings of the 14th European Conference on Computer Vision (ECCV 2016), October 2016, Amsterdam, Netherlands, 21–37, https://doi.org/10.1007/978-3-319-46448-0_2, 2-s2.0-84990068627.
10.1007/978-3-319-46448-0_2
Google Scholar
7 Lin T. Y., Goyal P., Girshick R., He K., and Dollar P., Focal Loss for Dense Object Detection, Proceedings of the IEEE International Conference on Computer Vision, 2017, Venice, Italy, 2980–2988.
Google Scholar
8 Diwan T., Anirudh G., and Tembhurne J. V., Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications, Multimedia Tools and Applications. (2023) 82, no. 6, 9243–9275, https://doi.org/10.1007/s11042-022-13644-y.
10.1007/s11042-022-13644-y
PubMed Web of Science® Google Scholar
9 Hussain M., YOLO-v1 to YOLO-v8, The Rise of YOLO and Its Complementary Nature Toward Digital Manufacturing and Industrial Defect Detection, Machines. (2023) 11, no. 7, https://doi.org/10.3390/machines11070677.
10.3390/machines11070677
Web of Science® Google Scholar
10 Girshick R., Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision, 2015, Santiago, Chile, 1440–1448, https://doi.org/10.1109/iccv.2015.169, 2-s2.0-84964588182.
10.1109/iccv.2015.169
Google Scholar
11 Ren S., He K., Girshick R., and Sun J., Faster R-CNN: Towards real-Time Object Detection With Region Proposal Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2017) 39, no. 6, 1137–1149, https://doi.org/10.1109/tpami.2016.2577031, 2-s2.0-85019258369.
10.1109/TPAMI.2016.2577031
PubMed Web of Science® Google Scholar
12 Liu Z., Gu X., Chen J., Wang D., Chen Y., and Wang L., Automatic Recognition of Pavement Cracks from Combined GPR B-Scan and C-Scan Images Using Multiscale Feature Fusion Deep Neural Networks, Automation in Construction. (2023) 146, https://doi.org/10.1016/j.autcon.2022.104698.
10.1016/j.autcon.2022.104698
Web of Science® Google Scholar
13 Xing J., Liu Y., and Zhang G. Z., Improved YOLOV5-Based UAV Pavement Crack Detection, IEEE Sensors Journal. (2023) 23, no. 14, 15901–15909, https://doi.org/10.1109/jsen.2023.3281585.
10.1109/jsen.2023.3281585
Web of Science® Google Scholar
14 Xiong C., Zayed T., and Abdelkader E. M., A Novel YOLOv8-GAM-Wise-IoU Model for Automated Detection of Bridge Surface Cracks, Construction and Building Materials. (2024) 414, https://doi.org/10.1016/j.conbuildmat.2024.135025.
10.1016/j.conbuildmat.2024.135025
Web of Science® Google Scholar
15 Dong X., Liu Y., and Dai J., Concrete Surface Crack Detection Algorithm Based on Improved YOLOv8, Sensors. (2024) 24, no. 16, https://doi.org/10.3390/s24165252.
10.3390/s24165252
Web of Science® Google Scholar
16 Yu Z., Shen Y., and Shen C., A Real-Time Detection Approach for Bridge Cracks Based on YOLOv4-FPM, Automation in Construction. (2021) 122, https://doi.org/10.1016/j.autcon.2020.103514.
10.1016/j.autcon.2020.103514
Web of Science® Google Scholar
17 Diao Z., Huang X., Liu H., and Liu Z., LE-YOLOv5: A Lightweight and Efficient Road Damage Detection Algorithm Based on Improved YOLOv5, International Journal of Intelligent Systems. (2023) 2023, no. 1, https://doi.org/10.1155/2023/8879622.
10.1155/2023/8879622
Web of Science® Google Scholar
18 Ning Z., Wang H., Li S., and Xu Z., YOLOv7-RDD: A Lightweight Efficient Pavement Distress Detection Model, IEEE Transactions on Intelligent Transportation Systems. (2024) 25, no. 7, 6994–7003, https://doi.org/10.1109/tits.2023.3347034.
10.1109/tits.2023.3347034
Web of Science® Google Scholar
19 Zhao M., Su Y., Wang J. et al., MED-YOLOv8s: A New Real-Time Road Crack, Pothole, and Patch Detection Model, Journal of Real-Time Image Processing. (2024) 21, no. 2, https://doi.org/10.1007/s11554-023-01405-5.
10.1007/s11554-023-01405-5
Web of Science® Google Scholar
20 Xing Y., Han X., Pan X., An D., Liu W., and Bai Y., EMG-YOLO: Road Crack Detection Algorithm for Edge Computing Devices, Frontiers in Neurorobotics. (2024) 18, https://doi.org/10.3389/fnbot.2024.1423738.
10.3389/fnbot.2024.1423738
Web of Science® Google Scholar
21 Wang Q.Y. and Chen B., A Novel Transfer Learning Model for the Real-Time Concrete Crack Detection, Knowledge-Based Systems. (2024) 301, https://doi.org/10.1016/j.knosys.2024.112313.
10.1016/j.knosys.2024.112313
Web of Science® Google Scholar
22 Wan H., Gao L., Yuan Z. et al., A Novel Transformer Model for Surface Damage Detection and Cognition of Concrete Bridges, Expert Systems with Applications. (2023) 213, https://doi.org/10.1016/j.eswa.2022.119019.
10.1016/j.eswa.2022.119019
Web of Science® Google Scholar
23 Meng A. X., Zhang X., Yu X. et al., Investigation on Lightweight Identification Method for Pavement Cracks, Construction and Building Materials. (2024) 447, https://doi.org/10.1016/j.conbuildmat.2024.138017.
10.1016/j.conbuildmat.2024.138017
Web of Science® Google Scholar
24 He J., Wang Y., Wang Y., Li R., Zhang D., and Zheng Z., A Lightweight Road Crack Detection Algorithm Based on Improved YOLOv7 Model, Signal, Image and Video Processing. (2024) 18, no. S1, 847–860, https://doi.org/10.1007/s11760-024-03197-y.
10.1007/s11760-024-03197-y
Web of Science® Google Scholar
25 Wang X., Gao H., Jia Z., and Li Z., BL-YOLOv8: An Improved Road Defect Detection Model Based on YOLOv8, Sensors. (2023) 23, no. 20, https://doi.org/10.3390/s23208361.
10.3390/s23208361
Web of Science® Google Scholar
26 Sohaib M., Jamil S., and Kim J. M., An Ensemble Approach for Robust Automated Crack Detection and Segmentation in Concrete Structures, Sensors. (2024) 24, no. 1, https://doi.org/10.3390/s24010257.
10.3390/s24010257
Web of Science® Google Scholar
27 Liu Q., Ye H., Wang S., and Xu Z., YOLOv8-CB: Dense Pedestrian Detection Algorithm Based on In-Vehicle Camera, Electronics. (2024) 13, no. 1, https://doi.org/10.3390/electronics13010236.
10.3390/electronics13010236
Web of Science® Google Scholar
28 Qiu S., Cai B., Wang W. et al., Automated Detection of Railway Defective Fasteners Based on YOLOv8-FAM and Synthetic Data Using Style Transfer, Automation in Construction. (2024) 162, https://doi.org/10.1016/j.autcon.2024.105363.
10.1016/j.autcon.2024.105363
Web of Science® Google Scholar
29 Wei H., Liu X., Xu S., Dai Z., Dai Y., and Xu X., Dwrseg: Rethinking Efficient Acquisition of Multi-Scale Contextual Information for Real-Time Semantic Segmentation, 2022, https://arxiv.org/abs/2212.01173.
Google Scholar
30 Liu S., Qi L., Qin H., Shi J., and Jia J., Path Aggregation Network for Instance Segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, 8759–8768.
Google Scholar
31 Tan M. X., Pang R. M., and Quoc V. L., Efficientdet: Scalable and Efficient Object Detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, Seattle, 10781–10790.
Google Scholar
32 Dai X., Chen Y., Xiao B. et al., Dynamic Head: Unifying Object Detection Heads With Attentions, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, Nashville, 7373–7382.
Google Scholar
33 Maharana K., Mondal S., and Nemade B., A Review: Data Pre-Processing and Data Augmentation Techniques, Global Transitions Proceedings. (2022) 3, no. 1, 91–99, https://doi.org/10.1016/j.gltp.2022.04.020.
10.1016/j.gltp.2022.04.020
Google Scholar
34 Arya D., Maeda H., Ghosh S. K., Toshniwal D., and Sekimoto Y., RDD2020: An Annotated Image Dataset for Automatic Road Damage Detection Using Deep Learning, Data in Brief. (2021) 36, https://doi.org/10.1016/j.dib.2021.107133.
10.1016/j.dib.2021.107133
Web of Science® Google Scholar
35 Chen X., Liu C., Chen L., Zhu X., Zhang Y., and Wang C., A Pavement Crack Detection and Evaluation Framework for a UAV Inspection System Based on Deep Learning, Applied Sciences. (2024) 14, no. 3, https://doi.org/10.3390/app14031157.
10.3390/app14031157
Google Scholar
36 Doshi K. and Yilmaz Y., Road Damage Detection Using Deep Ensemble Learning, Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), 2020, Atlanta, 5540–5544.
Google Scholar
37 Guo G. and Zhang Z., Road Damage Detection Algorithm for Improved YOLOv5, Scientific Reports. (2022) 12, no. 1, https://doi.org/10.1038/s41598-022-19674-8.
10.1038/s41598-022-19674-8
Web of Science® Google Scholar
38 Wang C. Y., Bochkovskiy A., and Liao H. Y. M., YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, Vancouver, Canada, 7464–7475, https://doi.org/10.1109/cvpr52729.2023.00721.
10.1109/cvpr52729.2023.00721
Google Scholar
39 Arya D., Maeda H., Ghosh S. K., Toshniwal D., and Sekimoto Y., RDD2022: A Multi-National Image Dataset for Automatic Road Damage Detection, Geoscience Data Journal. (2024) 11, no. 4, 846–862, https://doi.org/10.1002/gdj3.260.
10.1002/gdj3.260
Web of Science® Google Scholar
40 Bao Y., Song K., Liu J., Wang Y., Yan Y., Yu H., and Li X., Triplet-Graph Reasoning Network for Few-Shot Metal Generic Surface Defect Segmentation, IEEE Transactions on Instrumentation and Measurement. (2021) 70, 1–11, https://doi.org/10.1109/tim.2021.3083561.
10.1109/tim.2021.3083561
Web of Science® Google Scholar

All articles

LBN-YOLO: A Lightweight Road Damage Detection Model Based on Multiscale Contextual Feature Extraction and Fusion

Abstract

1. Introduction

1.1. Related Works

1.2. Motivations

1.3. Contributions

2. Algorithms for Road Damage Detection

2.1. LBN-YOLO Algorithm

2.2. C2f-DWR Module

2.3. Improved Feature Fusion Network SBiFPN

2.4. Dyhead

3. Experiment and Analysis

3.1. Dataset and Preprocessing

3.2. Experimental Parameter Setting and Model Training

3.3. Evaluation Indicators

3.4. Model Validation and Results

3.5. Ablation Experiment

3.6. Comparing Different Object-Detection Algorithms

3.7. Visualization of Test Results

3.8. Performances on China_MotorBike Dataset and NEU-DET

4. Discussion

5. Conclusions

Conflicts of Interest

Funding

Acknowledgments

Open Research

Data Availability Statement

References

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley