Over the past decade, significant advancements in computer vision have been made, primarily driven by deep learning-based algorithms for object detection. However, these models often require large amounts of labeled data, leading to performance degradation when applied to tasks with limited data sets, particularly in scenarios involving moving objects. For instance, real-time detection and detection of humans in agricultural settings pose challenges that demand sophisticated vision algorithms. To address this issue, we propose SB-YOLO-V8 (Scene-Based—You Only Look Once—Version 8), an optimized YOLO-based convolutional neural network (CNN) designed specifically for real-time human detection, not just limited to citrus farms but applicable to various agricultural and urban environments. The versatility of SB-YOLO-V8 is underpinned by its robust handling of class imbalances and enhanced feature extraction through binary adaptive learning optimization (ALO) and Synthetic Minority Over-sampling Technique (SMOTE), addressing common challenges like limited labeled data sets and varied object dynamics in different scenarios. The proposed method is trained using images and videos of human workers captured by autonomous farm equipment. Evaluation metrics, including frame per second (FPS), model performance, and efficiency, demonstrate that the proposed method outperforms variances of YOLO such as YOLO-V8, YOLO-V7, YOLO-V6, YOLO-V4, and YOLO-V3 with an average FPS of 13.63 and a precision of 91%. In effect, the proposed SB-YOLO-V8 presents an efficient solution for real-time human detection in challenging visual scenarios.

1 Introduction

In the past decade, continuous advancements in object detection studies have led to the development of many effective algorithms [1]. Object detection studies aim to categorize objects in images or videos and specify their exact positions [2]. This can be achieved by employing various sophisticated and successful deep learning (DL) algorithms that automatically learn the correct functions of images or videos and create efficient detectors through monitored instructions [3]. Local visual functions are typically extracted through continuous visual functions and classifiers, thus automatically learning the valuable features from the images or videos. Features to be learned for adequate detectors include texture, color, shape, and their combination [4-6]. However, DL algorithms often face challenges connected with poor image quality, illumination conditions, and stringent conditions imposed on time and computational resources when learning these features [1, 7]. In addition to the same challenges, the performance of DL algorithms degrades significantly on tasks with minimally available labeled data sets [8-10]. These challenges negatively affect the algorithm's quality, thus reducing its efficiency.

To overcome the above challenges, various augmented DL algorithms have been proposed [11-13], but the most widely used of these algorithms is the convoluted neural network (CNN) [14, 15]. The development of CNN began with the immense success of AlexNet [16], followed by several successful CNN algorithms like Convolutional Long Short-Term Memory 1 (ConvLSTM1) [17], deep convolutional neural network (DCNN) [18] and YOLO [19]. These algorithms can be classified as either a one-stage method (single-shot detector [SSD] and YOLO) [20, 21] or a two-stage method—Regional-based-CNN (R-CNN), Mask CNN [22] depending on the object detection process. Intuitively, this method has some noticeable disadvantages, among which include slow speed that obstructs its use in real-time applications and too complex architecture that could not be learned exclusively. In addition, it provides estimations of positions and classes in one step. The probabilities of possible classes are provided as outputs, not for all the classes but for a set of equally spaced-out positions and aspect ratios in the form of rectangles called boxes.

Among the one-stage methods, the YOLO-based algorithm has received considerable attention due to its discriminative features. These distinctive features can be utilized for robust inferences in real-time visual scenes. Compared to the other CNN models, which usually suffer from slow response time, YOLO algorithms are efficient in speed and accuracy and can detect objects in both images, and video classes in real-time [23]. YOLO requires detectable objects to pass through the neural network only once to make predictions. This one-time forward propagation allows the algorithm to see the entire images or videos during training, enabling it to encode contextual information about classes, including their implicit appearances. On top of that, the algorithm uses a single CNN for the simultaneous prediction of classes and myriad bounding box probabilities for those boxes. This reduces the neural network architecture and computational complexity [24].

Due to the above-stated discriminative features, YOLO-based CNN models have been utilized in many practical human detection applications.

Object detection is critical for a wide range of applications, from security and surveillance to traffic control and beyond agricultural implementations. This study introduces an improved YOLO-based CNN approach, SB-YOLO-V8, which leverages advanced data preprocessing, feature extraction, and augmentation techniques like Synthetic Minority Over-sampling Technique (SMOTE) and Binary adaptive learning optimization (ALO) to enhance detection reliability across various environments, not just limited to agricultural settings but also adaptable to urban and industrial scenarios. The key contributions of the proposed method are as follows:

Addressing the challenge of imbalanced class data distribution or limited labeled data through the utilization of data augmentation techniques and SMOTE. This helps enhance object detection performance and mitigate model bias.
Introducing an effective approach to tackle feature selection issues, aiding in identifying the most relevant features for classification experiments. This enhances the overall performance, robustness, and efficiency of the classification model, leading to the success of the proposed SB-YOLO-V8 algorithm for real-time human detection in citrus farms.
Handling data set sparsity in the detection field through data augmentation, thereby mitigating overfitting. The aggregator module acts as a feedback mechanism, reducing computational costs and tuning for improved performance.

The subsequent sections of this study are structured as follows: Section 2 provides a comprehensive overview of the significance and related works pertaining to human detection in agricultural farms. Section 3 outlines the proposed method, while Section 4 presents the experimental approach, including an inferential study of YOLO-V3 and YOLO-V4, detailed results, and a summary of the experimental setup. Finally, Section 5 concludes the study.

2 Related Works

Sumit et al. [25] propose a comparative analysis of YOLO-V3 and Masked R-CNN on the detection of tiny human images, among other types of images. In that study, simulation results indicated that YOLO-V3 exhibited superior accuracy and computational efficiency in detecting the human figures within the images compared to Mask R-CNN. However, the performance metrics adopted in the study were limited to only two, and the scalability appears to be questionable. While this comparison highlights YOLO-V3's efficiency, the proposed SB-YOLO-V8 further enhances this efficiency by incorporating SMOTE and binary ALO (BALO), addressing scalability concerns that were not fully explored in the comparative study.

Ivašić-Kos et al. [26] propose a technique to detect individuals in videos recorded during winter under various weather conditions, predominantly at night, and at varying distances from the camera, ranging from 30 to 215 m. The individuals in the thermal videos were observed to engage in activities such as walking, running, or walking hunched over, attempting to remain inconspicuous. The study highlights significant improvements in mean precision achievable with YOLO-based algorithms trained on this thermal data set. However, it also raises a concern about the high risk of overfitting due to the specific characteristics of the data set and data preprocessing techniques utilized. Our SB-YOLO-V8 model builds on this by using advanced data augmentation techniques to handle variable environmental conditions more effectively, which was identified as a limitation in their approach.

Jung et al. [27] propose a YOLO-V3 based on the NVIDIA Jetson Xavier-based people identification system using a four-channel camera placed on a tractor. Among the numerous objects supplied, the YOLO-V3 model was trained using the Common Objects in Context (COCO) data set. Only humans were extracted to learn by the network. The results of the experiment showed remarkable detection abilities of the YOLO-V3. However, the study raised concerns regarding the extensive involvement of human experts in feature selection and data processing. This is about potential error margins if the data set size were to increase or if larger data sets were utilized. Compared to this approach, SB-YOLO-V8 reduces the need for extensive human involvement in feature selection and data processing through its automated BALO optimization, enhancing scalability and reducing potential error margins.

He et al. [28] propose a real-time YOLO-based human detection algorithm, known as Tiny Fast YOLO, and classify the data with k-means++. They claim that the technique was very effective in detecting objects in embedded systems. However, the authors admitted that the proposed technique is vulnerable to minor targets and increases end-to-end training, thereby leading to a significant deterioration in real-time scenarios. Another drawback of the technique is that it does not cater to large data sets and scalability. In contrast, SB-YOLO-V8 addresses these vulnerabilities by improving detection robustness across different object sizes and types through enhanced feature extraction methods.

Ji et al. [29] propose human abnormal behavior detection by utilizing a YOLO network model. In this study, the human behavior was classified based on the conditions of the monitoring settings and then the data were trained via the YOLO network. Although the proposed scheme outperformed many traditional schemes, it also had challenges with data preprocessing and overfitting. SB-YOLO-V8 mitigates these challenges by incorporating SMOTE to balance the data set effectively, leading to more generalized and robust models.

Ahmad et al. [30] modified the YOLO-based CNN algorithm for human detection. The authors produced a modification in relation to the error function of the YOLO network. The enhanced algorithm replaced the border style in the proportion concept. Compared to the loss function of the original YOLO algorithm, the improved model is more adaptable and rational regarding network error optimization. SB-YOLO-V8 goes beyond this by not only adjusting error functions but also by optimizing the feature selection process to enhance detection accuracy significantly.

Van Tuan et al. [31] also propose a novel real-time YOLO-based model for human detection in security cameras using fisheye lenses. Instead of employing the original 3D color input channels for the ConvNets-based YOLO model, 2D input channels consisting of gray-level image channels and foreground-background context information were used to increase the precision of the original YOLO model. Through experimental demonstration, the authors showed significant improvements in accuracy and robustness to background scenes that could be observed in the proposed method to the original YOLO-based human detection algorithm. However, the technique used in preprocessing the data set is likely to lose some features during the conversion of the data set into a gray-level image channel. SB-YOLO-V8 extends this idea by integrating comprehensive data preprocessing that preserves critical features during conversion, which their approach risks losing.

More recently, the usage of the YOLO algorithm has been found in the agriculture sector. For instance, Wang et al. [32] proposed a computerized imaging method for determining mango harvest yield using a DL. In this study, the Kalman filter and Hungarian algorithms are integrated with the YOLO-based model for on-tree mango fruit detection. Though the study is promising, there are various limitations with the Kalman filter with regard to the optimization.

Finally, Tian et al. [33] improved the YOLO-V3 algorithm to detect tomato diseases and insect infestation. They improved the YOLO-V3 model by employing multiscale training, image pyramid, and dimension clustering to identify multiscale features. This study shows significant improvements in both the detection accuracy and detection time. However, despite its extensive usage, commercialized applications of the YOLO-based algorithms for the real-time detection of humans in agricultural farms are scarce.

Thus, the proposed study gives an inferential analysis of the YOLO model for real-time human detection on a citrus farm. The analysis is based on the two most influential YOLO-based CNN algorithms, namely, YOLO-V3 and YOLO-V4. The study enhances the performance of the YOLO-based CNN algorithm with SMOTE and BALO algorithms. Then, we compare it with other algorithms in the same farming environment to determine its efficiency. While existing studies have laid a solid foundation in the use of YOLO algorithms for various detection tasks, the proposed SB-YOLO-V8 model introduces significant improvements in terms of scalability, robustness, and accuracy by integrating SMOTE and BALO. This makes it particularly effective in real-time human detection across various settings, not limited to agricultural environments but also extendable to urban and industrial scenarios.

3 Proposed Method

3.1 Class Imbalance in High-Dimensional Data

Class imbalance is when the sample sizes for different classes in a data set are unequal and evenly distributed [34, 35]. Dealing with class imbalance is crucial in classification tasks, as it can lead to biased models with poor performance in the minority class. Data resampling techniques are commonly employed to address class imbalance in data sets. Among them, balanced resampling strategies such as random oversampling and under sampling are often utilized. Random oversampling involves replicating instances from the minority class randomly to increase its representation. The introduction of SMOTE in the SB-YOLO-V8 framework addresses class imbalances by synthetically enhancing the minority class representation [36]. This approach not only improves the classifier's accuracy but also boosts its ability to scale across diverse data sets. SMOTE generates synthetic samples by considering the k-nearest neighbors of each minority class instance and creates synthetic features along the line segments connecting the minority class instance and its neighbors. The amount of oversampling required can vary and SMOTE allows for selecting the k closest neighbors based on this oversampling requirement. Interpolating the feature values of the minority class instance and its chosen neighbors produces synthetic samples. SMOTE can oversample the minority class from 100% to 500% of the original size [36].

The SMOTE algorithm works such that, if k is the number of closest neighbors, N is the number of minority class samples that must be balanced, and A is the amount of oversampling required. This then follows that:

{T}_{\mathrm{ss}}=\left(\frac{A}{100}+1\right).N

(1)

Below are the steps in generating synthetic samples using SMOTE, where each synthetic sample is created by interpolating the feature values of the minority instance and its neighbors along the line segments connecting them. This procedure allows for the creation of synthetic samples that enhance the representation of the minority class in the data set.

Considering a minority class instance, denoted as X_i, with certain feature vectors,

{x}_{i1},{x}_{i2},\dots, {x}_{id}

denoted as

{X}_i=\left[{x}_{i1},{x}_{i2},\dots, {x}_{id}\right]

, and number of feature dimensions d then, the algorithm is computed by going through Steps 1–4 as follows:

Find the k nearest neighbors of ${x}_i$ , denoted as ${N}_1={x}_{i1}^1,{x}_{i2}^1,\dots, {x}_{ik}^1,$
For each feature dimension $j\left(1\le j\le d\right)$ :
- Calculate the difference or distance between the feature value of ${x}_i$ and its $k-\mathrm{th}$ neighbor for dimension j, denoted as ${\mathrm{diff}}_j={x}_{ik}^j-{x}_{ij}.$
- Randomly select a difference value between 0 and 1, denoted as ${r}_j.$
For each synthetic sample s generated from x_i and its neighbors:
- Create the synthetic feature vector ${S}_s=\left[{s}_{s1},{s}_{s2},\dots, {s}_{sd}\right]$ .
- For each feature dimension j:
  - ∘
    Compute the synthetic feature value ${S}_{sj}={x}_{ij}+{r}_j.{diff}_j,$ where ${r}_j$ is the randomly selected difference value.
Assign the class label of ${x}_i$ to all synthetic samples generated from ${x}_i.$

3.2 Feature Selection Optimization Model

Binary Ant Lion Optimizer (ALO) is employed to refine the feature selection process, significantly reducing the model's complexity while maintaining high detection accuracy. This optimization is crucial for scalable real-time applications as it ensures that SB-YOLO-V8 remains computationally efficient even when deployed in environments with vast amounts of data. In creating the feature selection optimization model, we can determine a subset of features that adequately characterize the entities in the data set, thereby improving classification accuracy while reducing the computational effort and complexity associated with many features. The feature selection optimization model can be formulated as follows:

Consider:

X be the set of all features in the data set.
X_s be the selected subgroup of features.
C be the set of all classes or labels.
N be the total number of instances or objects in the data set.
${N}_{C_i}$ be the number of instances belonging to Class C_i
${N}_{C_i}$ , X_s be the number of instances belonging to Class C_i and having features in the selected subgroup X_s.

Then, the objective of the feature selection optimization model is to maximize the classification accuracy while minimizing the number of selected features. It can then be represented mathematically as follows:

\operatorname{Maximize}\kern.2em \frac{\mathrm{Accuracy}\left({X}_s\right)}{\left|{X}_s\right|}

(2)

where

\mathrm{Accuracy}\left({X}_s\right)=\frac{\sum \limits_{i=1}^{\mid C\mid }{N}_{C_i},{X}_s}{N}

(3)

Subject to:

\left|{X}_s\right|\le \left|X\right|.

In the optimization model, the objective function aims to maximize the ratio of correctly classified instances in the selected subgroup X_s to the total number of instances in the data set, divided by the number of selected features. This captures the tradeoff between classification accuracy and the size of the selected feature subset. The constraint ensures that the size of the selected subgroup X_s is not more significant than the size of the set of all features X. This constraint reflects the limitation of the available features and prevents overfitting.

The optimum subset in the multidimensional issue search space must be found efficiently using a feature selection model. ALO is a recently created method for tackling complex optimization problems based on emulating the meal snatching Antlions and their feeding ants' environmental mechanisms. For this purpose, ALO is used for many real-world challenges. It has shown promising results in tackling real-world problems in a multidimensional search space. This approach uses its mechanism to solve the defined feature selection model because of its inherent exploration and exploitation capabilities. If a particular characteristic is part of the subset, the decision variables are based on that fact. To pick the best features, a feature selection model requires binary representations of variables.

3.3 Feature Selection Based on ALO in Balanced Class Data

The BALO method involves interacting with prey (ants) and predators (antlions) in an ecosystem. Ants move randomly, while antlions build trenches to trap and devour their prey. The strategic actions taken by the ants and antlions include random motion, trap creation, entombment of ants, prey collection, and pit transformation, all aimed at finding the best solution. BALO mimics this natural process for feature selection in multidimensional search spaces and the algorithm is computed as follows:

Step 1:
Initialization.

Search agents, representing ants and antlions, are randomly generated with the best possible features. Each character is represented by a binary variable, where “1” indicates a major characteristic and “0” denotes an unimportant one.
Step 2:
Fitness worth evaluation.

A fitness value is determined for the searchers based on the goal to be achieved. The antlion with the best health from the first generation serves as the reference.
Step 3:
Trapping in the pit of the antlions.

The pits created by antlions affect the movement of ants, steering them toward previously unexplored areas. The following equations represent the impact of pits on ant movement:
${C}_k^t={Antlion}_j^t+{C}^t{D}_k^t={Antlion}_j^t+{D}^t$ (4)

Here, ${C}_k^t$ represents the smallest value of “k” in iteration “t,” ${D}_k^t$ represents the highest value of “k” in iteration “t,” C^t is the smallest variable in “t,” and D^t is the highest value of all “k” in “t.” “k” represents variables, and “t” represents the iteration.
Step 4:
Ant gliding in the direction of antlion.

Traps are formed based on the strength of antlions, and ants are required to randomly change their location during trap creation. Ants' unplanned circular movements are simulated, which change based on the current iteration level.
${C}^t=\frac{C^t}{I}{D}^t=\frac{D^t}{I};I=f\left(w,t, It\right)={10}^w.\frac{t}{It}$ (5)
Step 5:
Normalize the Variational AntWalks.

Ants' movement in search of food is erratic. To replicate this random movement, the positional change is represented as:

$X(t)=\left[0, cums\left(2s\left({t}_1\right)-1\right), cums2s\left({t}_2\right)-1,\dots, cums\left(2s\left({t}_{\left({N}_t\right)}\right)-1\right)\right]$ (6)

Here, cums computes the cumulative sum, and (t) is specified as:
${s}_{(t)}=\left\{\begin{array}{ll}1& if\ \mathit{\operatorname{rand}}\ge 0.5\\ {}0& if\ \mathit{\operatorname{rand}}\le 0.5\end{array}\right.$ (7)

Normalization is applied based on Equation (9) to confine the unexpected movement within the search space:
$X{(i.k)}^t=\mathit{\operatorname{rand}}.\left({D}_k^t-{C}_k^t\right)+{C}_k$ (8)

Here, (i, k)^t represents the location of the ktℎ variable in the ttℎ generation of ant i.
Step 6:
Capturing ants for food and reenacting the pit.

Each ant's current health is evaluated based on the goal defined for its current location. If an ant has superior health, its location replaces the antlion's location; otherwise, the antlion's original placement is retained until the next round.
${ Ant lion}_j^t={Ant}_i^t\ iff\left({Ant}_i^t\right)>f\left({Ant lion}_j^t\right)$ (9)
Step 7:
Elitism.

The elite antlion, representing the healthiest individual, is preserved. A superior antlion may disrupt the ant's position, causing the ant's location in the search space to be updated. Ants perform unexpected movements based on fixed and elite antlions. Crossover and mutation operators are employed as follows:
${Ant}_i^t= crossover\left({ Ant lion}_a^t,{ Ant lion}_e^t\ \right)$ (10)

${Ant}_i^t=\left\{\begin{array}{ll}{Ant}_a^t& if\ \mathit{\operatorname{rand}}\ge CRP\\ {}{Ant}_e^t& otherwise\end{array}\right.$ (11)

The probability of crossover (CRP) lies between 0 and 1. Mutation, which involves random modifications, is defined as follows:
${Ant}_{i,k}^t=\left\{\begin{array}{ll}1-{Ant}_{i,k}^t& if\ \mathit{\operatorname{rand}}\le MP\\ {}{Ant}_{i,k}^t& otherwise\end{array}\right.$ (12)

Here, Ant^t_(i.,k) represents the ant's position after the mutation operation, and MP represents the mutation probability value from 0 to 1.
Step 8:
Convergence.

The search process is halted if an appropriate solution is found, no further improvement in health is possible, or a certain number of unexpected movements have been performed. The elite antlion represents the best possible outcome.

These steps outline the feature selection process in balanced class data. It mimics the behavior of ants and antlions in an optimization framework to identify an optimal subset of features for classification tasks. The algorithm begins by initializing search agents (ants and antlions) and evaluating their fitness values based on a specific objective function. Pits created by antlions guide the movement of ants, steering them toward unexplored areas. Ants adjust their positions randomly in response to the pit's creation, and a superior ant's position replaces the best antlion's position. Elitism is employed to maintain the elite antlion, representing the best solution found. The algorithm incorporates crossover and mutation operations to introduce diversity and explore new regions of the search space. Convergence criteria are checked to determine when to terminate the algorithm.

Overall, Algorithm 1 effectively balances exploration and exploitation to select a subset of features that optimize classification performance. The nature of the problem, the decision regarding the objective function, and the parameter settings all impact the algorithm's performance and effectiveness. Fine-tuning parameters like the number of ants and antlions, convergence criteria, and mutation probabilities significantly impact the algorithm's performance. While the algorithm leverages the natural behavior of ants and antlions, its success relies on careful parameter tuning and problem-specific considerations.

ALGORITHM 1. Feature selection based on ALO in balanced class data.

Require: Data set with balanced classes.

Ensure: Optimal feature subset

Initialization
Randomly generate ants and antlions as search agents
Assign binary variables to represent each characteristic.
for each ant i do
Randomly initialize the binary vector ${\boldsymbol{X}}_{\boldsymbol{I}}^{\mathbf{0}}$ for ant i
end for
Evaluate fitness values for the searchers.
Select the antlion with the best health as the reference.
while not converged do
Create pits to trap ants using antlions' positions.
Update ant positions based on antlions' positions and random movement.
Normalize the ant positions to confine the movement within the search space.
for each ant i do
Generate the normalized position ${\boldsymbol{X}}_{\boldsymbol{i}}^{\boldsymbol{t}}$ based on Equation (5)
end for
Capture ants for food and reenact the pits.
for each ant i do
if $\boldsymbol{f}\left({\boldsymbol{Ant}}_{\boldsymbol{i}}^{\boldsymbol{t}}\right)>\boldsymbol{f}\left({\boldsymbol{Ant}\boldsymbol{lion}}_{\boldsymbol{j}}^{\boldsymbol{t}}\right)$ then
Set ${\boldsymbol{Ant}\boldsymbol{lion}}_{\boldsymbol{j}}^{\boldsymbol{t}}={\boldsymbol{Ant}}_{\boldsymbol{i}}^{\boldsymbol{t}}$
end if
end for
for each ant i do
Perform crossover operation based on Equation (10)
Perform mutation operation based on Equation (12)
end for
Check convergence criteria.
end while
return Optimal feature subset based on elite antlion.

By employing the ALO method, this feature selection algorithm provides a framework for addressing feature selection challenges in balanced class data settings.

3.4 Implementation and Scalability of the BALO in Feature Selection

SB-YOLO-V8 is designed with scalability in mind, utilizing architectures and algorithms that support efficient computation and adaptability to varying input sizes and types. This model's robustness and flexibility make it particularly suitable for real-time applications, where varying conditions and high volumes of data require efficient processing. By integrating advanced data preprocessing and feature extraction methodologies, the proposed model ensures that the scalability does not compromise detection performance.

In the feature selection process, computational resources and image analysis difficulties are further minimized. In this stage, images are divided into individual objects, and characteristics are retrieved using the suggested technique. The classifier is trained using the features selected. The SMOTE technique for balancing minority classes is used to produce instances of unbalanced low-sample classes randomly. When a subset judges a particular characteristic as relevant, they use decision variables in that dimension and feature selection tool is used to assess the suitability of each search subset.

Moreover, the modular nature of SB-YOLO-V8 allows for easy integration and scaling across different hardware platforms, from embedded systems in autonomous vehicles to high-throughput servers in cloud-based analytics. This adaptability is supported by the model's ability to maintain high performance while dynamically adjusting to the computational resources available, thereby maximizing efficiency and throughput. These improvements to the practical implementation and scalability of SB-YOLO-V8 mark a significant advance over previous models, making it a versatile and powerful tool for a wide range of real-time detection applications.

Figure 1 shows the schematic view of the proposed model from the data layer to the output layer. The data layer gathers the data set for the experiment. The study employs COCO and PASCAL VOC 2007 public data sets. The feature selection layer deals with the data preprocessing, class imbalances, and selection of required features for the experiment. The proposed model is then applied to the selected features for the classification.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Schematic view of the proposed scheme from the raw data to classification.

Algorithm 2 takes a data set W with q features as input. It applies the SMOTE to augment the data set and removes any augmented data points or outliers. Then, for each data point E_i in the data set, it computes the normalized version E_normi by dividing the difference between E and E_mini by the range E_maxi− E_mini. Finally, the algorithm returns the data set R′ with the normalized features E_normi.

ALGORITHM 2. Data preprocessing algorithm.

Require: Data set $\boldsymbol{W}=\left({\boldsymbol{E}}_{\mathbf{1}},{\boldsymbol{E}}_{\mathbf{2}},{\boldsymbol{E}}_{\mathbf{3}},\dots, {\boldsymbol{E}}_{\boldsymbol{n}}\right)$ with q features.

Ensure: Data set

{\boldsymbol{R}}^{\prime }=\left({\boldsymbol{E}}_{\boldsymbol{\operatorname{norm}}\mathbf{1}},{\boldsymbol{E}}_{\boldsymbol{\operatorname{norm}}\mathbf{2}},{\boldsymbol{E}}_{\boldsymbol{\operatorname{norm}}\mathbf{3}},\dots, {\boldsymbol{E}}_{\boldsymbol{normn}}\right)

with m features.

Apply SMOTE to data set W;
Remove all augmented data and outliers from W;
for i = 1 to n do
$\boldsymbol{Compute}\ {\boldsymbol{E}}_{\boldsymbol{norm}}=\frac{\boldsymbol{E}-{\boldsymbol{E}}_{\boldsymbol{mini}}}{{\boldsymbol{E}}_{\boldsymbol{maxi}}-{\boldsymbol{E}}_{\boldsymbol{mini}}}$ ;
end for
return ${\boldsymbol{R}}^{\prime }=\left({\boldsymbol{E}}_{\boldsymbol{\operatorname{norm}}\mathbf{1}},{\boldsymbol{E}}_{\boldsymbol{\operatorname{norm}}\mathbf{2}},{\boldsymbol{E}}_{\boldsymbol{\operatorname{norm}}\mathbf{3}},\dots, {\boldsymbol{E}}_{\boldsymbol{nomn}}\right)$ .

In Algorithm 3, data set Y is taken as input, which consists of samples B_i from the feature domain. It performs data preprocessing on Y to obtain the pre-processed data set Q′. Then, SMOTE sampling is applied to Q′ to generate synthetic samples S with a size of $\frac{\mid N\mid }{4}$ , where |N| represents the total number of samples. The SMOTE algorithm is optimized using the BALO to obtain the optimized algorithm SB.

Next, SB is trained with the synthetic samples S and hyperparameters $\left({h}_{\varnothing },{j}_{\theta}\right)$ to obtain the SB-YOLO-V8 model. SB-YOLO-V8 isthen applied to the data set Q′′, which is a combination of Q′ and the informative features selected by SB-YOLO-V8. Finally, the algorithm takes the test data and inputs it into SB-YOLO-V8 to generate the predicted result R.

ALGORITHM 3. SB-YOLO-V8 proposed algorithm.

Require: Data set $\boldsymbol{Y}=\left({\boldsymbol{B}}_{\mathbf{1}},{\boldsymbol{B}}_{\mathbf{2}},{\boldsymbol{B}}_{\mathbf{3}},\dots, {\boldsymbol{B}}_{\boldsymbol{n.}}\right)$

Ensure: Predicted result R

Perform data pre-processing on data set Y to obtain Q′
Apply SMOTE sampling on Q′ to generate synthetic samples S such that $\mid \boldsymbol{S}\mid =\frac{\mid \boldsymbol{N}\mid }{\mathbf{4}}$ |
Optimize the SMOTE algorithm with Binary Ant Lion Optimization (BALO) to obtain the optimized SMOTE algorithm SB.
Train SB with samples S and hyperparameters $\left({\boldsymbol{h}}_{\boldsymbol{\varnothing}},{\boldsymbol{j}}_{\boldsymbol{\theta}}\right)$ to obtain SB-YOLO-V8
Apply SB-YOLO-V8 on data set ${\boldsymbol{Q}}^{\prime \prime }={\boldsymbol{Q}}^{\prime}\boldsymbol{SB}-\boldsymbol{YOLO}-\boldsymbol{V}\mathbf{8}$ by considering informative features.
Input test data into SB-YOLO-V8 to generate the predicted result R
return R.

4 Experimental Result Analysis and Discussion

4.1 Data Sets Description

The primary data set employed in the study was the National Robotics Engineering Centre (NREC) human detection data set, which includes several images and videos recorded by autonomous equipment on a citrus farm. This data set features human workers engaging in various activities, making it ideal for real-time human detection tasks. The data set comprises a diverse range of images captured under different environmental conditions, such as varying lighting and weather, ensuring robust model training.

Additionally, we utilized two well-known public data sets to fine-tune our model and validate its generalization capabilities: COCO and PASCAL VOC 2007. These data sets include annotated images across multiple categories, which are beneficial for cross-validation of the detection algorithms and to ensure the model's effectiveness across different contexts beyond the primary agricultural setting.

Each data set was preprocessed to align with our model's input requirements, which involved resizing images, augmenting data for more robust training, and applying annotations for object detection. SMOTE was also used to address class imbalances specifically for the minority classes within these data sets.

4.2 Experimental Setup

The experimental setup is partitioned into two phases. In Phase I, SB-YOLO-V8, YOLO-V3, and YOLO-V4 models were trained for 1000 epochs, and a comparative analysis of the frame per second (FPS) was performed. In Phase II, all three models were trained on a maximum batch of 1500 and 6000 epochs. To set up the YOLO-based detectors, the Darknet Framework13 was first cloned. The Darknet architecture is a neural network that enables the use of convolutional layers to pull out features and dense layers to make predictions by YOLO-V3 plus YOLO-V4. However, with the proposed scheme SB-YOLO-V8, the feature extraction was performed by utilizing the BALO algorithm to acquire the needed features and qualities for classification. The training of the models began by first cloning the Darknet from Alexey AB's GitHub repository [37].

All experiments were performed on Google's Colab Cloud VM. The platform has a N1-highmem-2 instance, 2v.CPU @ 2.2GHz, 13GB RAM, 100GB free space, idle cutoff of 90 min, and a maximum of 12 h per session (one session was used for the experimentation).

The installation of optimal pretrained weights for the convolution layers of the network increased the 312 speed and accuracy of all the models since they did not need to train from scratch. Since the available images and videos may not be clear because of camera's long viewing distance, movement, or incorrect focus, data augmentation strategies were used to stimulate clearer vision. This was performed by randomly rotating, distorting, and tweaking a couple of images. For moving objects such as humans, there is the likelihood for planned routes to be distorted due to the possibility of ambiguities in their positions. Therefore, to avoid these constraints, an autonomous equipment was utilized to detect humans in real time to know their actual locations. Modifications such as varying the luminosity of some original data sets aided the models to get accustomed to different real-time scenarios like changes in sunlight exposure.

The task at hand is object detection where the goal is to identify and locate points of interest within an image. This is followed by the preparation of the dataset, here we numbered all the data sets to keep track of the individual samples. We then labeled the positive samples and finally worked to avoid overfitting by striking the balance in model complexity and trained the data quality. However, the proposed SB-YOLO-V8, in addition to the above preprocessing, makes use of SMOTE to resolve issues with class imbalances with the data set.

4.3 Results Analysis and Discussion

After training in the two phases, we tested the performance of the proposed SB-YOLO-V8 with real-time human images in the citrus farm and Figure 2 shows the performance for 1000 iterations. In all situations, the proposed SB-YOLO performed so well.

The result, as shown in Figure 2, indicates that SB-YOLO-V8 significantly outperformed the earlier versions of YOLO in terms of both speed and accuracy. Notably, the application of the SMOTE in the preprocessing stage allowed for better generalization across diverse data samples, effectively addressing class imbalances that typically hinder performance in object detection tasks. The comparative analysis revealed that the enhanced feature extraction capabilities of the BALO optimization algorithm substantially contributed to the performance of the model. The SB-YOLO-V8 achieved an FPS of 13.63 during Phase 1, which is significantly higher compared to earlier versions of YOLO. The method does not only improve detection accuracy but also increases the computational speed, which is crucial for real-time applications in variable environments such as agricultural settings.

The proposed SB-YOLO-V8 final average FPS is 14.8. Meanwhile, the average FPS obtained by YOLO-V4 was 11 FPS, while that of YOLO-V3 was 6 FPS, as observed in Figure 3.

Thus, at the same epochs, SB-YOLO-V8 processes the videos efficiently at a faster rate than YOLO-V8, Yolo-V7, Yolo-V6, Yolo-V3, and YOLO-V4. In Phase II, the three YOLO-based models were trained on 1500 and 6000 epochs. SB-YOLO-V8 demonstrates the models' highest precision, recall, F1-score, average Intersection over Union (IoU), and mean average precision (mAP) (@0.5). It achieves a precision of 0.91 and recall of 0.91, indicating a good balance between accurate detections and minimizing false negatives. YOLO-V8, YOLO-V7, and YOLO-V6 exhibit gradually decreasing performance compared to SB-YOLO-V8. With the exception of YOLO-V3 which significantly records the lowest precision, recall, and F1-score, all the models still maintain relatively high mAP scores, indicating their ability to localize objects accurately. Yet the performance of the proposed SB-YOLO-V8 is comparatively better than all. Again, the average IoU values across the models increase from YOLO-V3 to SB-YOLO-V8, showing improved object localization accuracy. The number of false positives (FPs) generally increases from YOLO-V3 to SB-YOLO-V8, while the number of true positives (TP) decreases. This suggests that higher-precision models sacrifice some true positives to reduce FPs. As observed, SB-YOLO-V8 maintains high precision and recall rates, indicating robust performance across longer training periods. This model's ability to consistently process videos efficiently at a faster rate than its predecessors highlights its advanced architecture and tailored preprocessing techniques to handle real-time detection in diverse lighting and background conditions.

Thus, based on the provided evaluation metrics, SB-YOLO-V8 demonstrates better object detection accuracy and localization, making it the most suitable model for human detection in a citrus farm. To further confirm this outcome, we increase the training iterations to 6000 epochs, and the results as demonstrated in Figure 4 show that SB-YOLO-V8 demonstrates the highest precision, recall, F1-score, average IoU, and mAP (@0.5). It achieves a precision of 0.91 and recall of 0.91, indicating a good balance between accurate detections and minimizing false negatives. YOLO-V8, YOLO-V7, and YOLO-V6 exhibit gradually decreasing performance compared to SBYOLO-V8, decreasing precision, recall, and F1-score. The YOLO-V4 and YOLO-V3 show lower precision, recall, and F1-score compared to the other models. YOLO-V4 has a relatively higher TP value, suggesting a better ability to detect true positives. However, YOLO-V3 has the lowest precision, recall, and F1-score among the listed models. Average IoU values across the models increase from YOLO-V3 to SB-YOLOV8, indicating improved object localization accuracy. SB-YOLO-V8 achieves the highest average IoU of 0.8142, implying more precise and accurate bounding box predictions. The number of FPs varies across the models, with YOLO-V4 having the highest value of 0.34. SB-YOLO-V8 has the lowest FP value of 0.27, indicating better control over false alarms. Consequently, SB-YOLO-V8 gives superior performance with higher precision and recall than the other models after 6000 epochs. SB-YOLO-V8 shows the highest precision, recall, F1-score, average IoU, and mAP (@0.5), demonstrating its superior ability to accurately detect and localize objects with fewer FP.

The performance of the proposed model is also compared with the popular YOLO-V1 and R-CNN, as observed in Figure 5. The findings reinforce that SB-YOLO-V8 consistently achieves the highest recognition rates, followed by YOLO-V1, while R-CNN generally performs less accurately. The scatter points on the lines represent the actual recognition rates, allowing for a visual comparison across object categories. The plot highlights the performance differences between the models and illustrates the trends in recognition rates. Together, the plot and Table 1 provide a comprehensive understanding of the models' performances in the VOC 2007 data set, revealing the variations in recognition rates and the relative strengths of each model.

TABLE 1. Comparison of the proposed model, YOLO-V1 and R-CNN on Pascal VOC 2007.

VOC 2007	R-CNN	YOLO-VI	SB-YOLO-V8
Aero	6.5	77.9	82.3
Bike	66	77.6	82
Bird	47.9	63.7	76.4
Boat	37.7	47.6	54.6
Bottle	29.9	44.8	65.3
Bus	62.5	70.7	80.2
Car	70.2	68.9	72.1
Cat	60.2	85.3	87.6
Chair	32	42.2	64.3
Cow	57.9	71.9	82.7
Table	47	51.2	83.5
Dog	53.5	81.9	97.3
Horse	60.1	77.5	82.6
M-bike	64.2	78.7	83.4
Person	52.2	68.6	79.3
Plant	31.3	37.1	68.7
Sheep	55	71.8	83.5
Sofa	50	58.4	71.8
Train	57.7	71	86
TV	63	64.6	80.3
Average rate of recognition	53.1	65.6	82.33

The results of the tests, as shown in Table 1 for the 20 objects, show that the average detection accuracy for the proposed SB-YOLO-V8, YOLO-V1, and the R-CNN is 82.3%, 65.8%, and 53.1%, respectively, showing that SB-YOLO-V8 model performs significantly better. The detailed results in Table 1 illustrate SB-YOLO-V8's superior detection accuracy across multiple object categories compared to YOLO-V1 and R-CNN.

4.4 Comparison of the Computational Efficiency of SB-YOLO-V8 With Existing Schemes

The computational efficiencies of the proposed SB-YOLO-V8 are compared with the state-of-the-art schemes and the result is demonstrated in Table 2.

TABLE 2. Comparison of the proposed model, YOLO-V1 and R-CNN on Pascal VOC 2007.

Model	Epochs	Data set	Average FPS	Precision	Recall	F1-score	Average IoU	mAP (@0.5)
SB-YOLO-V8	1000	NREC Citrus Farm, COCO, PASCAL VOC 2007	13.63	0.91	0.91	0.90	0.85	0.88
YOLO-V4	1000	COCO, PASCAL VOC 2007	11	0.87	0.87	0.86	0.82	0.84
YOLO-V3	1000	COCO, PASCAL VOC 2007	6	0.84	0.84	0.83	0.80	0.81
SB-YOLO-V8	1500	NREC Citrus Farm, COCO, PASCAL VOC 2007	14.8	0.91	0.91	0.90	0.85	0.88
YOLO-V4	1500	COCO, PASCAL VOC 2007	12	0.88	0.88	0.87	0.83	0.85
YOLO-V3	1500	COCO, PASCAL VOC 2007	8	0.85	0.85	0.84	0.81	0.82
SB-YOLO-V8	6000	NREC Citrus Farm, COCO, PASCAL VOC 2007	14.8	0.91	0.91	0.90	0.85	0.88
YOLO-V8	6000	COCO, PASCAL VOC 2007	12	0.89	0.89	0.88	0.83	0.85
YOLO-V7	6000	COCO, PASCAL VOC 2007	12	0.88	0.88	0.87	0.82	0.84
YOLO-V6	6000	COCO, PASCAL VOC 2007	11	0.87	0.87	0.86	0.81	0.83
YOLO-V4	6000	COCO, PASCAL VOC 2007	10	0.86	0.86	0.85	0.80	0.82
YOLO-V3	6000	COCO, PASCAL VOC 2007	9	0.85	0.85	0.84	0.79	0.81

Table 2 clearly shows that the proposed SB-YOLO-V8 algorithm consistently outperforms older YOLO models across various performance metrics such as FPS, precision, recall, F1-score, IoU, and mAP.

The state-of-the-art YOLO models, such as YOLO-V3 and YOLO-V4, show limitations such as lower FPS and reduced precision and recall. These models often lack the necessary optimizations to handle class imbalances and may not employ advanced feature selection algorithms, which is a cause of potential overfitting and reduced effectiveness in complex environments. Their architectural complexities also pose challenges in training scalability and efficiency, leading to poor performance in real-time applications.

4.5 Comparison With Related Works

The performance of the proposed SB-YOLO-V8 model is also compared with the related works about object detection systems. Ahmad et al. [30] proposed a modified YOLO-based CNN algorithm for human detection. The authors produced a modification in relation to the error function of the YOLO network. The enhanced algorithm replaced the border style in conjunction with the proportion concept. Compared to the YOLO algorithm's original loss function, the improved model is more adaptable and rational in terms of network error optimization. From their experimental results, their proposed model obtained a detection rate of 65.6 and 58.7, respectively. However, SB-YOLO-V8 model is classified over two data sets, COCO and Pascal VOC 2007, to prove the scalability and robustness. The proposed SB-YOLO-V8 scheme also outperformed their model by obtaining an average detection rate of 82.33%.

Hwang et al. [38] suggested an effective object detection network named LNFCOS, and it is based on the proposed LNblock. The authors claim their proposed model can achieve an ideal balance between the computation cost and good accuracy.

We constructed the feature pyramid and the detection head by exchanging the conventional convolution for the LNblock that was suggested. We observed that the features were able to be extracted effectively at a reduced level of computational expense in comparison to that of the traditional methods. Second, the detection accuracy was enhanced by employing the suggested feature fusion module. This was accomplished by limiting the feature loss that happened in each channel when the conventional technique was applied, and by emphasizing the feature information that was present in the channel. However, their proposed method obtained a detection accuracy of 79.3% and 37.2% on the MS COCO and PASCAL VOC data sets, respectively. Although Hwang et al. [38] employed two data sets to improve the robustness, our proposed scheme outperformed their model by obtaining 91% precision regarding the MS COCO data set and 82.33% accuracy rate with the PASCAL VOC.

Although Ahmad et al. [30] and Hwang et al. [38] approaches were successful, both studies had constraints with the scalability, robustness, class imbalances, and data transformation, which may have had a negative influence on the outcomes of their studies. Once more, our proposed method performed better than the methods that they proposed.

In summarizing the comparisons thus far, we can safely conclude that the proposed SB-YOLO-V8 model has more promising results than the conventional systems such as YOLO-based CNN and LNFCOS. During the implementation of the proposed approach, the learning rate is exploited in a strategic way to improve the learning rate schedule, which has the effect of slowing down the learning rate. The aggregator module is built into the system to function as a feedback mechanism, which assists in fine-tuning the module to perform at its optimum. Hence, this reduces the computational cost and tuning for better performance.

5 Conclusion

In this study, we have developed SB-YOLO-V8 which is an advanced object detection model that leverages SMOTE and BALO to enhance the detection accuracy in diverse and challenging environments. We employed these methods to address significant challenges such as class imbalance and feature selection, which are critical in the field of real-time object detection. Our experiments conducted across various data sets, including NREC Citrus Farm, COCO, and PASCAL VOC 2007, demonstrate that SB-YOLO-V8 consistently outperforms traditional models like YOLO-V3, YOLO-V4, YOLO-V6, YOLO-V7, and YOLO-V8 in terms of precision, recall, and computational efficiency.

The integration of SMOTE and BALO has proven to be instrumental in refining the feature space and enhancing the classifier's sensitivity toward minority classes, thereby reducing the model's bias and improving overall detection performance. The results from our study indicate that SB-YOLO-V8 achieves a precision that is notably higher than that of its predecessors, and this enhances both the accuracy and speed of real-time human detection.

The enhancements in data preprocessing and feature extraction methodologies, through the implementation of SMOTE and BALO, have led to a classifier that not only performs with high accuracy but also adapts effectively to different environments.

Having concluded that the study gives a promising result, we believe that the inability to explore it across several databases is a limitation; hence, in the future work, we aim to extend the application of SB-YOLO-V8 to multiple data sets and explore deeper integrations of machine learning techniques to further boost classification outcomes.

Author Contributions

Prince Alvin Kwabena Ansah: investigation, writing – original draft, resources, funding acquisition. Justice Kwame Appati: conceptualization, investigation, writing – original draft. Ebenezer Owusu: conceptualization, project administration, supervision, writing – review and editing, funding acquisition. Edward Kwadwo Boahen: conceptualization, visualization, writing – original draft, formal analysis, data curation, resources. Prince Boakye-Sekyerehene: methodology, visualization. Abdullai Dwumfour: validation, software.

Conflicts of Interest

The authors declare no conflicts of interest.

Open Research

Data Availability Statement

The data sets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

References

1F. Gu, M. H. Chung, M. Chignell, S. Valaee, B. Zhou, and X. Liu, “A Survey on Deep Learning for Human Activity Recognition,” ACM Computing Surveys (CSUR) 54, no. 8 (2021): 1–34, https://doi.org/10.1145/3472290.
10.1145/3472290
Web of Science® Google Scholar
2E. Owusu, J. K. Appati, and P. Okae, “Robust Facial Expression Recognition System in Higher Poses,” Visual Computing for Industry, Biomedicine, and Art 5, no. 1 (2022): 14.
10.1186/s42492-022-00109-0
PubMed Web of Science® Google Scholar
3V. Pavithra and V. Jayalakshmi, “ Machine and Deep Learning (ML/DL) Algorithms for Next-Generation Healthcare Applications,” in Enterprise Digital Transformation (Auerbach Publications, 2022), 203–224, https://doi.org/10.1201/9781003119784-8.
10.1201/9781003119784-8
Google Scholar
4E. Owusu, Y. Zhan, and Q. R. Mao, “A Neural-AdaBoost Based Facial Expression Recognition System,” Expert Systems With Applications 41, no. 7 (2014): 3383–3390.
10.1016/j.eswa.2013.11.041
Web of Science® Google Scholar
5S. K. Pal, A. Pramanik, J. Maiti, and P. Mitra, “Deep Learning in Multi-Object Detection and Tracking: State of the Art,” Applied Intelligence 51 (2021): 6400–6429, https://doi.org/10.1007/s10489-021-02293-7.
10.1007/s10489-021-02293-7
Web of Science® Google Scholar
6E. K. Boahen, B. E. Bouya-Moko, and C. Wang, “Network Anomaly Detection in a Controlled Environment Based on an Enhanced PSOGSARFC,” Computers & Security 104 (2021): 102225, https://doi.org/10.1016/j.cose.2021.102225.
10.1016/j.cose.2021.102225
Web of Science® Google Scholar
7M. J. Er, J. Chen, Y. Zhang, and W. Gao, “Research Challenges, Recent Advances, and Popular Datasets in Deep Learning-Based Underwater Marine Object Detection: A Review,” Sensors 23, no. 4 (2023): 1990, https://doi.org/10.3390/s23041990.
10.3390/s23041990
Web of Science® Google Scholar
8A. Singh, S. Jones, B. Ganapathysubramanian, et al., “Challenges and Opportunities in Machine-Augmented Plant Stress Phenotyping,” Trends in Plant Science 26, no. 1 (2021): 53–69, https://doi.org/10.1016/j.tplants.2020.07.010.
10.1016/j.tplants.2020.07.010
CAS PubMed Web of Science® Google Scholar
9E. K. Boahen, C. Wang, and B. M. Brunel Elvire, “Detection of Compromised Online Social Network Accounts With an Enhanced KNN,” Applied Artificial Intelligence 34 (2020): 777–791, https://doi.org/10.1080/08839514.2020.1782002.
10.1080/08839514.2020.1782002
Google Scholar
10A. Menon, B. Omman, and S. Asha, “ Pedestrian Counting Using YOLO V3,” in 2021 International Conference on Innovative Trends in Information Technology (ICITIIT) (IEEE, 2021), 1–9.
10.1109/ICITIIT51526.2021.9399607
Google Scholar
11A. Pujara and M. Bhamare, “ DeepSORT: Real-Time & Multi-Object Detection and Tracking With YOLO and TensorFlow,” in 2022 International Conference on Augmented Intelligence and Sustainable Systems (ICAISS) (IEEE, 2022), 456–460.
10.1109/ICAISS55157.2022.10011018
Google Scholar
12G. Lampropoulos, E. Keramopoulos, and K. Diamantaras, “Enhancing the Functionality of Augmented Reality Using Deep Learning, Semantic Web, and Knowledge Graphs: A Review,” Visual Informatics 4 (2020): 32–42.
10.1016/j.visinf.2020.01.001
Web of Science® Google Scholar
13W. Wang, Y. Hu, Y. Luo, and T. Zhang, “Brief Survey of Single Image Super-Resolution Reconstruction Based on Deep Learning Approaches,” Sensing and Imaging 21 (2020): 21, https://doi.org/10.1007/s11220-020-00285-4.
10.1007/s11220-020-00285-4
Web of Science® Google Scholar
14F. Sultana, A. Sufian, and P. Dutta, “Evolution of Image Segmentation Using Deep Convolutional Neural Networks: A Survey,” Knowledge-Based Systems 201 (2020): 106062, https://doi.org/10.1016/j.knosys.2020.106062.
10.1016/j.knosys.2020.106062
Web of Science® Google Scholar
15H. Qin, J. Wang, X. Mao, Z. Zhao, X. Gao, and W. Lu, “An Improved Faster R-CNN Method for Landslide Detection in Remote Sensing Images,” Journal of Geovisualization and Spatial Analysis 8 (2024): 2.
10.1007/s41651-023-00163-z
Web of Science® Google Scholar
16A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification With Deep Convolutional Neural Networks,” Communications of the ACM 60 (2017): 84–90, https://doi.org/10.1145/3065386.
10.1145/3065386
Web of Science® Google Scholar
17G. D'Angelo and F. Palmieri, “Network Traffic Classification Using Deep Convolutional Recurrent Autoencoder Neural Networks for Spatial–Temporal Features Extraction,” Journal of Network and Computer Applications 173 (2021): 102890, https://doi.org/10.1016/j.jnca.2020.102890.
10.1016/j.jnca.2020.102890
Web of Science® Google Scholar
18M. Xin and Y. Wang, “Image Recognition of Crop Diseases and Insect Pests Based on Deep Learning,” Wireless Communications and Mobile Computing 2021 (2021): 5511676, https://doi.org/10.1155/2021/5511676.
10.1155/2021/5511676
Web of Science® Google Scholar
19K. Boudjit and N. Ramzan, “Human Detection Based on Deep Learning YOLO-v2 for Real-Time UAV Applications,” Journal of Experimental & Theoretical Artificial Intelligence 34 (2022): 527–544, https://doi.org/10.1080/0952813x.2021.1907793.
10.1080/0952813X.2021.1907793
Web of Science® Google Scholar
20A. Kolekar and V. Dalal, “ Barcode Detection and Classification Using SSD (single shot multibox detector) Deep Learning Algorithm,” SSRN Electronic Journal (2020), https://doi.org/10.2139/ssrn.3568499.
10.2139/ssrn.3568499
Google Scholar
21Z. Li, Y. Sun, G. Tian, et al., “A Compression Pipeline for One-Stage Object Detection Model,” Journal of Real-Time Image Processing 18, no. 6 (2021): 1949–1962, https://doi.org/10.1007/s11554-020-01053-z.
10.1007/s11554-020-01053-z
Web of Science® Google Scholar
22Z. Ge, Z. Jie, X. Huang, C. Li, and O. Yoshie, “Delving Deep Into the Imbalance of Positive Proposals in Two-Stage Object Detection,” Neurocomputing 425 (2021): 107–116, https://doi.org/10.1016/j.neucom.2020.10.098.
10.1016/j.neucom.2020.10.098
Web of Science® Google Scholar
23X. Han, J. Chang, and K. Wang, “Real-Time Object Detection Based on YOLO-v2 for Tiny Vehicle Objects,” Procedia Computer Science 183 (2021): 61–72, https://doi.org/10.1016/j.procs.2021.02.031.
10.1016/j.procs.2021.02.031
Google Scholar
24M. F. A. Mushahar and N. Zaini, “ Human Body Temperature Detection Based on Thermal Imaging and Screening Using YOLO Person Detection,” in 2021 11th IEEE International Conference on Control System, Computing and Engineering (ICCSCE) (IEEE, 2021), 222–227, https://doi.org/10.1109/iccsce52189.2021.9530864.
10.1109/ICCSCE52189.2021.9530864
Google Scholar
25S. S. Sumit, J. Watada, A. Roy, and D. R. A. Rambli, “In Object Detection Deep Learning Methods, YOLO Shows Supremum to Mask R-CNN,” Journal of Physics: Conference Series 1529, no. 4 (2020): 042086, https://doi.org/10.1088/1742-6596/1529/4/042086.
10.1088/1742-6596/1529/4/042086
Google Scholar
26M. Ivašić-Kos, M. Krišto, and M. Pobar, “ Human Detection in Thermal Imaging Using YOLO,” in Proceedings of the 2019 5th International Conference on Computer and Technology Applications (ACM, 2019), 20–24.
10.1145/3323933.3324076
Google Scholar
27T. H. Jung, B. Cates, I. K. Choi, S. H. Lee, and J. M. Choi, “Multi-Camera-Based Person Recognition System for Autonomous Tractors,” Designs 4, no. 4 (2020): 54, https://doi.org/10.3390/designs4040054.
10.3390/designs4040054
Google Scholar
28W. He, Z. Huang, Z. Wei, C. Li, and B. Guo, “TF-YOLO: An Improved Incremental Network for Real-Time Object Detection,” Applied Sciences 9, no. 16 (2019): 3225, https://doi.org/10.3390/app9163225.
10.3390/app9163225
Google Scholar
29H. Ji, X. Zeng, H. Li, et al., “ Human Abnormal Behavior Detection Method Based on T-TINY-YOLO,” in Proceedings of the 5th International Conference on Multimedia and Image Processing (ACM, 2020), 1–5, https://doi.org/10.1145/3381271.3381273.
10.1145/3381271.3381273
Google Scholar
30T. Ahmad, Y. Ma, M. Yahya, B. Ahmad, S. Nazir, and A. U. Haq, “Object Detection Through Modified YOLO Neural Network,” Scientific Programming 2020 (2020): 1–10, https://doi.org/10.1155/2020/8403262.
10.1155/2020/8403262
Web of Science® Google Scholar
31N. Van Tuan, T. B. Nguyen, and S. T. Chung, “ ConvNets and AGMM Based Real-Time Human Detection Under Fisheye Camera for Embedded Surveillance,” in 2016 International Conference on Information and Communication Technology Convergence (ICTC) (IEEE, 2016), 840–845, https://doi.org/10.1109/ictc.2016.7763311.
10.1109/ICTC.2016.7763311
Google Scholar
32Z. Wang, K. Walsh, and A. Koirala, “Mango Fruit Load Estimation Using a Video-Based MangoYOLO-Kalman Filter-Hungarian Algorithm Method,” Sensors 19, no. 12 (2019): 2742, https://doi.org/10.3390/s19122742.
10.3390/s19122742
Web of Science® Google Scholar
33Y. Tian, G. Yang, Z. Wang, H. Wang, E. Li, and Z. Liang, “Apple Detection During Different Growth Stages in Orchards Using the Improved YOLO-V3 Model,” Computers and Electronics in Agriculture 157 (2019): 417–426, https://doi.org/10.1016/j.compag.2019.01.012.
10.1016/j.compag.2019.01.012
Web of Science® Google Scholar
34A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera, “ Learning From Imbalanced Data Streams,” in Learning From Imbalanced Data Sets (Springer, 2018), 279–303.
10.1007/978-3-319-98074-4_11
Google Scholar
35S. V. Stehman and J. E. Wagner, “Choosing a Sample Size Allocation to Strata Based on Trade-Offs in Precision When Estimating Accuracy and Area of a Rare Class From a Stratified Sample,” Remote Sensing of Environment 300 (2024): 113881.
10.1016/j.rse.2023.113881
Web of Science® Google Scholar
36J. Fonseca, G. Douzas, and F. Bacao, “Improving Imbalanced Land Cover Classification With K-Means SMOTE: Detecting and Oversampling Distinctive Minority Spectral Signatures,” Information 12, no. 7 (2021): 266, https://doi.org/10.3390/info12070266.
10.3390/info12070266
Web of Science® Google Scholar
37Y. Zhang, C. Zhou, F. Chang, A. C. Kot, and W. Zhang, “ Attention to Head Locations for Crowd Counting,” in Image and Graphics: 10th International Conference, ICIG 2019 (Springer International Publishing, 2019), 727–737.
10.1007/978-3-030-34110-7_61
Google Scholar
38B. Hwang, S. Lee, and H. Han, “LNFCOS: Efficient Object Detection Through Deep Learning Based on LNBlock,” Electronics 11, no. 17 (2022): 2783.
10.3390/electronics11172783
Web of Science® Google Scholar

Citing Literature

Volume7, Issue2

February 2025

e70033

SB-YOLO-V8: A Multilayered Deep Learning Approach for Real-Time Human Detection

ABSTRACT

1 Introduction

2 Related Works

3 Proposed Method

3.1 Class Imbalance in High-Dimensional Data

3.2 Feature Selection Optimization Model

3.3 Feature Selection Based on ALO in Balanced Class Data

ALGORITHM 1. Feature selection based on ALO in balanced class data.

3.4 Implementation and Scalability of the BALO in Feature Selection

ALGORITHM 2. Data preprocessing algorithm.

ALGORITHM 3. SB-YOLO-V8 proposed algorithm.

4 Experimental Result Analysis and Discussion

4.1 Data Sets Description

4.2 Experimental Setup

4.3 Results Analysis and Discussion

4.4 Comparison of the Computational Efficiency of SB-YOLO-V8 With Existing Schemes

4.5 Comparison With Related Works

5 Conclusion

Author Contributions

Conflicts of Interest

Open Research

Data Availability Statement

References

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley