In order to explore the action recognition, tracking, and optimization analysis of the training process based on the SVR model and multimedia technology, the author proposes based on the radial basis function model, researching a new surrogate model technology-support vector regression (SVR). We first introduce the basic principles of SVR, select the parameters of SVR, and then elaborate the basic steps of SVR modeling. Then, we design and optimize application examples through numerical example multimedia technology; the validity of the support vector regression method is verified. Experimental results: the comparison of SVR1 and SVR2 shows that the utilization of multiscale timing feature maps should occur after tem (SVR2) rather than being directly fused in the feature dimension (SVR1), mainly because small-scale information affects the resolution of large-scale information; on data sets such as ActivityNet, in order to verify the effectiveness of SVR and DR-Dvc algorithms, the performance of the proposed algorithm and the baseline before improvement and the current mainstream algorithm are respectively compared. Experimental results show the proposed algorithm has a significant performance improvement compared to before the improvement; at the same time, it is better than most current mainstream algorithms, which proves the feasibility and effectiveness of the algorithm. Describing the introduction of regression can effectively improve the performance of sequential action proposals and event description algorithms, and compared with the current mainstream methods, it has certain performance advantages.

1. Introduction

As shown in Figure 1, surrogate model refers to a mathematical model that meets the required accuracy to replace complex numerical calculations or physical experiments; at the same time, the calculation cost is low and the calculation efficiency is high. The construction process of the proxy model is generally divided into the following: (1) construct sample points based on a certain experimental design; (2) based on sample points, a mathematical approximation method is used to fit a mathematical model that meets the accuracy requirements [1]. Therefore, from a mathematical point of view, the surrogate model is actually through the method of fitting or interpolation; we use the sample points to construct a function to predict the response value of the unknown point. The author adopts the Latin hypercube design method and focuses on the introduction of approximate methods [2]. More proxy models are used in multidisciplinary design optimization; there are radial basis function (RBF) interpolation model, Kriging model, RSM polynomial response surface model, BP neural network model, and support vector machine (SVM) model. According to the literature, each agency model has advantages and disadvantages; therefore, in the MDO design process, it needs to be based on the characteristics of the physical object being studied; we choose the most suitable model. In view of the strong ability of support vector machines to deal with nonlinear problems, there is no need to run through all sample points and the characteristics of strong data smoothing ability [3]. Currently, computer vision is studied in the image analysis task. The outstanding results, studies on face recognition and image retrieval have demonstrated the success of deep learning in image analysis tasks, while the Faster-RCNN series models and YOLO4, ssD’s models proposed for image target detection and segmentation tasks are close to mature, and the error rate of target detection and classification tasks in PASCAL VOC and ImageNet competitions is far less than 0.1. Video intelligent analysis has made some columns of development, and, let the machine understand that the video content is the most critical step, although there are a lot of picture technology and research methods can learn for reference, but the increase of video timing dimension brings many problems yet to be solved; video intelligent analysis research is still in the primary stage. Video intelligent analysis has made some columns of development; the main research focused on including video behavior recognition, timing action proposal, timing motion detection and video description, multiple directions, represented by THUMOS-4, ActivityNet, multiple large-scale video understanding data sets, and the corresponding video understanding competition, greatly promoting the research and development of video intelligent analysis. At the same time, the improvement of the video understanding research system and the success of the computer vision technology in the pictures have provided a good research background for the development of the video understanding research work.

Details are in the caption following the image — Open in figure viewer PowerPoint

Based on the current research, the author proposes based on the radial basis function model, researching a new proxy model technology-support vector regression (SVR). Designing and optimizing application examples through numerical example multimedia technology, the validity of the support vector regression method is verified. Experimental results: we verify the effectiveness of the SVR and DR-Dvc algorithms on data sets such as ActivityNet, the performance of the proposed algorithm, and the baseline before improvement and the current mainstream algorithms are respectively compared. The experimental results show that the proposed algorithm has a significant performance improvement compared to before the improvement; at the same time, it is better than most current mainstream algorithms, which proves the feasibility and effectiveness of the algorithm.

2. Literature Review

Dutt et al. proposed a behavior recognition scheme based on the dense trajectory of traditional feature extraction methods, as shown in Figure 2; the basic idea of this scheme is to first obtain the characteristic trajectory in the video frame sequence through the optical flow field; based on this feature trajectory, four types of features are extracted: HOF, HOG, MBH, and trajectory [4]. Aiming at the problem of DT algorithm extracting features subject to environmental constraints, Liu et al. proposed an improved DT algorithm i DT (improved DT); it mainly uses the optical flow between the two video frames before and after and the SURF key points to match, so as to eliminate or reduce the impact of camera movement; at the same time, Fisher vector (FV) is used to encode the features and the feature normalization method is improved. This makes the i DT algorithm the best method with the best effect, stability, and reliability before deep learning enters the field [5]. The earliest video feature extraction method based on deep learning is the dual-stream video feature extraction method proposed by Hu et al., the basic principle is to calculate dense optical flow for every two frames in a video sequence, obtain the dense optical flow vector diagram of the video frame sequence (including timing information), and then train the 2DcNN feature extraction model for the video RGB image and the dense optical flow vector diagram, respectively; the two branches of the network use the SiNGle-Shot method to reason about the action category, respectively; finally, the multiple classification results obtained by SiNGle-Shot are fused through the classification scoring fusion module; fusion methods include simple average and support vector machine (SVM) two methods; finally, the final classification result is obtained by combining the dual-stream inference results [6]. On this basis, Abrahamyan et al. used the cNN network to perform spatial and temporal feature fusion and replaced the basic time and space network with the vGG-16t19i structure; the accuracy on the Ucf101 and hmDB51 data sets is respectively 92.5% and 65.4% [7]. In the same year, Zakaria et al. did a lot of work on the research of the Shuangliu method, the two-stream scheme tSN network, which is currently widely used, is proposed. In terms of input data, in addition to the traditional RGB image and optical flow input, the tSN network, i also tried RGB image difference and curved optical flow, the experimental results of the final thesis obtained the best results in the combination of RGB + opticAl flow + wARpeD opticAl flow. In terms of network structure, tSN tried vGG-16, GooGleNet, and BN-iNceptioN three network structures. Among them, BN-iNceptioN has the best experimental effect. In terms of training strategy, tSN also introduces methods such as cross-modal pretraining, regularization, and data enhancement. Finally, the accuracy rates on the Ucf101 and hmDB51 data sets reached 94.2% and 69.4% [8]. Sharma et al. improved the fusion part of tSN and used the network to learn the different weights of the features of different segments when they are fused. In order to better analyze the correlation of different scales of video [9], Li et al. proposed a tRN model based on time inference; it has obvious advantages in short video classification tasks [10]. The standard time-series motion detection research work began in Huang, Z, the hAND-ceNtRic and oBject-ceNtRic features are used to detect specific actions in the kitchen cooking video of a fixed camera. The work of time-series motion detection in a wider field was launched after the emergence of the classic video understanding data set thUmoS-14 [11]; among them, Chen et al. used Dt features, single-frame CNN features, or fusion voice features, etc., respectively; the time-series motion detection framework is designed by using sliding windows to generate candidate proposals. At the same time, time-series motion detection methods based on spatio-temporal (SpAtio-tempoRAl) information began to appear; in the time dimension, the action proposal generation method based on sliding window is still used [12]. Xia et al. proposed an end-to-end method for sequential action detection, directly inferring the timing boundary of the action. The design of the network structure is divided into two parts: observation network and cyclic network, we observe that the network is used to encode video frame-level features, and RNN is used to process these observation features and determine the next observation frame and the prediction time of the action [13].

Based on current research, the author proposes based on the radial basis function model, researching a new proxy model technology-support vector regression (SVR). Designing and optimizing application examples through numerical example multimedia technology, the validity of the support vector regression method is verified. Experimental results: we verify the effectiveness of the SVR and DR-Dvc algorithms on data sets such as ActivityNet and the performance of the proposed algorithm, and the baseline before improvement and the current mainstream algorithm are respectively compared.

3. Proxy Model Based on Support Vector Regression

3.1. Basic Principles of SVR

From a geometric point of view, given a sample set (x₁, y₁), (x₂, y₂),…, (x_i, y_i), x ∈ R, y ∈ R. The basic form of the support vector regression method prediction model is as follows:

(1)

where μ is a constant; ω_i is the coefficient; and ϕ is the basic function.

In the support vector regression method, the insensitive loss function, the number ε, is introduced; if the difference |y_i − f(x_i)| between the predicted value f(x_i) and the sample value y_i is less than the given ε, it is considered lossless (although the predicted value and the observed value may not be exactly equal).

As shown in Figure 3, when the sample point is located in the area between the two dashed lines, it is considered that the loss at this point is 0; the area formed by the two dashed lines becomes the ε zone; only when the sample appears outside the ε zone, the loss appears. ε-insensitive loss function means that there are some prediction points that are “completely consistent” with some sample points, and this feature is not available in many other loss functions [14].

In linear regression problems, constructing a surrogate model becomes the following constrained convex quadratic optimization problem:

(2)

Considering the allowable error, slack variables ξ_i+ 、 ξ_i− (both are nonnegative real numbers) and penalty parameter C (C is a nonnegative real number) can be introduced, and the problem becomes

(3)

Solving the above problem by Lagrangian multiplier method, the prediction model of support vector regression in the case of linear regression is as follows:

(4)

where a_i+ and a_i− are the Lagrange multipliers to be solved, which is the support vector [15].

For nonlinear regression problems, through nonlinear transformation x⟶ψ(x), transform the sample space into a high-dimensional feature space and construct a linear model in this space, and the kernel function can effectively solve the above problems. Take the kernel function as the radial basis function, as shown in the following formula:

(5)

where σ is the coefficient of the kernel function and defines the nonlinear transformation from the sample space to a high-dimensional feature space. Each basis function center corresponds to a support vector. According to functional theory, when the kernel function ψ(x_i, x_j) that realizes linearization transformation satisfies the Mercer condition, it corresponds to the dot product in a certain transformation space [16].

Therefore, the dual form of (3) is

(6)

For this problem, this article uses the quadratic programming program in MATLAB to solve a₊ and a₋.

When a₊, a₋ ∈ (0, C/n), take any sample (x_i, x_j) and calculate μ as follows:

(7)

When the support vectors a₊ and a₋ and the constant μ are solved, the prediction model of SVR can be obtained as

(8)

3.2. SVR Parameter Selection

In this method, the penalty parameter C determines whether the prediction model is “overfitted” or “underfitted,” in order to make it have better versatility and the ability to filter sample noise. ε determines the number of support vectors and at the same time makes the agent model robust and sparse. If the ε value is selected too small, the regression estimation accuracy is high, and the number of support vectors increases; if the value of ε is too large, the regression estimation accuracy will decrease, the number of support vectors will decrease, and the sparsity of the support vector machine will be large [17]. Therefore, the parameters ε and C control the complexity of the model in different ways. The kernel function coefficient σ reflects the distribution or range characteristics of the training sample data; it determines the width of the local neighborhood. A larger σ means a lower variance.

3.3. Basic Steps of SVR Modeling

According to the solution process of the above-mentioned SVR method, the process of establishing an approximate model can be obtained, as shown in Figure 4.

3.4. SVR Method Calculation and Example Validation

The procedures used to establish the agent model by using the SVR method are all MATLAB programs.

Performance indicators: to facilitate the quantitative evaluation of the fitting quality of the support vector regression methods, the following performance indicators are defined:

(1)
Relative error R_ei: to describe the prediction effect of a certain period, the calculation formula is
(9)
In formula, f(x_i) for the predicted value and y_i for the actual value.
(2)
Average relative error M_re: the overall prediction performance can be comprehensively evaluated, with the calculation formula of
(10)
In the formula, n is the number of samples.
(3)
Mean squared error M_se: it is a measure of the deviation of the predicted value from the actual value, and the calculation formula is

(11)

4. Experimental Results and Analysis

The author will explain in detail the experimental process of the paper and compare the experimental results in the process to verify the effectiveness of the method proposed in the paper. The experimental process of the thesis mainly verified the effectiveness of the two aspects of optimization proposed by SVR: one is to introduce SVR to generate timing evaluation proposals on multiscale timing feature maps; the second is to apply 2D convolution on the original feature map to jointly model the timing-channel [18]. The evaluation index of the result of the experiment includes the AUc of the AR curve to measure the performance of the sequential action proposal generation and the mAp to evaluate the performance of the sequential action detection.

The comparative experiment before and after the improvement of fpN first reproduced the BSN of BASeliNe, and based on the results generated by this proposal, combined with the tSN behavior recognition program, the BASeliNe’s sequential action detection program is given. Then, under the settings of 400-dimensional features and 1D convolution after inputting fc, directly apply FpN to generate candidate proposals at multiple scales and perform subsequent behavior recognition [19]. Regarding the use of fpN, the thesis experimented with three different multiscale time-series evaluation schemes; among them, SVR1 performs a simple weighted average on the five-scale time-series feature maps obtained from the feature pyramid; SVR2 inputs all five timing characteristic diagrams to the timing evaluation module to fuse the results. The feature maps of the three scales of 16, 32, and 64 are merged on the time-series scale of 64 using the top-down method; the two scales of 128 and 256 remain unchanged [20].

The final experimental results are shown in Table 1.

(1)
Comparing SVR1 and SVR2 explains that the use of multiscale time-series feature maps should occur after tem (SVR2); instead of directly fusing in the feature dimension (SVR1), the main reason is that small-scale information will affect the resolution of large-scale information.
(2)
Comparing BSN-BASeliNe and SVR, it fully proves that SVR, a sequential action detection algorithm based on fpN, has a significant performance improvement; the main reason is the way of generating proposals at different resolutions; it can improve the recall rate of the action when the target action sequence length varies widely [21, 22]. In order to better extract video features, take the output (512, 1536) before the full connection in tSN as the input of the sequential action proposal module; using 2D convolution to jointly model the timing and channel characteristics, the experimental results obtained are shown in Table 2.

1. SVR experimental process results.

Method	AR@10	AR@100	AUC
BSN-baseline		75.68	59.27
SVR1	49.87	72.48	56.78
SVR2	54.61	74.28	69.83
SVR3	49.86	73.15	61.45

2. Experimental results of 2D convolution timing-channel joint modeling.

Method	AR@10	AR@100	AUC
BSN-baseline		68.96	55.83
SVR3	52.84	72.36	67.72
SVR	58.41	77.06	68.45

Experimental results show the joint modeling of timing and channel is very effective, compared with only using iD convolution to extract timing features, and 2D convolution can better connect contextual information, and at the same time, it can also pay attention to the different effects of different channel characteristics on the results [23]. Therefore, the final implementation of SVR is the SVR algorithm of 2D convolution timing-channel joint modeling; finally, the relationship between the recall rate and the number of candidate proposals under different tIoU, as shown in Figure 5, is obtained.

Experimental results: (1) the result of the sequential action proposal directly affects the result of the sequential action detection; (2) the SVR algorithm can significantly improve the effect of sequential action detection under different tIoU requirements [24, 25].

5. Conclusion

First, the input video passes through the prefeature extraction network to obtain video features; secondly, the video features obtain a multiscale timing feature map through the 2D timing-channel convolution kernel fpN structure; then, multiple candidate proposals are obtained through the timing evaluation module pem and the proposal evaluation module pem; finally, the final sequential action detection result is obtained through the action recognition classifier. The third part compares the proposed SVR algorithm with experiments. This part mainly designs two comparative experiments, one is to verify the effectiveness of the algorithm improvement by comparing it with the improved BASeliNe; it is proved that the introduction of the fpN module to generate proposals on the multiscale feature map can better cover the multiscale time-series target action area and get a better set of candidate proposals. The second is to prove the superiority and competitiveness of the algorithm by comparing with the existing state-of-the-art method. Future research will be valuable in other areas of SVR method utilization.

Conflicts of Interest

The authors declare no conflicts of interest.

Open Research

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

1 Rouxia C., Xiaodong C., Shifang T., and Donghai Y., Research on inverse simulation of physical training process based on wireless sensor network, International Journal of Distributed Sensor Networks. (2020) 16, no. 4, 155014772091426, https://doi.org/10.1177/1550147720914262.
10.1177/1550147720914262
Web of Science® Google Scholar
2 Tian H., Jiang W., Shu G., Wang W., Huo D., and Shakir M. Z., Analysis and optimization of thermally-regenerative ammonia-based flow battery based on a 3-d model, Journal of the Electrochemical Society. (2019) 166, no. 13, A2814–A2825, https://doi.org/10.1149/2.0711912jes, 2-s2.0-85073679351.
10.1149/2.0711912jes
CAS Web of Science® Google Scholar
3 Tran D.-P. and Hoang V.-D., Adaptive learning based on tracking and ReIdentifying objects using convolutional neural network, Neural Processing Letters. (2019) 50, no. 1, 263–282, https://doi.org/10.1007/s11063-019-10040-w, 2-s2.0-85065450195.
10.1007/s11063-019-10040-w
Web of Science® Google Scholar
4 Dutt S., Dash S., Nandi S., and Trivedi G., Analysis, modeling and optimization of equal segment based approximate adders, IEEE Transactions on Computers. (2019) 68, no. 3, 314–330, https://doi.org/10.1109/tc.2018.2871096, 2-s2.0-85053602019.
10.1109/TC.2018.2871096
Web of Science® Google Scholar
5 Liu N., Zhao R., Qiao L., Zhang Y., Li M., Sun H., Xing Z., and Wang X., Growth stages classification of potato crop based on analysis of spectral response and variables optimization, Sensors. (2020) 20, no. 14, 3995, https://doi.org/10.3390/s20143995.
10.3390/s20143995
Web of Science® Google Scholar
6 Nkouagnou C. J., Haman D., and Kenfack J. A., Trajectory tracking controller for birotor coaxial unmanned aerial vehicle using nonlinear continuous-time generalized predictive control combined to pi-observer, Russian Aeronautics. (2022) 4, no. 646, 660.
Google Scholar
7 Cui Y. and Liu X., Adaptive consensus tracking control of strict-feedback nonlinear multi-agent systems with unknown dynamic leader, Neural Computing and Applications. (2022) 34, no. 8, 6215–6226, https://doi.org/10.1088/1742-6596/2001/1/012031.
10.1007/s00521-021-06801-1
Web of Science® Google Scholar
8 Zakaria N. J., Shapiai M. I., and Wahid N., A study of multiple reward function performances for vehicle collision avoidance systems applying the dqn algorithm in reinforcement learning, IOP Conference Series: Materials Science and Engineering. (2021) 1176, no. 1, 13, 012033, https://doi.org/10.1088/1757-899x/1176/1/012033.
10.1088/1757-899X/1176/1/012033
Google Scholar
9 Sharma A. and Kalra M., A blockchain based approach for improving transparency and traceability in silk production and marketing, Journal of Physics: Conference Series. (2021) 1998, no. 1, 012013, https://doi.org/10.1088/1742-6596/1998/1/012013.
10.1088/1742-6596/1998/1/012013
Google Scholar
10 Li Z., Yan P., Liang J., and Tian X., The influence of clahe on the accuracy stability of the automatic classification of mars surface lineament structure based on dem image, Journal of Physics: Conference Series. (2021) 2003, no. 1, 012013, https://doi.org/10.1088/1742-6596/2003/1/012013.
10.1088/1742-6596/2003/1/012013
Google Scholar
11 Chabuda A., Dovgialo M., Duszyk A., Ygierewicz J., and Durka P., Rendering stimuli for ssvep-bci and attention tracking with blinker, Acta Physica Polonica A. (2021) 139, no. 426.
10.12693/APhysPolA.139.426
Web of Science® Google Scholar
12 Chen M.-S., Hwang C.-P., Ho T.-Y., Wang H.-F., Shih C.-M., Chen H.-Y., and Liu W. K., Driving behaviors analysis based on feature selection and statistical approach: a preliminary study, The Journal of Supercomputing. (2019) 75, no. 4, 2007–2026, https://doi.org/10.1007/s11227-018-2618-9, 2-s2.0-85053854959.
10.1007/s11227-018-2618-9
Web of Science® Google Scholar
13 Xia Q., Wang Z., Ren Y., Yang D., Sun B., Feng Q., and Qian C., Performance reliability analysis and optimization of lithium-ion battery packs based on multiphysics simulation and response surface methodology, Journal of Power Sources. (2021) 490, no. 7411, 229567, https://doi.org/10.1016/j.jpowsour.2021.229567.
10.1016/j.jpowsour.2021.229567
CAS Web of Science® Google Scholar
14 Geeth K. M., Reddy M. C. S., and Kumar M. S., Optimization of dry-sliding wear parameters on carbon fiber reinforced polyester composites using taguchi based greyrelation analysis, IOP Conference Series: Materials Science and Engineering. (2021) 1185, no. 1, 012003, https://doi.org/10.1088/1757-899x/1185/1/012003.
10.1088/1757-899X/1185/1/012003
Google Scholar
15 Zbunjak Z., Kuzle I., and Maar D., Overload mitigation sips based on dc model optimization and pmu technology, Tehnički Vjesnik. (2020) 27, no. 1, 213–220.
Web of Science® Google Scholar
16 Mohammadi M., Fallahi V., and Seifouri M., Optimization and performance analysis of all-optical compact 4 and 5-channel demultiplexers based on 2d pc ring resonators for applications in advanced optical communication systems, Silicon. (2021) 13, no. 8, 2619–2629, https://doi.org/10.1007/s12633-020-00614-y.
10.1007/s12633-020-00614-y
CAS Web of Science® Google Scholar
17 Liu C., Zheng X., and Ren Y., Parameter optimization of the 3pg model based on sensitivity analysis and a bayesian method, Forests. (2020) 11, no. 12, https://doi.org/10.3390/f11121369.
10.3390/f11121369
Web of Science® Google Scholar
18 Liu Z., Zhou J., Feng W., and Chen Y., Modeling, analysis, and multi-objective optimization of cold extrusion process of clutch outer gear hub using response surface method and meta-heuristic approaches, International Journal of Advanced Manufacturing Technology. (2021) 116, no. 1, 229–239, https://doi.org/10.1007/s00170-021-07451-2.
10.1007/s00170-021-07451-2
CAS Web of Science® Google Scholar
19 Zhu C., Jiang S., Li S., and Lan X., Efficient and practical correlation filter tracking, Sensors. (2021) 21, no. 3, 790–796, https://doi.org/10.1109/jlt.2020.2964781.
10.3390/s21030790
Web of Science® Google Scholar
20 Pandiyan A., Arunkumar G., and Premkumar G., Online citation: a. pandiyan, g. arunkumar and g. premkumar. 2019 design analysis and topology optimization of a connecting rod for single cylinder 4-stroke petrol engine, International Journal of Vehicle Structures & Systems. (2020) 11, no. 4, 439–442.
Google Scholar
21 Jia D., Li F., Zhang C., and Li L., Design and simulation analysis of trimaran bulkhead based on topological optimization, Ocean Engineering. (2019) 191, no. Nov.1, 106304–106304.39, https://doi.org/10.1016/j.oceaneng.2019.106304, 2-s2.0-85072759104.
10.1016/j.oceaneng.2019.106304
Web of Science® Google Scholar
22 Xu X. and Wu L., Heat transfer optimization of blast furnace stave based on entransy dissipation and entropy generation analysis, Heat Transfer Research. (2019) 50, no. 5, 501–517, https://doi.org/10.1615/heattransres.2018026547, 2-s2.0-85065025927.
10.1615/HeatTransRes.2018026547
Web of Science® Google Scholar
23 Smoleń M., Consistency of outputs of the selected motion acquisition methods for human activity recognition, Journal of Healthcare Engineering. (2019) 2019, no. 3, 10, 9873430, https://doi.org/10.1155/2019/9873430, 2-s2.0-85069763086.
10.1155/2019/9873430
PubMed Web of Science® Google Scholar
24 Yu X. and Xie W., Real-time recovery and recognition of motion blurry QR code image based on fractional order deblurring method, IET Image Processing. (2019) 13, no. 6, 923–930, https://doi.org/10.1049/iet-ipr.2018.5792, 2-s2.0-85065247827.
10.1049/iet-ipr.2018.5792
Web of Science® Google Scholar
25 Long T., Liang Z., and Liu Q., Advanced technology of high-resolution radar: target detection, tracking, imaging, and recognition, Science China Information Sciences. (2019) 62, no. 04, 1–26, https://doi.org/10.1007/s11432-018-9811-0, 2-s2.0-85063032931.
10.1007/s11432-018-9811-0
Web of Science® Google Scholar

All articles

Action Recognition, Tracking, and Optimization Analysis of Training Process Based on SVR Model and Multimedia Technology

Abstract

1. Introduction

2. Literature Review