To solve the problems of low accuracy and high time cost in manual recording and statistics of basketball data, an automatic analysis method of motion action under the basketball sports scene based on the spatial temporal graph convolutional neural network is proposed. By using the graph structure in the data structure to model the joints and limbs of the human body, and using the spatial temporal graph structure to model the posture action, the extraction and estimation of human body posture in basketball sports scenes are realized. Then, training combined with transfer learning, the recognition of motion fuzzy posture is realized through the classification and application of a label subset. Finally, using the self-made OpenCV to collect and calibrate NBA basketball videos, the effectiveness of the proposed method is verified by analyzing the motion action. The results show that the proposed method based on the spatial temporal graph convolutional neural network can recognize all kinds of movements in different basketball scenes. The average recognition accuracy is more than 75%. It can be seen that the method has certain practical application value. Compared with the common motion analysis method feature descriptors, the motion action analysis method based on the spatial temporal graph convolution neural network has higher identification accuracy and can be used for motion action analysis in the actual basketball sports scenes.

1. Related Work

Basketball is one of the most popular sports competitions. The analysis of motion action in the basketball sports scene is helpful to improve basketball players’ skills. In addition, it can make basketball coaches and athletes quickly master their own sports characteristics. At present, the analysis mainly relies on manual, and the posture estimation is processed by manual marking basketball video. This method usually has problems of low efficiency, low accuracy, high cost, and so on. To solve the above problems, Yu et al. proposed to use MeanShift to process and track the features of motion videos. The tracking and recognition accuracy of this method is 96.04% and 97.10%, respectively, which has ideal effects [1]. Liu et al. proposed an improved ghosting suppression and adaptive visual background extraction algorithm to effectively remove the ghosting problem in motion videos [2]. Li et al. detected and tracked moving targets by combining FPGA and image processing, which realize the functions such as image acquisition, image gray scale, image filtering, and interframe difference [3]. Bin et al. recognized students’ standing behavior in a class based on the region of interest (ROI) and face tracking [4]. Huang detected 3d image targets and introduced a deep learning algorithm, thus greatly improving the accuracy of detection [5]. In addition to the above studies, Sun and Manikandaprabu et al. also proposed target detection and tracking methods [6–12]. The above research provides a lot of useful methods for the tracking of motion targets. Therefore, this paper combines basketball movement to detect and recognize basketball motions so as to provide a new method for the processing of sports video images.

The motion action analysis at the basketball sports scene has made great progress, but its overall performance still needs to be improved. On the one hand, the prediction effect of human posture joint points in the basketball sports scene is not satisfactory. On the other hand, the boundary of estimating motion in basketball is relatively fuzzy, which increases the difficulty of research [13–15]. Therefore, in order to solve the above problems, on the basis of the existing research, utilizing the powerful learning ability of the spatial temporal graph convolutional network (ST-GCN), this study proposes a method of analysis of motion action in basketball sports scene based on image processing and spatial temporal convolutional neural network. What is more, by using the graph structure in the data structure to model the human joint points and limbs, and using the spatial temporal graph structure to model the posture action, the human posture in the basketball sports scene is extracted and estimated. Then, by dividing and applying the label subset, and combining it with migration learning training, the recognition of motion fuzzy posture is realized.

2. Introduction of Spatial Temporal Graph Convolutional Neural Networks

The spatial temporal graph convolution neural network redefines convolution according to the graph structure, and it enables the graph structure to perform convolution operations. In 2D image convolution, the feature maps of the whole process are two-dimensional pixels. The convolution step is set as 1, and 0 is added at the appropriate position of boundary to obtain the output feature graph with the same size as the input feature graph. For the input f in c channels, convolution kernel with size a ∗ b is adopted for convolution; then, the output feature map of position (x, y) is as follows [16]:

\begin{matrix} g (x, y) = f (x, y) * W (x, y) = \sum_{s = - a}^{a} \sum_{t = - b}^{b} f (s, t) W (s - x, t - y) . \end{matrix}

(1)

In the convolutional neural network, the convolution of the convolution kernel W is the weighted overlay of the corresponding position of the image and the convolution kernel, so the above equation can be rewritten as

\begin{matrix} g (x, y) = \sum_{h = - a}^{a} \sum_{w = - b}^{b} f (p (x, y, h, w)) \cdot W (h, w), \end{matrix}

(2)

where p represents the sampling function, which is responsible for extracting the field of (x, y) and (x, y) itself, which can be expressed as:

\begin{matrix} p (x, y, h, w) = (x, y) + p^{'} (p, w) . \end{matrix}

(3)

Here, W is the matrix of c channels, the weighted result obtained from the input sampling inner product of c channels, which represents the weight function.

Formula (2) is extended and graph convolution is defined as follows:

(1)
Feature mapping of all nodes (including c-dimensional feature vector) is
$\begin{matrix} f_{in}^{t} : V^{t} ⟶ R^{c} . \end{matrix}$ (4)
(2)
In the image field, the sampling function p(h, w) extracts the points around the center of gravity. In the image structure, for node v_ti, the sampling function extracts its adjacent point set B(v_ti){v_tj|d(v_tj, v_ti) ≤ D}, where d(v_tj, v_ti) represents the minimum distance between the nodes v_tj and v_ti. Therefore, the sampling function p : B(v_ti)⟶V can be expressed as
$\begin{matrix} p (v_{t i}, v_{t j}) = v_{t j} . \end{matrix}$ (5)
Considering that the connection between human body joints is sparse, this study takes the joint whose adjacent distance is 1, so D = 1 is set.
(3)
Two-dimensional image pixels are arranged in squares. Any location field is arranged from top to bottom and from left to right. However, for a general graph structure, adjacent nodes have no fixed order. So instead of labeling and building a weight function node by node, this paper divides the set of the adjacent node B(v_ti) of the node v_ti into a fixed number of K subsets. Meanwhile, it codes the (c, K) dimensional tensor to map adjacent nodes to corresponding label subsets:
$\begin{matrix} l_{t i} : B (v_{t i}) ⟶ \{01, \dots, K -\} . \end{matrix}$ (6)
The weight function can be expressed as
$\begin{matrix} W (v_{t i}, v_{t j}) = W^{'} (l_{t i} (v_{t j})) . \end{matrix}$ (7)
Using the newly defined sampling function and weight function to rewrite formula (2), then we get
$\begin{matrix} f_{out} (v_{t i}) = \sum_{v_{t j} \in B (v_{t i})} \frac{1}{z_{t i} (v_{t j})} f_{in} (p (v_{t i}, v_{t j})) \cdot W (v_{t i}, v_{t j}), \end{matrix}$ (8)
where Z_ti(v_tj) is the normalized term, which is equal to the number of subsets. And it is used to measure the influence of different subsets on the output result, which can be calculated by the formula as follows:
$\begin{matrix} Z_{t i} (v_{t j}) = |\{v_{t k} | l_{t i} (v_{t k}) = l_{t i} (v_{t j})\}| . \end{matrix}$ (9)
Substituting formulas (5) and (7) into formula (9),we obtain
$\begin{matrix} f_{out} (v_{t i}) = \sum_{v_{t j} \in B (v_{t i})} \frac{1}{z_{t i} (v_{t j})} f_{in} (v_{t j}) \cdot W^{'} (l_{t i} (v_{t j})) . \end{matrix}$ (10)
(4)
Formula (10) is used to establish the spatial temporal graph convolution model of human posture sequence. First of all, the two adjacent frames with the same node are connected according to the graph structure to form the edge set E_F. Then, multiple spatial graphs are connected into the spatial temporal structure, which realizes the spatial temporal graph convolution. Finally, the spatial adjacent point set is extended to adjacent frame nodes as follows [17, 18]:
$\begin{matrix} B (v_{t i}) = \{v_{q j} | d (v_{q i}, v_{t i}) \leq K, |q - t| \leq |\frac{Γ}{2}|\} . \end{matrix}$ (11)

Here, Γ is the parameter, representing the time length of the spatial temporal convolution kernel. And it is responsible for setting the distance threshold of adjacent nodes added into the subset to less than Γ/2 from v_ti in the time axle distance.

The spatial temporal convolution sampling function is the same as the convolution sampling function of each frame graph in formula (5). The weight function is for the root node v_ti, and the label mapping l_ST(v_qj) of adjacent node set of the spatial temporal graph structure can be expressed as

\begin{matrix} l_{ST} (v_{q i}) = l_{t i} (v_{t j}) + (q - t + [\frac{Γ}{2}]) \times K . \end{matrix}

(12)

Here, l_ti(v_tj) represents the label mapping of the adjacent node set of node v_ti in each frame.

3. Basketball Motion Analysis Method Based on Spatial Temporal Graph Convolutional Network

3.1. Overall Process

According to the characteristics of the above spatial temporal graph convolution network, the specific process of the basketball motion analysis method is designed, as shown in Figure 1. First of all, according to the node sequence formed by each human body joint of input multiple frames, the label subset is divided by the label division strategy. Then, the input tensor is constructed by transforming the spatial temporal graph convolution network into spatial temporal graph convolution. Finally, using the spatial temporal graph convolutional neural network to train and classify output, the analysis of basketball movement is realized. Each key part is explained as follows.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Basketball movement analysis flow based on spatiotemporal graph convolution network.

3.2. Construction of the Structural Input of Human Body Joint Sequence Diagram

According to the multiple joint matching algorithm, the graph structure in the data structure is adopted to model the human body joints and limbs, and the spatial temporal graph structure is adopted to model the posture action, as shown in Figure 2 [19–22].

For a T frame, the basketball movement video with N joint posture sequences of each frame can be defined as an undirected temporal and spatial diagram G = (V, E), where V is the input of the convolutional neural network, and it represents the total number of joints in posture sequence [23]. Calculating by formula (13), we obtain the corresponding coordinate confidence of each coordinate point and posture estimation output heat map. In addition, the edge of the spatial temporal graph structure can be decomposed into the edge set of each frame and the edge set between two adjacent frames, which are expressed as formulas (14) and (15), respectively, where H represents the joint of the human limb, and all edges of E_F represent the locus of the joint [24].

\begin{matrix} V = \{v_{t i} | t = 11, \dots, T, i =, \dots, N\} . \end{matrix}

(13)

\begin{matrix} E_{s} = \{v_{t i} v_{t j} | t = τ, (i, j) \in H\}, \end{matrix}

(14)

\begin{matrix} E_{F} = \{v_{t i} v_{(t + 1) i}\} . \end{matrix}

(15)

3.3. Label Subset Partition Strategy

The subset of labels in this study is divided by reference to the ST-GCN partition strategy. ST-GCN partition strategy includes unified partition, partition by distance, and partition by spatial structure, as shown in Figure 3. In the figure, Figure 3(b) is a unified partition strategy, which is the most direct and simple partition strategy. By dividing the whole set of adjacent point, the corresponding graph convolution is calculated as the inner product of feature vectors and weight vectors of each adjacent node to v_ij. Therefore, it can be seen that the unified partition strategy is to calculate the inner product of all adjacent nodes’ average feature vector and weight vector, which is easy to lead to the loss of local features. So, this method is not the best posture sequence classification method [25].

Figure 3(c) is the partition strategy by distance, which is based on the distance d(⋅, v_ti) from the root node v_ti. In this study, D is set to 1, so the root node itself can be regarded as a subset, that is, D = 0. The adjacent nodes with distance D = 1 can form a subset. Therefore, the partition strategy can include two vectors with different weights to model the local differential characteristics. Label quantity divided by distance is K = 2, and the label is

\begin{matrix} l_{t i} (v_{t i}) = d (v_{t j} \cdot v_{t i}) . \end{matrix}

(16)

Figure 3(d) shows the subset partition of adjacent point labels based on the spatial distribution of human body joints, where X is the center of the human body, and the adjacent point labels include three label subsets, such as the root node itself, centrifugal group, and centripetal group. In this paper, the center of gravity of the human body is obtained by averaging the coordinates of all the nodes. According to the spatial distribution, the number of labels is K = 3, and the labels are

\begin{matrix} l_{t i} (v_{t j}) = \{\begin{cases} 0, & if r_{j} = r_{i}, \\ 1, & if r_{j} = r_{i}, \\ 2, & if r_{j} = r_{i} . \end{cases} \end{matrix}

(17)

Here, r_i represents the average distance from the gravity of each frame to the joint i in the training set.

3.4. Implementation of ST-GCN Based on Label Subset

The method of ST-GCN in the case of single frame is shown in the formula as follows:

\begin{matrix} f_{out} = \land^{- 12 /} (A + I) \land^{- 12 /} f_{in} W, \end{matrix}

(18)

\begin{matrix} \land^{i i} = \sum_{j} (A^{i j} + I^{i j}), \end{matrix}

(19)

where ∧ⁱⁱ represents the normalized term; A represents the adjacency matrix of the human joint connection; I stands for the self-connected identity matrix; W represents the weight matrix formed by stacking the weight vectors of the output channel.

Considering that there are multiple subsets of labels in practice, the spatiotemporal graph convolution cannot form ∧^−1/2(A + I)∧^−1/2. Therefore, it is necessary that the input performs tensor multiplication with the normalized adjacency matrix, and the result performs the time dimension convolution with the standard convolution of length 1 × Γ. The input feature graph can be expressed as a (C, T, V) dimensional tensor, where C is (x, y) score, V represents the joint number, and T represents the sequence length. The adjacency matrix can be expressed by multiple matrices Aj, namely (A + I) = ∑_jA_j. So, formula (20) can be expressed by formula (21), which is shown as follows:

\begin{matrix} f_{out} = \sum_{j} \land_{j}^{- 12 /} A_{j} \land_{j}^{- 12 /} f_{in} W_{j}, \end{matrix}

(20)

\begin{matrix} \land_{j}^{i i} = \sum_{k} (A_{j}^{i k}) + a . \end{matrix}

(21)

To avoid that the denominator is 0, this article sets a = 0.001.

4. Results and Analysis

4.1. Experimental Environment and Basketball Movement Classification

In Python, the results were counted and displayed by using pyqt5 and openCV, and the posture estimation is processed by using OpenPose. Basketball movement classification is the premise of action analysis. This study is based according to the current commonly used basketball Kinetics dataset to classify the basketball movements. Four types of basketball-related actions are obtained, namely running with the ball, layup, pitching, and playing basketball. Among them, playing basketball includes a series of basketball actions, which belongs to multiple basketball action categories. So this experiment only selected three kinds of actions, such as running with the ball, layup, and throwing the ball, as the basketball movement category. In addition, considering the possible state of movement of basketball players on the court, this experiment complements four types of actions: running without the ball, passing the ball, catching the ball, standing, or defending. Finally, the basketball action category in this experiment contains a total of seven kinds of actions, as shown in Table 1.

Table 1. Classification of basketball movement.

Basketball action category
Pass the ball	Catch a ball	Layup	Pitching	Run without the ball	Run with the ball	Standing or defending

4.2. Data Sources and Preprocessing

In this experiment, video clips of NBA standard games collected by the self-developed basketball action capture gadget are used as the experimental data. Tools include play, stop, fast forward, fast back, and jump to the specified frame function. In addition, there is a tracking algorithm consisting of two parallel forward networks added into the tool, where one network is used to calculate the representation of template features, and the other network is a tracking network. The center point feature and the template feature are used to find the most similar location as the boundary frame.

Considering that the center of the calibration frame of the tracking algorithm is usually target center, and the size of the calibration frame varies with the size of the target. It has a great influence on the target posture extraction. Therefore, this study sets the clipping frame center to the standard frame center and sets its size to 368 × 368 consistent with the network input size. In addition, in order to enlarge the dataset, the captured video is flipped horizontally in this study. At the same time, considering that the calibration tool may have untraceable situations of targets in complex scenes, this paper uses the manual calibration method to track. Finally, the number of videos obtained in this lab is shown in Table 2.

Table 2. Collection quantity statistics.

	Pass the ball	Catch a ball	Layup	Pitching	Run without the ball	Run with the ball	Standing or defending
Training set	105	91	74	61	145	68	89
Testing set	47	68	18	21	61	19	23

4.3. Network Structure and Parameter Settings

The method of basketball motion action analysis based on the spatial temporal graph convolution is constructed in this study. The spatial temporal graph convolution network structure of ST-GCN is designed in Figure 4. In the figure, the left figure is a spatial temporal graph convolution network formed by stacking seven-layer ST-GCN modules. The fourth layer network is used to compress the feature information of the time dimension, and it doubles the number of feature channels. The spatiotemporal dimension convolution step for the convolution kernel of this layer network is 2. The middle figure is a specific form of the ST-GCN module, whose input dimension is (B, C, T, V, N), where B represents the batch size, C represents (x, y) score obtained from the posture estimation model, T represents the sequence length with an initial value of 300, V = 18 represents the joint number, and N represents the maximum output number of posture estimation. Since this study only focuses on the central target action, N is set to 1. By multiplying the tensor with its corresponding normalized transformation matrix, it can perform convolution with the general two-dimensional convolution W_j.

Furthermore, to achieve basketball movement classification, it is necessary to map the output characteristic information of the ST-GCN module. Here, average pooling is used to compress the output features, and full convolution is used to map the features to seven types of basketball action channels. Finally, the dimensions are changed into (1, 7) for classification.

At last, the experiment sets the temporal dimension graph convolution kernel size of the spatial temporal convolution network to 9. And following the label subset division strategy, the spatial dimension graph convolution kernel size is set as 1, 2, or 3. The initial parameters of the spatial temporal graph convolution network are Kinetics pretraining network parameters of transfer ST-GCN training, and the final classification layer parameters are initialized by the Gaussian distribution. An Adam optimizer is used to update the training process, and the basic learning rate is 0.001. When the 960 epoch is trained, the gradient is decreased by 90% at 320, 480, 640, and 800 epochs.

4.4. Experimental Results

To analyze the influence of different frame lengths as input on the recognition effect of the proposed method, and under the premise of other parameters remaining unchanged, the model training is processed with frame lengths of 130, 150, 170, 190, 210, and 230 as input. The results are shown in Figure 5. As can be seen from the table, the recognition effects of most models on motion actions improve with the increase of the frame length, while the recognition effects of some motion actions jump and decline with the increase of the frame length. Overall, the accuracy of Top1 is improved with the increase of the frame length. When the frame length exceeds 190, the recognition effect is not improved because the excessive frame length leads to redundancy. It can be seen that the space is wasted and the effective frame loss is increased. Therefore, this study sets the frame length to 190.

To analyze the influence of label subset division strategy on the motion recognition effect, this paper divided the label subset according to unified division, distance division, and spatial structure division strategy. And the proposed method is adopted for identification. The results are shown in Figure 6. As can be seen from the table, label subsets divided by distance and spatial structure have better effects compared with unified division. The reason is that the subset obtained by unified division is a single subset, which contains less information and has weak information expression ability. However, the subset obtained by distance division and spatial structure division has more information than the subset obtained by unified division, so its effect is better. Compared with the spatial structure division method, the representation by distance division is less, and the action recognition accuracy of the two methods is close. Therefore, this study chooses the spatial structure division strategy to divide label subsets.

To analyze the influence of different network structures on the model recognition results, different network structures are adopted after the input frame length and label subset division strategy are determined, as shown in Figure 7. Testing the recognition effect of the model on the motion actions, the results are shown in Figure 8. As can be seen from the table, changes in network layers and network structure have a limited effect on improving the accuracy of model recognition results. Compared with the model using the transfer learning method, the accuracy of Top1 is lower. The reason is that the amount of data in the dataset is limited, and more information is not obtained through transfer learning, so its adaptability cannot be effectively improved.

To verify the effectiveness of the proposed method, the proposed method is used to verify it on the experimental dataset. Compared with different motion action recognition methods, the results are shown in Table 3. It can be seen from the table that the method proposed in this study has the best action identification effect in most basketball sports scenes. Although the identification effect of running with the ball is lower than that of the feature descriptor method, the overall action identification effect is better. Therefore, the method proposed in this study is effective to some extent.

Table 3. Comparison of identification results of different methods.

Frame length	Indicators	Pass the ball (%)	Pitching (%)	Run without the ball (%)	Run with the ball (%)	Standing (%)	Catch a ball (%)	Layup (%)	Total (%)
LSTM	Top1	21.28	23.81	47.54	5.26	17.39	14.71	22.22	24.51
LSTM	Top2	31.91	42.86	50.82	10.53	30.43	48.53	33.33	40.08
Res-CNN	Top1	25.53	28.57	49.18	10.53	30 43	19.12	38.89	29.96
Res-CNN	Top2	51.06	47.62	555	15.79	47.83	63.24	44.44	51.75
Paper method	Top1	38.30	42.86	75.41	10.53	47.83	29.41	55.56	45.53
Paper method	Top2	74.47	66.67	83.61	21.05	69.57	95.59	66.67	76 67

Significantly, it can be seen from the test results that the recognition accuracy of two similar movements, running without the ball and running with the ball, is quite different. The recognition accuracy of the proposed method for running without the ball is more than 75%, while that for running with the ball is only about 21%. In order to analyze the causes, this study selects the typical movements of running with and without the ball in the experimental dataset to analyze, as shown in Figure 9. Here, running with the ball and running without the ball are both movements of swinging hands and running with both legs, and the posture joints of the actions are highly similar. Running with the ball has more arm swing than running without the ball. After the images are input into the network and the results of misjudgment are checked, it can be found that the reason for the low recognition failure rate of running without the ball may be that the training data occupy a large proportion in the training set, and the reason for the low recognition failure rate of running with the ball is that it is easy to misjudge it as running without the ball. In addition, the sphere is considered to be added into the posture estimation as a joint. However, for the small amount of calibration data, the recognition effect has not reached the expected standard, so the study has not obtained a satisfactory solution to this problem.

5. Conclusion

To sum up, the motion action analysis method at basketball sports scene based on the spatial temporal graph convolutional neural network is proposed. And the human joints and limbs are modeled by using the graph structure in the data structure, and the posture movement is modeled by the spatial temporal graph structure, which realizes the body posture extraction and estimation at the basketball scenarios. The motion fuzzy posture recognition is realized by dividing and applying the tag subset and training with transfer learning. When the spatial temporal graph convolution network has 11 layers, the input length is 190 frames. And when the label subsets are divided by the spatial structure, the network has the highest recognition effect and recognition accuracy in the basketball sports scene, reaching more than 75%.

Compared with other identification methods such as feature descriptors, this method has higher identification accuracy, and it can be used for the motion action identification and analysis in actual basketball sports scenes. Although some achievements have been made in this study, there are still some shortcomings to be improved. Especially, for the low recognition accuracy of running with the ball and easily misjudged as running without the ball, the new identification methods of the ball should be combined to distinguish in the future study so as to improve its recognition accuracy.

Conflicts of Interest

The author declares no conflicts of interest.

Open Research

Data Availability

The experimental data used to support the findings of the study are available from the corresponding author upon request.

References

1 Yu H., Sharma A., and Sharma P., Adaptive strategy for sports video moving target detection and tracking technology based on mean shift algorithm, International Journal of System Assurance Engineering and Management. (2021) 1–11.
Web of Science® Google Scholar
2 Liu L., Chai G.-h., and Qu Z., Moving target detection based on improved ghost suppression and adaptive visual background extraction, Journal of Central South University. (2021) 28, no. 3, 747–759, https://doi.org/10.1007/s11771-021-4642-9.
10.1007/s11771-021-4642-9
Web of Science® Google Scholar
3 Li W. et al., Moving target detection method based on FPGA, Scientific Journal of Intelligent Systems Research. (2021) 3, no. 3.
Google Scholar
4 Wu B., Wang C., Huang W., Huang D., and Peng H., Recognition of student classroom behaviors based on moving target detection, Traitement du Signal. (2021) 38, no. 1, 215–220, https://doi.org/10.18280/ts.380123.
10.18280/ts.380123
Web of Science® Google Scholar
5 Huang L., Moving target detection method of three-dimensional image of whip leg technique in s, Journal of Physics: Conference Series. (2021) 1744, no. 4, 042216, https://doi.org/10.1088/1742-6596/1744/4/042216.
10.1088/1742-6596/1744/4/042216
Google Scholar
6 Sun W., Yan D., Huang J., and Sun C., Small-scale moving target detection in aerial image by deep inverse reinforcement learning, Soft Computing. (2020) 24, no. 8, 5897–5908, https://doi.org/10.1007/s00500-019-04404-6.
10.1007/s00500-019-04404-6
Web of Science® Google Scholar
7 Bharat Kumar M. and Rajesh Kumar P., Bayesian fusion strategy for moving target detection in multichannel SAR framework[J], Evolutionary Intelligence. (2020) 1–14.
Web of Science® Google Scholar
8 Zhang W. and Sun W., Research on small moving target detection algorithm based on complex scene, Journal of Physics: Conference Series. (2021) 1738, no. 1, 012093, https://doi.org/10.1088/1742-6596/1738/1/012093.
10.1088/1742-6596/1738/1/012093
Google Scholar
9 Yaofeng L. and Ma Y., Internet of moving target detection method based on nonparametric background model, International Journal of Computers and Applications. (2021) 43, no. 2, 193–198, https://doi.org/10.1080/1206212x.2018.1537096, 2-s2.0-85055545239.
10.1080/1206212X.2018.1537096
Google Scholar
10 Zhao Z. and Lu G., Target motion detection algorithm based on dynamic threshold[J], Journal of Physics: Conference Series. (2021) 1738, no. 1, 012085.
10.1088/1742-6596/2033/1/012085
Google Scholar
11 Manikandaprabu N. A. L. L. A. S. I. V. A. M. and Vijayachitra S. E. N. N. I. A. P. P. A. N., Moving human target detection and tracking in video frames[J], Studies in Informatics and Control. (2021) 30, no. 1, 119–129.
10.24846/v30i1y202111
Web of Science® Google Scholar
12 Yang Q., Shi W., Chen J., and Tang Y., Localization of hard joints in human pose estimation based on residual down-sampling and attention mechanism[J], The Visual Computer. (2021) 1–13.
Web of Science® Google Scholar
13 Chen W., Fan Y., and Zhang Ye, Dynamic gesture recognition based on iCPM and RNN[J], Journal of Physics: Conference Series. (2020) 1684, no. 1, 012066.
10.1088/1742-6596/1684/1/012066
Google Scholar
14 Zhang H., Dou H., and Li B., Research on human action recognition algorithm based on sine feature, Journal of Physics: Conference Series. (2020) 1518, no. 1, 012024, https://doi.org/10.1088/1742-6596/1518/1/012024.
10.1088/1742-6596/1518/1/012024
Google Scholar
15 Hu Li-Q., Cai Z.-Q., Xing Li-N., and Tan Xu, Human action recognition via learning joint points information toward big AI system[J], Journal of Visual Communication and Image Representation. (2019) .
10.1016/j.jvcir.2019.102688
Google Scholar
16 Khan M., Mustaqeem, Amin U., Ali Shariq I., Muhammad S., Mustafa Servet K., Giovanna S., and de Albuquerque V. H. C., Human action recognition using attention based LSTM network with dilated CNN features[J], Future Generation Computer Systems. (2021) 125.
PubMed Web of Science® Google Scholar
17 Chen J., Samuel R., Jackson D., and Parthasarathy P., LSTM with bio inspired algorithm for action recognition in sports videos, J]. Image and Vision Computing. (2021) 104214, https://doi.org/10.1016/j.imavis.2021.104214.
10.1016/j.imavis.2021.104214
Web of Science® Google Scholar
18 Lei Ye and Ye S., Deep learning for skeleton-based action recognition[J], Journal of Physics: Conference Series. (2021) no. 1.
PubMed Google Scholar
19 Wu K.-H. and Chiu C.-T., Action recognition using multi-scale temporal shift module and temporal feature difference extraction based on 2D CNN, Journal of Software Engineering and Applications. (2021) 14, no. 05, 172–188, https://doi.org/10.4236/jsea.2021.145011.
10.4236/jsea.2021.145011
Google Scholar
20 Vijay Anant A., Deepak K., and Suresh C. G., Human action recognition using CNN-svm model[J], Advances in Science and Technology. (2021) 105, 282–290.
10.4028/www.scientific.net/AST.105.282
Google Scholar
21 Yang J., Wang F., and Yang J., A review of action recognition based on Convolutional Neural Network[J], Journal of Physics: Conference Series. (2021) 1827, no. 1, 012138.
10.1088/1742-6596/2029/1/012138
Google Scholar
22 Kumar A., Kushwaha S., and Khurana R., Fusing dynamic images and depth motion maps for action recognition in surveillance systems[J], International Journal of Sensors, Wireless Communications & Control. (2021) 11, no. 1, 107–113.
10.2174/2210327909666191209155141
Google Scholar
23 zheng D., Li H., Li H., and Yin S., Action recognition based on the modified t, International Journal of Mathematics and Soft Computing. (2020) 6, no. 6, 15–23, https://doi.org/10.5815/ijmsc.2020.06.03.
10.5815/ijmsc.2020.06.03
Google Scholar
24 Lee J. and Jung H., TUHAD: taekwondo unit technique human action dataset with key frame-based CNN action recognition[J], Sensors. (2020) 20, no. 17, https://doi.org/10.3390/s20174871.
10.3390/s20174871
Web of Science® Google Scholar
25 Hoshino S., Niimura K., and Niimura K., Robot vision system for human detection and action recognition, Journal of Advanced Computational Intelligence and Intelligent Informatics. (2020) 24, no. 3, 346–356, https://doi.org/10.20965/jaciii.2020.p0346.
10.20965/jaciii.2020.p0346
Web of Science® Google Scholar

Citing Literature

All articles

Motion Action Analysis at Basketball Sports Scene Based on Image Processing

Abstract

1. Related Work

2. Introduction of Spatial Temporal Graph Convolutional Neural Networks

3. Basketball Motion Analysis Method Based on Spatial Temporal Graph Convolutional Network

3.1. Overall Process

3.2. Construction of the Structural Input of Human Body Joint Sequence Diagram

3.3. Label Subset Partition Strategy

3.4. Implementation of ST-GCN Based on Label Subset

4. Results and Analysis

4.1. Experimental Environment and Basketball Movement Classification

4.2. Data Sources and Preprocessing

4.3. Network Structure and Parameter Settings

4.4. Experimental Results

5. Conclusion

Conflicts of Interest

Open Research

Data Availability

References

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley