Wireless Communications and Mobile Computing

Volume 2022, Issue 1 9273474

Research Article

Open Access

[Retracted] Multifunctional Robot Grasping System Based on Deep Learning and Image Processing

Retraction(s) for this article

XinYu Zhang,

Corresponding Author

XinYu Zhang

[email protected]

orcid.org/0000-0002-4251-7425

School of Automation Science and Technology, Xi’an Jiaotong University, Xi’an 710049, Shaanxi, China xjtu.edu.cn

School of Information Engineering, Xi’an University, Xi’an 710065, Shaanxi, China xjtu.edu.cn

Search for more papers by this author

Kai Ye,

Kai Ye

School of Automation Science and Technology, Xi’an Jiaotong University, Xi’an 710049, Shaanxi, China xjtu.edu.cn

Search for more papers by this author

XinYu Zhang,

Corresponding Author

XinYu Zhang

[email protected]

orcid.org/0000-0002-4251-7425

School of Automation Science and Technology, Xi’an Jiaotong University, Xi’an 710049, Shaanxi, China xjtu.edu.cn

School of Information Engineering, Xi’an University, Xi’an 710065, Shaanxi, China xjtu.edu.cn

Search for more papers by this author

Kai Ye,

Kai Ye

School of Automation Science and Technology, Xi’an Jiaotong University, Xi’an 710049, Shaanxi, China xjtu.edu.cn

Search for more papers by this author

First published: 31 March 2022

https://doi.org/10.1155/2022/9273474

Citations: 1

Academic Editor: Kalidoss Rajakani

Share a link

Email
Wechat
Bluesky

Abstract

This paper firstly introduces the general architecture of the multifunctional harvesting robot grasping system; then, deep learning is used to investigate the object target recognition, and a set of target detection algorithm based on convolutional neural network is implemented; then, image processing technology is used to realize the function of target object localization, which can guide the multifunctional harvesting robot to complete the picking of the target multifunctional. The experimental results show that the multifunctional harvesting robot has a small calculation error of multifunctional coordinates and has a strong multifunctional recognition and positioning capability.

1. Introduction

With the development of automation technology, robotics has grown by leaps and bounds, and a variety of intelligent robots are used in industrial, medical, educational, and agricultural applications. Robot grasping is an important function of robots in industrial scenarios, which is traditionally implemented based on manual teaching methods and 2D or 3D model matching to obtain grasping postures [1]. The former does not address the need to capture any object in any pose; the latter requires a large number of templates for different objects to be created in advance, and more than one template may be needed for the same object, making it laborious to build a template search library [2]. In order to solve this problem, the common method is to build grasping classifiers through machine learning algorithms for grasping pose planning, but these grasping classifiers often require human feature design, which is empirical and heuristic, and different features need to be designed for different grasping conditions, making human feature design difficult, inefficient, and impractical [3].

In recent years, with the increase in GPU computing power and the rapid development of deep learning technology, deep neural networks driven by big data can learn the deep features corresponding to things or behaviours, especially for feature learning and expression of two-dimensional data, such as in image classification, object detection, semantic segmentation, and behavioural recognition research fields; the effect achieved by convolutional neural networks has far surpassed the traditional detection algorithms and even surpass human recognition in some areas [4]. The introduction of deep learning algorithms in robot grasping research, where convolutional neural networks are used to extract feature information of grasping poses in a hierarchical manner, can solve important challenges that previously required human-designed grasping features [5].

Furthermore, the use of convolutional neural networks allows for autonomous and efficient learning of feature representations of grasping poses from a large training dataset compared to inefficient human-designed features, resulting in improved performance of grasping pose detection algorithms [6].

In summary, the application of deep learning techniques to robot grasping pose detection algorithms not only eliminates the tedious work of building templates and human-designed features but also allows for efficient grasping planning of target objects, which is of great value for research. In this study, we investigate how to identify and locate objects in a scene and design a well-structured convolutional neural network as a feature extractor for grasping poses.

2. Related Work

2.1. Object Detection

Current deep learning-based object detection methods can be categorised into two types depending on the implementation, Two-Stage object detection algorithms and One-Stage object detection algorithms. The R-CNN algorithm proposed by [7] outperforms the OverFeat end-to-end processing method proposed by [8] and improves the performance by about 50% compared to the traditional object detection algorithm. The YOLO algorithm proposed in [9] follows the regression-based One-Stage approach of the OverFeat algorithm to achieve true end-to-end object detection, and its improved version, YOLOv2 [10], can achieve a detection speed of 155 fps, but the YOLO-like algorithm also has shortcomings, as it is not rational for detecting small objects and objects with overlapping parts, and the algorithm’s generalization capability and localization frame accuracy are insufficient. In [11], an SSD detection algorithm is proposed, which predicts object regions on the feature maps output by different convolutional layers, outputs discrete default box coordinates of multiple scales and proportions, and uses small convolutional kernels to predict the coordinate compensation values and category confidence of the candidate boxes.

2.2. Current Status of Robot Gripping Research

The application of machine vision technology to achieve robot grasping is to determine the relationship between the camera coordinate system and the robot base coordinate system, use the technology related to vision algorithms to detect the 3D poses of the target object, and implement the grasping action according to the detection results. In [12], a matching method is proposed for estimating the target pose, where a library of templates of the target object is created, its SIFT feature points and corresponding descriptors are detected, and the single-response matrix between the template and the target is derived by a matching algorithm. Reference [13] also used SIFT features to build a sparse 3D model as a template based on the feature point pairs and used a matching algorithm to identify the predicted target’s pose information. Reference [14] proposed an object recognition and pose detection algorithm with real-time feedback for object colour and shape characteristics; however, the algorithm is not stable due to the effect of illumination and background. Reference [15] used a sliding window approach to generate multiple candidate grasping poses and then used a neural network to discriminate the optimal grasping pose, which is a less efficient method to obtain the optimal solution by traversing the search. In [16], a random sampling method was attempted to reduce the time-consuming candidate frame generation, but the final result was not significant. In [17], a cascaded convolutional neural network was proposed, with the first half applying an R-FCN network to locate the target capture position and coarse capture angle classification prediction for RGB images and the second half refining the capture angle prediction by Angle-Net. A large amount of training data is obviously required to make the network model eventually achieve better results, as proposed in [18], to transform the positional detection problem into an angle classification problem by using the rotation angle as the label; however, labelling such a high-quality grasping angle dataset requires a lot of labour and resources. To simplify the annotation of the grasp angles, [19] proposed a grasp detection network GDN with both the image and the grasp angle as input and the output as the rating of that grasp angle, selecting the highest rating from 18 candidate grasp angles (divided into 18 such as 180°) as the detection result. This approach obviously simplifies the dataset production process at the expense of computational efficiency.

In order to improve the efficiency of dataset production, [20] proposed to conduct simulated grasping experiments in a simulation environment to collect the dataset, and their proposed Dex-Net2.0 system can achieve fast and accurate grasping of known objects through its own deep learning system for the 10,000 3D object models with different characteristics in the virtual object model library graph.

In an end-to-end implementation, [18] uses a neural network to generate the centroid of the grasp and the coordinates of the mechanical claw fingers directly from a local view of the target object. Reference [19], on the other hand, proposed a deep reinforcement learning algorithm for autonomous grasping based on deep Q-functions, whose experiments showed that the reinforcement learning algorithm could imitate complex movements and even plan the robot to achieve door opening.

3. Overall Design of the Multifunctional Harvesting Robot Gripping System

The studied multifunctional harvesting robot grasping system consists of an industrial camera, an image processing module, a multifunctional robot carrier, and a data processing unit (PC). The industrial camera is mounted on the robot arm of the multifunctional robot and is responsible for collecting information about the surrounding environment of the orchard and the multifunctional targets; the image processing module is responsible for processing the image information collected by the industrial camera; for example, the fruit robot is responsible for movement and grasping of the multifunctional targets; the data processing unit is responsible for further analysis of the image information to achieve identification and positioning of the multifunctional targets. The workflow of the multifunctional harvesting robot is shown in Figure 1.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Workflow diagram of the multifunctional harvesting robot.

When working, the industrial camera is used to capture images of the orchard environment and the multifunctional target, which is simply processed and compressed by the image processing module of the multifunctional robot and sent to the PC in the background; the PC in the background uses image processing and deep learning algorithms to preprocess, threshold segmentation, feature extraction, and camera calibration of the images; then, the module matches and identifies the position and coordinates of the multifunctional to be picked. After obtaining the relative position of the target multifunction relative to the multifunction robot, the background PC communicates with the multifunction robot via TCP/IP protocol, sends the coordinate information of the position of the multifunction to be picked to the robot, and guides the multifunction robot to carry out the picking operation to achieve the picking of the target multifunction.

4. Target Identification

The PC quickly identifies the multifunction, locates it in three dimensions, and sends the coordinates and picking instructions to the multifunction robot, which then completes the picking operation.

The former uses RCNN, FCN, and other algorithms to detect targets in the image region, while the latter transforms the detection problem into a regression problem to be solved. The latter is faster in detection but has lower accuracy and is not suitable for multifunctional robots with high accuracy in target recognition. Therefore, RCNN is used as the target detection algorithm for multifunctional robots.

The RCNN regional target detection algorithm differs from other algorithms in that it uses a convolutional neural network to extract the target features.

The RCNN regional target detection process consists of the following four steps:

(1)
Create a convolutional neural network model for extracting image features
(2)
Extract all the suspected regions in the image using a selective search method
(3)
Train a SVM classifier to classify the extracted features
(4)
Use regressor to correct the target regions to achieve target fruit recognition

Compared to traditional target feature extraction methods, the convolutional neural network structure used allows for the extraction and recognition of multilevel features; i.e., RCNN is able to ensure accurate recognition while avoiding complex manual feature extraction. The RCNN is also effective in detecting complex surroundings, as shown in Figure 2.

5. Target Localisation Based on Image Processing

After the backend PC identifies the multifunctional target, it also needs to obtain the location information of the target before the multifunctional robot can achieve the picking operation, which is very similar to the positioning method of the tennis picking robot. The image processing-based target positioning includes a backend PC and an industrial camera. The PC then uses the image processing algorithm to position the target multifunctionally based on the previous recognition results.

Based on the real coordinate system; O_xy and O_uv are the image and pixel coordinate systems, respectively; o_c is the industrial camera photocentre point; o is the image coordinate midpoint; P is the multifunctional real coordinate point of the target; p is the projection point of P in the image coordinate system; and f is the camera focal length. The relationship between the world and camera coordinate systems is obtained using rotation and translation, and the conversion equation is

(1)

where T is the translation matrix between the world and camera coordinate systems.

The world coordinates P(X_W, Y_W, Z_W) in the camera coordinate system p(X_c, Y_c, Z_c) are

(2)

The coordinate transformation between camera and image can be derived from the projection principle as

(3)

Then, the coordinates p₀(x, y) of point p(X_c, Y_c, Z_c) in the camera coordinate system in the image coordinate system are

(4)

The conversion expressions for the two are

(5)

where dx and dy are the pixel equivalents and u₀ and v₀ are the image coordinate system origins.

6. Experimental Results and Analysis

In order to verify that the multifunctional harvesting robot grasping system based on deep learning and image processing meets the design requirements, practical grasping experiments were carried out using the system. In the gripping experiments, the accuracy of the multifunctional robot gripping system was judged mainly by the coordinate error of the target multifunction. The coordinate data of the multifunctional robot grasping system is shown in Table 1.

Table 1. Coordinate data of the multifunctional robot gripping system.

	x	y	z	Δx	Δy	Δz
	Fruit actual coordinates			Coordinate error
1	-0.593	-0.27	0.855	-0.2	-0.3	0
2	-12.357	-21.252	16.738	-0.1	-0.5	0.2
3	8.58	-5.264	8.625	0.3	0.2	0.1
4	15.714	8.521	0.859	-0.4	-0.1	0.2
5	-6.431	-1.371	23.083	-0.2	-0.5	0.1

Table 1 shows that the coordinate errors in all three axes are less than -0.5, and in the z-axis, the errors are within 0.2, indicating that the system has a strong multifunctional identification and positioning capability.

In order to further verify that the posture output of the proposed robot grasping posture detection system can achieve actual robotic two-finger grasping, an autonomous grasping experiment was set up to verify this. The human-computer interface is shown in Figure 3 and includes image display controls, function buttons, and parameter display controls. The Get_Image button controls the image acquisition and image preprocessing, while the Detect button implements the object detection function and displays the object detection result on the top right of the interface.

As shown in Figure 4, the target recognition effect can be achieved by adjusting the value of the Class control to display the grasping position parameters of the corresponding object; the Connect_TCP button enables communication between the computer and the robot console; the Translation button maps the grasping position to the robot grasping parameters and displays the parameter values; the Robot_Grasp button controls the movement of the robot to the corresponding position to grasp the object. The Robot_Grasp button controls the robot’s movement to the corresponding position for grasping objects; the Grasp on and Grasp off buttons control the opening and closing of the gripper, respectively; the Destination button controls the robot’s movement to the object placement point after the grasping action has been completed, and the robot grasps the object using this control software [21].

Only multiple target objects are placed in the detection area, as shown in Figure 5; for the flow chart of the robot to implement the grasping action, the specific grasping action flow is as follows:

(a)
The robot moves to a priming position with the air grasp in an open state
(b)
The robot control receives the grasping position of the target object, drives the robot to move directly above the target, and then adjusts the end joint angle to the yaw angle of the corresponding grasping position
(c)
The robot moves downwards until it reaches the gripping position
(d)
The air grip closes and the gripping action is performed
(e)
The robot grips the target object and lifts it upwards
(f)
The robot transports the target object directly above the set placement point
(g)
The robot moves downwards to the object placement point
(h)
The air grip opens and releases the object, completing a single object gripping action

After a total of 100 actual gripping tests for each object tested, the overall gripping success rate was 84%. In the case of the metal bottle and the data cable, the results were not satisfactory, with a success rate of 60% for both items.

The main reason for the optimised target recognition as shown in Figure 6 is that the smooth metal bottle body affects the imaging effect of the depth map, which ultimately leads to a low success rate of effective grasping position detection, while in the data line grasping experiment, the main reason for the low grasping success rate is limited by the hardware configuration of the two-finger gripper; the two-finger gripper used in the experiment is a pneumatic hand claw, which cannot be fully closed, and after closing, there is still a 16 mm gap between the two fingers There was still a 16 mm gap between the two fingers, which did not allow for a secure grip on the small cross-sectional width of the cable, and the air grip failed to make a secure contact when closed. This is the main reason for the 7% difference between the overall gripping success rate and the effective gripping posture detection success rate, which could be improved by replacing the gripper with a better performing one.

The autonomous robot grasping experiments in this section demonstrate that the proposed grasping pose detection system is effective in the robot grasping planning task by mapping the pose results to the robot base coordinates for the target object.

7. Conclusions

Using an industrial camera as the image acquisition sensor, the multifunctional robot transmits the image information collected by the industrial camera to the PC in the background in real time through the network, and the PC uses image processing and deep learning algorithms to recognize and analyze the image, send the coordinates of the multifunctional to be picked to the multifunctional robot, and guide it to carry out the picking operation, which realizes the robot’s grasping and positioning of the target multifunctional and is of great significance to improve the recognition accuracy of the multifunctional robot and realize the automation of multifunctional picking.

Conflicts of Interest

The authors declared that they have no conflicts of interest regarding this work.

Open Research

Data Availability

The datasets used in this paper are available from the corresponding author upon request.

References

1 Kim J., Nocentini O., Scafuro M., Limosani R., Manzi A., Dario P., and Cavallo F., An innovative automated robotic system based on deep learning approach for recycling objects, Proceedings of the 16th International Conference on Informatics in Control, Automation and Robotics, 2019, Prague, Czech Republic, 613–622, https://doi.org/10.5220/0007839906130622.
10.5220/0007839906130622
Google Scholar
2 SEBBATA W., KENK M. A., and BRETHE J.-F., An adaptive robotic grasping with a 2-finger gripper based on deep learning network, In 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), 2020, Vienna, Austria, 620–627, https://doi.org/10.1109/etfa46521.2020.9212163.
10.1109/etfa46521.2020.9212163
Google Scholar
3 Ghazaei G., Alameer A., Degenaar P., Morgan G., and Nazarpour K., Deep learning-based artificial vision for grasp classification in myoelectric hands, Journal of Neural Engineering. (2017) 14, no. 3, article 036025, https://doi.org/10.1088/1741-2552/aa6802, 2-s2.0-85020424183.
10.1088/1741-2552/aa6802
PubMed Web of Science® Google Scholar
4 Almanza C., Baquero J. M., and Jiménez-Moreno R., Robotic hex-nut sorting system with deep learning, International Journal of Electrical and Computer Engineering (IJECE). (2021) 11, no. 4, 3575–3583, https://doi.org/10.11591/ijece.v11i4.pp3575-3583.
10.11591/ijece.v11i4.pp3575-3583
Google Scholar
5 Levine S., Pastor P., Krizhevsky A., Ibarz J., and Quillen D., Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, The International journal of robotics research. (2018) 37, no. 4-5, 421–436, https://doi.org/10.1177/0278364917710318, 2-s2.0-85029779101.
10.1177/0278364917710318
Web of Science® Google Scholar
6 Zhang J., Li M., Feng Y., and Yang C., Robotic grasp detection based on image processing and random forest, Multimedia Tools and Applications. (2020) 79, no. 3-4, 2427–2446, https://doi.org/10.1007/s11042-019-08302-9.
10.1007/s11042-019-08302-9
Web of Science® Google Scholar
7 Jiang P., Ishihara Y., Sugiyama N., Oaki J., Tokura S., Sugahara A., and Ogawa A., Depth image–based deep learning of grasp planning for textureless planar-faced objects in vision-guided robotic bin-picking, Sensors. (2020) 20, no. 3, https://doi.org/10.3390/s20030706, 32012874.
10.3390/s20030706
PubMed Web of Science® Google Scholar
8 Johns E., Leutenegger S., and Davison A. J., Deep learning a grasp function for grasping under gripper pose uncertainty, In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, Daejeon, Korea (South), 4461–4468, https://doi.org/10.1109/iros.2016.7759657, 2-s2.0-85006507964.
10.1109/iros.2016.7759657
Google Scholar
9 Schwarz M., Milan A., Lenz C., Munoz A., Periyasamy A. S., Schreiber M., and Behnke S., NimbRo picking: versatile part handling for warehouse automation, In 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, Singapore, 3032–3039, https://doi.org/10.1109/icra.2017.7989348, 2-s2.0-85027973943.
10.1109/icra.2017.7989348
Google Scholar
10 Suzuki K., Kanamura M., Suga Y., Mori H., and Ogata T., In-air knotting of rope using dual-arm robot based on deep learning, In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, Prague, Czech Republic, 6724–6731, https://doi.org/10.1109/iros51168.2021.9635954.
10.1109/iros51168.2021.9635954
Google Scholar
11 Di Wu C. Z., Ji L., Ran R., Wu H., and Xu Y., Forest fire recognition based on feature extraction from multi-view images, Traitement du Signal. (2021) 38, no. 3, 775–783, https://doi.org/10.18280/ts.380324.
10.18280/ts.380324
Web of Science® Google Scholar
12 Varley J., Weisz J., Weiss J., and Allen P., Generating multi-fingered robotic grasps via deep learning, In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), 2015, Hamburg, Germany, 4415–4420, https://doi.org/10.1109/iros.2015.7354004, 2-s2.0-84958167103.
10.1109/iros.2015.7354004
Google Scholar
13 Kehoe B., Matsukawa A., Candido S., Kuffner J., and Goldberg K., Cloud-based robot grasping with the Google object recognition engine, In 2013 IEEE International Conference on Robotics and Automation, 2013, Karlsruhe, Germany, 4263–4270, https://doi.org/10.1109/icra.2013.6631180, 2-s2.0-84887309933.
10.1109/icra.2013.6631180
Google Scholar
14 Caldera S., Rassau A., and Chai D., Review of deep learning methods in robotic grasp detection, Multimodal Technologies and Interaction. (2018) 2, no. 3, https://doi.org/10.3390/mti2030057.
10.3390/mti2030057
Google Scholar
15 Liu R., Nageotte F., Zanne P., de Mathelin M., and Dresp-Langley B., Deep reinforcement learning for the control of robotic manipulation: a focussed mini-review, Robotics. (2021) 10, no. 1, https://doi.org/10.3390/robotics10010022.
10.3390/robotics10010022
Web of Science® Google Scholar
16 Jia Q., Cai J., Cao Z., Wu Y., Zhao X., and Yu J., Deep learning for object detection and grasping: a survey, In 2018 IEEE International Conference on Information and Automation (ICIA), 2018, Wuyishan, China, 427–432, https://doi.org/10.1109/icinfa.2018.8812318, 2-s2.0-85072318323.
10.1109/icinfa.2018.8812318
Google Scholar
17 Li J., Li Z., Feng Y., Liu Y., and Shi G., Development of a human–robot hybrid intelligent system based on brain teleoperation and deep learning SLAM, IEEE Transactions on Automation Science and Engineering. (2019) 16, no. 4, 1664–1674, https://doi.org/10.1109/TASE.2019.2911667.
10.1109/TASE.2019.2911667
Web of Science® Google Scholar
18 Alonso M., Izaguirre A., and Graña M., Current research trends in robot grasping and bin picking, In the 13th international conference on soft computing models in industrial and environmental applications, 2018, Springer, Cham, 367–376.
Google Scholar
19 Pinto L. and Gupta A., Supersizing self-supervision: learning to grasp from 50k tries and 700 robot hours, In 2016 IEEE international conference on robotics and automation (ICRA), 2016, Stockholm, Sweden, 3406–3413, https://doi.org/10.1109/icra.2016.7487517, 2-s2.0-84977599666.
10.1109/icra.2016.7487517
Google Scholar
20 Kang H., Zhou H., Wang X., and Chen C., Real-time fruit recognition and grasping estimation for robotic apple harvesting, Sensors. (2020) 20, no. 19, https://doi.org/10.3390/s20195670, 33020430.
10.3390/s20195670
PubMed Web of Science® Google Scholar
21 Kleeberger K., Bormann R., Kraus W., and Huber M. F., A survey on learning-based robotic grasping, Current Robotics Reports. (2020) 1, no. 4, 239–249, https://doi.org/10.1007/s43154-020-00021-6.
10.1007/s43154-020-00021-6
Google Scholar

Citing Literature

All articles

[Retracted] Multifunctional Robot Grasping System Based on Deep Learning and Image Processing

Retraction(s) for this article

Retracted: Multifunctional Robot Grasping System Based on Deep Learning and Image Processing

Abstract

1. Introduction

2. Related Work

2.1. Object Detection

2.2. Current Status of Robot Gripping Research

3. Overall Design of the Multifunctional Harvesting Robot Gripping System

4. Target Identification

5. Target Localisation Based on Image Processing

6. Experimental Results and Analysis

7. Conclusions

Conflicts of Interest

Open Research

Data Availability

References

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley