Volume 2022, Issue 1 4820618

Research Article

Open Access

Surface Quality Automatic Inspection for Pharmaceutical Capsules Using Deep Learning

Hao Dong,

Hao Dong

orcid.org/0000-0002-5101-7430

School of Mechanical Engineering, Guizhou University, Guiyang 550025, China gzu.edu.cn

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China gzu.edu.cn

Search for more papers by this author

Jing Yang,

Jing Yang

orcid.org/0000-0003-1915-9487

School of Mechanical Engineering, Guizhou University, Guiyang 550025, China gzu.edu.cn

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China gzu.edu.cn

Key Laboratory of Advanced Manufacturing Technology of Ministry of Education, Guizhou University, Guiyang 550025, China gzu.edu.cn

Search for more papers by this author

Jun Wang,

Jun Wang

orcid.org/0000-0003-3209-1149

School of Mechanical Engineering, Guizhou University, Guiyang 550025, China gzu.edu.cn

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China gzu.edu.cn

Search for more papers by this author

Shaobo Li,

Corresponding Author

Shaobo Li

[email protected]

orcid.org/0000-0003-4759-6000

School of Mechanical Engineering, Guizhou University, Guiyang 550025, China gzu.edu.cn

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China gzu.edu.cn

Key Laboratory of Advanced Manufacturing Technology of Ministry of Education, Guizhou University, Guiyang 550025, China gzu.edu.cn

Search for more papers by this author

Hao Dong,

Hao Dong

orcid.org/0000-0002-5101-7430

School of Mechanical Engineering, Guizhou University, Guiyang 550025, China gzu.edu.cn

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China gzu.edu.cn

Search for more papers by this author

Jing Yang,

Jing Yang

orcid.org/0000-0003-1915-9487

School of Mechanical Engineering, Guizhou University, Guiyang 550025, China gzu.edu.cn

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China gzu.edu.cn

Key Laboratory of Advanced Manufacturing Technology of Ministry of Education, Guizhou University, Guiyang 550025, China gzu.edu.cn

Search for more papers by this author

Jun Wang,

Jun Wang

orcid.org/0000-0003-3209-1149

School of Mechanical Engineering, Guizhou University, Guiyang 550025, China gzu.edu.cn

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China gzu.edu.cn

Search for more papers by this author

Shaobo Li,

Corresponding Author

Shaobo Li

[email protected]

orcid.org/0000-0003-4759-6000

School of Mechanical Engineering, Guizhou University, Guiyang 550025, China gzu.edu.cn

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China gzu.edu.cn

Key Laboratory of Advanced Manufacturing Technology of Ministry of Education, Guizhou University, Guiyang 550025, China gzu.edu.cn

Search for more papers by this author

First published: 26 August 2022

https://doi.org/10.1155/2022/4820618

Citations: 3

Academic Editor: Binghua Cao

Share a link

Email
Wechat
Bluesky

Abstract

Capsules are commonly used as containers for most pharmaceutical products. Thus, the quality of a capsule is closely related to the therapeutic effect of the products and patient health. At present, surface quality testing is an essential task in the actual production of pharmaceutical capsules. In this study, a deep learning-based capsule defect detection model, called CapsuleDet, is proposed to classify and localize defects in image sensor data from capsule production for practical application. A guided filter-based image enhancement method and hybrid data augmentation method are used in improving the quality and quantity of the raw data, respectively, to mitigate the low contrast issue and enhance the robustness of the model training. Deformable convolution module and attentional fusion feature pyramid are also used to improve the detection effect of capsule defects by effectively utilizing the semantic and geometric information in the extracted feature maps and catering to the detection of defects with different shapes and scales. The evaluation results on the capsule defect dataset demonstrate that the proposed method achieves 92.91% mean average precision and 22.16 frames per second. Moreover, its overall performance in terms of training time, model size, detection accuracy, and speed is better than that of the currently popular detectors.

1. Introduction

For the pharmaceutical industry, drug quality is relevant to the effectiveness of the treatment for the corresponding symptoms and patient health [1]. In recent years, compared with traditional tablets, granules, and other types of drugs, pharmaceutical capsules have been widely used in the pharmaceutical market owing to their ability of masking odors and quick absorption. The corresponding scale of production and sales has been also expanding continuously.

However, various types of capsule defects (e.g., dents and holes) are inevitable during mass production. These defects affect the appearance quality of a product and even directly affect the control of the dosage in the subsequent filling process by a pharmaceutical manufacturer [2]. If the defective products are not eliminated in time and put on the market, the actual efficacy of the capsules cannot be guaranteed, resulting in potential safety hazards that can cause economic losses and negative impacts on the enterprise. Hence, surface quality inspection of pharmaceutical capsules is an integral part of the production process.

In general, the enterprise must not only improve production efficiency but also strictly ensure product quality for the entire industrial manufacturing domain. Thus, improving the quality of manufactured products without affecting production is the core competence of the manufacturing industry. As one of the important components in the manufacturing process, surface defect detection is an essential link to control industrial product quality [3]. Similar to most defect detection tasks, surface defect detection of pharmaceutical capsules has some pressing issues to be addressed because of the complexity of the following defects: (1) Shape anisotropy: uncertainty in the manufacturing process usually leads to large irregular variations in the shape of similar defects. As shown in Figure 1(a), the shape of the dented area is quite different (marked by the red curve), thereby placing high demands on the robustness of the method. (2) Scale difference: the extent of product damage is not always similar because of the disruption in online manufacturing processes and environmental factors. Figure 1(b) shows the large difference in scale between the same type of defects. Among them, the feature information of tiny defects after high abstraction is easily lost, making it difficult to be accurately located. (3) Low contrast: defect detection largely depends on the imaging effect of the features in the image, which is particularly important for learning-based methods via deep feature extraction. Figure 1(c) presents that detecting the weak scratch in the yellow box is more difficult than detecting the hole in the red box.

Details are in the caption following the image — Open in figure viewer PowerPoint

These challenges, which are common during surface quality inspection of industrial products, are gradually being addressed because of the effective progress achieved by numerous extensive and in-depth research works in the field of defect detection. Jin et al. [4] resolved the problem of detecting defects inside castings by combining traditional image processing with a classifier. Similarly, Wu and Lu [5] used support vector machines to complete the inspection of product packaging appearance quality. Other methods based on manually crafted features have also achieved good results [6–8]. However, these methods are weak when facing multiple types of defect features. They also fail to represent the detection results effectively (particularly for the minor defects). Deep learning-based methods have recently succeeded in computer vision tasks. Deep learning-based methods have been proved to be a more efficient solution than traditional image processing or machine learning methods [9, 10]. Thus, introducing deep learning methods into the field of surface defect detection has become increasingly favored. At present, deep learning approaches based on the convolutional neural network (CNN) are used to accomplish the quality control tasks for various industrial products through detection models with diverse network architectures. For shape anisotropy, the model must have strong deformation modeling capabilities to capture position shifts during visual perception [11]. However, the standard convolution in most common feature extraction networks does not have the expected adaptive ability. Inspired by [12], this study introduces deformable convolution in the feature extraction module to accommodate irregular variations in the shape of defects. Thus, adjusting the feature sampling area correspondingly to the shape information of the respective defect becomes possible, thereby reducing the background information contained in the defect area. For scale difference, the features at different levels of CNN involve different object scales [13], whereas the previous approaches [14, 15] focusing on feature learning at the end of the network lack small object information contained in the shallow features, further leading to unsatisfactory detection performance. A feature pyramid module is used in this study to address the issue by fusing multiple feature maps at different scales, thereby obtaining semantic information between different levels and enhancing the ability to recognize tiny defect features. For low contrast, the faint defect information in the image causes the model to have a problem from the initial feature extraction phase [16]. Moreover, the only remaining residual features gradually fade away with the deepening of the network. In this scenario, the model should have excellent effects at the primary phase of feature extraction and be biased toward the region of interest. Therefore, a guided filter is used in this study as an image enhancement method to highlight low-contrast defective features. The attention module is added to the feature fusion process to assure that the enhanced semantic information can be effectively transmitted.

Based on the above theoretical developments and research analysis, an automatic inspection method for surface defects in pharmaceutical capsules using deep learning is proposed in this study. The main contributions of this article are listed as follows:

(1)
In the data preparation phase, the guided filter-based image enhancement method is used to the low contrast of defects. A novel hybrid data augmentation method is proposed to improve the generalization ability of the training model. This efficient way of processing raw image sensor data from quality and quantity perspectives can provide a solution for relevant research when facing similar situations
(2)
A defect detection model named CapsuleDet based on deformable convolution and attentional fusion feature pyramid is proposed to adapt to the differences in shape and scale of surface defects in pharmaceutical capsules. It copes well with irregular variations in defect shapes and can explore rich feature information in multilevel feature fusion, thereby making a reference for the field of multiclass minor defect detection
(3)
Compared with other typical detectors, such as SSD, Faster R-CNN, YOLOv3, FCOS, and Cascade R-CNN, the proposed method in this article has certain advantages in terms of comprehensive performance, including training time, model size, detection accuracy, and inference speed. Therefore, it has an excellent chance to complete deployment in the actual production process

The rest of this article is organized as follows. Section 2 presents an overview of the related works. Section 3 provides the details of the proposed methods. Section 4 describes the experimental details and the corresponding discussions. Finally, Section 5 concludes this article.

2. Related Work

In the past, surface defects in pharmaceutical products were mainly identified by manual visual inspection, which was too costly and inefficient in meeting the demands of quality control [17]. Nowadays, automated inspection technology based on visual perception is widely used for quality control tasks in industrial manufacturing [18], commodity packaging [19], and fruit grading [20] by replicating and surpassing the effects of the human eye. Therefore, surface quality automatic inspection, which can be mainly divided into two categories, is undoubtedly the ideal solution to ensure high throughput and high reliability of products in the pharmaceutical industry.

2.1. Traditional Detection Approaches

The related enterprises are eager to transition and upgrade from manual to automated detection to safeguard the quality of pharmaceutical products while reducing labor costs [21]. Many research works for the early exploration of automated detection have used the most common traditional detection approaches in machine vision to monitor pharmaceutical production quality, including two categories of methods: image processing and traditional machine learning.

2.1.1. Image Processing-Based Approaches

Mozina et al. [22] established a statistical model for pharmaceutical tablets using image processing algorithms, which were used to evaluate the quality of imprints by comparing the tablets to be inspected with the qualified tablets. Similarly, Kekre et al. [23] used a series of image transformation methods to count image information, such as greyscale, color, and texture, to determine the differences among defect features on the surface of pharmaceutical capsules and to classify various defect types. Gabriele et al. [24] accomplished the detection of defects, such as cracks on the surface of pharmaceutical glass tubes, using image filters and threshold segmentation algorithms.

2.1.2. Traditional Machine Learning-Based Approaches

Wang et al. [25] used the support vector machine (SVM) to achieve the binary classification of pharmaceutical capsule head defects based on image binarization. For multiple types of defects, Qi and Jiang [26] used the Canny edge-detection algorithm to extract features of defect areas in pharmaceutical capsules and then completed multiclass classification by the hierarchical SVM. Bahaghighat et al. [27] proposed a method for counting the number of cards within drug packages based on machine learning, which can be used to identify the accuracy of drug distribution on the production line.

These approaches achieved good performance in feature description and defect detection by relying on statistical modeling or cluster analysis. However, most of them are applied to homogenous textures or salient features, and they heavily depend on expert knowledge. When faced with complex textures and multiclass defects, these methods are also complex and do not have good robustness.

2.2. Deep Learning-Based Detection Approaches

Unlike traditional detection approaches, deep learning methods supported by massive data information and powerful computing power can learn autonomously and determine representation effectively [28]. Moreover, they are flexible and stable in processing complex semantic information. Thus, they have the potential to replace traditional methods in the field of quality inspection and broad development prospects. At present, deep learning-based detection approaches have two main types: image classification and object detection.

2.2.1. Image Classification-Based Approaches

At present, deep learning methods in image classification mainly use CNN-based classification networks, which include two parts: feature extractor (composed of the cascaded convolutional and pooling layers) and classifier. The images for inspection are processed by the well-trained classification model, which can output the class of images and its class confidence. Accordingly, Zhou et al. [29] proposed a CNN-based defect detection model for pharmaceutical capsules. This model can classify various capsule defects, such as dent, hole, and stain. However, it cannot locate the defects precisely. Other researchers have also used this method for surface quality inspection tasks on manufactured products, such as steel [30] and fabric [31], and have achieved good results. Although image classification approaches can achieve great classification accuracy and high stability, they are only applicable in detecting single defect types. Representing accurately the specific semantic and location information of each defect in the image is difficult when a single image contains several defects of the same or different types.

2.2.2. Object Detection-Based Approaches

Compared with image classification-based approaches, object detection-based approaches that use CNN-based detectors for quality inspection tasks have the ability of manual visual inspection in the traditional sense of defect detection (i.e., to achieve defect localization and classification). The current deep learning-based object detection models can be roughly divided into two categories: two-stage methods (e.g., Faster R-CNN [32] and Cascade R-CNN [33]) and one-stage methods (e.g., SSD [34], YOLOv3 [35], and FCOS [36]). The common idea of both is to first abstract the important semantic features of the image by a specially designed feature extractor, so as to obtain a deep representation effect, and then complete the category division and border localization of the object by the classifier and regressor, respectively. The difference between the two methods is determined by whether the generation phase of the candidate object boxes is contained. The former focuses on detection accuracy and uses this phase to perform an initial screening of objects, which provides high-quality localization options for subsequent classification and localization, and also increases the detection time to a certain extent. Conversely, the latter further relies on well-designed feature fusion operations to directly provide accurate semantic information for subsequent processing, which enables it to guarantee detection accuracy while having certain advantages in detection speed. Therefore, both methods have relatively wide application in the existing research of product surface defect detection when facing diverse practical needs. In the early exploration of drug defect localization, a method based on R-CNN for pharmaceutical capsule defect identification and localization was proposed in [37]. However, this method only achieves the defect detection of two types: scratch and stain. Zhang et al. [38] proposed a detection method for foreign particles in liquid pharmaceutical products. This method first locates the suspect objects with Faster R-CNN and then classifies the foreign particles and noise with the random forest classifier. However, it does not achieve a classification of specific classes for foreign particles because of the various uncertainties in the appearance features.

In general, these existing research works must be improved based on the following aspects: (1) Most methods cannot locate defects in pharmaceutical capsules. (2) The few methods that can locate defects achieve only a few types of detection. (3) These methods are also only used for large or salient defect features. Therefore, a deep learning-based surface quality inspection method for pharmaceutical capsules is proposed in this study. In contrast to the previous methods, the proposed method in this study can accomplish the accurate classification and localization of multiclass defects while effectively addressing the existing challenges.

3. Methods

3.1. System Overview

Nonplanar defect inspection of pharmaceutical capsules can generally be regarded as an object detection task for multiview images. We focus on ensuring the detection performance of the proposed method in this study. Therefore, the detection scheme of the method includes three parts: (i) image collection, (ii) data processing, and (iii) defect detection, as shown in Figure 2.

(1)
First, image collection is the primary link in the task of surface defect detection. However, direct irradiation by light sources caused by the easily reflective property of the capsule surface leads to local highlights in the image, which interfere with the feature extraction and learning effect of the model. Therefore, we propose an image collection scheme, as shown in Figure 2. The collection process is illuminated by the lateral LED light source. A cubic photomask composed of light diffuse plates is designed and processed to avoid image highlights. This cubic photomask feeds the capsule with transmitted light. This process can prevent light spots on the raw image captured by the image sensor due to the reflection of bright light on the capsule surface
(2)
Then, the link of data processing after the completion of image collection is set in this study to address the shortcomings of the original data in terms of quantity and quality. Therefore, this link can provide a data guarantee for the subsequent network training phase, thereby enabling the model to have good convergence and detection accuracy
(3)
Finally, a surface defect detection model based on deep learning is designed for the characteristics of the pharmaceutical capsule dataset to classify and locate multiclass defects in capsules accurately. The specific method is described in detail below

3.2. Image Enhancement

As mentioned earlier, the greyscale information in some of the defect areas is interfered with by the scattering of the imaging light source, resulting in a low contrast between the areas (i.e., some texture of the capsule defect is not clearly visible in the raw image). This scenario affects the accuracy of subsequent manual annotation, thereby weakening the feature extraction and learning effect of the detection model. The image must be enhanced to increase the visibility of defect contours. Instead of the ordinary isotropic filter, a guided filter [39], as an image processing algorithm with preserved edge information, can process noise while avoiding important textures and details being filtered out of the image. As shown in Figure 3(a), the algorithm requires a weighting operation based on the guide image I for the raw image p to obtain the filtered image q. The output q_i of the filter can be defined as

(1)

where i and j are pixel indexes and the filter kernel ∑_jW_ij is the weight to be determined by the guide image I. In particular, we choose the raw image p to be the corresponding guide image I for achieving the effect of preserving the edge detail of the defective region.

After the smoothing process of the guided filter, we apply the subtraction operation between the raw image p and the filtered image q to obtain the specific details and general contours of the defect area. Then, we perform the enlargement operation on the defect area and finally add it to the filtered image q to make the defect contours and textures clear. The obtained enhanced image I_e is as follows:

(2)

where δ denotes the adjustment factor, which is set to 4.5 in this study. After these two steps, a visual comparison between Figures 3(b) and 3(c) shows that the fogging effect of the defects in the raw image is reduced, and the contour details and texture information are enhanced, both from the perspective of human-eye inspection (represented by visual presentation) and machine recognition (represented by pixel gradient distribution). These findings suggest that the guided filter algorithm used in this study can considerably improve the contrast between the defects and the background, thereby promoting the subsequent manual annotation efficiently to ensure the reliability of the training dataset and enhance the identifiability of the defects.

3.3. Data Augmentation

Detection methods based on deep learning usually require a large amount of data to overcome the overfitting phenomenon of the model during the training process. However, collecting numerous high-quality defect images during the actual production process is difficult because of the workshop environment, production equipment, and other factors. Therefore, motivated by [40], we propose a hybrid data augmentation method for object detection in this study. It consists of two parts: transformation and mix-up. First, the linear transformation (rotation or flip) and spatial transformation (noise addition or brightness variation) are used to change the position information and semantic information of the objects in the raw image space, respectively. Then, the mix-up operation randomly blends the two transformed images to generate a new sample image, thereby enhancing the variety and validity of sample data (as shown in Figure 4). In particular, the mix-up operation can be formulated as

(3)

where x_i and x_j denote two randomly selected images to be processed,

denotes the image after the mix-up operation, and μ denotes the scale factor for blending, which is set to 0.5 in this study.

With 7380 sample images, the amount of available data is increased two times through data augmentation operation. Compared with the commonly used single transformation method, the hybrid approach not only increases the training samples but also expands the type and number of objects contained in a single image, which can further enhance the generalization performance of the model while avoiding poor model fitting.

3.4. Defect Detection

We choose the one-stage detector technology route to combine the detection advantages of speed and accuracy in designing our CapsuleDet, which can be adapted to the actual rhythm of capsule production and control the application cost of detection technology. In the following, the construction of the network structure is introduced in detail, and the two main functional modules, namely, the deformable convolution module and the attentional fusion module (AFM), are described separately.

3.4.1. Network Architecture

A one-stage object detector is commonly made up of a backbone network, a detection neck that is typically a feature pyramid network (FPN), and a detection head for object classification and localization. The CapsuleDet proposed in this study also consists of these three parts, and its overall architecture is shown in Figure 5. The corresponding detection components are introduced as follows.

(1) Backbone. We choose ResNet50 [41], which is widely used for computer vision tasks, as the feature extractor for our CapsuleDet. It consists of five feature layers, namely, {C1, C2, C3, C4, C5}. Figure 5 shows that we only use C3, C4, and C5 for subsequent feature fusion to trade off the memory usage against efficiency. We introduce the deformable convolutional network (DCN) to enhance the effectiveness of the extracted features for various defect shapes by replacing the 3 × 3 convolution operation in the three output layers on the basis of cost control.

(2) Detection Neck. The main function of the detection neck is to process the features extracted from the backbone network at all levels to fuse and utilize the semantic information and geometric details contained in the features and to lay a solid foundation for subsequent classification and regression of the detection head. We opt for the pyramid-style feature fusion module to accomplish this task. Figure 6 illustrates the comparison of the attentional fusion FPN (AF-FPN) proposed in the study with different classical FPNs. Based on [42], the designed AF-FPN uses shared convolution operations and interpolation operations to exchange the same feature map size between the upper and lower feature layers P_i+1, P_i−1 and feature layer P_i, which can ensure the consistency of the corresponding semantic features while avoiding the excessive computation caused by independent convolution in traditional FPN. We use the attention-based feature fusion module instead of the traditional element-wise addition operation to enhance the effectiveness of the last crucial step in FPN (i.e., feature fusion). The details of which are described subsequently.

(3) Detection Head. CapsuleDet has a very simple detection head. As shown in Figure 5, it consists of two branches: classification and regression. Each of these contains only one layer of 3 × 3 convolution for the normalization of the features from the FPN, unlike the typical one-stage detector with four convolutional layers in the detection head. This setting is chosen because we determine in our experiments that the number of convolutional layers has little effect on the detection accuracy of the capsule dataset and that additional convolutional layers increase memory usage and detection time. Each branch uses one layer of 3 × 3 convolution for classification or regression, where C denotes the number of categories for classification and 4 represents the predicted value for bounding box localization. For the loss function, we adopt the scheme proposed by generalized focal loss (GFL) [46], which is described as follows.

The loss calculation for the classification branch is completed by the quality focal loss, which is formulated as

(4)

where y is the quality label of the classes from 0 to 1, σ is the predicted results, and β is the adjustment factor, which is set to 2 in this article.

The regression branch loss calculation is accomplished via GIOU loss [47], which is defined as

(5)

where I represents the area of the intersection between the prediction box A_p and the annotation box A_g, U denotes the area of the concatenation between the two boxes, and A_c refers to the area of the smallest enclosing box between A_p and A_g.

Distribution focus loss is used to enable the network to focus quickly and accurately on the adjacent of the labeled location during the regression phase by optimizing the distribution of the prediction boxes before regression, which is calculated as

(6)

where i is the index value of distribution, b is the location information of labeled boxes, S_i, S_i+1 denote the detected boxes adjacent to b, and b_i, b_i+1 are the location information of the corresponding distribution box.

Finally, the total loss function can be expressed as

(7)

where N_pos is the number of positive samples, z denotes the localization objective of the pyramidal feature map at each level,

is the indicator function, being 1 if c_z > 0 and 0 if otherwise, and φ₀, φ₁ denote the loss weights, which are set to 2 and 0.25, respectively, in this study.

3.4.2. Deformable Convolution Module

Restricted by the default sampling method (matrix sampling) of traditional CNN, the pure standard convolution lacks the adaptive ability to the geometry of the input object when processing the image, making its operation process suffer from certain transformation defects. As such, it falls into a bottleneck of difficulty in improving accuracy when identifying irregular defects. For the network to have certain invariance in the process of affine transformation, it is usually necessary to change the original sampling method as a way to enhance the deformation modeling capability of ordinary convolution. Therefore, by adding offset prediction to the sampling points of conventional convolution to obtain new sampling points [12], DCN can use the offset to learn how to adaptively adjust the sampling area after the shape features of the instance objects are processed by additional convolution layers. Inspired by this, we apply DCN to the feature extraction module in the backbone network.

As illustrated in Figure 7(a), the sampling offset is predicted in parallel to adjust the regular sampling area by incorporating an additional convolution operator. More specifically, in the implementation of the deformable convolution operator, an additional convolution layer is added after the input feature map to obtain an offset field of the same size as the input feature map. Note that if the 3 × 3 convolution is implemented, its channel dimension direction includes 3 × 3 two-dimensional offset weights (as a representation of the bias in the x and y directions). And each position in the offset field represents the offset of the original convolution kernel from the corresponding position on the input feature map. When the final deformable convolution calculation is performed, the offset at the corresponding position will be added to the sampling position of the original convolution kernel to obtain the sampling position after the offset. Then, the output feature map can be calculated by the same feature sampling procedure as the standard convolution. Compared with the standard convolution, the deformable convolution is therefore allowed to learn the offsets of diverse defect shapes effectively and focus intensely on the semantic regions of interest to improve the network’s own ability in modeling deformation. Furthermore, Figure 7(b) simulates the difference in sampling between the two when dealing with irregular defective regions in the capsule. The use of deformable convolution instead of a standard one can effectively improve the effect of feature extraction. However, the extra convolution operator inevitably increases the computational effort, thereby prompting us to consider the balance between accuracy and speed when introducing DCN.

3.4.3. Attentional Fusion Module

The main idea of FPN is to form multiscale feature blocks through feature fusion [48] (i.e., the aggregation of features from various levels or branches), and its ultimate aim is to improve the generalization of the network to different scale objects. However, most classical FPNs usually implement feature fusion at different levels only by simple operations (e.g., summation or concatenation), which may result in the loss of the combined focus for contextual semantic information during the fusion of features. We propose AFM to combine effectively the adjacent features that are inconsistent in semantics and scales.

As shown in Figure 8, the input to the AFM consists of the low-level feature X and the high-level feature Y (both of which have been previously processed by the interpolation operation to the same feature map sizes H × W). In the early stages of fusion, X and Y first perform a summation operation to obtain the primary feature fusion map z. Then, the feature map z enters the local attention channel and the global attention channel on the left and right sides (with the channel reduction factor r set to 4). The outputs of the two channels are summed again, and the channel dimension is raised through 1 × 1 convolution to obtain the contextual information of the refined features. Finally, the fused feature maps from the attention channel are transformed by the Sigmoid function into attention weights ∆(αX + βY) and combined with the original X and Y. The final fusion feature map Z generated by the AFM is defined as

(8)

where α and β are the ratio factors for low-level feature X and high-level feature Y, respectively, which are both set to 0.5 in this study. All convolution operations in AFM are followed by batch normalization.

4. Experiments

4.1. Dataset

The experimental results of the CapsuleDet presented in this study were evaluated on the capsule defect dataset that we constructed. In this dataset, 7380 capsule images are included, along with five common types of capsule surface defects (i.e., dent, scratch, hole, stain, and gap). Each image has a resolution of 512 × 512. The detection task of the dataset is to classify and locate the corresponding defects as accurately as possible while determining whether the capsule is good or bad in the image. A common dataset labeling tool (LabelImg) was used to label the capsule dataset to annotate data. During this time, the specific capsules and the corresponding defects were labeled by professionals in the form of rectangular boxes, which contain the respective coordinates and category information.

In addition, the annotation information was preserved as XML files in PASCAL VOC format. Each image corresponds to an XML file. Table 1 shows the distribution of various types of data in our capsule dataset. In particular, we randomly divided the entire dataset into training and testing data in a ratio of 8 : 2 and avoided the uneven distribution of various defects, ensuring the fairness of subsequent experimental comparisons.

1. Distribution of the capsule dataset.

Dataset	Number of images	Number of capsules		Number of defects
Dataset	Number of images	OK	NG	Dent	Scratch	Hole	Stain	Gap
Training	5904	5994	8006	1431	1636	1679	1573	1687
Testing	1476	2221	2754	445	581	587	557	584
Total	7380	8215	10760	1876	2217	2266	2130	2271

4.2. Evaluation Metrics

General research comparisons only evaluate model performance by various types of accuracy metrics. Considering the actual landing demands for engineering applications, we chose the following four aspects in performing the multidimensional evaluation.

4.2.1. Training Time

For deep learning-based approaches, the length of training time determines the speed of model updating. In particular, the shorter the time spent on training, the more efficient is the learning of the new classes for the model. Therefore, the training time of the detection model was recorded to evaluate its adaptability when facing complex and changeable situations in industrial manufacturing.

4.2.2. Model Size

In practical industrial applications, the matching costs for the corresponding detection technology should be as low as possible to cater to the hardware facility conditions. As the direct object of the transfer from the experimental phase to the industrial environment, model size affects the compatibility of the application client. Thus, we used the memory size occupied by the weight parameters of the model after training as an important basis for evaluating this metric.

4.2.3. Detection Accuracy

As one of the indispensable evaluation metrics in reflecting the superiority of detection results, detection accuracy can demonstrate the direct effectiveness of the model for capsule datasets. Therefore, mean average precision (mAP) was used to measure the overall detection accuracy of the model for the capsule dataset.

4.2.4. Inference Speed

Detection speed is another essential factor in evaluating model performance that cannot be ignored to fit the actual production rhythm. Hence, the detection speed (i.e., the number of detectable capsule images per second) of the model during inference was evaluated using frames per second (FPS).

4.3. Implementation Details

4.3.1. Computation Platform

The hardware configuration for the experiments is an Intel Xeon Silver 4210R @ 2.40 GHz processor and an NVIDIA GeForce RTX 2080Ti (with 11 G memory). The software environment includes a 64-bit Windows 10 operating system, the open-source deep learning framework PyTorch, and the integrated development environment PyCharm.

4.3.2. Parameter Setting

The initial parameters of the models for all methods at the beginning of training were derived from the corresponding backbone network weights that have been pretrained on the ImageNet dataset to shorten the respective training cycles. The specific training details and hyperparameter settings are shown in Table 2. For the model training scheme, we set the epoch size to 40 and the initial learning rate to 0.001. The learning rate was also adjusted via piecewise decay to ensure that each model could achieve convergence. Given the GPU memory limitations, the batch size was set to 1 for the two-stage method to drive the model training properly and 8 for the rest of the methods. The optimizer during training is Momentum, whereas the parameter regularization method is L2. In addition, the warmup strategy was used for model training to ensure the stability of the later learning.

2. Hyperparameters during the model training.

Parameter	Value
Epoch size	50
Batch size	8
Learning rate	0.001
Decay_factor	[0.5, 0.1, 0.05, 0.01]
Decay_milestones	[5, 15, 25, 40]
Warmup_factor	0.1
Warmup_steps	1000
Momentum	0.9
L2_factor	0.0001
Score_threshold	0.025
NMS_threshold	0.6

4.4. Main Results

4.4.1. Comparison with Baseline

We first compared the proposed CapsuleDet with the baseline model GFL [46] to examine the model performance of the former. Table 3 shows the overall performance comparison between the two. The overall performance of CapsuleDet is better than that of the baseline owing to the lightweight design of the network structure and the addition of functional modules. CapsuleDet discards the two scale feature maps (P6 and P7) derived from the C5 feature layer, and the number of convolution layers used by the detection head is reduced, thereby making the model light. Therefore, compared with those of the baseline, the training time and the FPS during inference of our CapsuleDet are reduced by nearly 0.5 h and improved by approximately 3.6, respectively. CapsuleDet introduces DCN and AC-FPN with shared convolution to ensure model accuracy, effectively improving the mAP of the model by approximately 2.2% while avoiding an excessive increase in model parameters.

3. Overall performance comparison of our CapsuleDet with baseline.

Methods	Backbone	Training time (h)	Parameters (M)	mAP (%)	FPS (Hz)
Baseline [46]	ResNet50	3.76	186.16	90.67	17.58
Our CapsuleDet	ResNet50+DCN	3.28	182.64	92.91	21.16

As shown in Figure 9, the classification confidence of CapsuleDet is higher than that of the baseline when the two localization effects are similar. The localization accuracy of CapsuleDet is higher than that of the baseline when the difference in classification results between the two is small. As shown in Figure 9(b), the baseline model incorrectly determines the corresponding defective capsule as a defect rather than a bad capsule while detecting a defect. Figure 9(c) presents its missed detection when facing minor defects. By contrast, our CapsuleDet solves these problems well and illustrates the effectiveness of the model in detecting objects with different scales.

4.4.2. Comparison with Currently Popular Object Detectors

Table 4 shows the overall effectiveness of our CapsuleDet compared with modern mainstream methods, including two-stage versus one-stage methods. The performances of our model in all evaluation aspects, which are described below, are better than those of modern mainstream methods. It is worth noting that the backbone of each popular method follows the optimal framework proposed in their original studies, and the training settings of the comparison process are all consistent to ensure the fairness of the experiments.

4. Comparison of our CapsuleDet with other fashionable detectors.

Methods	Backbone	Training time (h)	Parameters (M)	mAP (%)	FPS (Hz)
TTFNet [49]	DarkNet53	4.12	263.35	79.96	26.57
SSD [34]	VGG16	4.82	141.37	84.05	24.76
Faster RCNN [32]	ResNet50	13.48	189.76	88.07	5.45
FCOS [36]	ResNet50	5.97	186.22	89.78	14.27
Cascade RCNN [33]	ResNet50	9.26	398.87	90.85	12.33
YOLOv3 [35]	DarkNet53	4.46	353.94	91.10	30.15
Our CapsuleDet	ResNet50+DCN	3.28	182.64	92.91	21.16

(1) Training Time. The two-stage method takes much more time to train than the one-stage method does because of the separation of the candidate box generation and selection stages from the subsequent classification and regression phases. Compared to the one-stage method, our CapsuleDet has 0.84 h less training time than the suboptimal method TTFNet [49] after several lightweight operations. Moreover, compared with the accuracy of the suboptimal method YOLOv3, CapsuleDet has an advantage of 1.18 h. Thus, CapsuleDet is highly efficient in learning and can quickly adapt to the identification and localization of new classes of defects for complex and changing manufacturing environments.

(2) Model Size. CapsuleDet has 41 M more weight parameters than the minimal model SSD has. However, compared with the memory usage size of other models, particularly the accuracy suboptimal method YOLOv3, the memory usage size of our CapsuleDet is only nearly half. Therefore, our model can be configured at a low cost for the same hardware facilities and easily arranged for the industrial application side.

(3) Detection Accuracy. As seen in Table 4, CapsuleDet has the best detection accuracy in this important detection metric. In particular, our CapsuleDet outperforms the training time suboptimal method TTFNet by approximately 13% mAP and by roughly 8.9 points compared to the model SSD with the minimum number of parameters. Importantly, CapsuleDet also has an advantage of 1.8 points over the accuracy of the suboptimal method YOLOv3.

(4) Inference Speed. The introduction of DCN inevitably reduces the inference speed of the model. However, CapsuleDet still has a speed advantage over the two-stage methods of the RCNN family, despite the difference of nearly 9 points compared to the suboptimal accuracy method YOLOv3. However, the difference in the inference speed between these models is minor compared to other one-stage methods. Moreover, our CapsuleDet has an inspection speed of 21.16 FPS to meet the actual capsule production rhythm.

4.4.3. Comparison with the Classical FPN

We compared the AF-FPN proposed in the study with the current classical FPN to examine the effectiveness of the former. The specific structure of each type of FPN is shown in Figure 6. We only switched the feature fusion module in the detection neck part of CapsuleDet accordingly, and the rest of the network structure and experimental parameter settings were kept the same to ensure the fairness of this comparison experiment. The results of the comparative experiments are shown in Table 5. The specific conclusions can be summarized as follows:

(i)
In terms of training time and model parameters, compared with the independent convolution in other feature fusion modules, the shared convolution used in our AF-FPN can effectively reduce the training time and compress the model size to a certain extent
(ii)
In terms of detection accuracy and speed, the single top-down fusion path (i.e., FPN) lacks effective use of low-level features. However, the addition of other bottom-up augmented paths (i.e., PAN and Bi-FPN) can effectively aggregate semantic information from the upper and lower feature layers to improve the detection accuracy, but the increase is less than 1 point, and it also slows down the inference speed in a certain extent
(iii)
In general, similar to the bidirectional fusion path, AF-FPN chooses the interactive fusion of the upper and lower feature layers to maximize the use of the feature information at all levels. Meanwhile, we chose to weigh the fusion operation with an attention module to avoid the semantic information of different feature layers from being confused in the fusion phase by ensuring that the fusion process was performed correctly. Although the AFM reduces the inference speed, this effect is relatively minor. Compared with PAN and Bi-FPN, the detection accuracy of our proposed optimized version of the feature pyramid module (AF-FPN) is improved by 1.1 points over the baseline module FPN

5. Comparison of AF-FPN with other classical feature fusion modules.

Feature fusion	Training time (h)	Parameters (M)	mAP (%)	FPS (Hz)
FPN [43]	3.37	185.21	91.78	22.97
PAN [44]	3.55	186.47	92.16	21.42
Bi-FPN [45]	3.48	188.63	92.37	22.53
Our AF-FPN	3.28	182.64	92.91	21.16

4.5. Ablation Experiments

4.5.1. Ablation Studies of Data Augmentation

To analyze the impact of data augmentation, we first compared the single transformation method with the hybrid data augmentation proposed in the article. Before the experiment, the training data used in both schemes were augmented to the same amount by the respective method. And the testing data contain both types of augmented data as a way to promote a fair comparison.

As shown in Table 6, compared with the single transformation method, the hybrid data augmentation method has data diversity that enables the model to be trained effectively under defective samples with rich semantic information while not affecting the training efficiency, memory usage size, and inference speed of the model. Specifically, it can deliver an accuracy advantage of approximately 1.2%. And Figure 10 further demonstrates the detection effect of both data augmentation methods on the same test image. As can be seen, compared to the single transformation method, the hybrid enhancement method enhances the localization effect of minor defects and the detection rate of low contrast objects by the data diversity. Meanwhile, in the case of overlapping capsules, the single transformation method is limited by its monotonicity, which leads to false detection or missed detection of overlapping capsules. In contrast, the hybrid augmentation proposed in the article can solve this problem well. The multiple expansions of quantity and object prove the effectiveness of the hybrid data augmentation method in prompting model learning and fitting.

6. Comparison of single transformation augmentation and hybrid data augmentation.

Augmentation method	Training time (h)	Parameters (M)	mAP (%)	FPS (Hz)
Single	3.28	182.64	91.67	21.16
Hybrid	3.28	182.64	92.91	21.16

4.5.2. Ablation Studies of Model Components

A series of ablation studies were performed to analyze the importance and the corresponding contribution of the various components in CapsuleDet. As shown in Table 7, the experiments were conducted by incrementally adding the hybrid data augmentation, deformable convolution, feature pyramids with normal fusion, and feature pyramids with attentional fusion on CapsuleDet, thereby observing the effect of each component.

7. Contribution of four components in our CapsuleDet. DCN, NF-FPN, and AF-FPN represent deformable convolution, feature pyramid with normal fusion, and feature pyramid with attentional fusion, respectively.

DCN	NF-FPN	AF-FPN	Training time (h)	Parameters (M)	mAP (%)	FPS (Hz)
			2.58	170.73	89.75	23.29
✓			3.23	174.94	91.56	22.63
✓	✓		3.25	181.78	92.49	22.24
✓		✓	3.28	182.64	92.91	21.16

Several valuable conclusions can be drawn from the results in Table 7: (i) The efficacy brought by the addition of each component is not conflicting. The progressive improvement in mAP shows that the power of each module can effectively reduce the difficulty of detecting capsule defects. (ii) Although the lightweight operation degrades the performance of the baseline model, the introduction of deformable convolution greatly enhances the performance of the model in detecting multiscale defects, by approximately 1.8 points of accuracy improvement compared to the original model. It makes the model worth the cost in terms of the increased number of parameters and computation. And compared to not using attentional fusion, feature pyramids with attention can further improve the detection accuracy by enhancing feature association. (iii) In automatic defect detection, the real-time performance of the detector is also considered an important metric. In the last column of Table 7, the inference speeds of the model for capsule defect detection under different configurations are compared. Although adding the designed module to the model prolongs the inference time, CapsuleDet has no considerable reduction in FPS compared with the incomplete model. Moreover, CapsuleDet can still meet the time requirements of the existing tasks. Furthermore, compared with those of the currently popular detectors and baseline models, the increase in training time and model memory usage of CapsuleDet remains within acceptable limits.

4.5.3. Ablation Studies of DCN

As mentioned earlier, DCN inevitably affects the performance of the model in other aspects while increasing accuracy. We conducted experiments on permutations between the three feature layers {C3, C4, C5} to explore a good choice for introducing DCN in the backbone network ResNet50 by observing the balance point of CapsuleDet performance in each aspect. Among them, the original remaining training parameters and model configuration remain unchanged. However, the setting of the DCN introduction is different. Figure 11 illustrates the performance effect of introducing DCN at different stages on CapsuleDet.

Figure 11(a) shows that the training time and memory usage of the model gradually increases with the introduction of DCN in other feature layers. The training time of the model when the DCN is introduced into the feature layer containing C5 is shorter than that in the opposite case. In addition, each increase in the number of DCN introduced results in an increase in memory usage of less than 2 M, which is negligible compared with the model size of the majority of the currently popular detectors. Figure 11(b) shows that the additional DCNs result in not only increased accuracy but also a corresponding slowdown in inference speed. Although the trends of increase and decrease are similar, the loss of less than 0.6 FPS (i.e., approximately 0.6 more capsule images can be detected per second) appears minor compared with the implication of the accuracy gain. Therefore, the introduction of DCN in the feature layers {C4, C5} is the good choice for our CapsuleDet training configuration to obtain the highest detection accuracy possible while considering the training time factor. The previous experiments were also conducted under this choice.

4.6. Further Analysis and Discussion

From the above experimental results, it can be concluded that the proposed method has a more balanced and comprehensive performance under multiple metrics assessment. The performance leadership of the method cannot be separated from the guidance of diversity data, the adaptation of different shape, and the fusion of multiscale features. Among them, the hybrid data ensures the diversity of the learning process and lays the foundation for the model to handle defect detection of multiple objects. The introduction of deformable convolution effectively enhances the self-adaptive ability of the model to cope with different defect shapes and improves its detection effect of multiple types of defects. And the multiscale fusion mechanism enables the network to adapt to different defect scales by enriching the semantic information of each stage. These are the strengths of the proposed method in this article over other methods.

However, our method continues to have failure cases when facing some difficult samples. As can be seen in Figure 12(a), our method suffers from missing detection in detecting some local defects. This situation may occur in the detection process of low-contrast defects under weak illumination or noise interference. Although the model is able to determine its entirety as an unqualified capsule, it is difficult to filter and locate the specific defect location from the network. This is mainly because with the in-depth extraction of features by the network, the location information of the defects under light or noise interference is gradually lost, which affects the detection effect of the model. In addition, the model may also have a false detection situation as shown in Figure 12(b). When the defective capsules overlap with the qualified capsules, if the junction area is close to the defect, it will cause the network to regard the originally qualified capsules as having defects, which will cause the interference of discrimination. Whether missing detection or false detection, these confusing situations were not expected earlier. For this, the above problems are mainly attributed to the oversimplicity of the detection head, which makes the network too sensitive to complex interference variations. And the robustness of the model can be further extended by enhancing the traceability and differentiation of the location and semantic information output with the detection head. This will be the future work that we need to investigate.

5. Conclusions

Given the practical landing needs of engineering applications, a deep learning-based method for detecting surface defects in pharmaceutical capsules was proposed in this study. Through efficient processing of raw data (including quality enhancement and quantity augmentation) and well-designed detection model architecture (including lightweight operation and the addition of functional modules), our CapsuleDet achieves a more encouraging overall performance than the baseline models and existing popular detectors. In addition to its powerful recognition and localization capabilities, it has good training time and model size. Compared with the methods in the existing relevant research, which have few types of recognition and lack the ability to locate defects, our method can achieve the detection of multiclass capsule defects while discriminating between good and bad capsules. It can effectively improve the product appearance quality and market competitiveness of the relevant pharmaceutical enterprises.

Deep learning requires a huge amount of training data to produce continuous improvement in performance. Thus, it greatly increases the cost of collecting industrial data and the manual annotation cost during training. The use of few data for defect detection or automated sample annotation will be one of the future directions of our research. In addition, pixel-level segmentation of capsule defects via image semantic segmentation technique to characterize the defect shapes excellently is also in our plans for future work.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China (No. 2020YFB1713304), the National Natural Science Foundation of China under Grant 62166005, the Key Laboratory of Ministry of Education Project (No. QKHKY (2020) 245), and in part by the Guizhou University Research Projects under Grant [2019]22 and Grant [2020]16.

Open Research

Data Availability

Code and dataset have been made available at https://github.com/D-ai-Y/CapsuleDet.

References

1 Trippe Z., Brendani B., Meier C., and Lewis D., Identification of substandard medicines via disproportionality analysis of individual case safety reports, Drug Safety. (2017) 40, no. 4, 293–303, https://doi.org/10.1007/s40264-016-0499-5, 2-s2.0-85010780389, 28130773.
10.1007/s40264-016-0499-5
CAS PubMed Web of Science® Google Scholar
2 Hindelang F., Zurbach R., and Roggo Y., Micro computer tomography for medical device and pharmaceutical packaging analysis, Journal of Pharmaceutical and Biomedical Analysis.(2015) 108, 38–48, https://doi.org/10.1016/j.jpba.2015.01.045, 2-s2.0-84923102753, 25710902.
10.1016/j.jpba.2015.01.045
CAS PubMed Web of Science® Google Scholar
3 Yu L., Wang Z., and Duan Z., Detecting gear surface defects using background-weakening method and convolutional neural network, Journal of Sensors. (2019) 2019, 13, 3140980, https://doi.org/10.1155/2019/3140980.
10.1155/2019/3140980
Web of Science® Google Scholar
4 Jin C., Kong X., Chang J., Chen H., and Liu X., Internal crack detection of castings: a study based on relief algorithm and Adaboost-SVM, The International Journal of Advanced Manufacturing Technology. (2020) 108, no. 9-10, 3313–3322, https://doi.org/10.1007/s00170-020-05368-w.
10.1007/s00170-020-05368-w
Web of Science® Google Scholar
5 Wu Y. and Lu Y., An intelligent machine vision system for detecting surface defects on packing boxes based on support vector machine, Measurement & Control. (2019) 52, no. 7–8, 1102–1110, https://doi.org/10.1177/0020294019858175, 2-s2.0-85069930510.
10.1177/0020294019858175
Web of Science® Google Scholar
6 Jia L., Chen C., Liang J., and Hou Z., Fabric defect inspection based on lattice segmentation and Gabor filtering, Neurocomputing. (2017) 238, 84–102, https://doi.org/10.1016/j.neucom.2017.01.039, 2-s2.0-85011977346.
10.1016/j.neucom.2017.01.039
Web of Science® Google Scholar
7 Zhou Z., Wang C., Gao X., Zhu Z., Hu X., Zheng X., and Jiang L., Fabric defect detection and classifier via multi-scale dictionary learning and an adaptive differential evolution optimized regularization extreme learning machine, Fibres & Textiles in Eastern Europe. (2019) 27, no. 1(133), 67–77, https://doi.org/10.5604/01.3001.0012.7510, 2-s2.0-85060701001.
10.5604/01.3001.0012.7510
Web of Science® Google Scholar
8 Herrero F., Garcia D., and Usamentiaga R., Surface defect system for long product manufacturing using differential topographic images, Sensors. (2020) 20, no. 7, https://doi.org/10.3390/s20072142.
10.3390/s20072142
Web of Science® Google Scholar
9 Dong H., Song K., He Y., Xu J., Yan Y., and Meng Q., PGA-Net: pyramid feature fusion and global context attention network for automated surface defect detection, IEEE Transactions on Industrial Informatics. (2020) 16, no. 12, 7448–7458, https://doi.org/10.1109/TII.2019.2958826.
10.1109/TII.2019.2958826
Web of Science® Google Scholar
10 He Y., Song K., Meng Q., and Yan Y., An end-to-end steel surface defect detection approach via fusing multiple hierarchical features, IEEE Transactions on Instrumentation and Measurement. (2020) 69, no. 4, 1493–1504, https://doi.org/10.1109/TIM.2019.2915404.
10.1109/TIM.2019.2915404
Web of Science® Google Scholar
11 Liu Z., Tang R., Duan G., and Tan J., Truing Det: towards high-quality visual automatic defect inspection for mental surface, Optics and Lasers in Engineering. (2021) 138, article 106423.
Web of Science® Google Scholar
12 Zhu X., Hu H., Lin S., and Dai J., Deformable Conv Nets V2: more deformable, better results, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, Long Beach, CA, USA, 9300–9308.
Google Scholar
13 Guo C., Fan B., Zhang Q., Xiang S., and Pan C., Aug FPN: improving multi-scale feature learning for object detection, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, Seattle, WA, USA, 12592–12601.
Google Scholar
14 Ye J., Ito S., and Toyama N., Computerized ultrasonic imaging inspection: from shallow to deep learning, Sensors. (2018) 18, no. 11, https://doi.org/10.3390/s18113820, 2-s2.0-85056329124.
10.3390/s18113820
Web of Science® Google Scholar
15 Ouyang W., Xu B., Hou J., and Yuan X., Fabric defect detection using activation layer embedded convolutional neural network, IEEE Access. (2019) 7, 70130–70140, https://doi.org/10.1109/ACCESS.2019.2913620, 2-s2.0-85067244205.
10.1109/ACCESS.2019.2913620
Web of Science® Google Scholar
16 Guo R., Liu H., Xie G., and Zhang Y., Weld defect detection from imbalanced radiographic images based on contrast enhancement conditional generative adversarial network and transfer learning, IEEE Sensors Journal. (2021) 21, no. 9, 10844–10853, https://doi.org/10.1109/JSEN.2021.3059860.
10.1109/JSEN.2021.3059860
Web of Science® Google Scholar
17 Xu X., Vallabh C. K. P., Hoag S. W., Dave V. S., and Cetinkaya C., Early detection of capping risk in pharmaceutical compacts, International Journal of Pharmaceutics. (2018) 553, no. 1–2, 338–348, https://doi.org/10.1016/j.ijpharm.2018.10.052, 2-s2.0-85055634754, 30367987.
10.1016/j.ijpharm.2018.10.052
CAS PubMed Web of Science® Google Scholar
18 Yun J., Shin W., Koo G., Kim M., Lee C., and Lee S. J., Automated defect inspection system for metal surfaces based on deep learning and data augmentation, Journal of Manufacturing Systems. (2020) 55, 317–324, https://doi.org/10.1016/j.jmsy.2020.03.009.
10.1016/j.jmsy.2020.03.009
Web of Science® Google Scholar
19 David J. and Sidnei A., An intelligent vision system for detecting defects in glass products for packaging and domestic use, The International Journal of Advanced Manufacturing Technology. (2015) 77, no. 1-4, 485–494, https://doi.org/10.1007/s00170-014-6442-y, 2-s2.0-84925484161.
10.1007/s00170-014-6442-y
Web of Science® Google Scholar
20 Ren A., Zahid A., Zoha A., Shah S. A., Imran M. A., Alomainy A., and Abbasi Q. H., Machine learning driven approach towards the quality assessment of fresh fruits using non-invasive sensing, IEEE Sensors Journal. (2020) 20, no. 4, 2075–2083, https://doi.org/10.1109/JSEN.2019.2949528.
10.1109/JSEN.2019.2949528
Web of Science® Google Scholar
21 Duong N. M., Chew M. T., Demidenko S., Pham Q. H., Pham D. K., Ooi M. L., and Kuang Y. C., Vision inspection system for pharmaceuticals, Proc. IEEE Sensors Applications Symposium. (SAS), 2014, Queenstown, New Zealand, 201–206.
Google Scholar
22 Mozina M., Tomazevic D., Pernus F., and Likar B., Automated visual inspection of imprint quality of pharmaceutical tablets, Machine Vision and Applications. (2013) 24, no. 1, 63–73, https://doi.org/10.1007/s00138-011-0366-4, 2-s2.0-84872339519.
10.1007/s00138-011-0366-4
Web of Science® Google Scholar
23 Kekre H. B., Mishra D., and Desai V., Detection of defective pharmaceutical capsules and its types of defect using image processing techniques, Proc. International Conference on Circuits, Power and Computing Technologies, 2014, Nagercoil, India, 1190–1195.
Google Scholar
24 Gabriele A., Pierfrancesco F., and Cosimo A., Row-level algorithm to improve real-time performance of glass tube defect detection in the production phase, IET Image Processing. (2020) 14, no. 12, 2911–2921.
10.1049/iet-ipr.2019.1506
Web of Science® Google Scholar
25 Wang Q., Zhang T., Cai Z., Jiang N., Wu J., and Zhang X., Visual quality inspection of capsule heads utilizing shape and gray information, Journal of Electronic Imaging. (2015) 24, no. 6, https://doi.org/10.1117/1.JEI.24.6.061121, 2-s2.0-84954103109.
10.1117/1.JEI.24.6.061121
Web of Science® Google Scholar
26 Qi D. and Jiang Z., Capsule defects classification based on hierarchical support vector machines, Advanced Materials Research. (2014) 926, 3373–3378.
10.4028/www.scientific.net/AMR.926-930.3373
Google Scholar
27 Bahaghighat M., Akbari L., and Xin Q., A machine learning-based approach for counting blister cards within drug packages, IEEE Access. (2019) 7, 83785–83796, https://doi.org/10.1109/ACCESS.2019.2924445, 2-s2.0-85068960043.
10.1109/ACCESS.2019.2924445
Google Scholar
28 Ma X., Kittikunakorn N., Sorman B., Xi H., Chen A., Marsh M., Mongeau A., Piché N., WilliamsR. O.III, and Skomski D., Application of deep learning convolutional neural networks for internal tablet defect detection: high accuracy, throughput, and adaptability, Journal of Pharmaceutical Sciences. (2020) 109, no. 4, 1547–1557, https://doi.org/10.1016/j.xphs.2020.01.014, 31982393.
10.1016/j.xphs.2020.01.014
CAS PubMed Web of Science® Google Scholar
29 Zhou J., He J., Li G., and Liu Y., Identifying capsule defect based on an improved convolutional neural network, Shock and Vibration. (2020) 2020, 9, 8887723, https://doi.org/10.1155/2020/8887723.
10.1155/2020/8887723
Web of Science® Google Scholar
30 He Y., Song K., Dong H., and Yan Y., Semi-supervised defect classification of steel surface based on multi-training and generative adversarial network, Optics and Lasers in Engineering. (2019) 122, 294–302, https://doi.org/10.1016/j.optlaseng.2019.06.020, 2-s2.0-85067859969.
10.1016/j.optlaseng.2019.06.020
Web of Science® Google Scholar
31 Jing J., Ma H., and Zhang H., Automatic fabric defect detection using a deep convolutional neural network, Coloration Technology. (2019) 135, no. 3, 213–223, https://doi.org/10.1111/cote.12394, 2-s2.0-85063010591.
10.1111/cote.12394
CAS Web of Science® Google Scholar
32 Ren S., He K., Girshick R., and Sun J., Faster R-CNN: towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2017) 39, no. 6, 1137–1149, https://doi.org/10.1109/TPAMI.2016.2577031, 2-s2.0-85019258369, 27295650.
10.1109/TPAMI.2016.2577031
PubMed Web of Science® Google Scholar
33 Cai Z. and Vasconcelos N., Cascade R-CNN: high quality object detection and instance segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2021) 43, no. 5, 1483–1498, https://doi.org/10.1109/TPAMI.2019.2956516, 31794388.
10.1109/TPAMI.2019.2956516
PubMed Web of Science® Google Scholar
34 Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C. Y., and Berg A. C., SSD: single shot multi box detector, Proc. European Conference on Computer Vision (ECCV), 2016, Amsterdam, The Netherlands, 21–37.
Google Scholar
35 Redmon J. and Farhadi A., YOLOv3: an incremental improvement, 2018, https://arxiv.org/abs/1804.02767.
Google Scholar
36 Tian Z., Shen C., Chen H., and He T., FCOS: fully convolutional one-stage object detection, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, Seoul, Korea (South), 9626–9635.
Google Scholar
37 Liu R., Gu Q., Wangand X., and Yao M., Region-convolutional neural network for detecting capsule surface defects, Boletín Técnico. (2017) 55, no. 3, 92–100.
Google Scholar
38 Zhang H., Zhao M., Liu L., Zhong H., Liang Z., Yang Y., Zhou X., Wu Q. M. J., and Wang Y., Deep multimodel cascade method based on CNN and random forest for pharmaceutical particle detection, IEEE Transactions on Instrumentation and Measurement. (2020) 69, no. 9, 7028–7042, https://doi.org/10.1109/TIM.2020.2973843.
10.1109/TIM.2020.2973843
Web of Science® Google Scholar
39 He K., Sun J., and Tang X., Guided image filtering, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2013) 35, no. 6, 1397–1409, https://doi.org/10.1109/TPAMI.2012.213, 2-s2.0-84882376954.
10.1109/TPAMI.2012.213
PubMed Web of Science® Google Scholar
40 Zhang H., Cisse M., Dauphin Y. N., and Lopez-Paz D., Mixup: beyond empirical risk minimization, 2017, https://arxiv.org/abs/1710.09412.
Google Scholar
41 He K., Zhang X., Ren S., and Sun J., Deep residual learning for image recognition, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, Las Vegas, NV, USA, 770–778.
Google Scholar
42 Wang X., Zhang S., Yu Z., Feng L., and Zhang W., Scale-equalizing pyramid convolution for object detection, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, Seattle, WA, USA, 13356–13365.
Google Scholar
43 Lin T., Dollár P., Girshick R., He K., Hariharan B., and Belongie S., Feature pyramid networks for object detection, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, Honolulu, HI, USA, 936–944.
Google Scholar
44 Liu S., Qi L., Qin H., Shi J., and Jia J., Path aggregation network for instance segmentation, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, Salt Lake City, UT, USA, 8759–8768.
Google Scholar
45 Tan M., Pang R., and Le Q. V., Efficient Det: scalable and efficient object detection, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, Seattle, WA, USA, 10778–11078.
Google Scholar
46 Li X., Wang W., Wu L., Chen S., Hu X., Li J., Tang J., and Yang J., Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection, 2020, https://arxiv.org/abs/2006.04388.
Google Scholar
47 Rezatofighi H., Tsoi N., Gwak J., Sadeghian A., Reid I., and Savarese S., Generalized intersection over union: a metric and a loss for bounding box regression, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, Long Beach, CA, USA, 658–666.
Google Scholar
48 Dai Y., Gieseke F., Oehmcke S., Wu Y., and Barnard K., Attentional feature fusion, Proc. IEEE Winter Conference on Applications of Computer Vision (WACV), 2021, Waikoloa, HI, USA, 3559–3568.
Google Scholar
49 Liu Z., Zheng T., Xu G., Yang Z., Liu H., and Cai D., Training-time-friendly network for real-time object detection, 2019, https://arxiv.org/abs/1909.00700.
Google Scholar

Citing Literature

All articles

Surface Quality Automatic Inspection for Pharmaceutical Capsules Using Deep Learning

Abstract

1. Introduction

2. Related Work

2.1. Traditional Detection Approaches

2.1.1. Image Processing-Based Approaches

2.1.2. Traditional Machine Learning-Based Approaches

2.2. Deep Learning-Based Detection Approaches

2.2.1. Image Classification-Based Approaches

2.2.2. Object Detection-Based Approaches

3. Methods

3.1. System Overview

3.2. Image Enhancement

3.3. Data Augmentation

3.4. Defect Detection

3.4.1. Network Architecture

3.4.2. Deformable Convolution Module

3.4.3. Attentional Fusion Module

4. Experiments

4.1. Dataset

4.2. Evaluation Metrics

4.2.1. Training Time

4.2.2. Model Size

4.2.3. Detection Accuracy

4.2.4. Inference Speed

4.3. Implementation Details

4.3.1. Computation Platform

4.3.2. Parameter Setting

4.4. Main Results

4.4.1. Comparison with Baseline

4.4.2. Comparison with Currently Popular Object Detectors

4.4.3. Comparison with the Classical FPN

4.5. Ablation Experiments

4.5.1. Ablation Studies of Data Augmentation

4.5.2. Ablation Studies of Model Components

4.5.3. Ablation Studies of DCN

4.6. Further Analysis and Discussion

5. Conclusions

Conflicts of Interest

Acknowledgments

Open Research

Data Availability

References

Citing Literature

Figures

References

Related

Information