Volume 2022, Issue 1 3941995

Research Article

Open Access

An Improved VSLAM for Mobile Robot Localization in Corridor Environment

Gengyu Ge,

Corresponding Author

Gengyu Ge

[email protected]

orcid.org/0000-0001-9913-0785

School of Information Engineering, Zunyi Normal University, Zunyi 563006, China zync.edu.cn

School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China cqupt.edu.cn

Search for more papers by this author

Zhong Qin,

Zhong Qin

School of Information Engineering, Zunyi Normal University, Zunyi 563006, China zync.edu.cn

Search for more papers by this author

Lilve Fan,

Lilve Fan

School of Information Engineering, Zunyi Normal University, Zunyi 563006, China zync.edu.cn

Search for more papers by this author

Gengyu Ge,

Corresponding Author

Gengyu Ge

[email protected]

orcid.org/0000-0001-9913-0785

School of Information Engineering, Zunyi Normal University, Zunyi 563006, China zync.edu.cn

School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China cqupt.edu.cn

Search for more papers by this author

Zhong Qin,

Zhong Qin

School of Information Engineering, Zunyi Normal University, Zunyi 563006, China zync.edu.cn

Search for more papers by this author

Lilve Fan,

Lilve Fan

School of Information Engineering, Zunyi Normal University, Zunyi 563006, China zync.edu.cn

Search for more papers by this author

First published: 23 May 2022

https://doi.org/10.1155/2022/3941995

Citations: 2

Academic Editor: Qiangyi Li

Share a link

Email
Wechat
Bluesky

Abstract

Localization is a fundamental capability for an autonomous mobile robot, especially in the navigation process. The commonly used laser-based simultaneous localization and mapping (SLAM) method can build a grid map of the indoor environment and realize localization task. However, when a robot comes to a long corridor where there exists many geometrically symmetrical and similar structures, it often fails to position itself. Besides, the environment is not represented to a semantic level that the robot cannot interact well with users. To solve these crucial issues, in this paper, we propose an improved visual SLAM approach to realize a robust and precise global localization. The system is divided into two main steps. The first step is to construct a topological semantic map using visual SLAM, text detection and recognition, and laser sensor data. The second step is the localization which repeats part work of the first step but makes the best use of the prebuilt semantic map. Experiments show that our approach and solutions perform well and localize successfully almost everywhere in the corridor environment while traditional methods fail.

1. Introduction

Nowadays, many commercial service robots are used for transporting goods in restaurants, hotels, and hospitals, especially during the COVID-19 epidemic time. Among them, the localization research for autonomous mobile robot navigation in a man-made structural environment is an ongoing challenge. In indoor scenes, the most used method is using a 2D laser rangefinder and laser-based SLAM to construct a 2D occupancy grid map [1, 2]. Then the mobile robot performs localization task by AMCL algorithm, which is a particle filter solution [3]. However, when the robot is in a long corridor, the mapping result is always shorter than the real scene and the localization is inaccurate. More seriously, it is easy for the localization process to fall into a symmetrical or similarly wrong position [4]. The reason is that, for the laser sensor used in the symmetrical and similar long corridor environment, the data collected at different times are similar. Therefore, the mobile robot can not get accurate pose information when it performs global localization tasks; besides, it is easy for the robot to converge to the wrong unimodal distribution. More than that, the number of particles increases with the size of the map; then the computing and memory costs also increase. Of all the shortcomings, the main one is the limited amount of data information collected by a 2D laser sensor.

In comparison to a 2D laser sensor, cameras provide more dense information such as point features, textures, lines, objects, and texts. It is one of the most potential sensors that can be used to perceive the environment and localize the pose for mobile robotics.

To address these problems, we propose a novel visual SLAM-based method for mobile robot localization, which is assisted by text information and laser data features extracted from the indoor scene, especially the long, symmetrical, and similar corridor environment. Firstly, the mobile robot initializes the system at the starting position. Secondly, it moves along the middle line of the corridor and constructs a features-based visual map using the visual SLAM method. Thirdly, the robot stops when passing through door areas, rotates the camera toward the doorplate, and records the text content together with the current keyframe node. Lastly, the mobile robot navigates and localizes itself according to the previously built map. The whole framework of the mapping and localization system is depicted in Figure 1.

Details are in the caption following the image — Open in figure viewer PowerPoint

This paper is organized as follows. The related work of visual SLAM, text detection, and recognition are discussed in Section 2. A proposed method using visual SLAM for localization is presented in Section 3. The experiments and discussion are described in Section 4 and Section 5 concludes the paper.

2. Related Work

2.1. Visual SLAM

In the field of visual SLAM, feature-based indirect approaches and photometric-based direct methods are currently two mainstream techniques. The former one extracts salient image features, like points, lines, and planar features, to realize the target. By minimizing the reprojection errors of the matched feature pairs, the camera motion and the depth of map points can be computed. PTAM [5] firstly used the parallel threads to solve the visual SLAM problem; it estimated pose in the tracking thread and refined camera motion in the local mapping thread. ORB-SLAM series [6–8] adopted the idea of parallel threads, added a loop closure thread, and used ORB features in the overall process. It is a state-of-the-art solution in the research field and can be used directly in applications.

The direct methods solve pose estimation by minimizing the pix-level intensity errors from two adjacent images. LSD-SLAM [9] built a semidense map compared to the sparse points map by feature-based SLAM. However, it still needed features extraction for loop closure purposes. SVO [10] used a depth filter model to estimate the depth and filtered outliers. It tracked sparse pixels using the FAST corners and modeled the triangulated depth observations with a Gaussian Uniform distribution. DSO was direct sparse odometry and a probabilistic model without computing keypoints or descriptors [11].

2.2. Text Detection and Recognition

Text signs in the indoor environment have semantic content, which makes easy to realize human-computer interaction. To achieve and understand the scene text information, the OCR techniques usually have two phases, text detection and text recognition, respectively [12].

Text detection is to distinguish text regions from the background of a captured image. Traditional methods based on manually designed features perform well when there exists an obvious construct between the text region and the background part. The stroke width transform (SWT) employed a local image operator to compute the width of approximately textual pixels, searched letter candidates, and grouped letters into text lines [13]. The maximally stable extremal region (MSER) had a real-time detection performance [14]. In addition, it was robust to blur, illumination, and color variation. The morphology-based method extracted high contrast areas as text line candidates [15] and was also invariant to different image changes like lighting, translation, rotation, and complicated backgrounds. CTPN [16] was a connectionist text proposal network and used CNN to detect a text line in a sequence of fine-scale text proposals which were then connected naturally with RNN. The other famous works based on deep learning networks were EAST [17] and IncepText [18].

There are many open-source OCR engines and software using traditional text recognition algorithms can be used, such as Tesseract, Google Docs OCR, and Transym [19]. These methods have relatively high accuracy when the text region has large contrast with the backgrounds and simple text lines. In addition, it does not need a GPU configuration. If the scene texts are multiple fonts, colorful, and complicated backgrounds, then the deep learning approaches based on GPU will be essential. There are two mainstream solutions based on CNN and RNN. The first one uses CNN to extract image features and combines RNN with connectionist temporal classification (CTC) to predict sequence, like CRNN [20]. Another one employs CNN, the Seq2Seq model, and Attention framework [21], which includes encoder and decoder, which adopts ideas from machine translation techniques.

3. Method

When the mobile robot comes into a new environment for the first time, it needs to construct a map and then navigate in the environment according to the built map. We use a monocular camera to perform visual SLAM for both mapping and localization purposes. Different from the traditional laser-based SLAM method, our method uses laser sensor data only for basic geometry features extraction, such as door area, middle of the corridor, or end of the corridor. In addition, text information extracted from the doorplates region is used for semantic localization. The detailed descriptions are as follows.

3.1. Moving Strategy

In the map building phase, the mobile robot mainly runs in the SLAM mode utilizing the ORB-SLAM framework. It will create a sparse feature map of the environment from scratch or incrementally update an existing map. However, most of the visual SLAM solutions are used for handheld mode or driverless scenes. In these cases, the moving trajectory of the camera will always be artificially moved on a fixed route and will certainly not lose the tracking of the route according to a prebuilt map. When it comes to the applications of autonomous mobile robots, for instance, the delivery robot moves in a corridor and needs to avoid obstacles when encountering pedestrians; changing the moving route will lose visual feature tracking. Consequently, the mobile robot needs to move within a fixed route range, preferably the middle line of the corridor, as shown in Figure 2, where a robot passes through a doorway area. Three routes indicated by red dotted lines are shown in Figure 2(a), and three positions are indicated by red points. There are only a few matched pairs between the visual features extracted from the image captured in position A and those in position C. Besides, the cameras oriented to different directions also have different matching results, even if in the same position. Therefore, if the mobile robot moves along the left red dotted line as the route and maps the corridor using visual SLAM, it will fail to localize itself when moving along the right red dotted line in the subsequent process. Ideally, the mobile robot moves along the middle red dotted line in Figure 2(a).

To ensure the mobile robot moves along a relatively fixed route, a laser range finder is used to measure the ranges from the center of the sensor to the obstacle, thus analyzing a high-precision position relative to the walls on both sides. As shown in Figure 2(b), the left detected distance plus the right one equals the width of the corridor.

(1)

where d_L means the shortest distance between the left wall and the center of the laser sensor, and d_R means the one of another side. When the robot passes through the doorway area, the following equation holds:

(2)

Consequently, when the mobile robot determines that it is in the corridor area, it is easy for the robot to move along the middle line of the corridor. The robot can slightly adjust its position so that the data measured by the laser sensor satisfy the following equation:

(3)

The next problem that needs to be solved is to determine whether the robot is in the corridor area. In most indoor scenes, the laser sensor can get whole valid distance data around the robot; if not, the robot is most likely in the corridor area, as shown in Figure 2(c) where the laser sensor gets two invalid range data regions. The blue areas belong to the range that can be covered by the scanning radius of the laser sensor.

Given a 2D laser range finder that has a measurement angle range of 360 degrees, the maximum measuring distance is defined as D_max, the minimum measuring distance as D_min, and the angular resolution as ϕ_min. Then the raw data can be described by the following formula:

(4)

where N equals the value 360/ϕ_min and represents the total number of scanning points P = {p₁, p₂, …, p_N}. An invalid scan area angle θ_null has a dynamically variable range. The maximum value θ_max is got when the laser sensor mounted on the robot is close to the wall, while the minimum value θ_min is got when the robot is located on the middle line along the corridor. The two values are defined as

(5)

If the mobile robot is moving in a corridor, then (6) holds.

(6)

We count the constant number of invalid return distances and compute the invalid area angle θ_null according to

(7)

where ϕ_i means that the i-th angle whose return distance value from the detected point p_i is invalid and ϕ_j represents the j-th. The constant angle range between the two also has invalid return distance values.

It is easy to find that the mobile robot is at the end of the corridor if only one invalid area satisfies (6). Similarly, if two invalid areas satisfy (6) and are distributed like Figure 2(c), then the robot is most likely located in the middle of the corridor.

According to the above information extracted from the laser scanning data, the mobile robot can autonomously move to the free area along a preset fixed route. In addition, the robot can get a coarse position estimation relative to the corridor.

3.2. Build an Image Map

The next work is using visual SLAM to construct a visual features map along the fixed route which is got based on the previous laser data.

The motion of a single camera should be solved by the principle of epipolar geometry, as shown in Figure 3. Define x₂ as the coordinate on the normalized plane of the pixel point p₂ in the current image I₂, and x₁ is the pixel p₁ in the previous image I₁. The two points are matched by visual feature; then, the following holds:

(8)

where R means the camera rotation motion, t is the translation, E represents the essential matrix, and F is the fundamental matrix. If the parameters of the essential matrix are calculated, then the basic matrix is easy to solve. Similarly, the rotation and translation matrices can be obtained by decomposing the essential matrix.

The essential matrix E is a 3 × 3 matrix. Consider a pair of matching points whose normalized coordinates are

and

. According to the polar constraint, the following equation holds:

(9)

Expand matrix E and write it in the form of vector; then put all the points into one equation to become a system of linear equations as (10) shows. u_i and v_i represent the i-th feature point. The coefficient matrix of linear equations consists of the position of characteristic points, with a size of 8 × 9. If the matrix composed of eight pairs of matching points satisfies the condition of rank 8, then the elements of E can be solved by this equation.

(10)

According to the estimated essential matrix E and the camera motion R, t can be recovered. This process is obtained by singular value decomposition (SVD), as follows:

(11)

where U and V are orthogonal matrices, Σ is a singular value matrix, and there are four possible solutions in the SVD decomposition. Point P in world coordinate system has a positive depth in both cameras, so the depth of the point under the two cameras can be used as the basis for judging the positive solution. The final decomposition result is shown in

(12)

If all feature points in the scene fall on the same plane, then motion estimation can be carried out through homography and the following equation holds:

(13)

Homography matrix is related to rotation, translation, and plane parameters. Solving motion is similar to essence matrix E. According to matching point pairs and (14) and (15), the H is decomposed to calculate rotation and translation.

(14)

A set of matching point pairs can construct three constraints (only two are linearly independent), so the homography matrix with a degree of freedom of 8 can be calculated through four pairs of matching feature points (these feature points cannot have three collinear points), that is, solve the following linear equations (when h 9 = 0, the right side is zero).

(15)

In monocular SLAM, the depth information of pixels cannot be obtained only through a single image, and the depth of map points needs to be estimated by triangulation. Triangulation refers to determining the distance of the same point by observing the included angle of the same point at two places. According to (8) and (15), we can calculate the world coordinate value of map points. However, due to the influence of noise, the two lines often cannot intersect. Therefore, it can be solved by the least square method.

(16)

We utilize the ORB-SLAM open-source library which extracts the ORB features to match the adjacent images. ORB features combine the oriented FAST detector with a rotated BRIEF descriptor and have low computational consumption, compared with SIFT and SURF features. To add an efficiently computed orientation to the FAST keypoint, an intensity centroid is designed for computing a vector. The moments of a patch around the FAST keypoint are defined as

(17)

where the value of p and q can only be limited to 0 or 1. Then, the intensity centroid is computed from those moments:

(18)

The orientation vector can be constructed by connecting two points, one of them is the keypoint corner’s center, and the other is the centroid of the patch. Then, the orientation is computed as shown in the following:

(19)

The work after keypoint detection is feature description which is convenient for feature matching. An original BRIEF descriptor is variant to in-plane rotation. It is a binary description of an image patch using a binary intensity test τ which is defined as follows:

(20)

where p(x) and p(y) are the intensity at point x and point y. There are usually 256 pairs of points selected to express a keypoint. Then, the descriptor is defined as a vector of 256 binary tests.

(21)

where n equals 256. Then, a learning method that uses PCA or other dimensionality reduction strategies is utilized to assist in realizing a rotation-invariant BRIEF descriptor. The combination of oFAST and rBRIEF is called ORB feature, and an example of the ORB features and matching pair is shown in Figure 4, which has eliminated the mismatched point pairs by using the RANSAC algorithm [22].

Then the depth of map points in the world coordinate can be computed by using the triangulation method. The constructed map of a corridor is shown in Figure 5, which has sparse points. The blue trapezoidal blocks are camera poses that represent the keyframes of those captured images and will be saved as one part of an image map. The green one means the current frame.

3.3. Extend to a Semantic Map

When the mobile robot performs a cargo transportation task and needs to interact with the user, it is more convenient to use the semantic map closed to human language expression and understanding. To realize this function, text detection and recognition techniques must be used to achieve the text-level information.

The text characters in the indoor environment are usually on the doorplate, room nameplate, signs, or billboard on the wall. The most common is the room number, which can uniquely determine the location of a room. Consequently, the mobile robot just needs to detect and recognize the text information in the door area and extract the room number; then the position of the robot can be determined.

However, the camera mounted on the robot platform captures images in real time and continuously. Most of the images do not have useful text information; processing these images will be a waste of time and computing power. Besides, the text information of a room number would probably only be partially captured in one image; this will lead to misjudgment of the whole semantic result. To solve this problem, we use the laser scanning data to get a preliminary judgment where the doorway area is in the corridor. Then the robot moves to a position facing the door and stops to capture an image with the best perspective. Lastly, detect the text region and recognize the correct room number.

We use two traditional approaches and one deep learning method to detect text regions in a corridor environment, respectively. Figure 6 shows the three detected results; it is obvious that the deep learning method has the best and most accurate detected box. Figure 6(a) is the detected result by using the morphological method which has wrong boxes including the door handle and other text-like signs. Figure 6(b) uses the MSER approach and includes texture area affected by illumination. Figure 6(c) is a processed result from Figure 6(b) by using the nonmaximum suppression (NMS) algorithm [23]. The red rectangle in Figure 6(d) is the ideal result which is got by using the deep learning approach.

The subsequent recognizing phase is relatively easy if a correct text box is detected. In addition, the text information we need is only Arabic numerals. Many open-source OCR solutions can realize this function and have high accuracy [24]. However, to get a high overall text detection and recognition accuracy, we choose to utilize an online deep learning scheme which is called EasyDL from Baidu company [25], https://ai.baidu.com/tech/ocr/general. Thus, we do not need to use GPU as the deep learning processing module, which has high power consumption and cost. Instead, only a WIFI module is needed to access the Internet.

After achieving the text information, an extended semantic map can be accomplished and is shown in Figure 7. The node represents a geo-tagged place which has doors or an intersection, and the edge means a passable route. The images in the node are decided according to the keyframe selection strategy of ORB-SLAM. Three keyframes in each doorway or intersection area are chosen to generate a node. If a corridor is in the form of a straight line or circular, then the semantic map can be represented by the data structure of a two-way linked list. Other cases can be expressed as multilayer quadtree.

3.4. Path Following

When a semantic map is constructed and the robot’s initial pose is known, the navigation task is a path following problem. The path planning algorithm varies with the form of a map. The commonly used data structures are a two-way linked list, multifork tree, and graphs. Consequently, it is a look-up problem and finding the shortest path.

Compared with nodes in a social network or an electronic map of a city, the geo-tagged nodes in an indoor environment used by a personal robot are usually very few. Therefore, search algorithms commonly used in data structures are sufficient.

3.5. Localization Mode

In the localization or navigation mode using the visual SLAM method, the system firstly loads the previously built sparse feature map, then extracts features of a newly captured image, and matches with those from the image map. The difficult thing is not the image matching, but how to capture images in a fixed route and direction that the robot ever traveled before. The method is using laser data to get a coarse place judgment according to (1)–(8).

Then the localization problem for an autonomous mobile robot becomes the problem of vision based global localization using a visual vocabulary [26]. For a geo-tagged place, the robot needs to perform a text retrieval task. Equation (22) shows a mathematical form to express an image. The image is expressed as a vector v_A which consisted of N-words and the weight η_i is computed from the product of term frequency TF_i and the inverse document frequency I DF_i [27].

(22)

The similarity of a new image translated to a form of bag-of-words with the image database can be computed using (23). When the robot moves to a doorway area, laser data feature extraction, visual image vocabulary matching, and text detection and recognition are combined to determine the localization result.

(23)

4. Experiments and Discussion

Our experimental platform is a two-wheel differential driving mobile robot equipped with a RPLIDAR A2 and a single perspective camera. The laser sensor has a 360-degree scanning rotation range and 8–12 meters’ distance. The CPU is ARM Cortex-A72 and the RAM memory size is 8 GB. The experimental environment is a long, symmetrical, and similar corridor inside a multiple-story hotel. The length of the corridor is 45 meters and the width is 1.85 meters. Figure 8 shows the corridor and platform. The laser sensor is used for collision detection and basic structural features extraction.

4.1. Lateral Deviation of Fixed Route

The mobile robot moves follow a fixed route which is the middle line along the corridor. According to the laser data computing and features extraction, the mobile robot adjusts its moving direction; a trajectory is shown in Figure 9.

From Figure 9, we find that the precision of laser data is very high, which is suitable for applications with accurate distance requirements.

4.2. Text Detection and Recognition

We collected 150 images from 30 doorway areas in a fifth-floor corridor inside the hotel. Five images in each area are collected from different perspectives and positions, but all of them are roughly facing the door. Because GPU device is not used in the microcomputer system, the offline deep learning model is not adopted for comparative experiments. Firstly, different text detection methods are tested and the average detected text boxes per image are counted. Table 1 shows the result which demonstrates that traditional methods detect many wrong areas as the text regions while online deep learning method performs better. Many lines or outlines in the image are misidentified as text information like numerals or characters. Consequently, WIFI based cloud platform solution is the best solution.

1. Comparison of text detection results.

Method	Average detected boxes per image
Morphological method	7.4
MESR	9.2
MSER + NMS	6.9
Ours	1.1

The second experiment is a comparison of the text recognition methods. The accuracy or successful recognition rate is calculated by counting the number of correct character recognitions out of the total tests (150 recognition experiments from 30 doorway areas). We used traditional method solution Tesseract and the online deep learning method to recognize the previously detected text boxes, respectively. Table 2 shows the recognition results. Compared with the traditional method, our online deep learning method achieved high accuracy.

2. Recognition results.

Method	Total number of recognitions	Number of correct recognitions	Accuracy (%)
Tesseract	150	83	55.3
Ours	150	148	98.7

4.3. Global Localization

In order to achieve a quantitative evaluation of the global localization performance, we placed the robot in 20 different positions in the corridor and used four different methods to perform global localization task, respectively. Due to the different moving strategies in different methods, we set the task ending condition for each positioning process. The AMCL method corresponded to condition when the particles were converged to a single cluster. The ORB-SLAM and text identification methods allowed the robot to rotate one circle in place. In our proposed method, the robot was allowed to adjust its position near the middle line of the corridor, and it needed to move 0.24 meters and rotate 90 degrees on average which were the acceptable range and the requirement of our mobile strategy. The results are shown in Table 3. It is obvious that our approach achieves a higher localization success rate when compared with other methods.

3. Localization results from different methods.

Method	Test positions	Success times	Correct rate (%)
AMCL	20	9	45.0
ORB-SLAM	20	14	70.0
Text	20	4	20.0
Ours	20	19	95.0

5. Conclusions

This paper presented a novel mapping and localization approach for an autonomous mobile robot navigating in a long corridor environment. Laser data was used to keep a fixed moving route and extract features for coarse place judgment. Visual SLAM was used to get visual localization and text information for a semantic level purpose.

Although the proposed approach can solve most of the corridor environment localization and semantic interaction with users, the situation in which the mobile robot is inside a specific room was not considered. Therefore, in our future research work, we will pay attention to the topological metric map which can cover all indoor environments. In addition, the 5G communication and cloud computing technology can be used to achieve multiple semantic information in the field of robotics; thus, the mobile robot does not need to configure GPU and other large computing platforms.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported by the Natural Science Research Project of Guizhou Province Education Department (Grant no. KY[2017]023, Guizhou Mountain Intelligent Agricultural Engineering Research Center) and Doctoral Fund Research Project of Zunyi Normal University (Grant no. ZS BS[2016]01, Aerial Photography Test and Application of Karst Mountain Topography).

Open Research

Data Availability

The labeled dataset used to support the findings of this study is available from the corresponding author upon request.

References

1 Grisetti G., Stachniss C., and Burgard W., Improved techniques for grid mapping with rao-blackwellized particle filters, IEEE Transactions on Robotics. (2007) 23, no. 1, 34–46, https://doi.org/10.1109/tro.2006.889486, 2-s2.0-33947422034.
10.1109/TRO.2006.889486
Web of Science® Google Scholar
2 Hess W., Kohler D., Rapp H., and Andor D., Real-time loop closure in 2D LIDAR SLAM, proceedings of the IEEE International Conference on Robotics and Automation, May 2016, Stockholm, Sweden, IEEE, 1271–1278, https://doi.org/10.1109/icra.2016.7487258, 2-s2.0-84977545823.
10.1109/icra.2016.7487258
Google Scholar
3 Guan R. P., Ristic B., Wang L., and Palmer J. L., KLD sampling with Gmapping proposal for Monte Carlo localization of mobile robots, Information Fusion. (2019) 49, 79–88, https://doi.org/10.1016/j.inffus.2018.09.003, 2-s2.0-85054165962.
10.1016/j.inffus.2018.09.003
Web of Science® Google Scholar
4 Ge G., Zhang Y., Wang W., and Jiang Q. L. Y., Text-MCL: autonomous mobile robot localization in similar environment using text-level semantic information, Machines. (2022) 10, no. 3, https://doi.org/10.3390/machines10030169.
10.3390/machines10030169
Web of Science® Google Scholar
5 Klein G. and Murray D., Parallel tracking and mapping for small AR workspaces, proceedings of the IEEE and ACM international symposium on mixed and augmented reality, November 2007, Nara, Japan, IEEE, 225–234, https://doi.org/10.1109/ismar.2007.4538852, 2-s2.0-50649108974.
10.1109/ismar.2007.4538852
Google Scholar
6 Mur-Artal R., Montiel J. M. M., and Tardos J. D., ORB-SLAM: a versatile and accurate monocular SLAM system, IEEE Transactions on Robotics. (2015) 31, no. 5, 1147–1163, https://doi.org/10.1109/tro.2015.2463671, 2-s2.0-84988339174.
10.1109/TRO.2015.2463671
Web of Science® Google Scholar
7 Mur-Artal R. and Tardos J. D., Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras, IEEE Transactions on Robotics. (2017) 33, no. 5, 1255–1262, https://doi.org/10.1109/tro.2017.2705103, 2-s2.0-85020739341.
10.1109/TRO.2017.2705103
Web of Science® Google Scholar
8 Campos C., Elvira R., Rodriguez J. J. G., Montiel J M., and Tardos J D., ORB-SLAM3: an accurate open-source library for visual, visual-inertial, and multimap SLAM, IEEE Transactions on Robotics. (2021) 37, no. 6, 1874–1890, https://doi.org/10.1109/tro.2021.3075644.
10.1109/TRO.2021.3075644
Web of Science® Google Scholar
9 Engel J., Schöps T., Cremers D., and Lsd-Slam, LSD-SLAM: large-scale direct monocular SLAM, Computer Vision - ECCV 2014, proceedings of European conference on computer vision, September 2014, Zurich, Switzerland, Springer, 834–849, https://doi.org/10.1007/978-3-319-10605-2_54, 2-s2.0-84906489259.
10.1007/978-3-319-10605-2_54
Google Scholar
10 Forster C., Zhang Z., Gassner M., and Werlberger M. D., SVO: semidirect visual odometry for monocular and multicamera systems, IEEE Transactions on Robotics. (2017) 33, no. 2, 249–265, https://doi.org/10.1109/tro.2016.2623335, 2-s2.0-85006869803.
10.1109/TRO.2016.2623335
Web of Science® Google Scholar
11 Engel J., Koltun V., and Cremers D., Direct sparse odometry, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2018) 40, no. 3, 611–625, https://doi.org/10.1109/tpami.2017.2658577, 2-s2.0-85041956460.
10.1109/TPAMI.2017.2658577
PubMed Web of Science® Google Scholar
12 Memon J., Sami M., Khan R. A., and Uddin M., Handwritten optical character recognition (OCR): a comprehensive systematic literature review (SLR), IEEE Access. (2020) 8, 142642–142668, https://doi.org/10.1109/access.2020.3012542.
10.1109/ACCESS.2020.3012542
Web of Science® Google Scholar
13 Epshtein B., Ofek E., and Wexler Y., Detecting text in natural scenes with stroke width transform, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2010, San Francisco, USA, IEEE, 2963–2970, https://doi.org/10.1109/cvpr.2010.5540041, 2-s2.0-77955991043.
10.1109/cvpr.2010.5540041
Google Scholar
14 Neumann L. and Matas J., Real-time scene text localization and recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2012, USA, IEEE, Rhode Island, 3538–3545, https://doi.org/10.1109/cvpr.2012.6248097, 2-s2.0-84866700071.
10.1109/cvpr.2012.6248097
Google Scholar
15 Wu J.-C., Hsieh J.-W., and Chen Y.-S., Morphology-based text line extraction, Machine Vision and Applications. (2008) 19, no. 3, 195–207, https://doi.org/10.1007/s00138-007-0092-0, 2-s2.0-39749093745.
10.1007/s00138-007-0092-0
Web of Science® Google Scholar
16 Tian Z., Huang W., He T., He P., and Qiao Y., Detecting text in natural image with connectionist text proposal network, Proceedings of the European Conference on Computer Vision, October 2016, Amsterdam, Netherlands, Computer Vision - ECCV 2016, 56–72, https://doi.org/10.1007/978-3-319-46484-8_4, 2-s2.0-84990030093.
10.1007/978-3-319-46484-8_4
Google Scholar
17 Zhou X., Yao C., Wen H., Wang Y., Zhou S., He W., and Liang J., East: an efficient and accurate scene text detector, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, July 2017, Honolulu, Hawaii, USA, 5551–5560.
Google Scholar
18 Yang Q., Cheng M., Zhou W., Chen Y., Qiu M., Lin W., and Chu W., Inceptext: a new inception-text module with deformable psroi pooling for multi-oriented scene text detection, 2018.
Google Scholar
19 Yankey J. and Ernest O., An automatic number plate recognition system using opencv and tesseract ocr engine, International Journal of Computer Application. (2018) 180, no. 43, 1–5, https://doi.org/10.5120/ijca2018917150.
10.5120/ijca2018917150
Google Scholar
20 Shi B., Bai X., and Yao C., An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2017) 39, no. 11, 2298–2304, https://doi.org/10.1109/tpami.2016.2646371, 2-s2.0-85032274465.
10.1109/TPAMI.2016.2646371
PubMed Web of Science® Google Scholar
21 Ly N. T., Nguyen C. T., and Nakagawa M., An attention-based row-column encoder-decoder model for text recognition in Japanese historical documents, Pattern Recognition Letters. (2020) 136, 134–141, https://doi.org/10.1016/j.patrec.2020.05.026.
10.1016/j.patrec.2020.05.026
Web of Science® Google Scholar
22 Canaz Sevgen S. and Karsli F., An improved RANSAC algorithm for extracting roof planes from airborne lidar data, Photogrammetric Record. (2020) 35, no. 169, 40–57, https://doi.org/10.1111/phor.12296.
10.1111/phor.12296
Web of Science® Google Scholar
23 Hosang J., Benenson R., and Schiele B., Learning non-maximum suppression, Proceedings of the IEEE conference on computer vision and pattern recognition, July 2017, Honolulu, Hawaii, IEEE, 4507–4515, https://doi.org/10.1109/cvpr.2017.685, 2-s2.0-85041929620.
10.1109/cvpr.2017.685
Google Scholar
24 Hegghammer T., OCR with tesseract, amazon textract, and Google document AI: a benchmarking experiment, Journal of Computational Social Science. (2021) 4, 1–22, https://doi.org/10.1007/s42001-021-00149-1.
10.1007/s42001-021-00149-1
Google Scholar
25 Du Y., Yang R., Chen Z., and Wang L. X. X., A deep learning network-assisted bladder tumour recognition under cystoscopy based on Caffe deep learning framework and EasyDL platform, International Journal of Medical Robotics and Computer Assisted Surgery. (2021) 17, no. 1, 1–8, https://doi.org/10.1002/rcs.2169.
10.1002/rcs.2169
PubMed Web of Science® Google Scholar
26 Niu J. and Qian K., Robust place recognition based on salient landmarks screening and convolutional neural network features, International Journal of Advanced Robotic Systems. (2020) 17, no. 6, 172988142096696, https://doi.org/10.1177/1729881420966966.
10.1177/1729881420966966
Web of Science® Google Scholar
27 Bampis L. and Gasteratos A., Sequence-based visual place recognition: a scale-space approach for boundary detection, Autonomous Robots. (2021) 45, no. 5, 1–14.
10.1007/s10514-021-09984-7
Web of Science® Google Scholar

Citing Literature

All articles

An Improved VSLAM for Mobile Robot Localization in Corridor Environment

Abstract

1. Introduction

2. Related Work

2.1. Visual SLAM

2.2. Text Detection and Recognition

3. Method

3.1. Moving Strategy

3.2. Build an Image Map

3.3. Extend to a Semantic Map

3.4. Path Following

3.5. Localization Mode

4. Experiments and Discussion

4.1. Lateral Deviation of Fixed Route

4.2. Text Detection and Recognition

4.3. Global Localization

5. Conclusions

Conflicts of Interest

Acknowledgments

Open Research

Data Availability

References

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley