简介概要

Plane extraction for navigation of humanoid robot

来源期刊：中南大学学报(英文版)2011年第3期

论文作者：张彤肖南峰

文章页码：627 - 632

Key words：humanoid robot; navigation; line segments splitting; region growing; plane extraction; depth image

Abstract: In order to make the humanoid robot walk freely in complicated circumstance, the reliable capabilities for obtainingplane information from its surroundings are demanded. A system for extracting planesfrom data taken by stereo vision was presented. After the depth image was obtained, the pixels of each line were scanned and split into straight line segments. The neighbouring relation of line segments was kept in link structure. The groups of three line segments were selected as seed regions. A queue was maintained for storing seed regions, and then the plane region was expanded around the seed region. The process of region growing continued until the queue of seed regions was empty. After trimming, the edges of the planes became smooth. In the end, extracted planes were obtained. In the experiment, two models were used: pipe and stairs. Two planes in pipe model and six planes in stairs model were extracted exactly. The speed and precision of algorithm can satisfy the demands of humanoid robot’s navigation.

J. Cent. South Univ. Technol. (2011) 18: 627-632

DOI: 10.1007/s11771-011-0740-4

Plane extraction for navigation of humanoid robot

ZHANG Tong(张彤)^{1, 2}, XIAO Nan-Feng(肖南峰)¹

1. School of Computer Science and Engineering, South China University of Technology,

Guangzhou 510641, China;

2. Computer Department, Guangdong Police Officers College, Guangzhou 510230, China

? Central South University Press and Springer-Verlag Berlin Heidelberg 2011

Key words: humanoid robot; navigation; line segments splitting; region growing; plane extraction; depth image

1 Introduction

Humanoid robots were considered to have more advantages of moving in environments designed for real humans than traditional wheeled robots. The humanoid robots had more degrees of freedom and could use more actions while walking, so their movement was more flexible. They could step on or over obstacles [1], climb up or down stair [2], turn around and even crawl underneath the object [3], etc. In complicated environments, the foundational problem of walking freely for the humanoid robots was navigation. At present, some famous humanoid robot platforms such as ASIMO of HONDA [4], WABIAN-2 of Waseda University, and HRP-3 of AIST have been developed for verifying all kinds of methods of controlling and navigation. Although related technologies were reported, the humanoid robots were still premature to use in actual life, and there were practical problems. One of them was the stability while walking [5-6]. The inherent instability of humanoid robots should be considered firstly, which was the main problem to make it difficult in practical applications.

The navigation system generated a path that a robot moved along from a current position to a target position based on the mapping information around the robot. In the previous navigation techniques, wheeled robots were studied [7-8]; however, there were few researches on the navigation system for the humanoid robots. The movement of a humanoid robot could be split into limited actions and each action arranged deferent foot placements [9]. The deferent sequences of actions would lead to deferent paths. A humanoid robot was equipped with two cameras as its eyes. These actions requiredreliable perceptual capabilities for obtainingits surrounding information. It was assumed that there were two kinds of objects in the circumstance where the humanoid robots moved [10]. One was the object having planar surface that the robot’s foot could put on, such as floor, low and flat object, and stairs. The other was the obstacle which the robot could not step over and should try its best to avoid, such as wall and large object. It was important to distinguish the objects with flat surface from obstacles for the safe navigation of a humanoid robot. To solve this problem, the plane must be extracted from surrounding information. Inthis work, a system is presented for extracting planesfrom data taken by stereo vision.

OKADA et al [11] proposed that 3D Hough Transformation was used for candidate extraction from depth map information and the plane segment candidates were fitted into the depth map in order to detect the partial plane segment. CHEN et al [12] presented a new method of segmenting planar regions when an uncalibrated camera underwent pure translation. Both methods could segment the plane from original images, but the information of plane in real world would not be obtained. Therefore, these methods were not fit for the humanoid robots. The method in this work would not only extract the planes from images but also would get the information about the planes such as position, angle and height.

2 System for extracting planes from data taken by stereo vision

2.1 Overall processing structure

A technique was adopted for rapidly extracting planes from a depth image. An overview of the algorithm is given in Fig.1. Since the depth image obtained from the original images always suffered from noise, the median filter was used to reduce the noise level before the actual extraction took place. It was supposed that the noise in the image was approximately stationary, so the estimation of the image of the noise variance could be done by averaging the estimates of the noise variance of each pixel. The root-mean-square-error was adopted for each pixel.

Fig.1 Processing structure of plane extraction

The next step of the algorithm was splitting pixels of each line into straight line segments. A simple link-based data structure had been developed to keep the link relation with the neighborhood of each line segment. The actual extraction was a region growing process. Some line segments were selected firstly as seed regions according to the condition whether they satisfied an optimality criterion or not, and then the plane region was expanded around the seed region. Each line segment that was the neighbor of the planar region could be decided whether it would be added to the planar surface or not by a very simple test. The region growing continued until no new line segment was added.

In order to generate clean edges between regions, a step called trimming plane edges was applied after all seed regions finished growing. A list of first-order polynomials and a label image were the final result. The polynomials described the borders of all planar regions and in the label image each pixel contained the number of the region that it belonged to.

2.2 Obtaining depth image

Two cameras that were equipped on a humanoid robot’s head as eyes and high performance computer formed the stereovision system. In order to acquire the depth information, two images gathered from identical object by cameras were processed. The processing mostly included video frequency trapping, camera calibration [13], image pre-processing, feature extraction, stereo matching [14] and three-dimensional reconstruction [15]. Camera calibration was the process of determining the interior and exterior parameters of cameras. The interior parameters included principal points, principal distances, pixel size and distortion coefficient etc. The exterior parameters were the position and orientation of two cameras with respect to each other. Through stereo matching, difference positions of identical object could be found in two images. Characteristic selecting, criterion determination and algorithm achieving were three important procedures. The depth information of the object could be computed by the disparity of two images using the geometrical relationship. A depth image was the course of visualizing the depth information. The procedure of generating the depth image was discussed and only the result to verify the following algorithm was adopted.

2.3 Noise estimation

In order to make the planar function generated by this algorithm more fit for the input pixels, it was necessary to know how to express the noise existing in the depth image. It was supposed that the noise in the image was stationary, and the estimation of the whole image of the noise variance could be computed by averaging the estimation of the noise variance of each pixel. To compute the noise variance at each pixel p, a plane was performed, z=Ax+By+C, fitting the 3×3 neighbourhood of p. If the pixel was out of the preset maximal and minimal range value, this pixel was discarded; otherwise, it was thought that the error in the plane was caused primarily by noise. was the root mean square error of the plane fitting in the 3×3 neighbourhood of p:

(1)

Then, the image noise variance could be expressed as

(2)

2.4 Splitting line segment

The classical splitting method was improved where one curve was split into two parts at the point furthest away from the chord between its two end points. The process of splitting continued until the value of distance was under the threshold. The distance of original algorithm was the perpendicular distance from one point to chord, and it was adjusted to the distance of z-axis in order to simplify the computation. z(x, y_i) was the function of straight line through two end points (x₀, y_i, z₀) and (x₂, y_i, z₂) and z-axis distance from a point (x₁, y_i, z₁) to the straight line was counted out by |z₁-z(x₁, y_i)|. After splitting line segment, the line segment, s, was stored by a record containing x values of the start and end points x_s and x_e, the slope k and the intercept b. In order to help the following region growing, the neighbouring relation of line segments should be saved in the record, including the left and right line segments in the same row and upper and bottom ones in the previous and next row. The left line segment was the one in which the point (x_s-1, y_i) was the last pixel and the right line segment was the one in which the point (x_e+1, y_i) was the first pixel. The number of neighbours in previous or next row was not constant. All line segments in the previous or next row that contained the pixels between x_s and x_e were the neighbors of s. Two pointers were needed to find the first neighbor and the last one in the previous or next row. So, in the end, six pointers were stored to find all the neighbors.

2.5 Seed region selection

A planar surface S could be described by a first-order polynomial:

z=Ax+By+C (3)

If the pixels in the i-th row were scanned, pixels belonging to S were

z=Ax+By_i+C=Ax+B_i (4)

It was obvious that they formed a straight line in the x-z plane. The straight lines of different scanned rows had the same slope and different intercepts. In other words, all pixels in a straight line of x-z plan must belong to the same plane. Now some line segments were chosen as seed regions for the following growing process. In general, a certain plane was decided by at least two lines. Under the consideration of the reliability, three line segments were chosen in three adjoining rows to decide the seed plane. Because short line segment might be near the edge, the length of three line segments was limited and it was demanded that its length was larger than the given threshold.

After checking the length, the optimal three line segments were chosen as the seed regions from the survivors. The optimality criterion was that the plane decided by three line segments had the least error. In other words, three line segments were chosen in the same plane. From the above, the description of pixels could be obtained in (i+1)-th and (i+2)-th row belonging to S:

(5)

(6)

If the three line segments were in the same plane, they had the same slope but different intercepts and the difference of intercept was always B. Therefore, whether the slope of each line segment and the difference of intercept were close could be tested and the closest groups were selected as seed regions. For each line segment, it was expressed by

(7)

The scope and intercept of the i-th, (i+1)-th and (i+2)-th line segments were

(8)

(9)

So, Δb_i was defined as the difference of intercept:

(10)

Queue storing seed regions were ordered when it did not exceed the given threshold:

(11)

2.6 Region growing

After a queue of seed regions had been built, the j-th record was deleted from the queue as the seed region . If still did not belong to a certain plane, the growing process started around it. All line segments of the region were stored in a list. At the beginning of iteration, all line segments of were considered. For each line of , after searching all its neighbors, it was still not labeled and decided to the current plane, then this line was deleted from the list of and all its neighbors were inserted. was generated for the next iteration. The plane fit for all the line segments would be adjusted before a new iteration began. At the k-th iteration, the plane fitting the region was described as

(12)

were decided by keeping the error least, and the error was obtained by

(13)

For every neighboring line segment, whether it should be added to the current region was decided by

(14)

where θ is the threshold. Since the maximal error happened in the pixels of the leftmost (x_l, y₀) and rightmost (x_r, y₀), the condition to test if the line segment should be added was

(15)

The process of iteration terminated until no new line was added to the plane region. The borders of region were saved in the list. Then, the next seed region was deleted from the queue and tested if it belonged to some plane; if still not, a new growing process began around it. While the seed queue was empty, the region growing finished.

2.7 Trimming plane edge

The edges of planes generated by the above algorithm were saved in the lists of line segments. The length of all line segments in the list was not accordant, so the edge would appear to be irregular. In order to make the edges smooth, it was needed to trim the line segments. For every line segment, s, in a list of edges, if its left neighbour was not labelled with the same plane number, whether the leftmost pixel of s was more fit for its left neighbour was checked; if this case happened, the pixel would be deleted from s and added to its left neighbour; and then the next leftmost pixel continued until no pixel was deleted. In the same way, if the right neighbour of s was not labelled with the same plane number, the rightmost pixel would be tested whether it was much fit for right neighbour; if this case happened, the pixel would be deleted from s and added to the right neighbour; and then its next rightmost pixel continued.

3 Experiments

Planes were extracted to build the circumstance map for the navigation of humanoid robots. Therefore, only larger planes than the size of robot’s foot were considered, and the minimum length of line segment in seed regions was set to be 100 pixels in experiment. Two types of model were chosen. One was pipe model which had two planes (see Fig.2), and the other was stairs model which had six planes (see Fig.3). The original images were acquired by two CCD cameras, and one of the pair of images which included 640×480 pixels was shown in Fig.2(a) and Fig.3(a). The whole algorithm had been implemented on the Intel core 3.3 GHz CPU. Through the stereovision processing, the depth image could be obtained (shown in Fig.2(b) and Fig.3(b)). After the seed regions were selected, the regions grew iteratively and the edges of planes were trimmed. The ultimate edges of the planes were shown in Fig.2(c) and Fig.3(c). In Fig.2(d) and Fig.3(d), the edges of the extracted planes were expressed in the original images, so it was convinced that the extracted planes were fit for the original images.

Fig.2 Results of plane extraction from pipe model: (a) One of a pair of original images; (b) Depth image obtained; (c) Extracted planes; (d) Planes on original image

Fig.3 Results of plane extraction from stairs model: (a) One of a pair of original images; (b) Depth image obtained; (c) Extracted planes; (d) Planes on original image

The noise estimation of depth image, σ_img, the number of line segments split and the time consumed by the algorithm were listed in Table 1. From the results, it could be seen that the speed of algorithm of plane extraction was receivable. In addition, a large portion (70% or even more) of the running time was devoted to obtaining the depth image and splitting the line segments, therefore the speed of algorithm would be increased greatly if quicker stereo matching algorithm was chosen. The efficiency of algorithm was also affected largely by the result of stereo matching algorithm, and the accuracy of extracting all the planes was larger than 90% when more than 85% of pixels in original images were matched. In one word, it was important that one kind of quick and efficient stereo matching algorithm was selected to obtain the depth image as a part of the whole algorithm.

Table 1 Parameter and result of algorithm

4 Conclusions

1) In order to help the humanoid robots build the environment map, a method of extracting the planes was proposed. The pair of original images was acquired by two CCD cameras, and through stereo matching, the depth image could be obtained. The median filter was adopted to reduce the noise level of the depth image. The noise variance of image was estimated by averaging the estimates of the noise variance at each pixel for the sake of setting the threshold. Pixels of each line were scanned and split into two parts at the point furthest away along the z-axis from the chord between its two end points iteratively. The seed regions were selected from the groups of three line segments. Queue storing seed regions were ordered by the optimality criterion, and then the plane region was expanded around the seed region. The process of region growing would continue until the queue of seed region was empty. After trimming, the border of the planes became smooth. In the end, the edges of extracted planes were got.

2) Two types of model: pipe and stairs in the experiment were adopted. Two planes in pipe model and six planes in stairs model were extracted exactly. The algorithm satisfied the demands of humanoid robot’s navigation in terms of speed and precision. The plane would be flagged with level if the normal direction of plane was vertical to the robot’s reference surface, and then this information could be used for building the surrounding map.

References

[1] GUTMANN J S, FUKUCHI M, FUJITA M. Stair climbing for humanoid robots using stereo vision [C]// Int Conf on Intelligent Robots and Systems (IROS). Sendai, Japan, 2004: 1407-1413.

[2] KANEHIRO F, HIRUKAWA H, KANEKO K, KAJITA S, FUJIWARA K, HARADA K, YOKOI K. Locomotion planning of humanoid robots to pass through narrow spaces [C]// Int Conf on Robotics and Automation (ICRA). New Orleans, 2004: 604-609.

[3] GUAN Y, YOKOI K, SIAN N E, TANIE K. Feasibility of humanoid robots stepping over obstacles [C]// Int Conf on Intelligent Robots and Systems (IROS). Sendai, Japan, 2004: 130-135.

[4] CHESTNUTT J, LAU M, CHEUNG G, KUFFNER J, HODGINS J, KANADE T. Footstep planning for the Honda ASIMO humanoid [C]// Proceedings of the IEEE International Conference on Robotics and Automation(ICRA’05). Piscataway, NJ, USA, 2005: 629-634.

[5] KUFFNER J J, KAGAMI S. Dynamically-stable motion planning for humanoid robots [J]. Autonomous Robots, 2002, 12(1): 105-118.

[6] ZONG Ying, SHI Wen, LI Xu. Optimal sagittal gait with ZMP stability during complete walking cycle for humanoid robots [J]. Journal of Control Theory and Applications, 2007, 5(2): 133-138.

[7] ZOU Xiao-bing, CAI Zi-xing, SUN Guo-rung. Non-smooth environment modeling and global path planning for mobile robots [J]. Journal of Central South University of Technology, 2003, 10(3): 248-254.

[8] CHEN Xi, TAN Guan-zheng, JIANG Bin. Real-time optimal path planning for mobile robots based on immune genetic algorithm [J]. Journal of Central South University of Technology: Science and Technology, 2008, 39(3): 577-583. (in Chinese)

[9] GUTMANN J S, FUKUCHI M, FUJITA M. Real-time path planning for humanoid robot navigation [C]// Int Joint Conference on Artificial Intelligence. San Francisco, CA, USA, 2005: 1232-1237.

[10] ZHANG Tong, XIAO Nan-feng. Real-time map building for path planning of a humanoid robot [C]// Asia-Pacific Conference on Information Processing, 2009: 211-214.

[11] OKADA K, KAGAMI S, INABA M, INOUE H. Plane segment finder: Algorithm, implementation and applications [C]// Proceedings of the 2001 IEEE International Conference on Robotics & Automation. Seoul, Korea, 2001: 2120-2125.

[12] CHEN Ze-zhi, PEARS N E, LIANG Bo-jian, McDERMID J. Plane segmentation from two views in reciprocal-polar image space [C]// International Conference on Image Analysis and Recognition. Porto, Portugal, 2004: 638-646.

[13] LUD C, SU L, ZHU F, SHI Z. A general method for omnidirectional stereo camera calibration based on neural network optimizationl [J]. Lecture Notes in Computer Science, 2006, 3972: 383-389.

[14] BANKS J, CORKE P. Quantitative evaluation of matching methods and validity measures for stereo vision [J]. The International Journal of Robotics Research, 2001, 18(3): 512-532.

[15] KAGAMI S, TAKAOKA Y, KIDA Y, NISHIWAKI K, KANADE T. Online dense local 3D world reconstruction from stereo image sequences [C]// Proc of the IEEE/RSJ Int Conf on Intelligent Robots and Systems (IROS’05). Edmonton, Canada, 2005: 2999-3004.

(Edited by YANG Bing)

Foundation item: Project(60776816) supported by the National Natural Science Foundation of China and Civil Aviation Administration of China; Project (8251064101000005) supported by the Natural Science Foundation of Guangdong Province, China

Received date: 2010-03-05; Accepted date: 2010-07-05

Corresponding author: ZHANG Tong, PhD; Tel: +86-13570128329; E-mail: susanzhangtong@126.com

简介概要

详情信息展示

Plane extraction for navigation of humanoid robot

相关论文

相关知识点