Performance evaluation of 3D computer vision techniques

Kabayama, A. M.; Trabasso, L. G.

doi:10.1590/S0100-73862002000300013

Abstract

This work presents the implementation and comparison of three different techniques of three-dimensional computer vision as follows: • Stereo vision - correlation between two 2D images • Sensorial fusion - use of different sensors: camera 2D + ultrasound sensor (1D); • Structured light The computer vision techniques herein presented took into consideration the following characteristics: • Computational effort ( elapsed time for obtain the 3D information); • Influence of environmental conditions (noise due to a non uniform lighting, overlighting and shades); • The cost of the infrastructure for each technique; • Analysis of uncertainties, precision and accuracy. The option of using the Matlab software, version 5.1, for algorithm implementation of the three techniques was due to the simplicity of their commands, programming and debugging. Besides, this software is well known and used by the academic community, allowing the results of this work to be obtained and verified. Examples of three-dimensional vision applied to robotic assembling tasks ("pick-and-place") are presented.

computer vision; range finders; robotics; mechatronics

Performance evaluation of 3D computer vision techniques

A. M. Kabayama¹ 1 E-mail: alfred@mec.ita.cta.br ; L. G. Trabasso

Instituto Tecnológico de Aeronáutica Divisão de Engenharia Mecânica-Aeronáutica 12.228-900 São José dos Campos, SP. Brazil. E-mail: gonzaga@mec.ita.cta.br

ABSTRACT

This work presents the implementation and comparison of three different techniques of three-dimensional computer vision as follows:

Stereo vision - correlation between two 2D images

Sensorial fusion - use of different sensors: camera 2D + ultrasound sensor (1D);

Structured light

The computer vision techniques herein presented took into consideration the following characteristics:

Computational effort ( elapsed time for obtain the 3D information);

Influence of environmental conditions (noise due to a non uniform lighting, overlighting and shades);

The cost of the infrastructure for each technique;

Analysis of uncertainties, precision and accuracy.

The option of using the Matlab software, version 5.1, for algorithm implementation of the three techniques was due to the simplicity of their commands, programming and debugging. Besides, this software is well known and used by the academic community, allowing the results of this work to be obtained and verified. Examples of three-dimensional vision applied to robotic assembling tasks ("pick-and-place") are presented.

Keywords: computer vision, range finders, robotics, mechatronics

Introduction

Because of the increasing use of robots in industries, robotics has become an area inside engineering with its own identity. The advance of sensor technologies and their decreasing prices has allowed the construction of robots with more feedback capabilities about their workspace. As a consequence, their positioning accuracy, speed operation and functional flexibility have increased. It is normally accepted that computer vision is the most powerful and flexible way to provide robot with feedback about the environment they interact with and considerable efforts and resources have been spent on research and development, according to Ruocco (1987).

The determination of tridimensional data from two-dimensional images is very important in computer vision field and one way to perform that, is by using stereo vision techniques. In a three dimensional scene recovered from a pair of stereo images, two main problems must be solved. The first and more complex one is called 'stereo matching', that is, the establishment of correlation between two images of the same object taken from two different points of view. One projected point in the first image must match a point in the second image, which is the projection of the same point of the tridimensional world. The second problem is the geometric calculation of the tridimensional co-ordinate from the pairs of correlated points from both images using triangulation techniques, as shown in Fu (1987). Most of the current stereo vision techniques use area based algorithms. Those algorithms split both images into a number of subregions and a degree of photometric similarity is used to establish the matching between the respective subregions.

The sensorial fusion technique uses two types of sensors: a camera (two-dimensional sensor) and a range finder (one-dimensional sensor). This technique uses the computer vision capabilities to perform bidimensional measures like the co-ordinates of the centre of area of an object, as well as its length, width and orientation. Then, the range finder completes the missing information by measuring the height of the object's centre area ordinate. All length, width and orientation information are used to calculate the attitude of a robotic manipulator and allow it to grab an object in a typical robotic pick and place task.

Structured lighting is used mainly in inspection and quality control applications. This method is based on the extraction of geometric information encoded in a bidimensional image. The object height information is extracted by measuring the deformity of a pattern of light projected in its surface. As the model of this deformity behaviour is known, it is possible to recover the object's height. All the three dimensional computer vision techniques are described in detail as follows.

Stereo Vision

The stereo vision system herein presented was designed for a robotic 'pick-and-place' application, as shown schematically in the scheme in Fig.1

The stereo images grabbing process takes place in two steps, with the camera filming the scene from this top view. After the first image is grabbed, the camera is moved away to (0,5 to 1,5 cm range) through a displacement mechanism driven by a step motor. After this displacement, the second image is grabbed.

Stereo Vision Implementation

The development and tests of the algorithm have been carried out in four steps.

The first step was the development of the image correlation routines using MatLab® version 5.1 and synthetic images generated by 3D Studio, Version 4.0. At the beginning, the work was conducted with lower resolution images (200x300 pixels) to make the routines development process faster due to the huge computational effort involved in this technique. As soon as the correlation algorithm parameters were optimised and settled to provide a good performance, the second step took place, where the real image, grabbed by the camera with 512x512 pixels and 64 grey levels resolution, replaced the synthetic images.

The third step was the development of a process for calibrating intrinsic camera lens parameters. The most important parameter is the focal distance that is used to recover the object's height information.

The fourth step was the recovery of tridimensional data about the scene from 2D images. From the first image, which is always grabbed at an initial position, the information about objects in scene is achieved in pixels, such as length and width. To recover metric information about objects, it is necessary to find out the relationship between metric and pixels scales.

The information about object height is calculated through a simple triangulation technique from geometric optics model of stereo images configuration, as shown in Fu (1987).

Image Correlation Process Using Relaxed Rotulation Labelling

'Relaxed rotulation labelling processes are iterative procedures heuristically developed to solve certain constraint satisfaction problems, which have become a standard technique in computer vision and pattern recognition', according to Pelillo (1994). The result search is a coarse to fine process, which ends when the iterative procedure reaches the required error. These algorithms are associated to a 'energy' function, quantifying the degree of violation of constraints, which is minimised as the process evolves.

The stereo matching algorithm implemented in this paper has been proposed by Feris (1998).

Camera model and camera calibration

The camera model adopted is the 'pinhole' model, as shown in Fu (1987) and in Nalwa (1993): the tridimensional projection of an object into a bidimensional surface is achieved through straight lines that pass through one single point called 'projection centre'. Camera calibration is the determination of all its inner geometric and optics features. These inner features are called intrinsic parameters. Camera calibration also means the determination of its position and orientation relative to the world co-ordinate system. These features are called extrinsic parameters. Laudares (1997) presents in detail an extrinsic camera calibration process, which is quite suitable for the robotic application proposed in this work. The most important camera intrinsic parameter is the focal distance l, which is the distance between the centre of the lens and the image sensor plane.

3D Object Height Recovery ( Triangulation)

The following conditions must be met for the model shown in the Fig 2:

The cameras are identical;

Both images co-ordinates systems are perfectly aligned, with different origin location;

The Z co-ordinate is the same for both images;

According to Fu (1987), the depth information recovery (Z co-ordinate) is achieved by the following expression:

where:

l is the focal distance, estimated by an experimental calibration process.

x₂ and x₁ are co-ordinates in pixels, that must be converted in metric scale by multiplying them by a scale factor that relates the size of sensor in x direction, in meters, and, sensor resolution, in pixels, in the same direction.

B is the baseline, that is the displacement between two optical centres.

Some improvements on Feris (1998) technique were included in order to increase the algorithm performance and ease the correlation process, as shown in Kabayama (1999).

Further information about focal distance and scale factor processes procedures and calibration results can be found in Kabayama (1999).

Table 1 shows the results of some objects height measures using 30mm baseline displacement.

Thumbnail

Disparity is the difference between respective x co-ordinates in both images and matches established is the number of correlated points.

Sensorial Fusion

The conception of the sensorial fusion technique for 3D-vision machine is shown in the Fig. 3.

The sensor used was the Honeywell® 's 946-A4V-2D-2CO-175E ultrasound sensor, which has the following features:

Minimum range: 200 mm ( programmable )

Maximum range: 2000mm ( programmable )

Weight: 0.210 kilograms

The sensor provides an analogic tension output proportional to the distance to be measured. This proportional pattern can behave in a direct or in an inverse way, depending on how the sensor is programmed (rising or falling modes). The curves of this sensor relate output tension variation as a function of distance were determined using both proportional modes for different range programs. The results showed that the ultra sound sensor has a linear behaviour in all modes and this is an important and desirable feature. The respective static calibration coefficients for each curve were calculated and they are necessary to sensitivity to establish the relationship between the output tension and the distance measured and for evaluating the sensitivity of the programmed mode for noise and the resolution.

As for ultrasound beam characteristics, as shown in Fig. 4, it was necessary to study the radial profile and the results are shown in Table 2.

Thumbnail

The distances shown in Table 2 refer to the object top.

The determination of the ultrasound beam diameter in a given level was performed in an experimental way: on the surface of the testing table, a grid paper has been fixed and the sensor positioned in a certain range from it, aimed at its direction. An object was moved on this surface towards the place that the sensor was pointing at. As soon as the object was detected, the place where that happened was marked. This procedure was repeated until a complete ultrasound profile in this level was determined. This entire process was repeated for other levels, as shown in the Table 2.

From the knowledge about the sensor features, it is possible to estimate the minimum size of the object that can be manipulated using this technique. For example, at 40cm range, the object size must be 16cm at least. The size of the objects cannot be smaller than the diameter of ultrasound beam in the object top. Besides, the object material should not absorb the ultrasound waves and the object's top must be perpendicular to the direction that the ultrasound beam may reach it.

Structured Lighting

Two different lighting patterns were studied to evaluate accuracy and check if this technique is suitable for a pick-and-place robotic application.

The first pattern studied was a source of laser light from a presentation pointer device. An external DC power source was adapted to the device in order to avoid the decreasing light intensity due to batteries flattering process. Fig. 5 shows a general view of the experiment using a laser light source and its respective model [Galbiati (1990)] to recover the object height h information.

The full line and the dotted line shown in Fig. 5 represent two distinct situations in the data acquisition process for the object height h information recovery.

The scene is filmed twice. At the first shot, represented by the dotted line, the object is out of scene and P1 is the position of the laser beam centre area where it touches the ground. At second shot, represented by the full line, the object is in scene and P2 is the position of the laser beam centre area where it touches the top of the object. The laser beam area centre is determined by computer vision system in both situations. P3 is the P2 projection in the horizontal plane. The laser beam reaches the ground with a q angle and d is the distance, in pixels, between area centres P1 and P2. The object h height is determined by a simple trigonometric relation ( see Fig. 5 ):

where:

d Î Z^+;

s Î R⁺, is an experimental conversion factor between centimeters and pixels;

q Î Z⁺ , is the angle between P1P2 and P1P3 line segments.

Implementation

The first step was the determination of the conversion factor s. An A4 sheet of paper has been used as a standard object shot five times. Then, the object size in pixels, in each image, were measured. The s conversion factor yields from the average rate between the standard object size measured in centimeters and in pixels. It is important to take note if this conversion factor is determined at the same direction of distance variation because the camera pixels are not square shaped.

The s factor determinated was:

s = 21.1 / 387 (cm / pixel)

The second step was the q angle calibration procedure. Five different objects with known heights h were used as calibration standards and shot five times. Then, using Eq. (3), the respective angles q were calculated for each distance d measured.

The angle q calibration results are shown in Table 3.

Thumbnail

After completing the s and q calibration processes, the 3D structured lighting computer vision system using laser light source was ready to be operated. Different object heights were measured and the results are shown in Table 4.

Thumbnail

The second pattern used in this experiment was a single stripe projected by an overhead projector. Fig. 6 shows a general view of experiment and the Fig. 7 shows a projected pattern detail.

In this case, the object height information recovery is similar to the laser case, using the same principles and equations. The difference is that applying stripe pattern yields in three object recognition by computer vision system as shown in Fig.7. Due to digitalisation errors, the alignment of O2 and O3 objects can not always be obtained. Because of this, the distance d is the average of the distance between O2 and O1 and the distance between O3 and O1.

The conversion factor s used is the same as in the previous experiment. The q angle calibration process was repeated, using five standard objects. The results are shown in Table 5.

Thumbnail

After completing the s and q calibration processes, the 3D structured lighting computer vision system using the single stripe pattern was ready to use. Different object heights were measured using the Eq. (2). Some results are shown in Table 6.

Thumbnail

Analysis and Conclusions

Computer vision is a field with great potentials to be exploited and this work shows that there are still many subjects to be tested and improved. Despite all literature available about the techniques exposed in the current work, only the practical part shows the researcher the difficulties involved in their implementation and allows the evaluation about what assumptions and measurements can be taken.

The errors obtained in the objects measurements in the stereo vision and structured lighting techniques experiments implemented are acceptable in typical pick-and-place applications due to the robot end effector compliance compensation, even considering the reported worst case, that was 1cm error.

Table 7 presents the analysis and conclusions compiled from experimental data and from difficulties faced during the implementation of each technique exposed.

Thumbnail

Feris, R.S. & Lages, W.F., 1998, Casamento de Imagens utilizando correlação e rotulação relaxada, Anais do IV ENCITA- Encontro de Inciação Científica e Pós-Graduação do ITA
Fu, K.S. & Gonzales, R.C. & Lee C.S.G., 1987, Robotics, Mc Grall Hill,
Galbiati, L. J., 1990, Machine vision and digital image processing fundamentals , Prentice Hall
Kabayama, A. M., 1999, Implementação e análise de técnicas de fusão sensorial aplicadas à robótica, Tese de mestrado ITA.
Laudares, D., 1997, Procedimento automático para calibração de sistemas de visão robótica para operações pick-and-place, Tese de mestrado ITA
Nalwa, V.S.,1993, A guided tour of computer vision, Adson-Wesley Publishing Company
Pelillo, Marcello, 1994, On Dynamics of Relaxation Labeling Processes, IEEE Transactions on pattern analysis and machine intelligence
Ruocco S.R., 1987, Robot sensors and transducers, Open University Press Robotics Series, Edited by P.G. Davey

1

E-mail:

alfred@mec.ita.cta.br

Publication Dates

Publication in this collection
27 Aug 2003
Date of issue
July 2002

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] Feris, R.S. & Lages, W.F., 1998, Casamento de Imagens utilizando correlação e rotulação relaxada, Anais do IV ENCITA- Encontro de Inciação Científica e Pós-Graduação do ITA

[2] Fu, K.S. & Gonzales, R.C. & Lee C.S.G., 1987, Robotics, Mc Grall Hill,

[3] Galbiati, L. J., 1990, Machine vision and digital image processing fundamentals , Prentice Hall

[4] Kabayama, A. M., 1999, Implementação e análise de técnicas de fusão sensorial aplicadas à robótica, Tese de mestrado ITA.

[5] Laudares, D., 1997, Procedimento automático para calibração de sistemas de visão robótica para operações pick-and-place, Tese de mestrado ITA

[6] Nalwa, V.S.,1993, A guided tour of computer vision, Adson-Wesley Publishing Company

[7] Pelillo, Marcello, 1994, On Dynamics of Relaxation Labeling Processes, IEEE Transactions on pattern analysis and machine intelligence

[8] Ruocco S.R., 1987, Robot sensors and transducers, Open University Press Robotics Series, Edited by P.G. Davey

Brasil

Brasil

Performance evaluation of 3D computer vision techniques

Abstract

Publication Dates