Grosso E Articolo 1995 Active

7/30/2019 Grosso E Articolo 1995 Active

1/12

868 IEEE TRANSACTIONS ON PATTERN ANALYSIS A ND MACHINE INTELLIGENCE, VOL. 17, NO. 9. SEPTEMBER 1995

ActiveDynam ic Stereo VisionEnrico Grosso andMassimo Tistarelli

Abstract-Visualnavigation is a challenging issue in automa tedrobot control. In many robot applications, like object m anipula-tion in hazardous environments or autonomous locomotion, i t isnecessary to automatically detect and avoid obstacles while plan-ning a safe trajectory. In t h i s context the d etection of c orridors offree space along the robot trajectory is a very impo rtant capabil-ity which requires no ntrivial visual processing. Inm t ases it ispossible to take advan tage of the active control of the cameras.In t h is paper we propose a cooperative schema in which mo-tion and stereo vision are used to infer scene structure and de-termine free space areas. Binocular disparity, computed on sev-eral stereo images over time, is combined with optical flow fromthe same sequence to obta in a relative -depth map of the scene.Both the time-to-impact and depth scaled by the distance of thecamera from the fixation point in space are considered as good,relative measurements which are based on the viewer, but cen-tered on the environment.The need for calibrated pa rameters is considerably reduced byusing an active control strategy. The cameras track a point inspace independ ently of the robot motion and the full rotation ofthe head, which includes the unknown robot motion, is derivedfrom binocular image data.The feasibility of the approach in rea l robotic applications isdemonstrated by several experiments performed on real imagedata acquired from an autonomous vehicle and a prototype cam-era head.

Index Terms-Active vision, dynamic vision, time-to-im pact,stereo vision, motion analysis, navigation.I. INTRODUCTION

robot with intelligent behavior must be capable ofA oping with unp recise situations. However, this capabil-ity not always imply very sophisticated high-level reasoningcapabilities. For example, collecting soda cans in an unpre-dictable indoor environment and putting them in a predefinedplace is a kind of intelligent task which has been demonstratedto be solvable by means of basic sensory capabilities coordi-nated by reflex-type (or insect-like) behaviors [l ], [2]. Eventhough more soph isticated visual and reasoning processes canbe envisaged, still many other, even complex operations can beperformed relying on reflexes to visual stimuli [3], [4].One of the most interesting and useful aspects of the activevision paradigm is the use of motion (and in general the abilityof the observer to act and interact with the environment) toguide confinuous image data acquisition to enrich the meas-urements of the scene. Overall, this implies an increase in theamount of incoming data which, in.turn, may require an ap-propriate data-reduction strategy (for example limiting the

Manuscript received Apr. 13. 199 3; revised May 22 , 1995 .The authors are with the University of G enoa, Department of Communica-tion, Computer, and Systems Scien ce, Integrated Laboratory for AdvancedRobotics (LIRA-Lab), Via Opera Pia 1 3-16 145 G enoa, Italy; e-mail:(grenri,tista)@dist.unige.it.IEEECS Log Number P95123 .

frequency content of the image sequence with appropriatetransformations like the log-polar mapping [W . On the otherhand, this strategy can enorm ously simplify the computationalschema underlying a given visual task or measurement. In thelast years several examples have been presented, where activemovements can induce more information and allow simplercomputational schema to be adopted 161, [71, 181, 191, [lo].This is very important wishing to design working systems,where simpler processes allow the implementation of real-timesystems, which can perform several m easurements over time.In this paper we face the problem of visual navigation.The main goal is to perform task-driven measurements of thescene, detecting corridors of free space along which the robotcan safely navigate. The proposed schema combines opticalflow and binocular disparity, computed on several image pairsover time.In the past the problem of fusing motion and stereo in amutually useful way has been faced by different researchers.Some of them integrated the measurements obtained from ste-reo techniques repeated over time according to the relativeuncertainty, building a consistent representation of the envi-ronment [ll], [12], [13], [14]. Ahuja and Abbott [15] usedexploratory fixations to apply several modalities like stereo,focus and vergence to recover the ob ject structure and disam-biguate occlusions. With the same purpose, Grosso et al. [16]integrated depth measurements derived from both stereo dis-parity and optical flow. In [17] a simple stereo technique, ap-plied to a single image scan-line, is used to determine the focusof expansion relative to the horizon and control the heading ofa robot v ehicle. It is interesting to note that the low com plexityof this technique allows to continuously repeat the measure-ments over time, improving the m otion control. In general, it ispossible to make a distinction between the approaches wherethe results of stereo and motion analysis are considered sepa-rately and the rather different approach based upon more inte-grated relations. Within this group falls the work reported in[18], [19] where the temporal derivative of disparity is ex-ploited, and the dynam ic stereo approach [20], [21] consideredin this paper.Developing a sensory system for autonomous navigation(considered as a possible application of the approach pre-sented in this paper), the integration of m ultiple visual modali-ties can be the key to overcome some com mon problems:

Calibration.Many vision algorithms rely on the knowl-edge of some well calibrated parameters relative to thecamera-robot system [22]. In this paper stereo disparityand optical flow are used to av oid the explicit calibrationof external parameters. In particular, this method pre-vents the drawbacks of earlier techniques like the use ofthe baseline length and the need fo r the knowledge of the0162-8828/95$04.00Q 1995 IEEE


2/12

G R O S S 0 AN D TISTARELLI: ACTIVWDYNAMIC STEREO VISION

robots motion. Moreover, by exploiting active behav-iors it is possible to rely on self-calibration techniqueswhich reduce the number of required parameters [23],[24]. This methodology is applied to determine the focallength of the cameras, which is required by the stereoalgorithm.Accuracy. Even though accuracy can be improved by astatistical integration of independent measurements [25],[12], it has been shown [26] that accurate estimates arenot necessary for navigation purposes. For this reasonthis topic is not exp licitly addressed in this paper, but theanalysis is focussed on the robustness of the measure-ments which is crucial for s afe vehicle control.Robustness. Mainly for safety reasons the vision systemmust be very robust with respect to noise, either elec-tronic, due to the sensors, or dynamic, due to inaccuratemotion of the vehicle. Moreover, the result of the visualprocessing must be error-free, even at the cost of falsealarms. Algorithmic robustness can be certainly im-proved by providing independent estimates of the samequantity, like those obtained from different visual mo-dalities, and a method to combine them. Moreover, ro-bustness and numerical stability can be also achieved byadopting a cooperative schema in which different visualcues contribute to the estimation of scene structure. Inthis paper we show that, computing the time-to-impactfrom the temporal evolution of d isparity, the error in thefinal estimate does not depend on differential measure-ments (which are notoriously not very robust), which isthe case com puting the time-to-impact directly from o pti-cal flow.Metrics. It is still not well understood which is the bestmetric to use representing the environment. Standardmetrics, like inches or centimeters, are not well suitedbecause they require an additional calibration to relatethem to the image-derived data and to the motion control.On the other hand, if an active interaction with the envi-ronment is engaged [7], [6], metrics which are intrinsic tothe observer, a re certainly best suited to guide the robotsbehavior [lo], [9], [27]. In this paper the time-to-impactis computed, a viewer-based metric among the m ost use-ful for navigation. Other viewer-based metrics are alsoaddressed like the depth scaled by the inter-ocular base-line, or by the distance from a reference point in space[ioi,191.

In this paper, binocular disparity, computed for each imagepair over time, and m onocular optical flow are combined, viasimple relations, to obtain a 2YiD representation of the scene,suitable for visual navigation, which is either in terms of time-to-impact or relative-depth. The knowledge of the baselinelength and the robot motion are not needed. The time-to-impact is computed avoiding the estimation of the Focus ofExpansion from optical flow, while stereo reconstruction isused to compute the full rotation of the head-eye system in1. Defining simple computational schemes is rather important to develop areal-time vision system.This s also true if YOU wish to re m t the samemeas-

~

urements over time toexploit temporal consistency. the angle offsets. his operation is perfom& off-line.

869

space. Image-derived quantities are used except for the focallength and vergence angles of the cam eras which a re activelycontrolled during the robot mo tion, and are measured directlyon the m otor axes by optical encoders.2U. STEREOVISION ANDDYNAMICISUAL PROCESSING

The dynamic stereo approach is based on two fundamentalbuilding blocks: stereo and motion analysis. The algorithmsand notations relative to these modalities are briefly presentedin the following sections.A. Stereo Vision

The stereo vision algorithm uses a coarse to fine approachto perform a regional correlation. Th e resulting disparity mapstates the correspondence of the points in the left and rightimage. We skip over a detailed description of the stereo algo-rithm (see, for instance, [16]); instead, starting from theknowledge of the related disparity, we will concentrate on thecomputation of relative depth.

L B IFig. 1. Schematic representationof the stereo coordinate system. On he leftthe configuration in space is shown-the y axes of the cameras and of thestereo coordinate system are parallel and orthogonal to the plane defined bythe two optical axes. On the right, the projection of the point P,=(X, , ) nthe plane Y=0 (the point P) s shown.

Fig. 1shows the considered configuration: let P be the pro-jection on the stereo plane (the plane defined by the two opti-cal axes) of a point P, in space. We define the Kfunction as:

where a nd P are the vergence angles, y =arctan(%) and6 =arctan(+) define the position of two corresponding pointson the image planes, x , =xl +D (D s the known disparity), F,and F, re the focal lengths of the left and right camera meas-ured in pixels. It is easy to prove that the depth Z referred tothe stereo coordinate system is:where B is the baseline length. From the knowledge of thevergence angles a, , and the angular disparities x 6, (2) pro-

2. Measuring the vergence angles by optical encoders requires calibrating


3/12

87 0 IEEE TRANSAC TIONS ON PATTERN ANALY SIS AND MACHINE INTELLIGENCE, VOL. 17, NO . 9, SEPTEMBER 1995

vides a depth measure with respect to the in te ro cu k baseline.As expected, the distance from the fixation point OK dependsonly on the vergence angles a and p; in fact:

A. . CalibrationThe computation of the &a, p, 5 s) function poses tworelevant issues: The former is related to the calibration of theintrinsic parameters of the cameras (in our case limited to thefocal lengths), the latter concerns the estimation of the angularoffsets for the vergence angles a and p measured on the en-coders of the motors.By exploiting "active behaviors" it is possible to rely onself-calibration techniques which reduce the number of re-quired parameters [23], [24]. Moreover, q ualitative estimatesof relative depth, like the time-to-impact, do not require theprecise calibration of intrinsic parameters of the cameras. Infact, for navigation purposes it is not necessary to define ametric representation of the environment, but an approximatemeasurement of the dangerous and safe areas within the visualfield always suffices.In order to estimate the focal length of the cameras, we usea simple, active process: The cameras fixate on a first point inspace, then they are moved to fixate a second point. The rota-tion of the camera, measured on the optical encoder, requiredto change the fixation point, is related to the displacement ofthe projections of the points on the image plane. Referring toFig. 2, we obtain:

(4)Xt a n @ = -F 'where 0 s the rotation of one of the two cameras measured onthe motor encoder and x is the displacement of the consideredpoint on the image plane.

Concerning the calibration of the angle offsets, it can beperformed off-line by applying standard calibration techniques[28], [29]. In this paper we have not addressed this topic ex-plicitly, but we have compensated the angular offsets of theencoders by measuring manually the vergence angles off-lineand then resetting the origin of the encoders counts.

Rg. 2. M o ~ n ghe fixation from point P I to Pz results in two rotationsa nduh,measuredby the motorencoders.

B. Motion, Optical Flow, andTime-to-ImpactThe a bility to quickly detect obstacles and evaluate the timeto be elapsed before a collision (time-to-impact) is of vitalimportance for animates. This fact has been dem onstrated byseveral studies on the behavior of animals [30] or humans per-forming specific tasks [31]. The time-to-impact can be com-

puted from the optical flow which is extracted from m onocularimage sequences acquired during ego-motion. The optical flowis computed by solving an over-determined system of linearequations in the unknown terms (U, ) =v [32], [33], [34],[351,[361, [371:

( 5 )d d- I = Or ; -W=ij ,fwhere I represents the image intensity of the point (x , y) attime f. The instantaneous velocity for each image point can becomputed by solving the linear system (5) [37].The image velocity can be described as a function of thecamera parameters and split into two terms depending on therotational and translational components of camera velocityrespectively. The rotational part of the flow field can becomputed from proprioceptive data (e.g. the camera rotation)and the focal length. Once the global optic flow is com-puted, e is determined by subtracting qr from 3 . From thetranslational optical flow, the time-to-impact can be computed:

where Ai is the distance of the considered point ( x i , y i ) , on theimage plane, from the Focus of Expansion (FOE). The posi-tion of the FOE on the image plane can be determined bycomputing the pseudo intersection of the set of straight lines,obtained by elongating the optical flow vectors.In general, the estimation of the FOE, is critical. In fact, thespread error in the least squares solution of the FOE increasesas the distance of the intersection from the image center in-creases, because small angular errors in the velocity vectorsshift the position of the computed intersection. This is the casewhen the camera m oves along a direction con siderably differ-ent from the direction of the optical axis.We will show how the estimation of the FOE can beavoided by using stereo disparity.III. ACTIVETEREOAN D MOTIONCONTROL

The measurement of the time-to-impact from stereo se-quences can be faced by analyzing the temporal evolution ofthe image stream.We consider the stereo system as sketched in Fig. 3. In thiscase, even though estimates fiom stereo and motion are expressedusing the same metric, they are not homogeneous because they arerelated to different reference frames. n the case of stereo, depth isreferred to an axis orthogonal to the baseline (it defines the stereocamera geometry), while motion depth is measured along a direc-tion parallel to the optical axis of one (left or right) camera. Therelation between the two eference hmes is defined by:


4/12

GROSS0 AN D TISTARELLI: ACTIVEDYNAMIC STEREO VISION 87 1

where a s the vergence angle of the left camera, FZ s the focallength of the left camera measured in pixe ls, and x is the hori-zontal coordinate of the considered point on the image plane(see Fig. 3). We choose to adopt the stereo reference frame,because it is symmetric with respect to the cameras. In the re-mainder of the paper all symbols referring to the motion refer-ence frame will be denoted by the superscript m (like '"2 nd"T ) while those referred to the symmetric stereo referenceframe will be left without any further lettering (likeZ and T).

Fig.3. Schematic representation of the stereo reference frames with theirorientation.From the results presented in the p revious sections we canwrite for a generic point Pi =(Xi,Yi ) on the im age plane:

where B is the inter-ocular baseline, W, s the velocity of thecamera along the Z axis in the stereo reference frame, '"Ti rep-resents the time-to-impact measured in the motion referenceframe, and Ti is the time-to-impact referred to the sym metric,stereo reference frame. In a general case, to comp ute the abso-lute distance Zi, either from the time-to-impact or disparity, itis necessary to determine W, r +The first parameter requiresto measure the translational velocity of the cam eras, while theinter-ocular baseline should be calibrated. In order to obtain acommon relative-depth estimate, we first consider two differ-ent expressions derived from (8).The first equation represents the time-to-impact with respect tothe motion reference frame, while the second equation repre-sents a generic relative measure of the d epth of a point (xi ,y i )with respect to a second point ( XI , yI ) .The first expression in(9) can be applied to a generic image point (xi, yj ) relative tothe left camera to compu te the ratio %:

Because of '"I;. at the denominator, this expression applies if thetime-to-impact is not null. It is possible to obtain a better estimateof 3 by averag ing the expression (10) over allthe image points.

Substituting now (10) in (9):

These wo equations are the first important result. In particular thesecond equation directly relates the relativedepth to the timeto-impact and stereo disparity (i.e., the * unction). In general,pointwise atios like (11) are not robust. Therefore, to reduce theeffects of measurement errors, it is necessary to integrate meas-urements over a neighborhood of the considered point.

(t imet2) jFig. 4. Diagram showing the rotation of the stereosystem during motion .

As already noted [20], [21], the critical factor in (10) and(1 1) is the computation of the time-to-impact which usuallyrequires the estimation of the FOE position. To avoid thismeasurement it is necessary to exploit also the temporal evo-lution of disparity. The problem is not trivial because, consid-ering different instants of time, the stereo reference framechanges its position and orientation while the rotational com-ponent of motion deeply affects the depth measurements per-formed by means of (11). Theoretically, it is possible to re-cover the rotational motion of the stereo cameras from visualinformation only, but the formulation is far too complex toallow a closed form solution [38], [28], [39]. In other words,even knowing the pan and tilt angles of the cameras, it is gen-erally impossible to compute the efective rotation of the ste-reo system in space.A first attempt to solve ths problem has been presented in [21],where a solution was given based on twomain assumptions:

1) The cameras were actively controlled to track a point in2) The stereo rig was kept parallel to the ground plane d ur-

In the remainder of the paper a more general solution is pre-sented, where the cameras are allowed to rotate about the xaxis, parallel to the ground plane, and about the y axis, per-pendicular to the ground plane. This added capability is veryimportant because it allows the binocular head to direct thegaze everywhere in space.

space during the robot motion,ing the motion of the vehicle.


5/12

87 2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 17, NO . 9, SEPTEMBER 1995

,

A. Estimation of Camera RotationThe motion of the stereo system between two different timeinstants tl and t 2 can be expressed as a gen eric roto-translation:

(12)where 'I ;=('X,Y, 'Z) and 2@ =(2X, 2Y, 2Z) are the coor-dinates of the same point in space m easured a t time tl and tz , Ris the rotation m atrix, and '?. =(2T', 2Ty, 2Tz) is the transla-tion vector. Therefore, to com pute the 3D motion in the gen-eral case, it is necessary to solve a linear system with 12 un-knowns (which can be reduced to six, including three rota-tional angles and three translational displacements), and atleast three different points are needed [28], [40].As we are interested in recovering the rotational motiononly, it is possible to avoid the compu tation of the translationalvector .?. by subtracting the expressions for four differentpoints in space. Considering four points Pa, Pb, Pc, and Pd attwo time instants tl and t2 , we obtain:

2F =R . I;+2F

'Fa - '8 =R * ( 'Fa - '8) ; =b, C,d , (13)which is a linear system of nine equations in nine unknowns.Le t us consider the general case, defined by (12), and sup-pose that the stereo reference frame rotates during motionaround two different axes: the absolute vertical axis, perpen-dicular to the ground plane, and the x axis, parallel to theground plane.3 The rotation matrix R can be written as thecomposition of three elementary rotations:

' I ;=RX($').R,(8). Rx($o). 'I; +'F, (14)where h is the rotation angle between the y axis of the stereosystem and the absolute vertical, and $=h + is the effec-tive rotation around the x axis. The angle 0 is the only realunknown in our case: h and are measured from the motorencoders, but 8 can not be measured because it is related to therotation of the robot carrying the stereo system and no t of thecameras. Com puting explicitly the matrix R we obtain:

R = 1in@ sin(3 -cos@osin0si n@ ,si n0 c o ~ @ ~ c o s @ ~sin@ ,sin@,co s0 sin@,cos@, cos@,sin@,cosO .cos@ , in0 -cos@o in@, sin 0, cos@, cos0 -sin@, sin@, cos@, os@, os0(15)

[Let us consider two points ijh and gk:

2% =R 'Fh +'F (16)by applying (13) and denoting by A'F =(A'X, A'Y, A'Z)obtain:

i Fk =R Fk +2F '= 'I;), '4 and A'I; =(A2X, A'Y, A2Z)=21;h - I;' W e

- 'ZCOS q0 in eA2Z =A'Xcos$, sin8 +

-A' Y(COS $o sin-A'Z(sin $o sin

1 (17)+ in $o co s cos e)+- os $o cos el cos e).

I.=+.[a n ( a - y ) )-?(19)= + . L . * . c osy4 s i n ( a - y ) '

Z = + . *1ut, as+s a proportionality factor for the coordinates (X, Z),then (18) does not depend on the stereo baseline +If the optical axes of the cameras are m aintained parallel tothe ground plane during the robot motion (as in [21]), wesimply obtain a=c=0 and b=d=1. In this case:A ' X A ~ Z A ~ X A ' ZA ' X A ~ X+A ' Z A ~ Z 'tane =

As a consistency verification we can note that if the stereosystem does not rotate during vehicle motion A'X =A2X andAZ =A2Z, and 8=0.The rotation angle 8 can be used to eliminate the rotationaleffect from the depth estimate. From (14) applied to a genericpoint P :Z, =*z- 2 ~ z d'xsine - bc+adc0Se)'y - UC - bdcose) 'z,

(21)where 2, epresents the distance 'Z of the considered pointprojected along the direction of 2Z.Dividing both sides of (21)by +and applying the first expression in (8):

'X 'Y 'Z*,=- ds ine - bc +a d C O & ) - - UC - dC0se)-+ + +'(22)which is an expression of the * unction projected on thetranslational path. Also, in this case, if the cameras do not tilt,we obtain simp ler expressions:

3. This s equivalent to consider a moving vehicle equipped with a stereocamera system where the baseline is kept parallel to the ground plane, whilethe stereo reference frame.can freely perform pan and tilt rotations. Only rollis not included in the allowed movements.


6/12

GROSS0 AN D TISTARELLI: ACTIVEDYNAMIC STEREO VISION 813

A t =[[ an(la* ,y) - )sin0 +k C O S I ~ ] . (24)If the baseline of the cameras does not rotate Zr = '2 and4 ='K.n the remainder of the paper we will generically de-note as Z and K the distance Zr and the function & projectedalong the translational path.B. Using the Temporal Evolution of Disparityform velocity, we obtain:Considering a general motion of the stereo system with uni-

Zi( t -A t ) - Z i ( t )=W, tW+j ( t - A t ) - i ( t ) =L A t ,

where Zi and K) re the corresponding measurements, for ageneric point (xi, y i ) , projected along the translational path.This is a very interesting expression show ing that K s a linearfunction of time, but also it can be used to estimate the un-known parameters W, nd B. Moreover, given the factor %, itis possible to make a prediction of the v alues of disparity overtime to facilitate the matching process. If the optical flow andthe disparity map are com puted at time t, the disparity relativeto the same point in space at the successive time instant, can beobtained by searching for a matching around the predicteddisparity, which must be shifted by the velocity vector to takeinto account the motion.

AS3 s a constant factor for a given stereo im age pair, it ispossible to compute a robust estimate by taking the averageover a neighborhood [20]:

1& = A * =-A t N2 C [ * i ( t - A t ) - i ( t ) ] . (26)Given the optical flow i ; =(U, ) and the map of the values ofthe K function at time t - At , the value of 3Ci(t) is obtained byconsidering the image point (x i+u i ,yi +v i ) on the map at timet . This expression reminds the temp oral evolution of disparityformulated in [18]. The basic difference between the two ap-proaches is the fact that Waxman and Duncan analyzed a ste-reo set-up with parallel optical axes, while in this paper thecameras have convergent optical axes. On the other hand, theexpression developed in [18] is related to binocular imageflows, which are used to establish stereo correspondence,while in our approach a monocular optical flow is used, to-gether with stereo disparity, to apply (26) and then computethe time-to-impact.Combining (9) and (26) we obtain:

not require the computation of the FOE. he value of- n(28) can be easily computed from the pixel position, the focallength of the camera and the vergence angle of the camera onwhich the optical flow has been computed:

In summary, in order to compute the time-to-impact or relativedepth, the following quantities need to be com puted or measured:0 the stereo disparity field at two time instants;0 monocular optical flow field from an image sequence ac-quired from the left camera;0 the vergence angles of the cameras (measured from theoptical encoders of the motors);0 the focal length of the cameras, or the conversion factorbetween linear and angular displacements on the imageplane.

C. Sensitivity Analysis in the Computationof the Time-to-Impact

It is beyond the amof this paper to p erform an exhaustive erroranalysis for all the equ ations presented, but it is interesting to ana-lyze the sensitivity of (27) with respect to noise. The variation of Tin relation to * an beexpressed by differentiating (27):From (26) it is possible to observe that A x is a function ofK i ( t )only. Therefore, computing exp licitly the total derivative-*--- and the p artial derivatives of Ti, and substitutingin (30):d *i N2&

dividing both sides by r and taking the differential miof F,we obtain:(32)

If the value of AK is obtained using a sufficient number N ofimage points, then the term Ik.21s n egligible. Therefore,the relative error on the time-to-impact is equal to the relativeerror on the disparity function K.This result can be comparedwith the error in computing the time-to-impact from opticalflow. In this case the relative error can be computed from (6):

where V, = is the amplitude of the translational componen tThe estimate (27) of the time-to-impact is very robust and does of the velocity vector $xi, yi) an d Ai is the distance of the


7/12

874

F tanZ(B+6)+ F "'(a-Y)I - m c o s ' ( a - l ' ) ( x Z + F ' + D Z + 2 x D )COs'(p+6)

IEEE TRANSACTIONS ON PATTERN AN ALYSIS A ND MACHINE INTELLIGENCE, VOL. 17, NO . 9, SEPTEMBER 1995

,_Fig. 5. Graphs showing the relative variation of X with respect to the pixelcoordinates and image disparity. a) Variation of the K function in relation tothe error S x of the image x coordinate.b) Variation of the K function in rela-tion to the error SF in the focal length F. ) Variation of the K function inrelation to the rela tive error % n the computed image disparityD .considered point ( x i , y i ) from the FOE. By developing this ex-pression and taking the differentialmi of Ti ,we obtain:

(34)where ai epresents the spread error of the least squares solu-tion for the location of the FOE,which is obtained as thepseudo intersection of the directions of the optical flow vec-tors. Dividing both sides by T and substituting (6):

I FD tm2(a-y) I


8/12

GROSS0 AND TISTARELLI: ACTIVEVDYNAMIC STEREO VISION 875

This error function is clearly bounded, because each of thethree addenda is bounded, and also varies smoothly with theimage coordinates and disparity. Therefore, the measurementof K, and from (32) also the time-to-impact, degrade grace-fully with increasing errors in the image variables. The behav-ior of the relative error in the measurement of the image dis-placement lq s well as the relative error in the image dis-parity Iqdepend on the computational schema applied. Onthe other hand, the computation of requires to compute anddifferentiate the rotational component of the optical flow. Inthis paper a differential technique has been applied for thecomputation of the optical flow, because allows to estimate adense and accurate flow field almost everywhere on the imageplane, assumed the motion to be small. However, as the imagevelocity becom es close to zero, the error term grows quitequickly, consequently, the measurements of the time-to-impactfrom the optical flow only can be unstable and sensitive tohigh frequency noise. This is not the case for the error terml%l because rather large disparities are computed by applyinga correlation technique.The first addendum in (35) represents the relative error in thecomputation of the FOE. This measurement is generally morerobust than the local computation of the image velocity itself,because it is obtained by integrating several velocity estimateson the image plane. However, due to the sensitivity to noise ofthe velocity estimation, the accuracy in the localization of theFOE is poor, in particular close to the FOE, while the term &&becomes very large, of the order of tens of pixels. Also the errorterm 131becomes very large computing the time-to-impactclose to the FOE. In conclusion, the time-to-impact computedfrom the optical flow only can be unstable and very sensitive tonoise wherever the optical flow is small and, in particular, pres-ents a singularity in the FOE. The accuracy in the computationof the time-to-impact degrades quite quickly as the FOE departsfrom the image plane. On the contrary, the absolute error com-puting the time-to-impact from the dynamic stereo approach canbe larger than using the optical flow only, but it is more robustand stable with respect to noise.

PI

IV. EXPERIMENTALESULTSA. Planar Tracking

In the first experiment a computer-controlled mobile plat-form TR C Lubmute with two stereo cameras has been used.The cameras were arranged as to verge toward a point inspace. A sequence of stereo images has been captured during atracking motion of the cameras and keeping the stereo rigparallel to the ground plane. In Fig. 6the first and last stereopair, from a sequence of 11 , are shown. The disparity mapcomputed from the sixth stereo pair is shown in Fig. 7. Th evehicle was moving forward about 100 mm per frame. Thesequence has been taken inside the LIRA lab, with many ob-jects in the scene, at different depths. The vehicle was undergo-

ing an almost straig ht trajectory with a very small steering to-ward left, while the cameras were fixating a stick which can beseen on the desk, in the foreground . The value of Ax or the sixthstereo pair has been computed by applying (26) at each imagepoint. By taking the averag e of the values of Ax over all the im-age points, a value of 3 qual to 0.23 has been obtained. Thisvalue must be compared to the ground truth equal to 0.29, com-puted from the velocity of the vehicle, which was about 100millimeters per frame along the 2 axis, and the inter-ocularbaselie which was about 335 millimeters. Due to the motiondrift of the vehicle and the fact that the baseline has been meas-ured by hand, it is most likely that, also in this case, the givenvalues of the velocity and baseline are slightly wrong.

Fig. 6. First (a) and last (b)stem mage pairof he sequence.

Rg. 7. Map of the values of the Kfunction obtained rom the sixthstem pairof thesequence.


9/12

876 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 17, NO , 9, SEPTEMBER 1995

Fig. 8. Opticalflow relativeto the sixth left image of the sequence.

Fig. 9. Rough and smoothed histograms of the ang les computed from frames5-6 and frames 7-8, respectively. The abscissa scale goes from -0.16 radiansto 0.16 radians. The maxima computed in the smoothed histograms c o m -sponds to 0.00625 and 0.0 075 radians, respectively.

In this experiment the came ras were moved keeping the op-tical axes on a plane parallel to the ground plane, thereforeonly one rotation ang le is involved during tracking. The rota-tion angle of the stereo baseline between two successive timeinstants has been computed by applying (20) to all the imagepoints. In the noiseless case all the image points will produ ceidentical estimates. In a real scene, like in this e xperimen t, it isnecessary to discard all the wrong an gular values coming fromerrors and loss of accuracy in the com putation. This is o b-tained by ranking the angular values on an histogram and co n-sidering as correc t the angular value corresponding to the peakin the histogram. In the case of objects moving within the fieldof view it should be easy to separate the different peaks corre-spondin g to moving and still objects. In Fig. 9two histogramsrelated to frames 5-6 and 7-8, respectively, are shown. Thecomputed camera rotation is 0.00625 radians, correspondingto abo ut 0.36 degrees.In Fig. 8the optical flow of the sixth left image of the se-quence is shown. From the op tical flow and the values of the Kfunction the map of the time-to-imp act has been computed byapplying (27). The map is shown in Fig. 10; the values of thetime-to-impact are coded as gray levels, darker meaning lowertime-to-impact.B. General Tracking

In the first experiment the cameras were kept fixed, whilean object was rotating on a turntable at a speed of two degreesper frame. The cameras were arranged as to verge toward thecenter of the turntable. In this way a tracking motion wassimulated. In Fig. 11the 1 0th and 20th stereo pair, from a se-quence of 24, are shown. The images are 256 x 256 pixelswith 8 bits of resolution in intensity.

Fig. 10. Time-to-impact computed using (27) for the sixth pair of the s equence; darker regions correspondtocloserobjects.Fig. 11. Tenth (a) and 20th (b) stereo mage pair of the sequence.

The captured sequence has been used to perform two ex-periments: In the former we took the position of 10 relevant


10/12

G R O S S 0 AN D T I S T A R E U I ACTIVWDYNAMIC STEREO VISION 817

points (like corners), directly from the original images (byhand using a pointing device); in the latter we computed theoptical flow and disparity maps from two successive imagepairs, and used the output to compute the rotational angle andthe time-to-impact.For the first experiment the 10th and 20th image pairs of thesequence have been used, which correspond to a rotation ofthe object of 20" circa! The images are shown in Fig. 11. Th ecoordinates of the points, taken by hand, were fed to (18). Us-ing more data than equations, we simply computed the averageof the angles resulting by considering all the data points. Ofcourse a least square method would have been more appropri-ate to find a better solution. The angular value obtained is18.6" ,which must be compared to the measured rotation of20" . It is worth noting that, by locating the position of thepoints by hand, we introduced an error of at least one pixel inthe localization of the points.In the second experiment two successive image pairs wereused, corresponding to a rotation of two degrees circa. Thedisparity maps were computed for both image pairs and theoptical flow for the left image of the firs t frame. The computedcamera rotation (around the vertical axis) is 0.0427 radians,corresponding to about 2.4' .The rotation recovered from theoptical flow and disparity with (18), has been used to correctthe K values as from (22) and compute the time-to-impact byapplying (27). The map in Fig. 12 codes the values of th e time-to-impact as gray levels, darker means smaller time-to-impact(or an object closer to the cameras).

Fig. 12. Time-to-Impact computed using (27) for the 15th pair of the se-quence; darker regions correspond to closer objects.A different experiment was performed by processing a se-quence of stereo images acquired from a prototype robotichead, built at the LIRA-Lab. A picture and a schematic dia-

gram of the head with the cameras is shown in Fig. 13. Thehead is composed of four independent degrees of freedom.The "neck" rotation and the common tilt of the two cameras iscontrolled by two DC orque motors. The independent ver-gence of each camera is controlled by two stepper motors.

4. This value has been measured approximately during the image acq uisi-tion phase.

Fig. 13.Pm and diagram of the robotic head used to acquire the imagesusedinone.experiment.

Fig. 14. Fmt (a) and last (b)st em image pair of the sequence acquired fromthe head.The head was placed on a wheeled cabinet. The gaze andvergence of the head were automatically adjusted to compen-sate for the m otion of the cabinet, keeping the fixation on thesame point in space. The forward motion was a translation ofabout 7 cm per each step. At each step a pair of stereo imageswas captured, at a resolution of 256 x 256 pixels with 8 bitsper pixel. In Fig. 14, the first and last image pair from a se-quence of 16 is shown. The vergence, pan and tilt angles of thecameras were recorded from the optical encoders of the headmotors. It is worth noting that only the degrees of freedomunder active control were effectively recorded, while the rota-tion of the cabinet was completely unknown and without pre-cise control. The recorded rotations were used to compute theeffective rotation of the camera system from one frame to thefollowing. The procedure adopted to compute the effectiverotation and the time-to-impact, is the same followed in theprevious experiment. The disparity maps were computed for


11/12

818 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 17, NO. 9, SEPTEMBER 1995

the sixth and seventh image pairs and the optical flow for theleft image of the sixth frame. In Figs. 15and 16 the computedmap of the It function and optical flow relative to the sixthframe, are shown. As in the previo us experimen t, we compu tedthe average from all the resulting angles. The ob tained camerarotation (around the vertical axis) is 0.014 radians, correspond-ing to about 0.8 degrees.

Fig. 15. Map of the values of the K unction obtained from the sixth andseventh s te re o pair of the sequence.

Fig. 16. Opticalflow relative to thesixth left image of the sequence.

Fig. 17. Time-to-impact computed using (27) for the sixth pair of the se-quence; darker regions correspondto closer objects.

The rotation recovered from the op tical flow and d isparity hasbeen used to correct the K values and compute the time-to-impact. The map in Fig. 17 codes the values of the time-to-impact as gray levels. As it can be noticed, the time-to-impactmap closely reflects the structure of the env ironment. The lampon the desk, which can be hardly seen in the original images,appears clearly closer than the backgrou nd, while the desk in theforeground has the lowest time-to-impa ct values.

V. CONCLUSIONIn this paper we have addressed the problem of the extrac-tion of relevant visual inform ation for robot operations.Whenever a spatial goal has to be reached, eithe r by animalsor robots, it is important to be able to decide the direction ofmotion. This decision is crucial in visual navigation becausecan be taken on the basis of visual information only. It is nec-essary to identify free space areas in the scene which can besafely crossed by the robot from its current position in space.Stereo vision and motion parallax have been considered as

cues to identify corridors of free space.Redundancy is enforced by defining a cooperative schema inwhich both stereo and motion provide information to estimate arelative-depth map of the observed scene. In the cooperationprocess binocular disparity, computed on several image pairsover time, is merged with optical flows to cope for the need ofcritical parameters relative to the cameras and/or the robot. Animportant aspect is certainly the possibility of computing dy-namic quantities, like the time-to-impact, directly from stereodisparity, using the optical flow to determine the temporal evo-lution of disparity in the image sequence.One of the advanta ges of the proposed app roach is the pos-sibility to compute the effective rotation of the stereo camerasystem, even in presence of unknown rotations of the movingvehicle. In fact, the only ang ular displacements which have tobe measured, are the rotations performed by the motors of thehead-eye system. Therefore, the systems behavior is closelyrelated to the capability of visual stabilization of a target. Thisis consistent with the need to stabilize an image in order tocompute meaningful data relative to the environment and/orthe. obse rver. In this sense the systems activity helps for mo-tion and structure recovery.The measured parameters are closely coupled together andstrongly depend on the behaviour of the system. In fact, notonly the recovered rotation is necessary to compute the time-to-impact map, but also it is naturally referred to the imagingsystem and not to the motor system, As long as the systemkeeps tracking the same point in space, or it d oes not performssaccadic movements, successive measurements are perfectlycoherent and can be integrated over time to enforce the robust-ness of the recovered rotation and time-to-impact.

ACKNOWLEDGMENTSThis work has been partially funded by the ESPRIT BasicResearch Action projects P3274 FIRST and P6769 SECONDand by grants of the Italian National Research Co uncil.We thank Prof. G. Sandini for his helpful comments and in-sights during the developm ent of thiswork.


12/12

GROS S0 AND TISTARELLI: AC TIVEDYNA MIC STEREO VISION 819

REFERENCESR.D. Beer, Intelligence asAdaptive Behavior.AcademicPress, 1990.R.A. Brooks, A robust layered control system for a mobfie robot, IEEETrans. Robotics andAutomation,vol. 2, pp. 14-23, Apr. 1986.E errari, E. Gaosso, M. Magrassi, and G. Sandini, A stem vision systemfor realtimeobstacle voidance in unknown e n v i t , P m . ntl Work-shop Intelligent Robots and Systems, Tokyo, July 1990.G. Sandhi andM. Titarelli, Robust obstacle detection using optical flow,Proc. IEEE Intl Workshop Robust Co mputer Vision, pp. 3 m 1 1 , Sea ttle,M. Titarelli andG. Sandhi, Dynamic aspects in active vision, CVGIP,Special issue on Purposive and W t a t i v e Active Viion,Y. Aloirmnos, ed.,vol. 56, no. 1,pp. 108-129, July 199 2.J. Aloimonw, I. Weiss, and A. B andyopadhyay, Active vision, Int? J.Computer Vision,vol. 1, no. 4, pp. 333-356.1988.G. Sandhi and M. Tita~lli,Active tracking strategy for monocu lar depthinfmceover multipleh,EEE T m . at tem Aanalysi sand M a c hIntelligence,vol. 12, no. 1, pp. 13-27, Jan. 199 0.R.K. B ajcsy, Act ive Perception vs passive pareption, Proc. Third IEEE CSWorkshop Computer Vision: Representation and Control, pp. 13-16, B el-laire,Mich., 1985.D.H. Ballad, Animte Vision, Artificial Intelligence, vol. 48, pp. 57-86,1991.D.H. Ballad and C.M. Bmwn, rincipk of animatevision, CVGIP, S p ecial issue on pluposive andQualitative Active Vision, Y. Aloin, ed., ol.56, no. 1, pp. 3-21, July 1992.N.J. BridweU and T.S. Huang, A discrete spatial for ateramotion stereo,CVGIP, vol. 21, pp. 3347 ,19 83.N. Ayache and O.D. Faugeras, Tdahtaining representationsof the envitOn-ment of a mobile rubot, IEEE Tran s. Robotics and Automution,vol. 5, no. 6,pp.804-819, Dec. 1989.D.J. Kriegman, E. Triendl, and T.O. Binford, Stereo vision and navigationinbuildings for mobile robots, IEEE Trans. Robotia and Automation,vol. 5 ,no. 6, pp. 792-803, Dec. 1989.L Matthies, T. Kanade, and R. Szeliski,an ilter-based algorithms forestimating depth from image sequenoes, Intl J. Computer Vision,vol. 3, no.3, pp. 209-238.1989.N. Ahuja and L Abbott, Surfaces h m ynamic stereo: Integratingcameravergence, focus, and calibration with stereo surface reconstnrction, IEEETrans. Paftem h l y s i s and Machine Intelligence,vol. 15,no. 10, pp. 1,007-1,029, Oct. 1 993.E. Gaosso, G. Sandini, and M. Titarelli, 3D object mnstruction usingstem and motion,IEEE T m . ystems, Man, and Cybemtics, vol. 19, no.6, Nov/Dec. 1989.R.A. Brooks, A.M. Flynn, and T. Marill, Self calibration of motion andstens vision for mobde robot navigation, Proc. DARPA Workshop ImageU m i i i t a g , pp. 398-410, Morgan andKat&, eds.,1988.A.M. Waxman and J.H. Duncan, Binocular mage flows: Steps towadstemm otion fusion, IEEE T m . Pattem Analysis and Machine Intelli-gence, vol. 8, no. 6, pp.715-729,1986.

Oct. 1-3,1990.

[19] L Li and J.H. Dun&, 3D tr ansl ati onal motion and struchu e from binocularimage lows, IEEE Trans. Patt em Analysis and M a c h n tellig ence.vol.15, no. 7, pp. 657-667, July 1993.[20] M. Tistarelli,E. Gaosso, and G . Sandhi, Dynamic stereo in visual naviga-tion,Proc. Intl Con& Computer Visionand P a Recognition,pp. 186-193, Iihuna, Hawaii, June 1991.I211 E.Gaosso,M. Titarelli, and G. Sandhi, Activedynamic stereo for ~ v i g a -tion, Proc. Sew & European Con$ Computer Vision,pp. 516-525, S . Mar-gherita figure, Italy, May 1992.[22] A. J P. Pu, nd J. Summers, A new developmentin camefa alibm-tion:Calibrating a p%r of mobile cameras, The ntl J. Robotics Research,pp. 104-1 16,1988.[23] Y.LChang, P. hang , and S . Hackwocd, Adaptive self-calibrationof vision-based mbot systems, IEEE Trans. Systems, Man, and Cybernetics, vol. 19,no. 4, JulJAug. 1 989.[24] A.P. Tinunalai, B.G. Schunck, and R.C. Jain, Dynamic stem with self-calibmtion, IEEE T m . u ttem Analysis andMachine Intelligence,vol. 14,no. 12,pp. 1,184-1,189,Dec. 1992.[25] L Matthies and T. Kanade, Usiig uncertaintymodels n visualmotion anddepth eshation,P m . Fourth IntlSymp.Robotics Research, pp. 120-138,Santa CNZ, C alif,, Aug. 9-14.1987 .

[26] R.C. Nelson and J. A loimonos, Wsing flow field divergence for obstacleavoidance in v sua navigati~n,EEE Trans. Patt em Analysisand MachineIntelligence,vol. 11, no. 10, pp. 1,102-1.106, Oct. 1989.[27] D.H. Ballard, R.C. Nelson, and B. Yamauchi, Animate G o & Optics[28] BKP. Hom, obot Vision.Cambridge, Mass.: MlT Press, 1986.[29] P. Puget andT. Skordas, Calibdng a mobile camerq m g e and Vision[30] D.N. Lee and P E.Red dish , plummeting gannets:A paradigmof emlogicaloptics,re , vol. 293, pp. 293-294,1981.[31] J. Cutting, Perception withM Eyefo r Motion. Cambridge,Mass.: lT Press,1988.[32] B.K.P. Hom nd B.G. Schunck, I)etarmntn g optical flow, Arlijicial Intelli-

gence, vol. 17, no. 1-3.pp. 185-204 ,1981.[33] H.H. Nagel, Dim3estmatonof optical flow and of its derivatives,Amifi-cial and Bwlogical Vision Systems, G.A.Ohan and H.H. Nagel, eds., pp.193-224, Spr ingVedag, 1992.[34] H.H. Nagel, Ontheestimationof optical flow: Relations betweendifferenetapproaches and some new results, A r t W ntelligence,vol. 33, pp. 299-324,1987.[35] S.U=, E irosi, A. Veni, and V. Tom, A computational approach tomo?ionpexqtion,Biological Cybernetics,vol. 60,p. 79-81,1988.[36] M. T i nd G. Sandini, Esh ation of depth h m motion using ananttuopomophicVisual sensor,Image and Vision Computing,vol. 8, no. 4,[37] M. Tita~lli,Multiple constraints or optical flow, Proc. Third European

Con& ComputerViswn, pp. 61-70, Stockholm,Sweden, May 1994.[38] B.K.P. Hom, Relative orientation, IntfJ.Computer Vision,,no. 4, pp. 59-78,1990.[39] B. Kam gar-Parsi, Ra ctic al computation of pan and tilt angles in stereo,Tech. Rep. CS-TR-1640, Univ. of MarylandCollege. Park, Md., Mar. 1986.[40] B. Sabata and J.K. Aggarwal a t i o n of motion h m pair of rangeimages:A review, CVGIP, special issue on Jmage Undentanding, vol. 54,no. 3, pp.309-324,Nov. 1991.

NOVS,vol. 15,110.5, p ~ .7-25,1989.

Computing,vol. 8, no. , p ~ .41-348,1990.

p ~ .71-278,1990.

EnricoGrosso received a degree in electronic engi-neering in 1988 and the PhD in computer scienceand electronic engineering in 1993 from the U ni-versity of Genoa.Since 1988, Dr. Grosso has been involved asproject investigator and task manager in variousESPRlT projects funded by the European Commu-nity. During 1990 he visited the National Institute ofScientific Research LFL4 of Grenoble, France. In1992, he was a visiting scientist at the Departmentof Com puter Science of the University of Rochester,New York.Dr.Gross0 is currently an assistant professor in the D epartment of C om-munication, Computer, and Systems Science at the University of Genoa. Hismain research interests cover biological and artificial vision, visuo-motorcoordination, and robotics.Massimo Tistarelli received a degree in electronicengineering and the PhD in com puter science andelectronic engineering (in 1992)from he U niversityof Genoa.Since 1984, Dr. Tistarelli has been working onimage processing and computer vision at the Inte-grated Laboratory for Advanced Robotics of theDepartment of Communication, Computer, andSystems Science of the University of Genoa wherehe is curre ntly an a ssistant professor. In 1 986, hewas a research assistant at the Department of Com-puter Scien ce of Trinity College, Dublin, Ireland, where he was deve loping a

system for the analysis of image data primarily aimed at the investigation oflow-level visual proceses. In 1989, he was a visiting scientist at ThinkingMachines Co. developing parallel algorithms for dynamic image processingon the C onnection Machine.Dr. Tistarellis research interests include robotics, com puter vision(particularly in the area of three-dimensional and dynamic scene analysis),image processing, and artificial intelligence .

Grosso E Articolo 1995 Active

Documents

Transcript of Grosso E Articolo 1995 Active