Vision Sensor Planning for 3D Model Acquisition

Citation: Chen, S. Y. and Li, Y. F., "Vision Sensor Planning for 3-D Model Acquisition", IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 35, No. 5, Oct. 2005, pp. 894-904.

* This file is only for search engines. Please obtain the formally published article from the respective publisher or databases.

S. Y. Chen, Member, IEEE, and Y. F. Li, Senior Member, IEEE

Abstract—A novel method is proposed in this paper for automatic acquisition of three-dimensional models of unknown objects by an active vision system, in which the vision sensor is to be moved from one viewpoint to the next around the target to obtain its complete model. In each step, sensing parameters are determined automatically for incrementally building the 3D target models. The method is developed by analyzing the target’s trend surface, which is the regional feature of a surface for describing the global tendency of change. Whilst previous approaches to trend analysis are usually focused on generating polynomial equations for interpreting regression surfaces in three dimensions, this paper proposes a new mathematical model for predicting the unknown area of the object surface. A uniform surface model is established by analyzing the surface curvatures. Furthermore, a criterion is defined to determine the exploration direction and an algorithm is developed for determining the parameters of the next view. Implementation of the method is carried out to validate the proposed method.

Index Terms—vision sensor, viewpoint planning, sensor placement, 3D modeling, trend surface, surface prediction, model acquisition.


SING machine vision to obtain the 3D model of an object is very important to many practical applications. In such tasks, when there is no prior information about the objects or environments, it is desired to automatically generate a multi-viewpoint plan at run-time. The goal of viewpoint decision or sensor placement is to gain knowledge about the unseen portions of the object while satisfying the placement constraints such as focus, field-of-view, occlusion, and collision avoidance. Such a strategy determines each subsequent vantage point and offers the benefit of reducing or eliminating the manual work that otherwise would be required for acquiring an object's surface features. A system without viewpoint planning typically has to utilize as many as tens of range images for a typical task, with significant overlap between them.

The research on actively moving a vision sensor for modeling objects [1] has been active for more than ten years.

Manuscript received November 10, 2004. This work was supported by the Research Grants Council of Hong Kong [Project no CityU1130/03E] and the National Natural Science Foundation of China [NSFC-60405009] and [ZJNSF-Y104185]. This paper was presented in part at the IEEE International Conference on Robotics and Automation, New Orleans, USA, April 2004.

S. Y. Chen is with Zhejiang University of Technology and National Laboratory of Pattern Recognition of CASIA, China (e-mail:

Y. F. Li (*corresponding author) is with the Department of Manufacturing Engineering and Engineering Management, City University of Hong Kong, Hong Kong (e-mail:

Since 1988 when an early attempt was made in this aspect by Cowan and Kovesi [2], many works have been published on sensor placement or viewpoint planning. In the early stage, the work focused mainly on sensor modeling, analysis of sensor's optical and geometrical parameters, and sensor placement constraints. Later, relevant applications were investigated in vision-based inspection and object recognition using CAD models. The "generate-and-test" method and the synthesis method were the major topics explored at that stage.

While optimization is desired for model-based sensor placement [3], it is a problem for planning viewpoints for unknown objects or environment, which is important in many tasks using machine vision. The reported results of research work in this aspect were still far from practical applications. The majority of work carried out in this problem is mainly to find the best views to digitize an object without missing zones, and with a minimum number of views. For example, Papdopoulos-Orfanos and Schmitt [4] utilized a volumetric representation and worked on a solution to the next best view problem for a range scanner with narrow and short viewing field. Their work focused on collision avoidance as the small field of view makes the sensors navigate closely to the object.

Among the previous approaches to the modeling problem, "occlusion" and uncertainty have been strongly associated with viewpoint planning for some time. Kutulakos et al. [5] utilized the changes in the boundary between sensed surface and occlusion with respect to sensor position to recover a shape. A similar histogram-based technique was used by Maver and Bajcsy [6] to find the viewing vector that would illuminate the most edge features derived from occluded regions. Whaite and Ferrie [7] used a sensor model to evaluate the efficacy of the imaging process over a set of discrete orientations by ray-casting: the sensor orientation that would hypothetically best improve the model is selected for the next view. The work by Pito [8] removed the need to ray-cast from every possible sensor location by determining a subset of positions that would improve the current model. An uncertainty driven approach was investigated [9], to maximize the information gain for the next view.

On the next best view (NBV) problem for incremental object modeling, Zha et al. [10] addressed two main issues to determine the next-best-viewpoint: 1) a uniform tessellation of the spherical space and its mapping onto the 2D array; 2) incremental updating computations from evaluating viewpoints as the NBV. Yu et al. [11] proposed to determine the next pose of the range sensor by analyzing the intersections of planar surfaces obtained from the previous images, for the unseen parts of the scene that can be observed most. They set up a sphere around the scene and the next view is selected that gives the largest unseen area. Pito et al. [8], [12] presented a solution for planning a range camera in the process of digitizing unknown parts. Arbel et al. [13] showed how entropy maps could be used to guide an active observer along an optimal trajectory and how a gaze-planning strategy could be formulated by using entropy minimization as a basis for choosing a next best view. In [14], Banta and Abidi described a system in which, ideally, a NBV will reveal the greatest quantity of previously unknown scene information. Reed et al. [15] determined the visibility volume, which is the volume of space within which a sensor has an unobstructed view of a particular target. In [16], [17], the scene was assumed to be composed only of polyhedral objects and cylinders. The technique proposed to solve the NBV problem was a depth-first search algorithm and the strategy ensured the completeness of the scene reconstruction.

Another factor to be considered in viewpoint planning is the constraints that must be satisfied. To this end, two distinct methods have been widely used: the weighted function method and tessellated space approach. The former [15], [18]-

[22] employs a function that combines several components representing the placement constraints. This method is usually used in model-based planning tasks [23]. The latter method [8]-[10], [24], [25] is mainly for object modeling tasks. It tessellates a sphere or cylinder around the object to be modeled, producing a viewpoint space or look-up array [24]. The object surface is partitioned as void surface, seen surface, unseen surface, and uncertain surface. The working space is also partitioned into void volume and viewing volume. An algorithm is then developed to plan a sequence of viewpoints so that the whole object can be sampled. This method is effective for dealing with some small and simple objects, but it is difficult to model a large and complex object, e.g. an environment with many concave areas, as it cannot solve the problems of occlusion constraint.

Therefore, previous efforts were often made on finding the best next views by volumetric analysis or occlusion as a guide. However, since there does not exist any information about the unknown target, it is actually impossible to give the true best next view. Although sensor placement has been studied for more than 15 years and there are a number of relevant publications available, a large part of the bibliography deals with model-based tasks such as for inspection or recognition. The existing work on sensor placement for object modeling is usually implemented with restriction in a specific kind of applications. Further investigation is very necessary for this issue, especially for modeling of large objects or environments.

This paper proposes a new idea of sensor placement in sense of target driven and shape prediction. It involves decision of exploration direction and determination of the next view. The trend surface is used as a cue to predict the unknown portion of an object and the next best viewpoint is determined by the expected surface.

II. INCREMENTAL MODELING AND VIEWPOINT PLANNING In 3-D modeling tasks, an incremental method is normally used for exploring the unknown portion of an object or environment. This results in a sequence of viewing actions for local model acquisition. Each successive sensing gives some new information that is registered and integrated in the global model. During such a process, the first view can be set at an arbitrary pose. A 3-D depth image M0 is then generated and registered as the initial object model O0. Next, a new viewing pose Pi+1 (i=0, 1, …, n) has to be determined so that the modeling process will continue until the whole object is

reconstructed. Let the global model reconstructed by previous i steps (from step 1 to i) be

Mi = Mi(u, v) = (x(u, v), y(u, v), z(u, v)), (1)

where u and v are the coordinates on the digital image. Using a coordinate system whose origin is at the center of the vision sensor with its Z-axis pointing along the viewing direction, the 3-D depth image is represented as:

z = Miw( x, y ), x, y ∈R. (2)

2 P3

1 P4


Figure 1. Multiple viewpoints for incremental modeling

Fig.1 illustrates an example in which five viewpoints are used for acquisition of the structural model, from P1 to P5. The difficulty here is that there is no prior knowledge about the object. Therefore, each successive viewpoint (from P2 to P5) must be determined during the run-time. The key problem of sensor placement in such a task is to determine a most feasible viewpoint called the Next-Best-View or Next-Best-Pose [10], by utilizing the partially known global model that has already been built in the previous steps. Algorithms of sensor placement or viewpoint planning have to be developed in such vision systems for determining the sensor’s position, orientation, and optical settings dynamically. In this paper, the next viewpoint is determined according to the surface trend of the known partial model.


A. Trend Analysis Surface trend describes the global shape of a surface. Trend surface analysis is a global method for processing spatial data. Mathematically, a mapped surface can be separated into two components -that of the trend and the residuals from the trend. The trend is the regional feature of a surface, and the residuals are the local fluctuations of high frequent features (Fig. 2). Trend surface analysis is often used for fitting and interpolating regression surfaces in three dimensions as smoothed representation of area data. It is assumed that the spatial distribution of a particular phenomenon can be represented by some form of continuous surface, usually a defined geometric function. The observed spatial pattern can be regarded as the sum of such a surface and a "random", or

local, term. The surface is a function of two orthogonal coordinate axes which can be represented by

z = f (x , y) + e, (3)

in which the variable z at the point (x, y) is a function of the coordinate axes, plus the error term e. This expression is the generalized form of the General Linear Model (GLM), which is the basis of most trend methods.

The function f(x, y) is usually expanded or approximated by various terms to generate polynomial equations. The principles of trend surface analysis were set out initially by Agocs [26]. To develop complex, smoothed equations for geophysical data by expanding the summation term of the General Linear Model, Krumbein [27] defined the relationship between standard multivariate regression analyses and trend methods. This expansion was performed by incorporating power terms and cross-products of the x and y coordinates. For an n-order three-dimensional surface, the form of the power series is given by

n i

j ij , (4)

f (v u ) = ∑∑v u b


i=0 j=0

where u and v are the coordinates on an arbitrary orthogonal reference system, bij is the constant coefficient of the surface (b00 is the surface base).

The trend part is useful for predicting the unseen part of an object or environment and is thus used for determining the next viewpoint in this research. The residuals (local features)


+ a) an arbitrary surface b) the trend surface c) local residuals

Figure 2. The trend is the regional feature of a surface

do not affect viewpoint planning much, but they should be filtered out during the image processing.

Let a single surface M be split into two parts, M1 and M2,

M = M1 M2.

If the surface M changes smoothly, both the trends of M1 and M2 should be approximately equal to the trend of M, i.e.

Trend(M) ≈Trend(M1) ≈Trend(M2). (5)

Suppose the vision agent has already captured a part of the surface, say M1, but M2 remains unknown. Then by computing the surface trend of M1, the surface shape of M2 can be predicted. In this paper, we will not use (3) or (4) directly as the trend model for surface prediction, since it relies on interpreting regression of the known area. Instead, we will develop a new mathematical model for describing the surface trend, thus emphasizing on the prediction of the unknown area.

With the partially known model, the sensor pose for a next view can be decided according to the known information. Here two steps are used to make this decision. The first step is to determine the exploration direction and the second is to determine the sensor pose according to the surface trend.

B. Exploration Direction

At each successive action in the 3D surface acquisition, only one direction, called exploration direction, can be chosen for planning the next viewpoint. This direction needs to be determined in three steps. The first step is to detect boundary points of the object. It segments the target from the scene and finds the points that are near to unknown area but on the object surface. In the second step, these points are ranked by a rating function. The third step is to select a point with the highest rating to give the exploration direction. The first and third steps can be solved in a straightforward way. For the second step, the rating function is described as

E C Ce


) ( = r , (6) (n +)1 Cn

order where norder is the surface order at point P, E is the expected volume of the unknown area near P. Cr, Cn, and Ce are adjustable coefficients and can be chosen according to application requirements. Cn reflects the importance of new object information and Ce reflects the importance of

C. Surface Prediction

Except for surface edges and object boundaries, since the curvature of a trend surface changes smoothly, the unknown part of object surface can be predicted by analyzing the curvature tendency of the known surface.

For a surface point, there are different curvatures along different directions, although the principal curvatures and Gaussian curvature are the most frequently used ones. To reduce the computation complexity, we may just compute the exploration reliability. Cr is a global scale factor or can be set to be 1. The result R(P) is the rated value for a boundary point


The surface order (i.e. the smoothness of the surface) is used as the main criterion. The reason is that the trend surface can predict the unknown area accurately where the surface has a low order. Fig. 3 illustrates the selection criterion of the exploration direction.

curve curvatures along the exploration direction. Without loss of generality, we can describe the mathematical formula along the horizontal direction (otherwise a coordinate transformation need be performed). Using a vertical sectional plane which is parallel to x-axis, at y = yv, to cut through the 3-D surface, we obtain a surface curve (see Fig.4),


z = f ) (. (8)

v yv

The curvature of this curve is

k(x, yv) = zv" / [1+(zv')2]3/2 . (9)

Let Xk = [x1, x2] be the domain of the known part of the surface curve. To predict the unseen surface, we use a linear regression of x on k and obtain a fitted curve cv for approximating the curvature tendency on the curve zv. Hence,

cv(x) = a x + b, x∈[x1, x3], (10)

where [x1, x3] is the whole domain including both the known and unknown area, i.e. [x1, x3] = [x1, x2] ∪ [x2, x3]. The two parameters a and b are determined by fitting the known part of surface curve, i.e.,

Figure 3. Exploration of object surface

The surface order is defined according to (4) with the same fitting error. To avoid computation of surface fitting, we can approximately compute the integral value of the curvatures in a small area, i.e.

order ( u, v ) ≈ ∫∫kmin (y x )dxdy , (7)



y x S (v u )

where S(u, v) is the neighborhood area of point (u, v). The area size of domain S is dependent on the sensor’s Field-of-View (FOV), e.g. with 20×20 mm2 or 30×30 pixels. kmin(x, y) is the curvature at point (x, y) along a specific direction.



It is only necessary to compute the surface orders in the










k(y x )

y x xk )



x + x




, (11)

areas near to the boundary of the known surface. The surface

a =


(x x2)


order in the center area of the known model does not affect the

exploration direction. After the minimum surface order is and


obtained, i.e. nmin = min{norder (u, v)}, the exploration direction


2 x12 /( ] ) x x1 . (12)





k(y x )









is set to be along the outer direction to the unknown area.



unseen surface

z = Mw (x, yv)

The curvature in the unseen area is expected to be:

known surface



ax +b, k(x, y ) < kmax , x[x2, x4],

v (13)

c =


k , k(x, y ) k




where x2< x4 < x3 is a domain for satisfying the constraint that the object surface will be in the FOV of the sensor. kmax is a given a threshold to limit the maximum curvature. It can be set a value in the range of [1, 20]/Sensing_Distance (1/mm). Then the surface curve in the unseen part of the object will be a solution of the following equation:

x2x4 x3 Figure 4. A curve on the sectional plane

||z"|| / [1+(z')2]3/2 -cv =0. (14) Calculating the centroid, the position of the reference point The solution of this differential equation is: (i.e. the new scene center) is obtained:


i+1, z

i+1 = (x

i+1), (20)


+ 2bx + 2C1)

i+1, y


< (k b)/ , (15)


± =

, x

dx C





o Li + 1 o Li + 1

xi+ 1 = , yi+ 1 = ,





Li 1+

Li + 1




± = k 2 − (x C3) + C , x ≥ (k b)/ a , (16)

max max





where Ci (i = 1, 2, 3, 4) are differential constants which can be zoi + 1 = Li + 1



Li + 1

Now the sensor position and the viewing direction can be determined. To achieve the maximum viewing angle (i.e. the

angle between the viewing direction and the surface tangent) for minimizing the reconstruction uncertainty, the viewing direction is chosen to be the inverse of the average normal on the predicted surface, that is,


N (y x )dxdy ijk)(y x )dxdy

i+1 = -= -dxdy dxdy

∫ ∫∫

= (-µ i+1i, -ν i+1j, -κ i+1k), (21)

where µ (x, y) =f , ν (y x ) =f , κ (y x ) − = 1 , and N(x, y)

∂ x , y , is the surface normal on point (x, y, z). The sensor position Pi+1 = (xpi+1, ypi+1, zpi+1) for the next viewpoint is planned as a solution of the following set of equations:

determined according to the boundary conditions, such as z(x2)

= z2 and z'(x2) = z2'. The sign "+" or "-" can also be determined by the known part of the surface curve (convex or concave). Since the predicted curve is based on the analysis of the tendency of known area, it is called the trend curve.


A. Placement Parameters

In a sensor placement problem, we need to specify the sensor’s placement parameters [3]. The placement parameters here include the sensor's position (xp, yp, zp) and orientation (α , β , γ ). The placement constraints can include visibility, focus, field of view, viewing angle, resolution, overlap, occlusion, and some operational constraints such as kinematic reachability of the sensor pose and robot-environment collision. Let the resolution constraint be

2(r = 2(xp x2) + [zp x z 2, y )]2 tan( )/ N < rmax , (17)



where N is the pixel number on a scanning line of the digital

image and Ω

is the angle of view of the sensor. Equation (17)

describes one pixel on the image corresponding to how much

size on the actual object surface.

To satisfy the constraints of sensor placement on resolution










+ 1





i+ 1 Pi+ 1 ||2 || Vi+ 1 ||2

op , (22)

i+ 1 yi+ 1 = −ν i+ 1

i+ 1 Pi+ 1 ||2 || Vi+ 1 ||2


|| and FOV, the parameter x4 in (13) is determined by a

searching algorithm. Then the mid-point of a trend curve is:



i+ || −= c , c





+ 1





tan 2


Qyv = (xv , yv , zmv ), zmv = f (x , y ), x = x2 + x4 . (18)

vv v


By moving the sectional plane to different positions, in the where ccmp is a positive constant for compensating the depth value range, is the sensor's Angle-of-View,


, and ||Oi+1 -Pi+1||2 is the

+ν i+ 12 +κ i+ 1

domain of Y fov < y < Yfov , we get a series of surface trend || V 1 ||




+ 1


curves. Connecting the mid-point of each such curve, we distance between Oi+1 and Pi+1. obtain a new curve:

B. Placement Algorithm

i+1 = L(Qyv), Yfov < y < Yfov . (19)

v The algorithm for the active sensor placement is summarized as below:

where L is the curve function and (i+1) denotes the next view pose.

Algorithm 1 The algorithm for determining placement parameters

Step 1. According the 3D surface or partial model obtained from the first view or previous views, find the boundary positions for determining the exploration directions;

Step 2. For each candidate position, compute its surface order and the expected unknown volume. Then select an exploration direction which is determined by the rating function (7).

Step 3. With the exploration direction, compute the surface

trend or curve trend at the selected position. Step 4. Determine the constants a and b by (11) and (12). Step 5. Determine Ci and the sign of z in (15) and (16). Step 6. Determine x4 by Algorithm 1; Step 7. Determine the mid-point for each trend curve by (18); Step 8. Determine the scene center by (20); Step 9. Determine the looking direction by (21); Step 10. Determine the eye position by (22);

Using this method, the sensor orientation is determined in such a way that it views perpendicularly downward the predicted object surface. The distance to the object is determined so that it enables acquisition of the maximum volume of unknown surface while satisfying some constraints such as resolution and FOV. Finally, the placement parameters of the vision sensor is given as a vector:



i +1 i+1

i+1 =(xiP +1, yi+1, zi +1, κ ν µ i+1) , (23) This placement vector is based on the local coordinate system. It needs to be converted to the world coordinate system by multiplying a transformation matrix.

C. The Modeling Process

Figure 5. The modeling process

Finally the iterative modeling process is illustrated in Fig. 5, in which the symbols Si (i=1, 2, ..., 6) represent the states of:

  1. S : Acquisition of a view

  2. S : Reconstruction of the 3-D local model

  3. S : Registration and fusion with the global model

  4. S : Model analysis and completion condition checking

  5. S : Ranking for selection of exploration directions

  6. S : Computing trend surface and determining next view

  7. S : Moving the robot to the new viewpoint.

Since surface data sets (triangular meshes or point clouds) are from different viewpoints, they are necessary to register and fuse with the global model in S3. A split and merge algorithm is used to perform such a registration and fusion process. The termination condition in S4 is determined according to a model-completion criterion. In implementation, the known partial model surface is tessellated into a set of small regions. The unknown hole is defined which is the area without data but is neither in known empty volumes nor pointing to unreachable space. An algorithm then detects the unknown holes (uncertain areas) near a small region and estimates their sizes. If the diameter of an unknown hole is larger than certain value dh (e.g. 10 mm) which is application dependent, a new

viewpoint should be planned. Otherwise, the reconstruction process will terminate.


It should be noted that the surface predicted by the trend is only the possible surface shape. The resulting error is analyzed here under the assumption that the actual surface is composed of typical 1st or 2nd order curves. We first define a prediction error. The absolute prediction error is defined as the difference between the expected surface and the actual object surface, i.e.,

(, ,

E = ∫∫|| y x z ) −zexp(y x || ) dxdy , (24)

Mi +1

where Mi+1 is the domain for the scene to be observed by the next view. A relative error is defined as:

x z , y) − zexp(x, y)


Er = || || dxdy ×%100 . (25)


x z , y)

Mi +1

Since the shape of a surface curve is computed on the sectional plane, we only need to analyze errors on a 2dimentional coordinate system for the 1st and 2nd order curves, i.e., straight lines, circular curves, parabolas, elliptical curves, and hyperbolas. These curves are widely used in practical industrial design and most object surfaces can be decomposed to these curves. It is not necessary to consider the original position of a curve because it does not affect the curvature of the object surface. Therefore we always let the original point of a curve be (0, 0).

For a straight line, e.g. z = c1x + c2, we have a=0 and b=0 according to (11) and (12). Referring to (15), the predicted curve function zp(x) is found to be the same as the actual one. Therefore the error to predict a straight line is

line = xx 24|| zz || dx = 0.

exp 2

Similarly, for a circular curve z = R x2 , we can also get that Ecircle 0. This means that there is no error between the predicted surface and the actual surface if the object surface is composed of circular curves or linear curves. Therefore we can accurately place the vision sensor for the next view pose to perceive maximum information. Example objects include cuboids and balls at arbitrary poses, and cylinders and cones at restricted poses.

The typical 2nd order curves are parabola: z = A0x2, elliptic: z = 11A02 x2, B0


and hyperbola: 1 1 + A0 x2.

z =


Since (11) is non-integratable when a≠ 0 and b ≠ 0 , for these three curves, a 2nd order Taylor polynomial function is used to approximate it. That is:

' '2 3


z (x = ∆ + zp ) ( + z ' p ) ( + ∆ 1 zp ) ( + ∆ R3



2 where,

(ax 2 + 2bx + 2C1) , and


z =



4 (ax + 2bx + 2C1)2 '' (8 ax + b).

z =

p 22

4 [ (ax + 2bx + 2C1 ] ) 3/2


In practical implementation, the term R3 is neglected as


<< ∆1 and R3 ≈ ∆ 0 . Therefore the predicted curve can be calculated numerically. The variables a and b are determined according to (11) and (12). The constant C1 is also determined with the boundary condition zp' (x2)=z'(x2), i.e.,


C1 = z (x2) 1 ax22 bx2 . (26)


[z (x2)]2 + 12

Then the absolute errors and relative errors of these curves are determined by averaging the difference between the actual z values and predicted z values. Simulation has been performed to observe the results of the absolute and relative errors of a parabola, elliptic, hyperbola, and a circle for a, b, C1, assuming A0=1 and B0=0.5. As a verification of the algorithm, the results show that the prediction error is always zero if the actual curve is a part of a circle, with the results of a = 0, b = 1, C1 = 0, and Ecircle = 0. The results also showed that the errors were very small (relative error was at the order of 10-7) for predicting other 2nd order curves. Therefore we can accurately place the vision sensor for the next view pose to observe objects composed of such surfaces, when using the expected curve predicted by the uniform surface trend model (15). However, for each different curve, the constants (a, b, and C1) are different and they should be dynamically determined according to the known part of the curves.


A. Implementation Considerations

To implement the proposed method in a practical system, we have to consider many other factors and conditions. In fact, this method has to be integrated with other techniques and algorithms for automating the modeling process, as illustrated in Fig. 5. This condition makes difficult in evaluation of the proposed method and currently we can only provide some simple examples with a little human interference during the modeling process.

One problem has to be noted is the noise filtering. Since the curvature on the object surface is sensitive to noise and the local features of a 3D image do not affect the surface trend much, a low-pass filter is applied to the image so that we can obtain a smoothed surface. Therefore, for the trend which describes the shape of a considerable large area, it is actually not sensitive to noise. We are also considering some image processing methods to filter noise and compute the shape in a more stable way.

Equations (11) and (12) are described as continuous functions. When applied to digital image processing, they can be written as,

m mm

mk x i ki

i xi

= a 1= i 1= i 1= i , (11')
m m
22 )(∑∑ ii xxm
1= i 1= i
m m
mxakb ii /)( −= . (12')
i 1= i 1=

where m is the number of total points on the known part of the object surface.

Another problem is on calculation of curvatures. The curvature of each point on the known part of object surface should be calculated so that the surface trend can be determined for predicting the unknown part. Equation (6) is not suitable for computing the curvature on a digital image because the errors in computing z' and z" will be significant. In this research, we determine the curvature of a point P(x(i), z(i)) by using three adjacent points (triplet), i.e. P(x(i-1), z(i-1)), P(x(i), z(i)), and P(x(i+1), z(i+1)). Each triplet defines a circle and the curvature k(i) is the inverse of its radius.

B. Practical Examples

Several experiments were carried in our laboratory for construction of object models. The range data are obtained by a structured light system set up in this research, which mainly consists of a projector and a camera. The projector is a palm-sized digital projector, PLUS U3-1080. It is connected to a computer and is controlled to generate some gray-encoded stripe-light patterns for 3D reconstruction [28]. The CCD camera (PULNIX TMC-9700) has a 1-inch sensor and a 25 mm lens. A fixture is designed for mounting the structured light system on the end-effector of a 6DOF robot (STAUBLI RX-90B) with ±0.02 mm repeatability. This enables the 3D sensor to be freely moved to an arbitrary position in the workspace.

Figure 6. The objects to be reconstructed

Fig. 6 illustrates two objects for the model construction in the experiment. In both cases, we set the resolution constraint to be rmax=0.85mm. The first one is demonstrated with the procedure in more details here. It was incrementally built by four views. The first view is assumed to be taken from the top view. To determine a next view for acquiring some unseen information of the object, we used trend surface method and developed a program to compute the expected surface curves. Then the trend is computed and the next viewpoint is determined.

The experimental results in the incremental construction of the first object are shown below to illustrate the computation at each step. A new surface was acquired at each view and it was integrated with the existing ones to form a partial model. The exploration direction and sensor placement were determined by the proposed method. The placement parameters of each view are set with a viewpoint vector in the format of [xyz ai bj ck], representing the six parameters of the 3D position and orientation. Also a registration vector contains six parameters for the surface integration so that the range images are transformed and registered to a common coordinate system. The first three of the six parameters define the surface's reference point and the other three define the surface orientation by 3-axis rotation. In this way, the model is improved by eliminating the overlapped/redundant points and stitching non-data areas.

Figure 7. The 3D surface obtained from the first view

The corresponding placement parameters of the first view (Fig. 7) are

Viewpoint1 = (0, 0, 457.9333, 0, 0,-1)

Reg1 = Oglobal = (0, 0, 0, 0, 0, 0) where the viewpoint vector has a format of [xyz ai bj ck], representing the six parameters of the 3D position and orientation. Regi contains some registration parameters for the surface integration so that the range images are transformed and registered to a common coordinate system. The first three of the six parameters define the surface's reference point and the other three define the surface orientation by 3-axis rotation. In this way, the model is improved by eliminating the overlapped/redundant points and stitching non-data areas.

Figure 8. The points were detected as the candidates of exploration directions. The decision was made by their rating values.

Known surface curve



Predictive trend





Ground plane



-20 -80-60-40-20 0 20 40 60 80

Figure 9. An predictive trend curve

Along the exploration direction, a trend curve (the dashed red curve in Fig. 9) is computed from the known 3D surface. The space under the curve and ground plane is marked as unreachable.

The sensor position and looking direction are planned according the predicted curves. The corresponding placement parameters are:

Set_scene_center = [63.9950, -23.0346, 40.4942]. Viewpoint2 = (502.9636, -26.2364, 158.2319, -0.965838, 0.00704474, -0.259052).

Reg2 = (-72.06,0.00,40.10, 32.01,-89.89,-30.80).

Similar to the situation in the first step, candidate points are selected to find an exploration direction. Along this direction, several trend curves are computed to predict the unknown part of the surface.









-70 -60 -50 -40 -30 -20 -10 0 10 20

Figure 14. One of the trend curve as in Fig. 13. Since the original curve is circular, the prediction is very accurate.

The sensor pose is also determined according the predicted

curves. The corresponding placement parameters are: Set_scene_center = [-8.4859 60.0199 33.7807] Viewpoint3 = (21.7862, 366.7046, -228.8904, -0.0747589,

-0.757378, 0.648683) Reg3 = (30.21,-116.07,40.02,89.83,31.10,-89.92)

With the model in Fig. 16, it is necessary to take further viewpoint decision and surface acquisition. These were done similarly to the previous steps. The sensor parameters for the fourth view are determined as (Fig. 17):

Set_scene_center = [16.3405, -10.1124, 35.0603]

Viewpoint4 = (15.8901, 347.64, -251.11, 0.0009831, -0.780902, 0.624653) Reg4 = (-116.12,-120.01,43.03,-90.00,22.60,89.31)

Finally the complete model was obtained by integrating all the four views. Their relative distribution to the object model in 3D space is given in Fig. 18. Also Fig. 19 shows the other object with five viewpoints. 3D Animation of the results is available on our web: In our experiments, the implementation was performed on a PC with 750 MHz CPU. The 3D surface acquisition was achieved by a unique color encoding method. The computation time for planning a next viewpoint was about three to five seconds. It should be noted that the planning results are dependent on the first view. By different initial viewpoints, the results are also different and there may be one or two more views planned for each object.

C. Discussion

For surface edges and object boundaries, as the curvature around these areas changes abruptly, there will be significant errors in computing the surface trend. For example, if there is surface edge located in the seen area, the domain of known part for determining constants a and b should be restricted to a smaller area that does not contain any edges. The exploration direction may also need to be changed by analyzing the seen object surface.

When modeling a small-sized object (compared with the sensor’s field-of-view), the trend surface method may not work very well since the whole object will be contained in a single view and the surface trend has a high order. That makes it unreliable to predict the unknown area. In such cases, a sensor based solution [8] instead of target-driven method may be used for the modeling task.

The basic idea of this research is to find any possible cues of object shape for predicting unknown areas. The surface trend is an option that is suitable for many objects. Here, the trend may not be computed by all area of the known part. We only choose a suitable surface part usually without object boundary or edges inside, so that the trend is reasonable and predictable. For example, for a polyhedral object, the surface trend will be a single surface plane. Only when the image boundary is on an edge of polygonal plane, the next view can not be reasonably planned. This will certainly cause a not good viewpoint, but rarely often happen in practice. Many cases are avoided by our selection method of exploration direction. Of course, we don't expect only trend method is enough for completion of the whole modeling task. We have to integrate many other methods together so that the whole modeling process is executed in an "adaptive" way. That is also why we currently have difficulties to provide some complicated examples that are fully automatically performed.

VII. CONCLUSION In this paper, we proposed a trend surface model which was used to predict the unseen part of objects or environments. In this way, the next viewpoint can be determined for on-line acquisition of the range data until the whole structure of the object or environment is reconstructed. The trend surfaces and trend curves were computed from the curvature tendency. While determining the next viewpoint, multiple sensor placement constraints were considered in this paper, such as resolution, field-of-view, and viewing angle. Such a trend model can accurately predict the unknown surface of the object if the surface is composed of 1st-order and 2nd-order curves and surfaces. Experiments were carried out to demonstrate the method presented in this paper. Our future work in the research will deal with reliable detection of the boundary positions on a complex partial model, so that they will be evaluated for selecting a best

candidate of exploration direction. Other uncertain conditions will also be considered to make to the reconstruction process more reliable so that an autonomous robot system can work without any human interference.


[1] P. K. Allen, I. Stamos, etc., “3D Modeling of Historic Sites using Range and Image Data”, ICRA 2003, Sep. 14-19, 2003 Tapei, pp. 145-150.

[2] C. K. Cowan and P. D. Kovesi, "Automatic sensor placement from vision task requirements", IEEE Trans. on PAMI, Vol. 10, No. 3, May 1988, pp. 407-416.

[3] S. Y. Chen and Y. F. Li, "Automatic Sensor Placement for Model-Based Robot Vision", IEEE Trans. on Systems, Man and Cybernetics, Part B, Vol. 33, No. 1, 2004, pp. 393-408.

[4] D. Papadopoulos-Orfanos and F. Schmitt, "Automatic 3-D digitization using a laser rangefinder with a small field of view", Int. Conf. on Recent Advances in 3-D Digital Imaging and Modeling, Ottawa, Ontario, Canada, May 1997, pp. 60-67.

[5] K. N. Kutulakos and C.R. Dyer, "Global surface reconstruction by purposive control of observer motion", IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Seattle, 1994, pp. 331-338.

[6] J. Maver and R. Bajcsy, "Occlusions as a guide for planning the next view", IEEE Trans. on PAMI, Vol. 15, No. 5, May 1993, pp. 417 -433.

[7] P. Whaite and F.P. Ferrie, "Autonomous exploration: driven by uncertainty", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 19, No.3, March 1997, pp. 193-205.

[8] R. Pito, "A solution to the next best view problem for automated surface acquisition", IEEE Trans. on PAMI, Vol. 21 (10), 1999, pp. 1016–30.

[9] Y. F. Li and Z. Liu, Uncertainty-Driven Viewpoint Planning for 3D Object Measurements, Proc. IEEE International Conference on Robotics and Automation, Taipei, Taiwan, Sept. 2003, pp.127-132.

[10] H. Zha, K. Morooka, T. Hasegawa, T. Nagata, "Active modeling of 3-D objects: planning on the next best pose (NBP) for acquiring range images", Int. Conf. on Recent Advances in 3-D Digital Imaging and Modeling, Ottawa, Ontario, Canada, May 1997, pp. 68 –75.

[11] Z. Yu, K. Wang and R. G. Yang, "Next best view of range sensor", IEEE 22nd Int. Conf. on Industrial Electronics, Control, and Instrumentation, vol. 1, Taipei, Taiwan, 1996, pp. 185-188.

[12] R. Pito, "A sensor-based solution to the "next best view" problem", 13th Int. Conf. on Pattern Recognition, vol. 1, Vienna, Austria, Aug. 1996, pp. 941-945.

[13] T. Arbel and F. P. Ferrie, "Viewpoint selection by navigation through entropy maps", Seventh IEEE Int. Conf. on Computer Vision, vol. 1, Kerkyra, Greece, 1999, pp. 248-254.

[14] J. E. Banta and M.A. Abidi, "Autonomous placement of a range sensor for acquisition of optimal 3-D models", 22nd IEEE Int. Conf. on Industrial Electronics, Control, and Instrumentation, Taipei, Taiwan, Aug. 1996, vol. 3, pp. 1583-1588.

[15] M. Reed, P. Allen, "Constraint-based sensor planning for scene modeling", IEEE Int. Symposium on Computational Intelligence in Robotics and Automation, Monterey, California, 1999, pp. 131-136.

[16] E. Marchand and F. Chaumette, "Active sensor placement for complete scene reconstruction and exploration", IEEE Int. Conf. Robotics and Automation, Albuquerque, USA, April 1997, pp. 743-750.

[17] E. Marchand and F. Chaumette, "Active vision for complete scene reconstruction and exploration", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 1, Jan. 1999, pp. 65-72.

[18] K. A. Tarabanis, R. Y. Tsai, P. K. Allen, "The MVP sensor planning system for robotic vision tasks", IEEE Trans. on Robotics and Automation, Vol. 11, No. 1, 1995, pp. 72-85.

[19] B. Triggs, C. Laugier, "Automatic camera placement for robot vision tasks", IEEE Int. Conf. on Robotics and Automation, Nagoya, 1995, pp. 1732-1737.

[20] X. G. Gu, M. M. Marefat, F. W. Ciarallo, "A robust approach for sensor placement in automated vision dimensional inspection", IEEE Int. Conf. on Robotics and Automation, Detroit, May 1999, pp. 2602-2606.

[21] C. K. Cowan, A. Bergman, "Determining the camera and light source location for a visual task", IEEE Int. Conf. on RA, vol. 1, Scottsdale, 1989, pp. 509-514.

[22] K. A. Tarabanis, P. K. Allen, R. Y. Tsai; "A survey of sensor planning in computer vision", IEEE Trans. on Robotics and Automation, Vol. 11, No.1, 1995, pp. 86-104.

[23] E. Trucco, M. Umasuthan, A. M. Wallace, V. Roberto, "Model-based planning of optimal sensor placements for inspection", IEEE Trans. on RA, Vol. 13, No.2, 1997, pp. 182-194.

[24] K. Morooka, H. Zha, T. Hasegawa, "Computations on a spherical view space for efficient planning of viewpoints in 3-D object modeling", 2nd Int. Conf. on 3-D Digital Imaging and Modeling, Ottawa, Canada, 1999, pp. 138-147.

[25] L. M. Wong, C. Dumont, M. A. Abidi, "Next best view system in a 3-D object modeling task", IEEE Int. Symposium on Computational Intelligence in Robotics and Automation, Monterey, California, 1999, pp. 306 -311.

[26] W. B. Agocs, "Least squares residual anomaly determination", Geophysics, Vol. 16, 1951, pp. 686-696.

[27] W. C. Krumbein, "The 'sorting out' of geological variables illustrated by regression analysis of factors controlling beach firmness", J. of Sedimentary Petrology, Vol. 29, 1959, pp. 575-587.

[28] Li, Y. F. and Chen, S. Y., "Automatic Recalibration of an Active Structured Light Vision System", IEEE Trans. on Robotics and Automation, Vol. 19, No.2, 2003, pp. 259-268.

[29] S. Y. Chen and Y. F. Li, "Active Viewpoint Planning for Model Construction", IEEE International Conference on Robotics and Automation, New Orleans, LA, USA, April, 2004, pp. 4411-4416.

[30] M. K. Reed, P. K. Allen, I. Stamos, "3-D modeling from range imagery: an incremental method with a planning component", Int. Conf. on Recent Advances in 3-D Digital Imaging and Modeling, Ottawa, Ontario, Canada, May 1997, pp. 76-83.

[31] M. Lozano, M. Devy, J.M. Sanchiz, "Perception Planning for an Exploration Task of a 3D Environment", 16th Int. Conf. on Pattern Recognition (ICPR 2002), Vol. III, Québec City, Canada, August 2002, pp. 704-707.

[32] W.R Scott, G. Roth, and J.F. Rivest, View planning for automated three-dimensional object reconstruction and inspection. ACM Computing Surveys, Vol. 35, 2003, pp.64-96.

A. Georgiev and P. K. Allen, “Localization Methods for a Mobile Robot in Urban Environments”, IEEE Trans. on Robotics and Automation, V. 20, N. 5, Oct. 2004, pp. 851-864.

Y. Chen (M’01) received the Ph.D. degree from the Department of Manufacturing Engineering and Engineering Management, City University of Hong Kong, Hong Kong, in 2003.

He joined Zhejiang University of Technology, China, in Jan. 2004 where he is currently an Associate Professor in the Department of Information Engineering. Since July 2004, he has been invited as a guest researcher in the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. His research interests include computer vision, robotics, 3-D object modeling, and medical image analysis.

Y. F. Li (M’91–SM’01) received the Ph.D. degree from the Robotics Research Group, Department of Engineering Science, University of Oxford, Oxford, U.K., in 1993.

From 1993 to 1995, he was a Postdoctoral Research Associate in the Artificial Intelligence and Robotics Research Group, Department of Computer Science, University of Wales, Aberystwyth, U.K. He joined City University of Hong Kong, Hong Kong, in 1995 where he is currently an Associate Professor in the Department of Manufacturing Engineering and Engineering Management. His research interests include robot vision, sensing, and sensor-based control for robotics.