Center for Mapping




Performance Of The Automated Image Sequence Processing

 

To assess the feasibility of automated line extraction with 3D positioning and consequently its real-time realization, a rich set of the potential image processing functions was developed in a standard C++ programming environment. Figure 1 shows the overall dataflow and processing steps, which will be illustrated in more detail later. In short, the real-time image processing is feasible due to a simple sensor geometry and the limited complexity of the imagery collected.

 

                                                    

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 1. Real-time image processing and post-processing workflow

 

First, a single down-looking camera acquires consecutive images with about 50% overlap, which cover only the road surface to the side of the vehicle. Then centerlines are extracted from the images, followed by feature point extraction around the centerline area. Finally, the feature points are matched to build a strip from the images. This matching process is greatly facilitated by the simultaneous availability of navigation data; basically the change in position and attitude between two image captures – relative orientation – has a dramatic impact on the search time for conjugate entities on image pairs since the usually two-dimensional search space is reduced to one dimension.

 

Color Space Transformation

 

Traffic signs, centerlines and the like have distinct colors to draw attention of drivers in an unambiguous way. Therefore, color images are preferred over monochromatic ones. Figure 2 illustrates various cases, including an extreme situation where the yellow solid lines are hardly visible in the B/W image.

 

 


 


Figure 2. Centerlines of various qualities.

 

 

Although there are many image processing algorithms working with various color data (multi-channel gray-scale imagery), the great majority of the core functions work only on simple monochrome image data. Therefore, a color space conversion is desirable. In other words, moving from the 3D color space into one dimension, a color direction, which shows the best possible separation for the objects we want to distinguish. For example, moving from RGB to IHS (Intensity, Hue, Saturation) space can effectively decouple intensity and color information. After some experiments, it was decided to use an RGB to S transformation, which is illustrated in Figure 3 under various road conditions. Obviously, dealing with one channel has a major benefit for the real-time implementation.

 


 


Figure 3. Test images and their RGB-to-S transformed representations.

 

 

Centerline Extraction

 

After the RGB to S transformation and filtering, the geometry of the centerlines is extracted from the binary images. Under the given dimensions of the mapping vehicle and the geometry of the camera, the centerlines, in transportation terms, are showing up in the images as multi-pixel-width lines (20-30 pixels wide), usually referred to as raster line in the vision community. For a raster line, its centerline is of primary interest. In literature, similar terms such as skeleton, or medial line (medial axis) are used. Skeleton is more like a one-pixel-width line, while centerline can be used to express a one-pixel-width line in both raster and vector lines. The mathematical definition of the centerlines of a raster line varies depending on the algorithms used to generate the centerline. In thinning algorithms, the skeleton is a collection of such pixels, which has more than one nearest neighbor to the boundary of a raster line. The skeleton is then extracted by shrinking the raster line from its boundary in all directions until the one-pixel-width eight-connected line remains. In the medial axis transformation method, the discrete medial axis pixels are the local maximum of a transformation value.

 

For centerline extraction, we selected a scan line-oriented method, which is simple and executes faster than most other algorithms. In this one-pass process, the computation is linear to the number of pixels on a raster line. Taking advantage of the known centerline direction, the optimal scan line direction, which is perpendicular to the centerline, can be easily achieved for most situations. During the scanning, the pixels along a scan line, which is a small segment of an image column, are processed in a top-to-down fashion. A robust recursive filtering technique can eliminate noise such as gaps, although most of the gaps and grey-scale irregularities already have been removed during the color space transformation, as well as provide segmentation for multiple centerlines such as double solid lines. Figure 4 depicts the results of this processing step.

 

Once boundary points are extracted, a line-following routine can generate the boundary lines, which are subject of further cleaning such as removing irregularities by applying geometrical constraints. In the final step, the midpoints are computed and the centerline is extracted as shown in Figure 5.

 

 

 

 

 

 

 

 

 

 

Figure 4. Centerline boundary points extracted.

 

 

 

 

 

 

 

 

 

 

 

Figure 5. Automatically extracted centerlines.

 

 

Feature Point Extraction

 

To achieve the highest accuracy possible, the 3-dimensional centerline positions must be obtained from stereo imagery. Knowing the camera orientation, both interior and exterior, and the matching (identical) entities between the 2-dimensional centerlines, the 3-dimensional centerline position can be easily computed. Since the external orientation is provided by the navigation data, and similarly the interior orientation can be determined by a priori calibration, the primary task is reduced to finding conjugate points or features in overlapping images. Since centerlines are subject to shift invariance, they cannot be used directly for matching purposes. There are a number of methods used for image matching, including feature-based and area-based techniques. Because of the special condition of the object space – near parallel planes – a simple correlation-based area method seems to be adequate for this purpose. For matching image primitives, feature points are considered.

 


Feature points correspond to high curvature or high gray level variety points. For simplicity, a corner detector was selected to extract feature points, which is based on the following operator:

 


I denotes the smoothing operation on the gray level image I(x, y). Ix and Iy indicate the x and y directional derivatives respectively. Figure 6 depicts feature points extracted around the centerline region from overlapping images.

 

 

 

 

 

 

 

 

Figure 6. Feature points extracted.

 

 

Matching Feature Points

 


The matching of the feature points is accomplished through correlation. The search space is constrained by the availability of epipolar geometry. For a given feature point s, a correlation window of size n´m is centered at its location in the first image, at point s1. Then a search window around the approximated location of the same object point s in the second image is selected, point s2, and the correlation operation is performed along the epipolar line. The search window size and location are determined by navigation data. The correlation score is defined as

 


Where Ik(uk, vk) is the average at point (uk, vk) of Ik (k=1, 2), and l is a normalizing factor so that the score ranges from –1, for two correlation windows which are not similar at all, to 1, for two correlation windows which are identical. A point in the first image may be paired to several points in the second image. Several techniques exist for resolving the matching ambiguities. Due to the special case scenario of a near planar object surface, a 6-parameter affine transformation provides an adequate geometrical relation between two images. Therefore, by calculating the affine transformation parameters by conjugate points, straightforward blunder detection can be used effectively for disambiguating matches and removing outliers.

 

Strip Formation

 

After determining the transformation parameters between consecutive images, the centerline segments are connected and an approximate centerline can be incrementally formed. However, the final coordinates of centerline can be computed only in post-processing mode once the final navigation data have become available. To illustrate the fit between images, an image strip was built by transforming five consecutive images into the same frame as shown in Figure 7.

 

 

 

 

 

 

 


Figure 7. Automatically formed image strip.

 

 

 

 

Introduction| System design| Hardware Implementation| Performance of the Automated Image Sequence Processing| Positioning Performance