Carlos Eduardo Scheidegger

cscheid@cs.utah.edu

UID: 00454219

Project 2: Image Mosaicing


In this project, I implemented image mosaicing, a technique for compositing pictures from a single scene taken from different points of view.

Background

Image mosaicing arises often in practice. The simplest case is where acquisition devices can't capture one entire scene, and so several shots are done. These shots have slightly differing points of view, and one wants to compose them into a single coherent image. In this project we don't treat the actual correspondence problem: we assume a set of pairs of fiducials, which will allow us to derive our transformation. Since an image is inherently a two-dimensional object, in order for us to be able to transform these things around we have to assume some things about the changes in the viewpoint. Basically, we need to assume that the object is either far enough from the camera or the object is flat enough that perspective changes are close to irrelevant.

The idea behing image mosaicing is to constrain the possible set of transformations to a family of transformations whose members are easily computable. In this case, we're working with a 3x3 perspective transform matrix of the form

( a0 a1 a2 )
( a3 a4 a5 ) = T
( a6 a7  1 )

Each point is represented by a three-vector (px, py, 1), the standard representation for a 2D point in homogenous coordinates. Since we have the pairs of fiducials, we can write a system of equations of the form

T(p) = p', for all fiducial pairs p and p'

Once we solve this system of equations, we have the transformation from the coordinate system of one image to the other. By cascading transformations, we are able to mosaic any set of images connected by sets of fiducial pairs.

Implementation

Code organization

I've organized my program into a class called CorrespondenceManager that lives in cmanager.{hxx, cxx} and two main programs mosaic (living in mosaic.cxx) and mosvis vis.cxx. Mosaic is the actual program as per project specfications, and mosvis is an interactive visualization tool using OpenGL I coded to be able to quickly see the results of my geometrical matching. mosvis simply uses the OpenGL texture machinery to display the results so that I could make sure I had that part working before moving on to more complex things.

The class CorrespondenceManager exposes methods to access transformation matrices and to traverse the correspondence graph. In order to modularize the code properly , I've written a templated method called dfs that takes an object that simply responds to visit, pre and post. With these, I was able to implement all the traversing I needed, both inside CorrespondenceManager and in mosvis. This is a technique close to the Strategy pattern [2]. CorrespondenceManager also exposes methods to access the actual fiducial pairs and images.

Techniques implemented

Coordinate Transformations: To get the actual transformation between different coordinate systems, I used SVD and the pseudo-inverse. Since I assumed that the fiducial points didn't represent a degenerate transformation (one with NullSpace(T) != { 0 }), I didn't do the numerical trick of zeroing out small singular values in the matrix. I simply inverted the singular value matrix by taking the reciprocals of the numbers and inverted the other two matrices by taking their transposes (since they're guaranteed to be rotation matrices). I used SVD because it allows me to overconstrain the system and then solve it in a least-squares sense. As we'll see, overconstraining is necessary to get decent results.

Bounds: Once I have the transformations, I can figure out the ideal bounds for the whole mosaic by starting with the target image and transforming the corners of the images to the correct space. It's sufficient to transform the corners because T is guaranteed to be affine (line-preserving). I used CorrespondenceManager::dfs to quickly implement this.

Backwards interpolation: With the bounds of the final mosaic image, I can map each pixel in the mosaic space to the individual image spaces to see where I land. This backward mapping eliminates the need for supersampling or other strategies for avoiding holes in the interpolation. In order to speed this up, instead of checking for all the images for each mosaic-space pixel, I get an AABB in the mosaic spacearound each image, and treat them separately. Assuming each individual image is similar in size and smaller than the mosaiced image, this drops tje interpolating complexity from O(W * H * n), where W and H are the mosaiced image width and height, and n is the number of images to O(w * h * n), where w and h are the individual images width and height.

Feathering: To get feathering, I implemented a somewhat non-standard weighting scheme. The weights of the pixels are given by their distance to the edge of the image, in their original space (which is worse than doing it in the mosaic space, but worked fine in practice). The main difference of my scheme to simple weighted averaging is that I tried to be careful to avoid some artifacts that can arise where two different images don't overlap exactly and their pixels have the same weight, as can be seen here:

To avoid this, instead of doing a simple average, I composite the pixels using the following formula:

new_alpha = 1 - (1-old_alpha)(1-alpha)
new_value = (1-alpha)*old_value + alpha*value
The stored alpha indicates the "total" weight of what's stored in the image. This alpha will be used to normalize the image later. Notice that this scheme imposes an ordering of the images. This is actually a good thing, because this means that the borders of the images will get blended, but the "middle" of them won't. Any artifacts of the transformation will then only be visible in the border. Since this border is feathered, it gets hard to spot the seams, which is exactly the effect we are trying to achieve:

Contrast Matching Surprisingly enough, this was the part of the project that gave me the most trouble. It was surprising for me because I didn't expect it to be such a tricky problem to solve. Maybe I missed some simple solution, but I actually tried several of them.

The first idea was to take advantage of the coordinate system transformation machinery and use it to do intensity transformations. I implemented this using policy classes [3], that simply switched between traversing the graph to geometric transformations and traversing it to compute intensity ones. My idea was that I could look at the neighborhood of the fiducial pairs and derive an intensity transformation based solely on their relative intensities. I implemented both a linear transformation of the form i' = a * i + b and a simple translation of the intensity spaces. Both only seemed to work on very simple cases like this one:

No intensity matching

Intensity matching

(Here the dynamic range is compressed to [0,255], but the original intensity-matched .tif has the correct bigger dynamic range)

The main problem with these transformations is that they seem to be too sensitive to point placement. I actually used an average of 30x30 pixels around the fiducial pairs to get the intensity, but this didn't help much. Notice the halos:

Failure to properly match contrast

The technique I ended up using is much more general, and is based on working on the gradient domain [1]. The gist of it is that instead of storing the values of the image, we store computed image gradients, and reintegrate them. Since not every vector field is integrable, we solve this in a least squares sense through the use of a Poisson equation. Unfortunately, this is slow. I had previously implemented a multigrid Poisson solver, but I couldn't get it to work on the arbitrarily shaped domains of the mosaic. So my final implementation uses red-black SOR with gamma=1.8, which was experimentally found to give reasonably good results.

Poisson mosaicing

One issue I found with this technique is that it actually smeared the gradients more than I wanted, and created some high-brightness areas close to the domain boundaries. It's simple enough to manually apply some intensity cropping and gamma correction to the resulting image, but I would have expected it not to happen. This is the final processed image:

In the program I submitted, if you want to run contrast matching on the datasets, pass an additional command line parameter "poisson" after the parameter file name. I decided not to use this by default because it takes too long, and the lack of contrast matching didn't seem too horrible with proper feathering. I should have gotten multigrid to work. I'm actually submitting the code along, but it's not integrated with the rest of the software (class Multigrid_Solver living in mgclass.{cxx,hxx}).

Experiments

The main problem with this algorithm seems to be the quality of the matching. For example, the whole point of not using averaging on the feathering part of the algorithm was to properly masquerade "bad" matches.

The theoretical minimum number of fiducial pairs is 4: each fiducial pair gives two less degrees of freedom. Since the transformation is defined by eight matrix elements (we set the scaling of the z axis to be zero since it is irrelevant for most homogenous coordinate things we're doing here. It also makes the linear system inhomogenous, which is a good thing if we're trying to get away from the trivial solution).

However, using the algorithm with only four pairs is really not practical. Here is one example of what happens with only four points, on a relatively planar image set:

Notice that the feathering helps a lot in hiding the bad parts. If I use my program mosvis to visualize the transformation, where there's no feathering of any kind, we can see much better the bad matching. You can see serious seams on the eastern african coast.

Notice that the "antarctica" label is feathered and still doesn't look too bad. If we were to use simple averaging, all those high-frequency components like the map meridians and parallels would get duplicated, resulting in an ugly mosaic.

Basically, the problem with the minimum number of pairs is that as though there is a unique affine transformation with these 4 pairs, there is very little information about the other points. Not only that, but the further away you are from the convex hull of the fiducial points, the more likely your transformation will give unexpected results, because it is essentially "extrapolating" from missing information. The solution seems to be, in practice, to not only put as many fiducials as possible, but to space them as far apart from each other as possible. This next image uses 11 fiducial pairs, which I experimentally found to be a good number. It seemed that new pairs didn't make much difference after that. This mosaicing is pretty good. I find it hard to spot the seams (you can see some of it around the "framing" for the "The World" lettering).

This is the same transformation visualized with mosvis. You can still see some problems around the Madagascar island, but the meridians match much better, and so does the persian gulf, without compromising the overall match for the parts west of Africa (as we can see in the actual mosaiced image)

The accuracy of the control points only seem to make much of a difference in fiducial sets that are too small. For all my examples, I picked up the fiducials with a resolution of around 5 pixels (basically I needed to zoom out a lot because of my tiny 1024x768 screen), and I expected this to be an issue, but it wasn't the case. I think this is happening because when we're overconstraining the linear system, the matchings are already approximate, so if the error in the measurement in independent, it gets "squelched" by the least-squares approximation. The next two pictures show the same previous mosaics where each coordinate of each fiducial point got perturbed by a uniformly random number between -10 and 10. As expected, the mosaic with many fiducials got less perturbed than the one with few.

11 perturbed fiducials: little difference to original (notice some artifacts on the "antarctica" label)

4 perturbed fiducials: more difference to original
The final image of my report is the complete mosaiced world map, a planar set, in contrast to the panoramic yosemite set:

Conclusion

This project was more challenging than the first one, and I think the ratio of "report work" to "project work" was much better. In the first one, it seemed that most of the time was spent actually doing the writeup. In this one, the task was more challenging and interesting, and required more time. I had quite a bit of fun making these mosaics look good :)

Acknowledgements

The Yosemite set was kindly provided by Steven Callahan, taken at a hiking trail at the Yosemite National Park. The world map set was kindly provided by Adriana Scheidegger, taken at the SCI Institute halls.

References

[1] Patrick Perez, Michel Gangnet, Andrew Blake. Poisson Image Editing. Proceedings of SIGGRAPH 2003.

[2] Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides Design Patterns - Elements of reusable object-oriented software. Addison-Wesley, 1995.

[3] Andrei Alexandrescu. Modern C++ Design - Generic Programming and Design Patterns Applied. Addison-Wesley, 2001.