Image mosaicing arises often in practice. The simplest case is where acquisition devices can't capture one entire scene, and so several shots are done. These shots have slightly differing points of view, and one wants to compose them into a single coherent image. In this project we don't treat the actual correspondence problem: we assume a set of pairs of fiducials, which will allow us to derive our transformation. Since an image is inherently a two-dimensional object, in order for us to be able to transform these things around we have to assume some things about the changes in the viewpoint. Basically, we need to assume that the object is either far enough from the camera or the object is flat enough that perspective changes are close to irrelevant.
The idea behing image mosaicing is to constrain the possible set of transformations to a family of transformations whose members are easily computable. In this case, we're working with a 3x3 perspective transform matrix of the form
( a0 a1 a2 ) ( a3 a4 a5 ) = T ( a6 a7 1 )
Each point is represented by a three-vector (px, py, 1), the standard representation for a 2D point in homogenous coordinates. Since we have the pairs of fiducials, we can write a system of equations of the form
T(p) = p', for all fiducial pairs p and p'
Once we solve this system of equations, we have the transformation from the coordinate system of one image to the other. By cascading transformations, we are able to mosaic any set of images connected by sets of fiducial pairs.
I've organized my program into a class called CorrespondenceManager that lives in cmanager.{hxx, cxx} and two main programs mosaic (living in mosaic.cxx) and mosvis vis.cxx. Mosaic is the actual program as per project specfications, and mosvis is an interactive visualization tool using OpenGL I coded to be able to quickly see the results of my geometrical matching. mosvis simply uses the OpenGL texture machinery to display the results so that I could make sure I had that part working before moving on to more complex things.
The class CorrespondenceManager exposes methods to access transformation matrices and to traverse the correspondence graph. In order to modularize the code properly , I've written a templated method called dfs that takes an object that simply responds to visit, pre and post. With these, I was able to implement all the traversing I needed, both inside CorrespondenceManager and in mosvis. This is a technique close to the Strategy pattern [2]. CorrespondenceManager also exposes methods to access the actual fiducial pairs and images.
Coordinate Transformations: To get the actual transformation between different coordinate systems, I used SVD and the pseudo-inverse. Since I assumed that the fiducial points didn't represent a degenerate transformation (one with NullSpace(T) != { 0 }), I didn't do the numerical trick of zeroing out small singular values in the matrix. I simply inverted the singular value matrix by taking the reciprocals of the numbers and inverted the other two matrices by taking their transposes (since they're guaranteed to be rotation matrices). I used SVD because it allows me to overconstrain the system and then solve it in a least-squares sense. As we'll see, overconstraining is necessary to get decent results.
Bounds: Once I have the transformations, I can figure out the ideal bounds for the whole mosaic by starting with the target image and transforming the corners of the images to the correct space. It's sufficient to transform the corners because T is guaranteed to be affine (line-preserving). I used CorrespondenceManager::dfs to quickly implement this.
Backwards interpolation: With the bounds of the final mosaic image, I can map each pixel in the mosaic space to the individual image spaces to see where I land. This backward mapping eliminates the need for supersampling or other strategies for avoiding holes in the interpolation. In order to speed this up, instead of checking for all the images for each mosaic-space pixel, I get an AABB in the mosaic spacearound each image, and treat them separately. Assuming each individual image is similar in size and smaller than the mosaiced image, this drops tje interpolating complexity from O(W * H * n), where W and H are the mosaiced image width and height, and n is the number of images to O(w * h * n), where w and h are the individual images width and height.
Feathering: To get feathering, I implemented a somewhat non-standard weighting scheme. The weights of the pixels are given by their distance to the edge of the image, in their original space (which is worse than doing it in the mosaic space, but worked fine in practice). The main difference of my scheme to simple weighted averaging is that I tried to be careful to avoid some artifacts that can arise where two different images don't overlap exactly and their pixels have the same weight, as can be seen here:
To avoid this, instead of doing a simple average, I composite the pixels using the following formula:
new_alpha = 1 - (1-old_alpha)(1-alpha) new_value = (1-alpha)*old_value + alpha*valueThe stored alpha indicates the "total" weight of what's stored in the image. This alpha will be used to normalize the image later. Notice that this scheme imposes an ordering of the images. This is actually a good thing, because this means that the borders of the images will get blended, but the "middle" of them won't. Any artifacts of the transformation will then only be visible in the border. Since this border is feathered, it gets hard to spot the seams, which is exactly the effect we are trying to achieve:
Contrast Matching Surprisingly enough, this was the part of the project that gave me the most trouble. It was surprising for me because I didn't expect it to be such a tricky problem to solve. Maybe I missed some simple solution, but I actually tried several of them.
The first idea was to take advantage of the coordinate system transformation machinery and use it to do intensity transformations. I implemented this using policy classes [3], that simply switched between traversing the graph to geometric transformations and traversing it to compute intensity ones. My idea was that I could look at the neighborhood of the fiducial pairs and derive an intensity transformation based solely on their relative intensities. I implemented both a linear transformation of the form i' = a * i + b and a simple translation of the intensity spaces. Both only seemed to work on very simple cases like this one:
(Here the dynamic range is compressed to [0,255], but the original intensity-matched .tif has the correct bigger dynamic range)
The main problem with these transformations is that they seem to be too sensitive to point placement. I actually used an average of 30x30 pixels around the fiducial pairs to get the intensity, but this didn't help much. Notice the halos:
The technique I ended up using is much more general, and is based on working on the gradient domain [1]. The gist of it is that instead of storing the values of the image, we store computed image gradients, and reintegrate them. Since not every vector field is integrable, we solve this in a least squares sense through the use of a Poisson equation. Unfortunately, this is slow. I had previously implemented a multigrid Poisson solver, but I couldn't get it to work on the arbitrarily shaped domains of the mosaic. So my final implementation uses red-black SOR with gamma=1.8, which was experimentally found to give reasonably good results.
One issue I found with this technique is that it actually smeared the gradients more than I wanted, and created some high-brightness areas close to the domain boundaries. It's simple enough to manually apply some intensity cropping and gamma correction to the resulting image, but I would have expected it not to happen. This is the final processed image: