Project 1 (Colorization)
Kelly Yeh (SID: 3037233460)
Introduction
This project aims to color an image from digitized Prokudin-Gorskii glass plate images.
This is done through extracting, preprocessing, stacking, and properly aligning the three
provided BGR images atop each other such that its pixels line up perfectly. To extract the
BGR channels individually, evenly split up the image into thirds, vertically, and then crop
all sides of each image to remove any unnecessary borders. After extraction, there are
different approaches to laying and aligning these channels on top of each other.
B Channel
G Channel
R Channel
Exhaustive Search
One way is through exhaustive search. This method naively searches over a window of
possible displacements, in my case I used [-15, 15], and chooses the best (x, y)
displacement. Evaluating the accuracy of the image matching can be done by either
minimizing Euclidean distance (L2 norm) or maximizing normalized cross-correlation
(NCC) between pixels. Both methods performed well but L2 was slightly better so I
just used the L2 norm. This allowed me to decently align BGR channels for smaller,
jpg images.
Cathedral
Green (2, 5)
Red (3, 12)
Monastery
Green (2, -3)
Red (2, 3)
Tobolsk
Green (3, 3)
Red (3, 6)
Image Pyramid
Exhaustive search, however, is too slow on larger, higher resolution tif images. Hence,
I had to implement an image pyramid. To build my image pyramid, I started with a
high resolution image and recursively downscaled it (by a factor of 2) until the image
was smaller than 256 pixels either in height or width; this is the coarsest, blurriest
layer. Here, I compute the ideal displacement using L2 norm similarly to exhaustive search,
again using a displacement window of [-15, 15] pixels.
A minor difference though, is that I will take this displacement value and multiply its x
and y values by 2^scale, in which scale is the number of times we have downscaled. This
essentially scales my displacement value back up such that it is relative to the dimensions
of the original image, where the displacement in then applied. This will occur on every level
up. Moreover, as I work my way back up the the pyramid, each layer will be realigned by the
computed displacement value from the previous layer multipled by 2 to account for upscaling.
Essentially, I iteratively update the alignment of my BGR channels with every displacement
value I discover until I finally reached and aligned the original image.
This strategy is employed for all tif images, and does a decent job at aligning all of them.
Those that required extra tweaking I improved by cropping a little more as well as leveraging
SSIM for the displacement algorithm instead of L2 norm.
Lady
Green (8, 53)
Red (10, 117)
Church
Green (0, 25)
Red (-4, 58)
Onion Church
Green (26, 51)
Red (36, 108)
Harvesters
Green (16, 60)
Red (13, 124)
Icon
Green (17, 40)
Red (23, 89)
Sculpture
Green (11, 33)
Red (-26, 140)
Three Generations
Green (11, 54)
Red (9, 112)
Train
Green (5, 43)
Red (31, 87)
Self Portrait
Green (28, 78)
Red (29, 153)
Melons
Green (8, 82)
Red (3, 153)
Emir
Green (24, 49)
Red (37, 60)