Project 1 (Colorization)

Kelly Yeh (SID: 3037233460)

Introduction

This project aims to color an image from digitized Prokudin-Gorskii glass plate images. This is done through extracting, preprocessing, stacking, and properly aligning the three provided BGR images atop each other such that its pixels line up perfectly. To extract the BGR channels individually, evenly split up the image into thirds, vertically, and then crop all sides of each image to remove any unnecessary borders. After extraction, there are different approaches to laying and aligning these channels on top of each other.
B Channel
b
G Channel
g
R Channel
r

Exhaustive Search

One way is through exhaustive search. This method naively searches over a window of possible displacements, in my case I used [-15, 15], and chooses the best (x, y) displacement. Evaluating the accuracy of the image matching can be done by either minimizing Euclidean distance (L2 norm) or maximizing normalized cross-correlation (NCC) between pixels. Both methods performed well but L2 was slightly better so I just used the L2 norm. This allowed me to decently align BGR channels for smaller, jpg images.
Cathedral
Cathedral
Green (2, 5)
Red (3, 12)
Monastery
Monastery
Green (2, -3)
Red (2, 3)
Tobolsk
Tobolsk
Green (3, 3)
Red (3, 6)

Image Pyramid

Exhaustive search, however, is too slow on larger, higher resolution tif images. Hence, I had to implement an image pyramid. To build my image pyramid, I started with a high resolution image and recursively downscaled it (by a factor of 2) until the image was smaller than 256 pixels either in height or width; this is the coarsest, blurriest layer. Here, I compute the ideal displacement using L2 norm similarly to exhaustive search, again using a displacement window of [-15, 15] pixels.

A minor difference though, is that I will take this displacement value and multiply its x and y values by 2^scale, in which scale is the number of times we have downscaled. This essentially scales my displacement value back up such that it is relative to the dimensions of the original image, where the displacement in then applied. This will occur on every level up. Moreover, as I work my way back up the the pyramid, each layer will be realigned by the computed displacement value from the previous layer multipled by 2 to account for upscaling. Essentially, I iteratively update the alignment of my BGR channels with every displacement value I discover until I finally reached and aligned the original image. This strategy is employed for all tif images, and does a decent job at aligning all of them. Those that required extra tweaking I improved by cropping a little more as well as leveraging SSIM for the displacement algorithm instead of L2 norm.
Lady
Lady
Green (8, 53)
Red (10, 117)
Church
Church
Green (0, 25)
Red (-4, 58)
Onion Church
Tobolsk
Green (26, 51)
Red (36, 108)
Harvesters
Harvesters
Green (16, 60)
Red (13, 124)
Icon
Icon
Green (17, 40)
Red (23, 89)
Sculpture
Sculpture
Green (11, 33)
Red (-26, 140)
Three Generations
Three Generations
Green (11, 54)
Red (9, 112)
Train
Train
Green (5, 43)
Red (31, 87)
Self Portrait
Self Portrait
Green (28, 78)
Red (29, 153)
Melons
Melons
Green (8, 82)
Red (3, 153)
Emir
Emir
Green (24, 49)
Red (37, 60)