Lecture 11 Image Processing 2017 1 with Notes.Key

Image analysis

CS/CME/BioE/Biophys/BMI 279 Oct. 31 and Nov. 2, 2017 Ron Dror

1 Outline

• Images in molecular and cellular biology • Reducing image noise – Mean and Gaussian filters – Frequency domain interpretation – Median filter • Sharpening images • Image description and classification • Practical image processing tips

2 Images in molecular and cellular biology

3 Most of what we know about the structure of cells come from imaging • Light microscopy, including fluorescence microscopy Fluorescence microscopy is where you label particular proteins or other https:// targets with a molecule that glows, www.microscopyu.com/ articles/livecellimaging/ so you can distinguish that specific livecellmaintenance.html target from everything else in the image.

• Electron microscopy http:// blog.library.gsu.edu/ wp-content/uploads/ 2010/11/mtdna.jpg

In fact, some techniques we’ll discuss later on 4 require you to do lots of computation BEFORE you can even see the picture. Imaging is pervasive in structural biology

• The experimental techniques used to determine macromolecular (e.g., protein) structure also depend on imaging

X-ray crystallography Single-particle cryo-electron microscopy

https://askabiologist.asu.edu/sites/default/ 5 files/resources/articles/crystal_clear/ http://debkellylab.org/?page_id=94 CuZn_SOD_C2_x-ray_diffraction_250.jpg Computation plays an essential role in these imaging-based techniques

• We will start with analysis of microscopy data, because it’s closest to our everyday experience with images • In fact, the basic image analysis techniques we’ll cover initially also apply to normal photographs

6 Representations of an image • Recall that we can think of a grayscale image as: – A function of two variables (x and y) y – A two-dimensional array of brightness values – A matrix (of brightness values) • A color image can be treated as: x – Three separate images, one for each color channel (red, green, blue) – A function that returns three values (red, green, blue) for each (x, y) pair

By the way, why do we use three color channels? because we only have three kinds of cones in our eyes that detect color, so three channels are sufficient to describe the colors we perceive.

Whereas Einstein proposed the concept of mass-energy equivalence, 7 Professor Mihail Nazarov, 66, proposed monkey-Einstein equivalence. From http://www.nydailynews.com/news/world/baby-monkey-albert-einstein-article-1.1192365 Reducing image noise

8 Experimentally determined images are always corrupted by noise • “Noise” means any deviation from what the image would ideally look like

http://www.siox.org/pics/pferd-noisy.jpg 9 Image noise There are many different kinds of noise, which depends on how you’re making the Noisy images measurement. Two common kinds are “gaussian noise” or “salt and pepper noise” “Gaussian noise”: normally Original image distributed noise added to each pixel

“Salt and peper noise”: How would you de-noise images like this? For random pixels each pixel, maybe average the values of nearby replaced by very pixels. bright or dark values 10 Reducing image noise Mean and Gaussian filters

11 How can we reduce the noise in an image? • The simplest way is to use a “mean filter” – Replace each pixel by the average of a set of pixels surrounding it – For example, a 3x3 mean filter replaces each pixel with the average of a 3x3 square of pixels – This is equivalent to convolving the image with a 3x3 matrix:

⎡ 1 1 1 ⎤ Or you could use 5x5 matrix, or larger. ⎢ 9 9 9 ⎥ ⎡ 1 1 1 ⎤ 4x4 is trickier because then you can’t ⎢ ⎥ center it around the target pixel. 1 1 1 1 ⎢ 1 1 1 ⎥ ⎢ 9 9 9 ⎥ = 9 ⎢ ⎥ 1 1 1 For corner/edge pixels, just average ⎢ 1 1 1 ⎥ ⎣⎢ ⎦⎥ the existing surrounding pixels and ⎣⎢ 9 9 9 ⎦⎥ ignore missing ones. Why 1/9? Values should sum to 1, so that overall brightness of the 12 image remains constant After the mean filter, noise is lower but images are blurrier. Blurring comes from averaging at the boundaries between two colors. As the filter size becomes bigger, you reduce the noise more because you’re averaging over more pixels, but you also blur the image more. If you know that your image will consist of regions of solid colors, you can do fancier things that detect the edge locations. But you can’t do this for many images, such as for images of cells. Mean filter

Original images

Result of 3x3 mean filter

A larger filter (e.g., 5x5, 7x7) would further reduce noise, but would blur the image more A better choice: use a smoother filter

• We can achieve a better tradeoff between noise reduction and distortion of the noise-free image by convolving the image with a smoother function • One common choice is a (two-dimensional) Gaussian

Rather than choosing exact filter size (3x3, 5x5, etc.), choose the Gaussian standard 14 deviation σ (e.g. σ = 1 pixel, 2 pixels, etc.) Gaussian filter

Original images

Result of Gaussian filter (standard deviation σ = 1 pixel)

Using a larger Gaussian would further reduce noise, but would blur the image more Mean filter (for comparison)

Original images

Filtered images

16 Gaussian filter (for comparison)

Original images

Filtered images

17 Reducing image noise Frequency domain interpretation

18 Low-pass filtering

• Because the mean and Gaussian filters are convolutions, we can express them as multiplications in the frequency domain • Both types of filters reduce high frequencies while preserving low frequencies. They are thus known as low-pass filters • These filters work because real images have mostly low-frequency content, while noise tends to have a lot of high-frequency content In other words, most of the low frequency information in the noisy image comes from the real image, while most of the high frequency information in the noisy image comes from the noise. So if you reduce high frequencies in the image, you can preferentially reduce the noise. This is like re-weighting the frequency components so that the low frequency signal stays strong but the higher frequency signal is weakened. This 19 is best done in Fourier space. Low-pass filtering Gaussian filter Mean filter

Filter in real domain

Use a Fourier transform to go from the top images (real domain) to the bottom images (frequency domain). Note: Fourier transform of a Gaussian is another Gaussian Magnitude profile in frequency domain (low frequencies are near center of plots)

The Gaussian filter eliminates high frequencies more effectively than the 20 mean filter, making the Gaussian filter better by most measures. Low-pass filtering

• As a filter becomes larger (wider), its Fourier- domain representation becomes narrower • In other words, making a mean or Gaussian filter larger will make it more low-pass (i.e, narrow the range of frequencies it passes) – Thus it will eliminate noise better, but blur the original image more

21 Reducing image noise Median filter

22 Median filter

• A median filter ranks the pixels in a square surrounding the pixel of interest, and picks the middle value • This is particularly effective at eliminating noise that corrupts only a few pixel values (e.g., salt- and-pepper noise) • This filter is not a convolution

23 Median filter

Original images

Result of 3x3 median filter

Using a larger window would further reduce noise, but would distort the image more Sharpening images

25 High-pass filter

• A high-pass filter removes (or reduces) low-frequency components of an image, but not high-frequency ones • The simplest way to create a high-pass filter is to subtract a low- pass filtered image from the original image – This removes the low frequencies but preserves the high ones – The filter matrix itself can be computed by subtracting a low-pass filter matrix (that sums to 1) from an “identity” filter matrix (all zeros except for a 1 in the central pixel) Note: the grey color now corresponds to 0. Black/white values show up at changes (edges) in the original image Original image Low-pass filtered image High-pass filtered image

– =

26 How might one use a high-pass filter? • To highlight edges in the image • To remove any “background” brightness that varies smoothly across the image You don’t want to actually filter out low-frequency stuff, because it contains the • Image sharpening most important information. – To sharpen the image, one can add a high-pass filtered version of the image (multiplied by a fractional scaling factor) to the original image – This increases the high-frequency content relative to low-frequency content – In photography, this is called “unsharp masking” Mild sharpening Stronger sharpening Original image (small scale factor) (larger scale factor)

https://upload.wikimedia.org/wikipedia/commons/0/0b/Accutance_example.png Image sharpening — another example

Original image Sharpened image

http://www.exelisvis.com/docs/html/images/unsharp_mask_ex1.gif Image description and classification

29 Describing images concisely

• The space of all possible images is very large. To fully describe an N-by-N pixel grayscale image, we need to specify N2 pixel values. • We can thus think of a single image as a point in an N2- dimensional space. • Classifying and analyzing images becomes easier if we can describe them (even approximately) with fewer values. • For many classes of images, we can capture most of the variation from image to image using a small number of values – This allows us to think of the images as points in a lower- dimensional space – We’ll examine one common approach: Principal Components Analysis. e.g. pictures of your face from different angles and with different lighting: their actual pixel values can be 30 dramatically different but they’re visually similar in many ways to our eyes. How can we capture this similarity by representing images in a lower-dimensional space? Principal component analysis (PCA) • Basic idea: given a set of points in a multi- dimensional space, we wish to find the linear subspace (line, plane, etc.) that best fits those points.

The idea of PCA is to find the linear subspaces First principal component that best fit your high dimensional data and then project your data onto those subspaces. These subspaces are the “principal components” — the directions in which your data varies the most. This gives you a way to represent your data in much fewer dimensions Second principal component while keeping the distinct points as separate as possible. Keeping the points as separate as possible means that your representation still captures as much of the original information as possible.

• If we want to specify a point x with just one number (instead of two), we can specify the closest point to x on the line described by the first principal component (i.e., project x onto that line). • In a higher dimensional space, we might specify the point lying closest 31 to x on a plane specified by the first two principal components. Principal component analysis (PCA) • How do we pick the principal components? – First subtract off the mean value of the points, so that the points are centered around the origin. – The first principal component is chosen to minimize the sum squared distances of the points to the line it specifies. This is equivalent to picking the line that maximizes the variance of the full set of points after projection onto that line. Maximizing variance after projection = keeping the points as spread out as possible after projecting onto the line. – The kth principal component is calculated the same way, but required to be orthogonal to previous principal components PCA is sometimes called SVD (singular value decomposition) First principal component How many principal components should you use to summarize a dataset? It’s a bit of a judgement call. This analysis will also tell you how much of the data is explained by each principal component — by looking at how much variance is left unaccounted for — and often you will see that 80-90% of Second principal component variance is captured by the first 3 to 8 principal components 32 Example: face recognition • A popular face recognition algorithm relies on PCA – Take a large set of face images (centered, frontal views) – Calculate the first few principal components – Approximate each new face image as a sum of these first few principal components (each multiplied by some coefficient). – Classify faces by comparing these coefficients to those of the original face images When a face picture is represented as points in the principal component space, pictures of the same person’s face will tend to cluster together, allowing you to identify people’s faces. Each principal component is a vector in the same high dimensional image space, so it can be viewed as an image too. Original images Principal components (viewed as images)

hp://www.pages.drexel.edu/~sis26/Eigenface%20Tutorial.htm Practical image processing tips

34 Practical image processing tips (Thanks to Leo Kesselman)

• When viewing an image, you might need to scale all the intensity values up or down so that it doesn’t appear all black or all white • You might also need to adjust the “gamma correction” to get the image to look right – This applies a nonlinear (but monotonically increasing) function to each pixel value This comes from the fact that the same grey value will look different on different monitors or TV screens, so there are standard ways to adjust the gamma value to correct for this.

35 You’re not responsible for this material, but you may find it useful Next quarter: CS/CME/Biophys/BMI 371 “Computational biology in four dimensions”

• I’m teaching a course next quarter that complements this one • Similar topic area, but with a focus on current cutting-edge research – Focus is on reading, presentation, discussion, and critique of published papers