TEXT LINE EXTRACTION USING SEAM CARVING

A Thesis

Presented to

The Graduate Faculty of The University of Akron

In Partial Fulfillment

of the Requirement for the Degree

Master of Science

Christopher Stoll

May, 2015 TEXT LINE EXTRACTION USING SEAM CARVING

Christopher Stoll

Thesis

Approved: Accepted:

______Advisor Dean of the College Dr. Zhong-Hui Duan Dr. Chand Midha

______Faculty Reader Interim Dean of the Graduate School Dr. Chien-Chung Chan Dr. Rex Ramsier

______Faculty Reader Date Dr. Yingcai Xiao

______Department Chair Dr. Timothy Norfolk

ii ABSTRACT

Optical character recognition (OCR) is a well researched area of computer science; presently there are numerous commercial and open source applications which can perform OCR, albeit with varying levels of accuracy. Since the process of performing OCR requires extracting text from images, it should follow that text line extraction is also a well researched area. And indeed there are many methods to extract text from images of scanned documents, the process known in that field as document analysis and recognition. However, the existing text extraction techniques were largely devised to feed existing character recognition techniques. Since this work was originally conceived from the perspective of computer vision and pattern recognition and pattern analysis and machine intelligence, a new approach seemed necessary to meet the new objectives.

Out of that need an apparently novel approach to text extraction was devised which relies upon the central idea behind seam carving. Text images are examined for seams, but rather than removing the lowest energy seams, they are evaluated to determine where text is located within the image. The approach can be run iteratively, alternating the direction of seam evaluation, to recognize

iii increasingly specific areas of the image. The ultimate goal is to create an algorithm which can provide data to a new type of recognition algorithm, one that can understand information at higher levels of the document structure and possibly use information gained from those higher levels (words or phrases) to increase the accuracy of identifying data in the lower levels (characters).

This paper explores the existing methods used for text extraction, touching upon existing OCR techniques, and then describes a novel technique for information extraction based upon seam carving. Modifications needed to adapt the seam carving process to the new problem domain are explained. Then, two output methods, direct area detection and information masking, are described. Finally, potential modifications to the technique, which could make it suitable for use in other domains such as general computer vision or bioinformatics, are discussed.

iv ACKNOWLEDGEMENTS

First, I would like to thank Dr. Zhong-Hui Duan for making this work possible. Her guidance on this project was invaluable and, perhaps more importantly, I would have never even seen the possibilities without the knowledge I gained from Dr. Duan’s instruction. I would also like to thank my committee members, Dr. Chien-Chung Chan and Dr. Yingcai Xiao, for the time they took reviewing my work and providing valuable feedback.

Additionally, I would like to thank Dr. Kathy J. Liszka, who helped an unlikely candidate get a chance to succeed in a Computer Science program. I would also like to acknowledge the contributions of Dr. Michael L. Collard and

Dr. Timothy W. O’Neil, who influenced the methodologies and technical implementations used in this project.

Finally, I would like to thank my wife Heather, my boys, my family, and my friends. Without their support, occasional diversions, and encouragement I would have never been able to see this project through to completion.

Namaste.

v

TABLE OF CONTENTS

Page

LIST OF FIGURES...... x

LIST OF EQUATIONS...... xii

CHAPTER I

I.INTRODUCTION INTRODUCTION...... 1

EXISTINGII. EXISTING APPROACHES APPROACHES...... 5

Text Extraction Methods...... 5

Run-length Smearing...... 6

X-Y Cut...... 8

Docstrum...... 9

Whitespace Analysis...... 11

Voronoi...... 12

Text Extraction Software...... 13

Cuneiform...... 14

OCRopus...... 14

vi ...... 15

OCRFeeder...... 15

III.PROPOSED PROPOSED APPROACH APPROACH...... 17

Seam Carving...... 18

Preprocessing Functions...... 20

Energy Functions...... 21

Simple Gradient...... 21

Sobel Edge Detection...... 22

Laplacian of Gaussians...... 22

Difference of Gaussians...... 23

Difference of Gaussians with Sobel...... 24

Seam Traversal...... 25

Text Extraction Steps...... 27

Direct Area Detection...... 30

Information Masking...... 31

Method Comparison...... 32

Overcoming Limitations...... 34

vii Improved Skew Handling...... 35

Complicated Layouts...... 36

Extracting Finer Details...... 37

Asymptotic Analysis...... 38

Related Work...... 40

Future Work...... 41

CONCLUSIONSIV. CONCLUSIONS...... 46

REFERENCES...... 49

APPENDICES...... 52

PROGRAMAPPENDIX A:MAKEFILE PROGRAM...... MAKEFILE 53

PROGRAMAPPENDIX MAINB: PROGRAM — SC. MAIN...... — SC.C 54

SEAMAPPENDIX CARVING C: SEAM FUNCTIONS CARVING — FUNCTIONS LIBSEAMCARVE.C — LIBSEAMCARVE.C...... 57

PNGAPPENDIX IMPORT D: FUNCTIONSPNG IMPORT — FUNCTIONS LIBPNGHELPER.C — LIBPNGHELPER.C...... 74

IMAGEAPPENDIX RESIZE E: IMAGE FUNCTIONS RESIZE — FUNCTIONS LIBRESIZE.C —...... ILBRESIZE.C 78

IMAGEAPPENDIX BINARIZATION F: IMAGE BINARIZATION — LIBBINARIZATION.C — LIBBINARIZATION.C...... 80

IMAGEAPPENDIX ENERGY G: IMAGE FUNCTIONS ENERGY — FUNCTIONS LIBENERGIES.C — LIBENERGIES.C...... 82

PIXELAPPENDIX DATA H: STRUCTURE PIXEL DATA — STRUCTURE PIXEL.H...... — PIXEL.H 89

viii WINDOWAPPENDIX DATA I: WINDOW STRUCTURE DATA —STRUCTURE WINDOW.H —...... WINDOW.H 90

UTILITIESAPPENDIX — J: LIBMINMAX.CUTILITIES — LIBMINMAX.C...... 92

ix

LIST OF FIGURES

Figure Page

Figure1.1: Run-length 1.1: Run-length Smearing Smearing Example………………………………………………….. Example...... 7

Figure1.2: X-Y 1.2: Cut X-Y Example Cut Example 1………………………………………………………………… 1...... 7

Figure1.2: X-Y 1.3: Cut X-Y Example Cut Example 2………………………………………………………………… 2...... 9

Figure1.4: Docstrum 1.4: Docstrum Example……………………………………………………………….. Example...... 11

Figure1.5: Whitespace 1.5: Whitespace Analysis Analysis Example…………………………………………………… Example...... 12

Figure2.1: Seam 2.1: Deviation Seam Deviation Example………………………………………………………… Example...... 18

Figure2.2: Edge 2.2: Profiles………………………………………………………………………. Edge Profiles...... 22

Figure2.3: Example 2.3: Example Gaussian Gaussian Filter Kernel…………………………………………………. Filter Kernel...... 23

Figure2.4: Energy 2.4: Energy Function Function Examples………………………………………………………. Examples...... 24

Figure2.5: Seam 2.5: “Shadow” Seam “Shadow” Example………………………………………………………… Example...... 26

Figure2.6: Seam 2.6: Path Seam Example……………………………………………………………….. Path Example...... 28

Figure2.7: Direct 2.6: AreaDirect Detection…………………………………………………………….. Area Detection...... 30

Figure2.8: Information 2.8: Information Masking……………………………………………………………… Masking...... 31

Figure2.9: Comparing 2.9: Comparing Direct AreaDirect Detection Area Detection and Information and Information Masking………………… Masking...... 33

x Figure2.10: Skewed 2.10: Skewed Text Comparison………………………………………………………. Text Comparison...... 34

Figure2.11: Seams 2.11: IdentifiedSeams Identified in Previously in Previously Identified Identified Areas……………………………… Areas...... 38

Figure2.12: Asymptotic 2.12: Asymptotic Analysis…………………………………………………………….. Analysis...... 39

Figure2.13: Image 2.13: OverviewImage Overview Extraction…………………………………………………….. Extraction...... 43

Figure2.14: Experimental 2.14: Experimental Application Application Toward Toward Protein Protein Differentiation………………….. Differentiation...... 44

xi

LIST OF EQUATIONS

Equation Page

2.1:Equation Format 2.1: Seam Formal Definition Seam Definition and Example………………………………………….. and Example...... 19

2.2:Equation Simple 2.2: Gradient Simple Formula…………………………………………………………Gradient Formula...... 21

2.3:Equation Formal 2.3: Seam Formal Pixel Seam Value Pixel Definition…………………………………………….. Value Definition...... 25

2.4:Equation Formal 2.4: Seam Formal Pixel Seam Value Pixel Definition Value Definition New……………………………………… New...... 27

2.5:Equation Formal 2.5: Definition Formal Definition of Net Deviation…………………………………………….. of Net Deviation...... 29

xii CHAPTER I

INTRODUCTION

While researching the possible application of the eigenface technique [4, 5] towards optical character recognition (OCR) it was discovered that the technique only worked well when the candidate characters, the characters to be recognized, were scaled and centered exactly as the characters which were used for training.

The original eigenface technique works well due to the fact that candidate images (faces of unknown people) can be easily scaled and rotated by using features which every face has — eyes, noses, and mouthes. Unlike faces, characters do not share fixed points of reference that can be used to scale and rotate them into a standard form; a robust text line extraction method is required.

When simple text extraction approaches, such as X-Y Cut or run-length smearing, are used for segmentation, additional post-processing steps and heuristics are required to get a clean, standardized version of each character.

When more sophisticated text extraction approaches are used, fewer post- processing steps are required, but the approaches are inherently more complex and still require properly processed inputs. With the ultimate goal of principal

1 component analysis based character recognition in mind, it seems worthwhile to consider novel approaches to text line extraction which are uniquely suited that purpose.

Since the ostensible benefits of this proposed text line extraction approach are largely premised upon the end goal of developing a novel approach to character recognition, it seems appropriate to briefly discuss the possible merits of investigating a novel approach to character recognition. Optical character recognition is a well researched area, and commercial applications which can accurately provide this functionality are widely available. However, the existing applications are focused on a single task; their usual goal is to take a scanned document and turn it into more useful textual data. The goal of principal component analysis based character recognition, which will hopefully be enabled by the novel text extraction methods described here, is to provide higher levels of understanding. Rather than identifying characters and then forming words based upon statistics, it will identify known words directly and only delve into character recognition when a word is not recognized. Instead of “sounding out” every word (the technique taught to children who are learning to read English whereby each letter is pronounced in hopes of identifying the word based upon its sound), only unrecognized words will be examined for individual characters.

These objectives further support the necessity of a new approach to text

2 extraction, one which can extract lines, phrases, and words as easily as it can individual characters.

Given the desired outcomes, characteristics of an ideal algorithm were considered. In general terms, the ideal text extraction approach would be robust, yet not overly complicated; it would be an elegant solution. Ideally a single algorithm could be implemented which could extract information from an image at different levels (lines, words, or characters). However, given the potential diversity of the source images, finding a unified approach seemed unlikely, at least until a seam carving based approach [6] is considered.

It is thus proposed that to achieve the desired results extracting text from scanned documents a seam carving approach be used. With an iterative seam carving approach it should be possible to segment the image, split out the lines of text, and split out the characters from the detected lines. The early iterations identify the text blocks, top and bottom of the characters are identified during the horizontal line splitting step, and the left and right of the characters are identified during the vertical character splitting step. Given the precise boundaries of the candidate characters, they can be projected onto a vector with a preset number of dimensions, and that vector can be used for image recognition. If this approach where to be viable it could further eliminate the need for specific deskewing, connected component analysis, and other preprocessing algorithms. The

3 orientation of the relevant seams could be determined either at the line level or at the character level.

This paper will begin by exploring existing text extraction methods. In addition to examining published text extraction techniques, methods used in open source OCR programs ill also be reviewed. Then, after providing an explanation of seam carving which pays special attention to the energy functions, a novel approach to text line extraction based upon seam carving will will be described. Modifications to the seam carving technique, which are necessary to apply seam caving to text line recognition, will be discussed. Then, two variations on the approach, direct area detection and information masking, will be described. Direct area detection attempts to directly locate the boundaries of text lines, while information masking provides a mask under which information is located. Finally, future work and possible alternative applications will be explored.

4 CHAPTER II

EXISTING APPROACHES

Before we go into more details regarding the proposed implementation, existing approaches to the problem should first be discussed. Popular text extraction and image segmentation algorithms will first be examined, then the segmentation portions of popular OCR programs will be examined. In order to

find the most successful techniques for extracting text from images, the most successful OCR programs will be considered; they must, necessarily, use an effective method for extracting text and isolating characters. Since commercial

OCR programs are close-sourced and cannot be examined, only open-source projects will be considered.

Text Extraction Methods

There are five image segmentation algorithms that appear frequently in academic papers, these are: run-length smearing (sometimes referred to as Run-

Length Smoothing Algorithms or RLSA), X-Y Cut (sometimes called XY-Cut),

5 Docstrum, whitespace analysis, and Voronoi. A brief explanation of each will be given.

Run-length Smearing

The run-length smearing approach [7] is a relatively old and simplistic approach to image segmentation for text recognition. The algorithm works by smearing a binarized version of the image vertically and horizontally so that individual components (characters, words, lines, paragraphs) are merged into black blobs (Figure 1.1). The original paper by Wong, Casey, and Wahl describes a smoothing method whereby gaps between black pixels are closed on a row-by- row and column-by-column basis when the gap is less than a tolerance level

(which is relative to the length of the row or column). The vertical and horizontal smearing are AND-ed together and then either smoothed or blurred. The resulting black areas are analyzed for: height, rectangular eccentricity, the percentage of coverage, and the average width of black portions of the area. The data for each block is entered into a table and heuristics are applied to identify areas which likely contain text. The method used by Wong et.al. to extract text characters from the areas involves splitting the likely text lines and performing pattern matching; an alphabet is built based upon which shapes are seen.

The run-length smearing algorithm requires that the image first be binarized and deskewed, but due to the text mask analysis it does not need to

6 have non-text items removed. The method for isolating individual characters is based upon matching other patterns within the document and using a trial-and- error approach to character splitting. The pattern matching technique described by Wong et.al. is very similar to the concept of using the eigenface technique, it even feeds back into the segmentation algorithm. Their technique can also be used to remove non-text from the images.

Figure 1.1: Run-length Smearing Figure 1.2: X-Y Cut Example 1 Example (Kevin Laven, University of Toronto) (Joost van Beusekom, Technische Universitat Kaiserslautern)

7 X-Y Cut

The X-Y Cut algorithm [8, 16] takes vertical and horizontal histograms of pixels in the scanned text image; the outputs of this process are sometimes referred to as projection profile cuts. The method described by Nagy and Seth [8] creates histograms which simply show the number of pixels in a given row or column within the image, while the method described by Ha, Haralick, and

Phillips [16] creates histograms which show the number of bounding areas in a given row or column. Minimums in the histogram, which represent white areas in the document, are then identified (Figure 1.2). The approach of Nagy and Seth just requires a binarized and deskewed image, but the approach of Ha et.al. also requires a connected component analysis be performed to identify the bounding areas (Figure 1.3). For simple one column documents the image can be directly split from here, but in real-world situations where the layout is more complicated the document must be traversed in kD-tree fashion (specifically an X-Y tree). This algorithm also requires binarization and deskewing be performed in the preprocessing stages.

8 Figure 1.3: X-Y Cut Example 2 (a) the original image (b) placement of cuts (c) zones subdivided by the recursive X-Y cut (d) X-Y tree of the page layout structure of the document image shown in (a) (Ha)

Docstrum

The Docstrum [9] algorithm is more interesting than either run-length smearing or X-Y Cut. The name Docstrum is derived from Document Spectrum; its author, O’Gorman, was inspired by ideas from the area of spectrum analysis.

The algorithm performs a k-nearest neighbor analysis of the components found during a bottom-up connected component analysis. This yields a graph of connected ‘connected components’; the center point of the ‘connected components’ are themselves connected to the k nearest center points of other

‘connected components’ (Figure 1.4). These connections define the boundaries of low-level areas (words or lines) which contain text.

9 O’Gorman describes how the connections which are found can be used to correct for document defects such as skew. For skew a histogram of of the connection angles is used. A similar approach, using a histogram of connections lengths, is used to determine character spacing.

This approach does not require a deskewed image, but it does require binarization and bottom-up connected component analysis. Noise reduction by the application of the kFill filter [17], which removes small speckles, is also recommended. Due to the nature of the algorithm, sections of very different text sizes should be processed separately, this is usual done in preprocessing.

Docstrum also has difficulties identifying document items that are above the line level (blocks or areas). The biggest drawback to this approach is that k-nearest neighbor must be performed on each possible character, which takes O(n2) time

(where n is the number of candidate characters). In practice computation time is reduced by limiting the range in which to search and only increasing it when k number of connections are not found.

10 Figure 1.4: Docstrum Example (a) original image (b) k-nearest neighbor of connected components (c) final boundaries (O’Gorman)

Whitespace Analysis

The insight behind whitespace analysis is that, “white space is a generic layout delimiter.” [10] This means that a document’s background, since it is normally uniformly white, can more readily be identified than its foreground.

These approaches focus on the geometry of documents which conform to a

Manhattan or Taxi-Cab page layout; which is essentially when a page contains isolated text blocks (like city blocks in Manhattan, New York).

There are various whitespace analysis techniques. A modern method was described by Baird [10], which was influenced by the work of Nagy et.al., [18] but most subsequent work cites enhanced versions created by Breuel [11, 12].

Like Docstrum, whitespace analysis requires connected component analysis. The

11 algorithm attempts to identify maximum areas of whitespace, called

“covers” (Figure 1.5). These areas are sorted and merged into larger areas which can be used to define the boundaries of the text containing areas.

Figure 1.5: Whitespace Analysis Example (a) searching the connected components for “covers” (b) identified whitespace areas (c) identified whitespace areas grouped, showing search grid (Breuel)

Voronoi

The Voronoi algorithm is a very interesting approach based upon the mathematical concept of Voronoi tessellations [13]. The algorithm was conceived to deal with the problem of non-Manhattan layout, where documents may have regions which are not laid out with perfectly horizontal and vertical edges. Pages that have a non-Manhattan layout cannot be deskewed with normal methods, so this approach is more flexible. The algorithm requires that points be found within connected components by using an edge following algorithm. These Voronoi

12 points cannot be used to directly create the Voronoi diagram, steps must be taken to move superfluous edges and create Voronoi areas that contain interesting portions of the document.

Text Extraction Software

Algorithms which give optimal quality results may not be asymptotically optimal, may simply not perform fast enough on currently available hardware, or may have other problems which make them impractical in real world applications. So, the known methods need to be evaluated for applicability.

Rather implementing each algorithm and performing extensive analysis, present

OCR systems will be examined to see which image segmentation algorithms are presently in use. Since proprietary OCR programs cannot be examined, the study was necessarily limited to open-source software offerings.

Currently the most predominate open-source OCR programs are GOCR,

Ocrad, Cuneiform, OCRopus, Tesseract, and OCRFeeder. GOCR performs very poorly on anything but the most regular text, so it was excluded. has better recognition abilities than GOCR, but its performance is still very poor; the main developer even states that GOCR uses rudimentary “ad hoc” algorithms, so it was excluded. Cuneiform, which was once a commercial product, has trouble recognizing the type-written characters in the example files, but it successfully identifies areas of text which should be examined, so it will be evaluated.

13 OCRopus is a set of Python scripts which perform various OCR functions, but it is well documented and is developed by Professor Breuel, so it will be considered. Tesseract appears to be the most widely used open-source OCR program. Finally, there is OCRFeeder. This is not really an OCR program, it is a front-end for all of the previously mentioned programs, but it does perform text extraction, so it will also be considered.

Cuneiform

Cuneiform was developed prior to 1993 by . The application was made freeware in 2007, and the main algorithm kernel was released as open-source in 2008. However, the codebase for Cuneiform was large and the few comments which it contained were often in Russian. Due to these difficulties it was not clear precisely how its text extraction was completed.

OCRopus

OCRopus is a modular system, and there are different options available for segmentation. The default method is -gpageseg, so that is the one which will be considered. The option method ocropus-prast performs a more complex geometric layout analysis, and ocropus-ridg performs a more experimental ridge based algorithm. The default segmentation method begins its image segmentation process by removing any horizontal black lines that may otherwise be confused for text in later stages. It then attempts to identify

14 columns by finding vertical white or black lines. Next it calculates the max norm of the gaussian filtered image in order to find the tops and bottoms of text lines; it is essentially looking at vertical gradients. The tops and bottoms are used to establish baselines and heights, and these “seeds” are used to mark text lines.

This process inherently requires that images be deskewed prior to being passed to the algorithm.

Tesseract

Tesseract is an OCR application which was developed by Hewlett Packard between 1985 and 1998, it was open-sourced in 2005, and has been sponsored by

Google since 2006. The image segmentation portion of the program uses an algorithm developed in 1994 by Ray Smith of Hewlett Packard. The algorithm, which was designed to work even with a skewed document, starts with a connected component analysis. The outlines from the connected component analysis are joined into blobs, the blobs are filtered by height, and then the blobs are filtered. With the relevant blobs retained, they are organized into lines.

Through this process the skew of the line can be determined.

OCRFeeder

OCRFeeder was developed in 2008 by Joaquim Rocha for his Computer

Science Masters Thesis. This program does not actually perform OCR itself, rather it operates as a front-end for programs such as Tesseract and Cuneiform.

15 However, OCRFeeder has its own segmentation engine, which often improves the OCR results given by the recognition program, perhaps because it has a superior segmentation implementation. OCRFeeder appears to use a variation of the whitespace algorithms.

16 CHAPTER III

PROPOSED APPROACH

Having examined existing approaches described in academic works and used in real-world applications, a new approach based upon seam carving will be described. Though originally conceived independently, the seam caving approach to text extraction has a lot in common with whitespace analysis. The main goal of the seam carving approach is to use the whitespace in a document to identify where the interesting information lies; rather than looking for rectangular covers which are joined into large seams, the seam carving approach attempts to find whitespace seams directly.

Consider a blank document, or a white image. With a properly constructed seam tracing function, one that attempts to maintain straight lines, all horizontal seams would go straight across the image. If a single letter or word is added to the center of the document, as seams are examined, from the top down or from left to right, they will begin to deviate as they get closer to the letter (Figure 2.1). The deviation should reach its maximum around the center of the word, then switch direction, and begin to deviate less.

17 Figure 2.1: Seam Deviation Example (a) seams in a blank document (b) seams in a document which contains a word (c) seam deviation amounts in a document which contains a word

Seen in this way, just as physical matter in space causes distortions in space-time, information in a document causes distortions in the document space.

With the seam carving approach to text segmentation the distortions made by the information are used to identify the boundaries of information within the document. The seam carving approach just needs slight modification to better support this new approach.

Seam Carving

Seam Carving was originally described by Avidan and Shamir as a technique for content aware image resizing. [6] Their insight was that it could be possible to resize images through the removal of unimportant information from the image rather than though cropping or scaling. The use of seams, which travel

18 a jagged path completely through the image, allow each row or column to remain the same width as the less important pixels are removed.

Let I be an n ✕ m image:

x x n n 7a) s = {s i} i=1 = {(x(i), i)} i=1, s.t. ∀i |x(i) − x(i − 1)| ≤ 1 where x : [1,...,n] → [1,…,m]

y y m m 7b) s = {s j} j=1 = {(j, y(j))} j=1, s.t. ∀j |y(j) − y(j − 1)| ≤ 1 where y : [1,...,m] → [1,...,n]

Equation 2.1: Formal Seam Definition and Example The formal definition of a vertical seam (7a), a horizontal seam (7b), and a visual example of vertical and one horizontal seams (Avidan and Shamir)

19 Preprocessing Functions

The original seam carving paper does not mention explicit preprocessing steps, and they may not be appropriate for image retargeting, but are needed when the approach is applied to text extraction. First, color images are converted to greyscale; color information is less valuable for text extraction. The conversion of color images to greyscale may at first seem a simple task, but in fact there a few different ways to accomplish it. Before settling upon the use of average pixel intensity, where the red, green, and blue color channels are averaged, many other approaches were examined. Experiments were ran to test the performance of using: the HSV hexcone model, where the maximum of the color channels is selected; different luma and luminance conversion ratios, such as the popular

(0.299 ✕ Red + 0.587 ✕ Green + 0.114 ✕ Blue); and even Euclidian distance.

In addition to converting color images to greyscale, different contrast enhancement techniques were experimented with. Otsu binarization [19] was used, but it did not always yield the best results. In many situations passing each pixel through a simple cosine function yielded the best results. In common photo editing applications this is known as curve adjustment, here a cosine function was used for the shape of the curve.

20 Energy Functions

Since the goal of seam carving is to remove pixels which will not be noticed, Avidan and Shamir considered the images’ “energies.” Each pixel in an image will have an energy value which represents how similar it is to the pixels around it. There are various methods for calculating energies, since text is normally contrasted against its background almost any method should work.

Edge detectors should also work well for high contrast text, so they will also be explored. The energy functions are defined as filter kernels and convolved against the target image.

Simple Gradient

The main energy function used by Avidan and Shamir was a simple gradient magnitude function. They calculate the absolute value of the change in the x direction and the y direction. Those two values are summed for the final energy value (Equation 2.2). This is a very simple approximation of the first- order derivative of the pixel in two dimensions.

∂ ∂ e1(I) = | /∂x I| + | /∂y I|

Equation 2.2: Simple Gradient Formula (Avidan and Shamir)

21 Sobel Edge Detection

The Sobel operator [20] provides another way to approximate the two partial derivatives of the image. Defined in 1968 by Irwin Sobel, It provides the norm of the gradient vector for each pixel, and works well for detecting edges in images which contain text.

Figure 2.2: Edge Profiles Showing the intensity profile of one line in an image, the same profile after noise is removed, the first derivative, and the second derivative of the image slice (Concise Computer Vision)

Laplacian of Gaussians

A Laplacian operator (∇·∇, ∇2, ∆) approximates the second-order derivative value of a pixel, and can find areas of rapid change, which signifies edges in an image (Figure 2.2). Second derivatives are even more susceptible to noise than first derivates. To overcome this, a Gaussian filter is first applied to

22 smooth out noise in the image. A Gaussian filter is a local convolution with a

filter kernel (Figure 2.3) which is derived from a 2D Gauss function; a 2D Gauss function is the product of two 1D Gauss functions. Laplacian of Gaussians, sometimes called the Mexican hat function, is the result of applying the Laplacian operator to a Gaussian blurred image.

1 4 7 4 1 4 16 26 16 4 7 26 41 26 7 4 16 26 16 4 1 4 7 4 1 Figure 2.3: Example Gaussian Filter Kernel This filter kernel, for a Gauss function with a standard deviation of 1, would be applied to each pixel and then divided by 273 (Klette).

Difference of Gaussians

The difference of Gaussians operator finds the difference between two blurred copies of the original image, each blurred in different amounts. More precisely, two Gaussian kernels with differing standard deviations are applied to the source image and the difference of the two results is recorded. Difference of

Gaussians is sometimes substituted as an approximation to Laplacian of

Gaussians.

23 Difference of Gaussians with Sobel

Due to the nature of Gaussian kernels, the difference of Gaussians method suppresses high-frequency information and thus operates like a band pass filter.

This means that random noise and finer image patterns are less likely to be reported as edges. When information obtained from a difference of Gaussians operator is combined with information obtained from a Sobel filter, high- frequency areas can be suppressed while strong edges are preserved. A new kernel filter could be mathematical derived to perform this combined approach in one step, but that was beyond the scope of this project.

Figure 2.4: Energy Function Examples (a) Difference of Gaussians (b) Sobel edge detection (c) Simple Gradient (d) Difference of Gaussians and Sobel (e) Laplacian of Gaussians (f) original

24 Seam Traversal

Given the results of an energy function, seams must be next be calculated for the image. For image retargeting the image can be treated as a graph and seams can be removed iteratively, but Avidan and Shamir chose to take a dynamic programming approach.

For vertical seams the dynamic programming matrix is filled in a top-to- bottom, left-to-right manner; a seam value is calculated for each pixel by adding the minimum of the left three pixels in the current pixel’s 8-connected neighborhood (left above, left, left below) to the current pixel’s energy value.

M(i, j) = e(i, j) + min( M(i−1, j−1), M(i−1, j), M(i−1, j+1) )

Equation 2.3: Formal Seam Pixel Value Definition (Avidan and Shamir)

For image retargeting, when analyzing vertical seams, the right column of pixels can be checked for minimums once the dynamic programming matrix is

filled. The minimum values represent the starting points for seams which can be removed. From these points backtracking is used to remove pixels.

25 Figure 2.5: Seam “Shadow” Example Original (left), seam values (center), seam values with decrement (right)

To apply seam carving towards text extraction the seam traversal procedure is modified. Firstly, with text images, seam values create “shadows” which can obscure information following large features (Figure 2.5). This effect could be due to the dynamic programming approach’s tendency to propagate errors; it may be possible to use replace dynamic programming with a greedy approach, but that has been left as a topic for future research. For now a decrement will be added to reduce the effect of shadows.

The dynamic programming matrix is filled as before, except that each pixel’s resulting seam value is reduced by some value k if it is greater than zero

(Equation 2.4). This is a key modification, without this decrement seams would never travel between text lines due to how seam values are carried across the image. A k value of 1 was experimentally determined to yield satisfactory results.

26 M(i, j) = e(i, j) + min( M(i−1, j−1), M(i−1, j), M(i−1, j+1) ) - k

Equation 2.4: Formal Seam Pixel Value Definition New (Stoll)

Text Extraction Steps

Additional, unique steps are required to use seam carving for text line extraction. Rather than examining the last column or row for minimum values, each pixel in the terminal row or column is examined. In fact, finding the seams of minimum value does not generally work with text images. For most images containing text the last row or column will contain a large number of minimum values since the background of most documents is some uniform color. So, backtracking must be performed for each of the pixels in the terminal row or column.

The process of backtracking must also be modified to support the new problem domain. For image retargeting the actual seam path matters little, whereas for text line extraction it is expected to proceed in a straight line since text is normally written in lines. So, in addition to considering the values of the next step when backtracking, the overall deviation from straight is also considered.

27 Figure 2.6: Seam Path Example Possible (orange) and probable (red) seam paths are shown for a standard seam carving approach (left). Top (red) and bottom (blue) seam paths are shown for backtracking with added deviation constraints.

If a seam has deviated from vertical or horizontal then the algorithm will force the seam back in that direction. When there are more than one possible backtracking move with the same value, the algorithm has discretion to choose which path to take. For the purposes of this text extraction approach, the algorithm must take the center path whenever possible, unless there is a net deviation. When there is a deviation the implementation must take the choice that minimizes the deviation. This results in seams which minimize the deviation and minimize the energy of the path. From here there are two distinct approaches to identifying the distortions in the image seams.

28 Let I be an n ✕ m image and s be a seam:

x n 2.5 a) t = ∑ i=1 ( s(i) - s(1) )

y m 2.5 b) t = ∑ j=1 ( s(i) - s(1) )

Equation 2.5: Formal Definition of Net Deviation Net deviation (t) for a vertical seam (14a) and for a horizontal seam (14b)

29 Direct Area Detection

The first way to identify distortions in the seams is to check for seams which have the most net deviation from straight. As a horizontal seam traverses across the image each column’s deviation from the starting row is recorded

(Equation 2.5). The sum of these deviations is the net deviation. When a seam’s net deviation changes sign it represents the bottom of an area, and the seam prior to it represents the top of an area. Long straight runs at the beginning and end are trimmed off.

Figure 2.6: Direct Area Detection The tops of found areas are marked with red lines and the bottoms of found areas are marked with blue lines.

30 Information Masking

Another approach to identify distortions in the seams is to look for pixels where seams overlap. As seams divert around information they will tend to go through the pixels just outside the upper and lower boundary. If a matrix is created to keep track of how many seams pass through each pixel, then the matrix cells nearest to information will have higher values and matrix cells where information resides will have zero values.

Figure 2.8: Information Masking Information is black since no seams went through those areas, the area outside of information is lightest since many seams deviated through those areas.

31 Method Comparison

Direct area detection and information masking both work reasonably well when the document is simple, but information masking works much better on more complicated document layouts. In the example (Figure 2.9) there are three columns of irregularly laid out words; the difference in character size between the columns prevents lines from being properly identified. In the cases where lines are identified correctly (such as “Charles Dewitt - H. E. McClure”), they are actually in different columns and should not be treated as single lines. Another issue with direct area detection is that words near the edge are sometimes not properly detected. The problem appears to be that the leading and trailing lines are trimmed prematurely due to the lack of seam run length. Finally, direct area detection, without further enhancements, begins to fail faster on documents which are skewed; this is especially problematic since handling skew is a strength of the seam carving approach.

32 Figure 2.9: Comparing Direct Area Detection and Information Masking Direct area detection (above) and information masking (below) on multiple columns

Information masking is superior to direct area detection when the target text is skewed. However, when text skew reaches a certain point even information masking, under the current implementation, begins to degrade in performance. As can be seen from the example (Figure 2.10), direct area detection clearly begins to break down at two degrees of skew, and stops working completely by the time the text is skewed by eight degrees. Information masking, since it does not rely upon heuristics, continues to work at eight degrees of skew

(Figure 2.10 d), but the seam markers at the beginning and ending of lines becomes stretched.

33 Figure 2.10: Skewed Text Comparison Two degree skew processed by direct area detection (a) and by information masking (b). Eight degree skew by direct area detection (c) and by information masking (d)

Overcoming Limitations

In order to be a viable solution the shortcomings mentioned in the above sections should be overcome, otherwise there is less compulsion to use this technique over the established techniques. The problem of skew is minor; it has been shown, through the information masking output, that the raw seam carving technique can identify text that is skewed. It is only the heuristics for direct area detection that need to be improved. And, if all else fails, a dedicated deskewing algorithm can be implemented as a preprocessing step.

The problem of handling more complicated text layouts, however, is more significant. Again, the raw seam carving technique, as shown through the information masking approach, can identify text regardless of its position, but keeping the text logically grouped is, as it always has been, more problematic.

34 Improved Skew Handling

In order to improve the handling of skewed text the approach could be modified so that it does not attempt to force seams to travel at precisely 90 or 180 degrees. Currently the algorithm encourages seams to end in the same row or column in which they started. A better method would be to identify deviation trends; if the seam is steadily deviating more, then it could be assumed that the information that is displaying it is skewed. Rather than attempting to force seams to travel at exactly 90 degrees or 180 degrees the algorithm could attempt to force the seam to travel along the trend angle.

Since direct area detection performs so poorly at detecting starting and ending seams when the text is skewed, that portion should also be improved.

Currently it simply evaluates the magnitude of the net deviation, once it changes sign the top and bottom seams are presumed to have been found, but this is not necessarily true with skewed text. Checking for a sign change in magnitude should account for the overall angle of seams within a certain range. This was not implemented as a part of this project, but is a goal of future research.

In all cases the heuristics used should be improved through more thorough data analysis, which in turn requires improved parameterization of the algorithm. The values used for heuristics are the result of manually examining

35 the algorithm’s performance over a very small data set. The use of a larger and more varied set of example images should yield better heuristic values.

Complicated Layouts

There are various possible techniques to overcome the problems of locating text within columns and other more complicated layouts. Most of these possible techniques are based upon methods described in other text extraction approaches.

The first option is to run the seam carving algorithm both vertically and horizontally, then using a heuristic to determine which direction is more appropriate to split in first. One possible heuristic is total resistance. The idea is that, as each of the seams are explored, when they are forced to deviate from a straight path they are experiencing resistance. For direct area detection the net deviation for each seam is calculated, so it would be trivial to sum these quantities. The sum net deviations for horizontal and vertical seam traversal should then be compared. The direction with less resistance, or a lower sum net deviation, is the direction which should be explored first. This technique helps when identifying an information mask, but it is still very crude. It tends to favor moving horizontally (for left-to-right or right-to-left texts) and does not help improve direct area detection.

36 Another approach to higher level segmentation would be to take a kD-tree or X-Y cut type of approach. Instead of initially tracing all the seams, the minimum seams will be found, as is done with typical seam carving. The difference comes in how “minimum seam” is defined. Rather than checking the terminal value of the seam this approach would need to find the seam with the minimum net deviation. Since there are likely to be multiple seams with the same minimum net deviation, this approach would need to find the center seam of the largest grouping of minimum net deviation seams. The identified seam would be used to split the image and the process would be ran again on each of the halves.

This was not implemented as a part of this project, but is a goal of future research.

Extracting Finer Details

Setting aside this approach’s current limitations, more of its benefits will be described. Given a non-skewed document which contains simple content with a manhattan layout, it is possible to iteratively run the same seam carving process on identified sub areas. If the first pass of the algorithm identified sentences, then the next pass of the algorithm will identify phrases, words, or letters. The technique is simply used against areas identified in the first step.

37 Figure 2.11: Seams Identified in Previously Identified Areas The first line (left) and second line (right) identified in the document from Figure 2.6

Asymptotic Analysis

The first step in the seam carving process is to read in an image and create an appropriate data structure. The example program performs brightness, contrast, and energy calculations as distinct steps, but this is done to allow for

flexibility during testing, a production program would combine these into a single step. The Difference of Gaussians could be performed in one pass given the appropriate data structure. So, assuming that the image has a width of n and a height of m, then this process will take O(nm).

The next step is to fill the seam matrix, which requires iterating over the image again, taking O(nm). Finally, all the seams must be traversed. Considering horizontal seam evaluation, each of the n starting pixels is considered as a starting point, and the seam must cross the entire height m of the image, so this process takes another O(nm). The runtime performance for a single pass is 3 *

O(nm), and 5 * O(nm) when both horizontal and vertical seams are evaluated. So, this approach performs in O(nm) time, where n is image width and m is image height. This conclusion is supported by empirical analysis (Figure 2.12).

38 Figure 2.12: Asymptotic Analysis Actual run-time performance data; both directions, average of 20 runs

It should be noted that although vertical and horizontal seam analysis both perform in essentially O(n2) time, real world performance can be drastically different due to algorithm implementation and hardware limitations. The image data structure uses row major order, and due to how memory is cached and moved to the processor, cutting horizontal seams can take twice as long to accomplish on some hardware.

39 Related Work

Seam carving has previously been used in the area of handwriting recognition, however this approach takes a slightly different perspective on the problem. Saabini and El-Sana describe a method for applying seam carving to handwritten text line extraction [14]. When they surveyed the research on text line extraction for handwriting recognition they also found that most of the techniques relied upon some sort of connected component analysis, so they devised a novel technique based upon seam carving.

The algorithm described by Saabini and El-Sana calculates the image’s or document’s energy in such a way (using a signed distance transformation) that the selected seam paths went through the lines of handwritten text rather than attempting to identify the whitespace. Once the text line’s center line is found it expands vertically to identify the full height of the line.

One major advantage of their approach is that small components, such as the dot above an “i” can be included in the row based upon what the algorithm learns about row heights. The new technique described in this paper can be susceptible to leaving out such small components, especially when processing images which have not first been subdivided. In practice, for single lines of text, the technique described here automatically includes small pieces of information due to how the heuristics are set find maximum deviation.

40 Another benefit of Saabini and El-Sana’s technique is that since it is tracking lines of text, it can handle multi-skew and lines that actually touch each other. For the technique described in this paper, even if a moving average was used to determine line skew, it will still have difficulty handling multi-skew. This technique can also cope with lines that touch at least as well as the method described by Saabini and El-Sana.

Asi, Saabini and El-Sana subsequently demonstrated that an enhancement to Saabini and El-Sana's approach could identify the precise boundaries between text lines [25]. Their method’s ability to do this is entirely dependent upon it ability to find the text lines’ medial path, which in turn relies upon single columns of handwritten information. Whereas their work is likely perform superiorly in identifying lines of free-form handwritten text, the method described in this paper is likely to excel in typed or printed documents. The method described here is intended to run recursively, so there is some expectations that the information it finds will be subsequently subdivided, whereas their approach is intended to identify information using a single pass.

Future Work

The existing implementation of seam carving for text line extraction performs as well as other notable text extraction methods. Like them it has trouble with skewed text and more complicated layouts, traditional problems in

41 this area of research. Fortunately, there are some techniques to overcome these shortcomings. The first improvement should be in the area of handling skewed text. The technique should consider some sort of moving average of the seam angles, but more research will need to be done in this area.

Another area of future work is the implementation of a kD-tree or X-Y cut approach to higher level document segmentation. This will help define the structure of the document and improve actual text extraction. Since a dynamic programming approach has been taken and due to the nature of seams within text documents, it should be possible to reuse seam data as the document is subdivided, thus reducing computation time. A check may be required to ensure that the border pixels of the sub-area all contain zero weights. Also, a more precisely defined base condition would need to be defined to limit the number of subdivision attempts.

An alternative approach to higher level document segmentation using seam carving would be to use the idea of “zooming,” or otherwise considering the level of detail (LOD). This idea is based upon how people process information within their view; people can get an overview of what they see or they can understand fine details within it, but those are generally two distinct steps. To simulate this process the candidate image could have its size reduced and contents blurred to decrease the definition of finer details. Then, when seam

42 carving is ran it will not find sentences, for example, but rather it would find paragraphs or columns.

Figure 2.13: Image Overview Extraction Experimental results of when the seam carving approach is ran on a reduced size image; the lines found (grey/black bands) are higher level structures such as paragraphs

The approach of using LOD could bring multiple benefits. First, since the seam caving approach runs in polynomial time, a reduction in image size would greatly in crease run-time performance. However, the dynamic programming matrix would have to be recreated at each zoom level which could negate any

43 realized gains. If this modification was combined with the kD-tree modification, it would make identification of higher level divisions easier. The idea of

“zooming out” also presents opportunities for early recognition or at least setting a context for subsequent recognition steps. It may be possible to run the low resolution areas of interest through an algorithm which quickly detects basic shapes, or one that uses textons [15] to detect textures. If this approach were viable it would have implications outside of the realm of OCR; it could be used in the field of bioinformatics or for general computer vision.

Existing implementations of this approach have been experimentally applied to bioinformatics. There are techniques being developed to identify carcinomas based upon examining differences of protein profiles; presently the differentiation of the protein profiles is done by manual inspection [21].

Enhanced version of the seam carving approach may allow for algorithmic differentiation of protein profiles.

Figure 2.14: Experimental Application Toward Protein Differentiation

44 Identification of interesting areas within two protein samples; storing the identified regions in a kD-tree structure may allow for automated comparison

Successful application to bioinformatics may further lead to applications toward general computer vision. To see how this approach could be used for general computer vision, consider an image which contains a stop sign. If the algorithm identifies the sign as being interesting in one of the initial iterations, and if the octagonal shape could be detected, then it may not even be necessary to continue processing the area — if a person sees a stop sign out of the corner of their eye they do not need to read the word “STOP” to understand the meaning of it. For most adults, seeing a red octagon is enough to understand that the symbol means stop. Also, if “STOP” is being recognized but some of the letters are illegible, the fact that the word being recognized is within an octagon would induce the algorithm to respond that the word is “STOP.”

45 CHAPTER IV

CONCLUSIONS

A promising new approach to text extraction has been described based upon seam carving. The seam carving approach brings many of the same benefits of white space analysis techniques; it operates largely parameter free, it does not make assumptions about document layout and it works regardless of text direction or page orientation. And though it also has many of the same limitations that whitespace analysis, there are some promising methods unique to this approach which can help overcome those shortcomings and allow for it to be used outside of the original problem domain.

Applying the seam carving approach to text line extraction reinforced that the original technique was designed for images and not text documents. When seam values are calculated, the original technique causes “shadows” to be cast which can block out smaller pieces of information. The shadows represent errors which, unlike images, easily get propagated through the mostly blank space of a text document. The shadows were reduced by adding decay to the seam value function. Their presence also suggests that it might be wise to switch from

46 dynamic programming to a greedy approach when applying seam carving towards text line extraction.

The approach described by Avidan and Shamir also relies upon the ability to find minimal terminal seam values from which to backtrack. For text documents the terminal value of all seams is likely to be zero. This is due to the amount of whitespace, space without information, in text documents. Images have a higher information density and are not normally padded with headspace.

Modifying the technique so that every seam terminus is examined is necessary to apply seam carving to text line extraction.

The original seam carving technique further allowed for indeterminate seam paths (determined by the implementation). For resizing images it is probably beneficial to have some randomness in the seam path, but that works against text line extraction. Text normally flows in straight lines, and it makes sense to consider that when looking to extract text lines. To apply seam carving to text extraction the concept of net deviation was introduced; in addition to minimizing the seam path cost the net deviation of the seam is also minimized.

47 These minor innovations have enabled the seam carving algorithm to be applied to task of text line extraction; it builds upon existing techniques, yet represents a truly novel approach. The potential benefits of perfecting this approach could have impacts beyond optical character recognition, and is thus worthy of more research.

48

REFERENCES

1. S. Prince, “Computer Vision: Models, Learning, and Inference,” 2014.

2. R. Klette, “Concise Computer Vision: An Introduction into Theory and Algorithms,” 2014.

3. S. Russell and P. Norvig, “Artificial Intelligence: A Modern Approach,” 2010.

4. L. Sirovich and M. Kirby, “Low-dimensional procedure for the characterization of human faces,” Journal of the Optical Society of America A, Volume 4, Issue 3, pp. 519-524, 1987.

5. M. Turk and A. Pentland, “Face recognition using eigenfaces,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–591, 1991.

6. S. Avidan, and A. Shamir, "Seam carving for content-aware image resizing,” ACM Transactions on graphics (TOG), Volume 26, Number 3, 2007.

7. K. Y. Wong, R. G. Casey, and F. M. Wahl, “Document Analysis System,” IBM Journal of Research and Development, Volume 26, Issue 6, pp. 647-656, 1982.

8. G. Nagy and S. Seth, “Hierarchical representation of optically scanned documents,” International Conference on Pattern Recognition - ICPR, 1984.

9. L. O’Gorman, “The Document Spectrum for Page Layout Analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 15 Issue 11, pp. 1162-1173, 1993.

10. H. S. Baird, “Background Structure In Document Images,” In Advances in Structural and Syntactic Pattern Recognition, pp. 17-34, 1992.

49 11. T. M. Breuel, “Two Geometric Algorithms for Layout Analysis,” Document Analysis Systems, 2002.

12. T. M. Breuel, “Robust least square baseline finding using a branch and bound algorithm,” Document Recognition and Retrieval, SPIE, pp. 20-27, 2002.

13. K. Kise, A. Sato, and M. Iwata, “Segmentation of Page Images Using the Area Voronoi Diagram,” Computer Vision and Image Understanding, Volume 70, Number 3, pp. 370-382, 1998.

14. R. Saabni and J. El-Sana, “Language-Independent Text Lines Extraction Using Seam Carving,” 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 563-568, 2011.

15. B. Julesz, “Textons, the Elements of Texture Perception, and their Interactions,” Nature, Volume 290, pp. 91–97, 1981.

16. J. Ha, R. M. Haralick, and I. T. Phillips, “Recursive X-Y Cut using Bounding Boxes of Connected Components,” Third International Conference on Document Analysis and Recognition, pp. 952-955, 1995.

17. L. O’Gorman, “Image and document processing techniques for the RightPages electronic library system,” International Conference on Pattern Recognition, pp. 260-263, 1992.

18. G. Nagy, J. Kanai, M. Krishnamoorthy, M. Thomas, and M. Viswanathan, ‘‘Two Complementary Techniques for Digitized Document Analysis” ACM Conference on Document Processing Systems, 1988.

19. N. Otsu, “A threshold selection method from gray-level histograms.” IEEE Transactions on Systems, Man, and Cybernetics, Volume 9, Issue 1, pp. 62–66, 1979.

20. I. Sobel, “An Isotropic 3 3 Image Gradient Operator,” 1969.

21. T. Shi, F. Dong, L. S. Liou, Z. H. Duan, A. C. Novick, J. A. DiDonato, ‘’Differential Protein Profiling in Renal-Cell Carcinoma” Molecular Carcinogenesis, Volume 40, pp. 47-61, 2004.

50 22. B. Kernighan and D. Ritchie, “The C Programming Language,” 2nd Edition, 1988.

23. Medical Article Records Groundtruth (MARG), National Library of Medicine, http://marg.nlm.nih.gov, 2005.

24. S. Firilli, F. Leuzzi, F. Rotella, F. Esposito, ‘’A Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation,” Italy, 2012.

25. A. Asi, R. Saabni and J. El-Sana, “Text Line Segmentation for Gray Scale Historical Document Images,” International Workshop on Historical Document Imaging and Processing (HIP), 2011.

51 APPENDICES

52 APPENDIX A

PROGRAM MAKEFILE

CC = cc CFLAGS = -I/usr/local/include/libpng16 -L/usr/local/lib -lpng16 CFLAGS_FULL = -Weverything -I/usr/local/include/libpng16 -L/usr/local/lib -lpng16 default: all

.PHONY: all sc clean all: sc test sc: mkdir -p ./bin/ ${CC} ${CFLAGS} -o ./bin/sc ./src/sc.c test: ./bin/sc -b 0 -c 0 -d 6 -e 0 ./tst/RightsOfManB-001degree.png ./tst/out_ROMB-001.png ./bin/sc -b 0 -c 0 -d 6 -e 0 ./tst/RightsOfManB-002degree.png ./tst/out_ROMB-002.png clean: -rm ./bin/*

53 APPENDIX B

PROGRAM MAIN — SC.C

/** * sc.c * Masters Thesis Work * Christopher Stoll, 2014 */

#include #include #include "libpngHelper.c" #include "libSeamCarve.c" #include "libResize.c"

#define PROGRAM_NAME "Experiments with Seam Carving" #define PROGRAM_VERS "0.4" #define PROGRAM_COPY "Copyright 2014-2015, Christopher Stoll" static void carve(char *sourceFile, char *resultFile, int forceBrt, int forceClr, \ int forceDir, int forceEdge, int forceGauss, int verbose) { int *imageVector; int imageWidth = 0; int imageHeight = 0; int imageDepth = 0; imageVector = readPNGFile(sourceFile, &imageWidth, &imageHeight, &imageDepth, verbose); if (!imageVector || !imageWidth || !imageHeight) { fprintf(stderr, "Error loading PNG image.\n"); exit(1); } int *newImageVector; newImageVector = seamCarve(imageVector, imageWidth, imageHeight, imageDepth, \ forceBrt, forceClr, forceDir, forceEdge, forceGauss); write_png_file(newImageVector, imageWidth, imageHeight, resultFile); } int main(int argc, char const *argv[]) { char **argumentVector = (char**)argv;

char *bvalue = 0; char *cvalue = 0; char *dvalue = 0; char *evalue = 0; char *gvalue = 0; int forceBrt = 0; int forceClr = 0; int forceDir = 0; int forceEdge = 0; int forceGauss = 0; int verboseFlag = 0; char *sourceFile = 0; char *resultFile = 0;

54 int c; opterr = 0; while ((c = getopt (argc, argumentVector, "b:c:d:e:g:v")) != -1) { switch (c) { case 'b': bvalue = optarg; forceBrt = (int)bvalue[0] - 48; break; case 'c': cvalue = optarg; forceClr = (int)cvalue[0] - 48; break; case 'd': dvalue = optarg; forceDir = (int)dvalue[0] - 48; break; case 'e': evalue = optarg; forceEdge = (int)evalue[0] - 48; break; case 'g': gvalue = optarg; forceGauss = (int)gvalue[0] - 48; break; case 'v': verboseFlag = 1; break; case '?': printf(PROGRAM_NAME " v" PROGRAM_VERS "\n"); printf(PROGRAM_COPY "\n\n"); printf("usage: sc [-b 0-8] [-c 0-5] [-d 0-9] [-e 0-8] [-g 0-3] [-v] \ source_PNG_file result_PNG_file \n"); printf(" \n"); printf(" Brightness Calculation Method \n"); printf(" '-b 0' Average Intensity / Brightness (default) \n"); printf(" '-b 1' HSV hexcone (Max Channel) \n"); printf(" '-b 2' Luma luminance - sRGB / BT.709 \n"); printf(" '-b 3' Luma luminance - NTSC / BT.601 \n"); printf(" '-b 4' Relative luminance \n"); printf(" '-b 5' HSP? \n"); printf(" '-b 6' Euclidian distance (generally poor results) \n"); printf(" '-b 7' Estimated relative luminance \n"); printf(" '-b 8' Estimated luma luminance - NTSC / BT.601 \n"); printf(" \n"); printf(" Contrast Adjustments \n"); printf(" '-c 0' none (default) \n"); printf(" '-c 1' use cosine adjusted brightness \n"); printf(" '-c 2' use double-pass cosine adjusted brightness \n"); printf(" '-c 3' use triple-pass cosine adjusted brightness \n"); printf(" '-c 4' use quadruple-pass cosine adjusted brightness \n"); printf(" '-c 5' use Otsu binarization (before Gaussian blurring) \n"); printf(" \n"); printf(" Force Seam Direction (or other output) \n"); printf(" '-d 0' automatically selected (default) \n"); printf(" '-d 1' force horizontal direction seams \n"); printf(" '-d 2' force vertical direction seams \n"); printf(" '-d 3' force both direction seams \n"); printf(" '-d 4' output brightness values \n"); printf(" '-d 5' output energy values \n"); printf(" '-d 6' output seam values (horizontal) \n"); printf(" '-d 7' output seam values (vertical) \n"); printf(" '-d 8' output seams (horizontal) \n"); printf(" '-d 9' output seams (vertical) \n"); printf(" '-d a' output areas (horizontal) \n"); printf(" '-d b' output areas (vertical) \n"); printf(" \n"); printf(" Energy Calculation Method \n"); printf(" '-e 0' use Difference of Gaussian (default) \n"); printf(" '-e 1' use Laplacian of Gaussian (sigma=8) \n"); printf(" '-e 2' use Laplacian of Gaussian (sigma=4) \n"); printf(" '-e 3' use Laplacian of Gaussian (sigma=2) \n"); printf(" '-e 4' use Sobel \n");

55 printf(" '-e 5' use LoG Simple \n"); printf(" '-e 6' use Simple Gradient \n"); printf(" '-e 7' use DoG + Sobel \n"); printf(" '-e 8' use LoG (sigma=8) AND Sobel \n"); printf(" \n"); printf(" Pre-processing Options \n"); printf(" '-g 0' none (default) \n"); printf(" '-g 1' pre-Gaussian blur (sigma=2) \n"); printf(" '-g 2' pre-Gaussian blur (sigma=4) \n"); printf(" '-g 3' pre-Gaussian blur (sigma=8) \n"); return 1; default: fprintf(stderr, "Unexpected argument character code: %c (0x%04x)\n", (char)c, c); } }

int index; // Look at unnamed arguments to get source and result file names for (index = optind; index < argc; index++) { if (!sourceFile) { sourceFile = (char*)argv[index]; } else if (!resultFile) { resultFile = (char*)argv[index]; } else { fprintf(stderr, "Argument ignored: %s\n", argv[index]); } }

// Make sure we have source and result files if (!sourceFile) { fprintf(stderr, "Required argument missing: source_file\n"); return 1; } else if (!resultFile) { fprintf(stderr, "Required argument missing: result_file\n"); return 1; }

// Go ahead if the source file exists if (access(sourceFile, R_OK) != -1) { carve(sourceFile, resultFile, forceBrt, forceClr, forceDir, forceEdge, \ forceGauss, verboseFlag); } else { fprintf(stderr, "Error reading file %s\n", sourceFile); return 1; } return 0; }

56 APPENDIX C

SEAM CARVING FUNCTIONS — LIBSEAMCARVE.C

/** * libSeabCarve.c * Masters Thesis Work * Christopher Stoll, 2015 -- v3, major refactoring * FD0D 1A69 8AD9 F05A 3052 707A 0D01 AA8F 51B2 E3EA */

#ifndef LIBSEAMCARVE_C #define LIBSEAMCARVE_C

#include #include #include #include "pixel.h" #include "window.h" #include "libWrappers.c" #include "libBinarization.c" #include "libEnergies.c" #include "libMinMax.c"

#define SEAM_TRACE_INCREMENT 16 #define THRESHHOLD_SOBEL 96 #define THRESHHOLD_USECOUNT 64 #define PI 3.14159265359 #define DEFAULT_CLIP_AREA_BOUND 1

/* * Trace all the seams * The least significant pixels will be traced multiple times and have a higher value (whiter) * The most significant pixels will not be traced at all and have a value of zero (black) */ static void findSeams(struct pixel *imageVector, struct window *imageWindow, \ int direction, int findAreas) { int directionVertical = 0; int directionHorizontal = 1; if ((direction != directionVertical) && (direction != directionHorizontal)) { return; }

int loopBeg = 0; // where the outer loop begins int loopEnd = 0; // where the outer loop ends int loopInc = 0; // the increment of the outer loop int loopInBeg = 0; int loopInEnd = 0; int loopInInc = 0; int seamLength = 0; int nextPixelR = 0; // next pixel to the right int nextPixelC = 0; // next pixel to the center int nextPixelL = 0; // next pixel to the left int currentMin = 0; // the minimum of nextPixelR, nextPixelC, and nextPixelL

57 int countGoR = 0; // how many times the seam diverged upward int countGoL = 0; // how many times the seam diverged downward int nextPixelDistR = 0; // memory distance to the next pixel to the right int nextPixelDistC = 0; // memory distance to the next pixel to the center int nextPixelDistL = 0; // memory distance to the next pixel to the left

// loop conditions depend upon the direction if (direction == directionVertical) { loopBeg = imageWindow->lastPixel - 1; loopEnd = imageWindow->lastPixel - 1 - imageWindow->xLength; loopInc = imageWindow->xStep * -1; // also set the next pixel distances nextPixelDistC = imageWindow->fullWidth; nextPixelDistR = nextPixelDistC - 1; nextPixelDistL = nextPixelDistC + 1; loopInBeg = imageWindow->yTerminus - 1; loopInEnd = imageWindow->yOrigin; loopInInc = imageWindow->xStep; seamLength = imageWindow->yLength; } else { loopBeg = imageWindow->firstPixel + imageWindow->xLength - 1; loopEnd = imageWindow->lastPixel; loopInc = imageWindow->yStep; // also set the next pixel distances nextPixelDistC = imageWindow->xStep; nextPixelDistR = imageWindow->fullWidth + nextPixelDistC; nextPixelDistL = (imageWindow->fullWidth - nextPixelDistC) * -1; loopInBeg = imageWindow->xTerminus; loopInEnd = imageWindow->xOrigin; loopInInc = imageWindow->xStep; seamLength = imageWindow->xLength; }

// v5 experiemnts (based upon v2) int totalDeviation = 0; int totalDeviationL = 0; int totalDeviationR = 0; int lastTotalDeviationL = 0; int lastTotalDeviationR = 0; int seamPointer = 0; int seamBegan = 0; int *lastSeam = (int*)xmalloc((unsigned long)seamLength * sizeof(int)); int *currentSeam = (int*)xmalloc((unsigned long)seamLength * sizeof(int)); int deviationMin = imageWindow->fullWidth / 25; int deviationTol = imageWindow->fullWidth / 200; int clipAreaBound = DEFAULT_CLIP_AREA_BOUND; int straightDone = 0; int straightStart = 0; int k = loopBeg; int loopFinished = 0; int minValueLocation = 0; // for every pixel in the right-most or bottom-most column of the image while(!loopFinished) { // process seams with the lowest weights // start from the left-most column minValueLocation = k; countGoR = 0; countGoL = 0;

// v5 experiments (based upon v2) if (findAreas) { totalDeviation = 0; totalDeviationL = 0; totalDeviationR = 0; seamPointer = 0; currentSeam[seamPointer] = minValueLocation; }

// move right-to-left or bottom-to-top across/up the image for (int j = loopInBeg; j > loopInEnd; j -= loopInInc) { // THIS IS THE CRUCIAL PART

58 if (direction == directionVertical) { if (imageVector[minValueLocation].usecountV < (255-SEAM_TRACE_INCREMENT)) { imageVector[minValueLocation].usecountV += SEAM_TRACE_INCREMENT; } } else { if (imageVector[minValueLocation].usecountH < (255-SEAM_TRACE_INCREMENT)) { imageVector[minValueLocation].usecountH += SEAM_TRACE_INCREMENT; } }

// get the possible next pixles if ((minValueLocation - nextPixelDistR) > 0) { if (direction == directionVertical) { nextPixelR = imageVector[minValueLocation - nextPixelDistR].seamvalV; } else { nextPixelR = imageVector[minValueLocation - nextPixelDistR].seamvalH; } } else { nextPixelR = INT_MAX; }

if (direction == directionVertical) { nextPixelC = imageVector[minValueLocation - nextPixelDistC].seamvalV; } else { nextPixelC = imageVector[minValueLocation - nextPixelDistC].seamvalH; }

if ((minValueLocation - nextPixelDistL) < loopEnd) { if (direction == directionVertical) { nextPixelL = imageVector[minValueLocation - nextPixelDistL].seamvalV; } else { nextPixelL = imageVector[minValueLocation - nextPixelDistL].seamvalH; } } else { nextPixelL = INT_MAX; }

// use the minimum of the possible pixels currentMin = min3(nextPixelR, nextPixelC, nextPixelL);

// attempt to make the seam go back down if it was forced up and ice versa // the goal is to end on the same line which the seam started on, this // minimizes crazy diagonal seams which cut out important information if (countGoR == countGoL) { if (currentMin == nextPixelC) { minValueLocation -= nextPixelDistC; } else if (currentMin == nextPixelR) { minValueLocation -= nextPixelDistR; ++countGoR; ++totalDeviation; } else if (currentMin == nextPixelL) { minValueLocation -= nextPixelDistL; ++countGoL; --totalDeviation; } } else if (countGoR > countGoL) { if (currentMin == nextPixelL) { minValueLocation -= nextPixelDistL; ++countGoL; --totalDeviation; } else if (currentMin == nextPixelC) { minValueLocation -= nextPixelDistC; } else if (currentMin == nextPixelR) { minValueLocation -= nextPixelDistR; ++countGoR; ++totalDeviation; } } else if (countGoR < countGoL) { if (currentMin == nextPixelR) { minValueLocation -= nextPixelDistR; ++countGoR; ++totalDeviation;

59 } else if (currentMin == nextPixelC) { minValueLocation -= nextPixelDistC; } else if (currentMin == nextPixelL) { minValueLocation -= nextPixelDistL; ++countGoL; --totalDeviation; } }

// v5 experiments (based upon v2) if (findAreas) { if (totalDeviation > 0) { ++totalDeviationR; } else if (totalDeviation < 0) { ++totalDeviationL; }

++seamPointer; currentSeam[seamPointer] = minValueLocation; } }

// v5 experiments (based upon v2) if (findAreas) { // only consider seams with persistent deviations if (totalDeviationL || totalDeviationR) { // persistently going left (bottom of an area) if (totalDeviationL > totalDeviationR) { // we already have the top of an area if (seamBegan) { // present deviation (plus tolerance) is less than last deviation amount // and the last deviation is greater than the minimum required deviation if (((totalDeviationL + deviationTol) < lastTotalDeviationL) && \ (lastTotalDeviationL > deviationMin)) { seamBegan = 0;

straightDone = 0; for (int i = 0; i < seamLength; ++i) { if (direction == directionVertical) { if (clipAreaBound) { if ((i > 0) && \ ((currentSeam[i-1] - currentSeam[i]) != imageWindow->yStep)) { straightDone = 1; straightStart = 0; } else { if (straightDone && !straightStart) { straightStart = i; } } } else { straightDone = 1; }

if (straightDone) { imageVector[currentSeam[i]].areaBoundaryV = 3; } } else { if (clipAreaBound) { if ((i > 0) && \ ((currentSeam[i-1] - currentSeam[i]) != imageWindow->xStep)) { straightDone = 1; straightStart = 0; } else { if (straightDone && !straightStart) { straightStart = i; } } } else { straightDone = 1; }

if (straightDone) {

60 imageVector[currentSeam[i]].areaBoundaryH = 3; } } }

// remove final straight edge if (clipAreaBound && straightStart) { for (int i = straightStart; i < seamLength; ++i) { if (direction == directionVertical) { imageVector[currentSeam[i]].areaBoundaryV = 0; } else { imageVector[currentSeam[i]].areaBoundaryH = 0; } } } } // we don't have the top of an area yet } else { // present deviation (plus tolerance) is less than last deviation amount // and the last deviation is greater than the minimum required deviation if (((totalDeviationR + deviationTol) < lastTotalDeviationR) && \ (lastTotalDeviationR > deviationMin)) { seamBegan = 1;

straightDone = 0; for (int i = 0; i < seamLength; ++i) { if (direction == directionVertical) { if (clipAreaBound) { if ((i > 0) && \ ((currentSeam[i-1] - currentSeam[i]) != imageWindow->yStep)) { straightDone = 1; straightStart = 0; } else { if (straightDone && !straightStart) { straightStart = i; } } } else { straightDone = 1; }

if (straightDone) { imageVector[lastSeam[i]].areaBoundaryV = 1; } } else { if (clipAreaBound) { if ((i > 0) && \ ((currentSeam[i-1] - currentSeam[i]) != imageWindow->xStep)) { straightDone = 1; straightStart = 0; } else { if (straightDone && !straightStart) { straightStart = i; } } } else { straightDone = 1; }

if (straightDone) { imageVector[lastSeam[i]].areaBoundaryH = 1; } } }

// remove final straight edge if (clipAreaBound && straightStart) { for (int i = straightStart; i < seamLength; ++i) { if (direction == directionVertical) { imageVector[lastSeam[i]].areaBoundaryV = 0; } else { imageVector[lastSeam[i]].areaBoundaryH = 0;

61 } } } } }

// persistently going right (top of an area) // totalDeviationL <= totalDeviationR } else { // only if a top has not yet been found (without a mathing bottom) if (!seamBegan) { // present deviation (plus tolerance) is less than last deviation amount // and the last deviation is greater than the minimum required deviation if (((totalDeviationR + deviationTol) < lastTotalDeviationR) && \ (lastTotalDeviationR > deviationMin)) { seamBegan = 1;

straightDone = 0; for (int i = 0; i < seamLength; ++i) { if (direction == directionVertical) { if (clipAreaBound) { if ((i > 0) && \ ((currentSeam[i-1] - currentSeam[i]) != imageWindow->yStep)) { straightDone = 1; straightStart = 0; } else { if (straightDone && !straightStart) { straightStart = i; } } } else { straightDone = 1; }

if (straightDone) { imageVector[lastSeam[i]].areaBoundaryV = 1; } } else { if (clipAreaBound) { if ((i > 0) && \ ((currentSeam[i-1] - currentSeam[i]) != imageWindow->xStep)) { straightDone = 1; straightStart = 0; } else { if (straightDone && !straightStart) { straightStart = i; } } } else { straightDone = 1; }

if (straightDone) { imageVector[lastSeam[i]].areaBoundaryH = 1; } } }

// remove final straight edge if (clipAreaBound && straightStart) { for (int i = straightStart; i < seamLength; ++i) { if (direction == directionVertical) { imageVector[lastSeam[i]].areaBoundaryV = 0; } else { imageVector[lastSeam[i]].areaBoundaryH = 0; } } } } } } } // and you though LISP was bad

62 lastTotalDeviationL = totalDeviationL; lastTotalDeviationR = totalDeviationR; for (int i = 0; i < seamLength; ++i) { lastSeam[i] = currentSeam[i]; } }

k += loopInc; if (direction == directionVertical) { if (k <= loopEnd) { loopFinished = 1; } } else { if (k >= loopEnd) { loopFinished = 1; } } } free(lastSeam); free(currentSeam); } static void setPixelPathVertical(struct pixel *imageVector, struct window *imageWindow, \ int currentPixel, int currentCol) { int pixelAbove = 0; int aboveL = 0; int aboveC = 0; int aboveR = 0; int newValue = 0;

pixelAbove = currentPixel - imageWindow->yStep; // avoid falling off the left end if (currentCol > 0) { // avoid falling off the right end if (currentCol < imageWindow->xLength) { aboveL = imageVector[pixelAbove - imageWindow->xStep].seamvalV; aboveC = imageVector[pixelAbove].seamvalV; aboveR = imageVector[pixelAbove + imageWindow->xStep].seamvalV; newValue = min3(aboveL, aboveC, aboveR); } else { aboveL = imageVector[pixelAbove - imageWindow->xStep].seamvalV; aboveC = imageVector[pixelAbove].seamvalV; aboveR = INT_MAX; newValue = min(aboveL, aboveC); } } else { aboveL = INT_MAX; aboveC = imageVector[pixelAbove].seamvalV; aboveR = imageVector[pixelAbove + imageWindow->xStep].seamvalV; newValue = min(aboveC, aboveR); } imageVector[currentPixel].seamvalV += newValue; // // This (below) is kinda a big deal // if (imageVector[currentPixel].seamvalV > 0) { imageVector[currentPixel].seamvalV -= 1; } } static int fillSeamMatrixVertical(struct pixel *imageVector, struct window *imageWindow) { int result = 0; int currentPixel = 0; // do not process the first row, start with j=1 for (int y = (imageWindow->yOrigin + 1); \ y < imageWindow->yTerminus; y += imageWindow->xStep) { for (int x = imageWindow->xOrigin; \ x < imageWindow->xTerminus; x += imageWindow->xStep) { currentPixel = (y * imageWindow->fullWidth) + x;

63 setPixelPathVertical(imageVector, imageWindow, currentPixel, x);

if (imageVector[currentPixel].seamvalV != 0) { ++result; } } } return result; } static void findSeamsVertical(struct pixel *imageVector, \ struct window *imageWindow, int findAreas) { findSeams(imageVector, imageWindow, 0, findAreas); } static void setPixelPathHorizontal(struct pixel *imageVector, \ struct window *imageWindow, int currentPixel, int currentCol) { // avoid falling off the right if (currentCol < imageWindow->xLength) { int pixelLeft = 0; int leftT = 0; int leftM = 0; int leftB = 0; int newValue = 0;

pixelLeft = currentPixel - imageWindow->xStep; // avoid falling off the top if (currentPixel > imageWindow->xLength) { // avoid falling off the bottom if (currentPixel < (imageWindow->pixelCount - imageWindow->xLength)) { leftT = imageVector[pixelLeft - imageWindow->yStep].seamvalH; leftM = imageVector[pixelLeft].seamvalH; leftB = imageVector[pixelLeft + imageWindow->yStep].seamvalH; newValue = min3(leftT, leftM, leftB); } else { leftT = imageVector[pixelLeft - imageWindow->yStep].seamvalH; leftM = imageVector[pixelLeft].seamvalH; leftB = INT_MAX; newValue = min(leftT, leftM); } } else { leftT = INT_MAX; leftM = imageVector[pixelLeft].seamvalH; leftB = imageVector[pixelLeft + imageWindow->yStep].seamvalH; newValue = min(leftM, leftB); } imageVector[currentPixel].seamvalH += newValue; // // This (below) is kinda a big deal // if (imageVector[currentPixel].seamvalH > 0) { imageVector[currentPixel].seamvalH -= 1; } } } static int fillSeamMatrixHorizontal(struct pixel *imageVector, \ struct window *imageWindow) { int result = 0; int currentPixel = 0; // do not process the first row, start with j=1 // must be in reverse order from vertical seam, // calculate columns as we move across (top down, left to right) for (int x = imageWindow->xOrigin; \ x < imageWindow->xTerminus; x += imageWindow->xStep) { for (int y = (imageWindow->yOrigin + 1); y < imageWindow->yTerminus; y += 1) { currentPixel = (y * imageWindow->fullWidth) + x; setPixelPathHorizontal(imageVector, imageWindow, currentPixel, x);

64 if (imageVector[currentPixel].seamvalH != 0) { ++result; } } } return result; } static void findSeamsHorizontal(struct pixel *imageVector, struct window *imageWindow, \ int findAreas) { findSeams(imageVector, imageWindow, 1, findAreas); }

/* * The main function */ static int *seamCarve(int *imageVector, int imageWidth, int imageHeight, int imageDepth,\ int brightnessMode, int contrastMode, int forceDirection, int forceEdge, int preGauss) { struct pixel *workingImage = \ (struct pixel*)xmalloc((unsigned long)imageWidth * \ (unsigned long)imageHeight * sizeof(struct pixel)); int *resultImage = \ (int*)xmalloc((unsigned long)imageWidth * \ (unsigned long)imageHeight * (unsigned long)imageDepth * sizeof(int));

int invertOutput = 0;

int inputPixel = 0; int outputPixel = 0; int currentPixel = 0; int currentBrightness = 0; // fill initial data structures for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; inputPixel = currentPixel * imageDepth;

struct pixel newPixel; newPixel.r = imageVector[inputPixel]; newPixel.g = imageVector[inputPixel+1]; newPixel.b = imageVector[inputPixel+2]; newPixel.a = imageVector[inputPixel+3];

if (brightnessMode == 0) { // Average Intensity / Brightness newPixel.bright = \ ((imageVector[inputPixel] + imageVector[inputPixel+1] + \ imageVector[inputPixel+2]) / 3); } else if (brightnessMode == 1) { // HSV hexcone newPixel.bright = \ max3(imageVector[inputPixel], imageVector[inputPixel+1], \ imageVector[inputPixel+2]); } else if (brightnessMode == 2) { // Luma luminance -- sRGB / BT.709 newPixel.bright = \ (imageVector[inputPixel] * 0.21) + \ (imageVector[inputPixel+1] * 0.72) + \ (imageVector[inputPixel+2] * 0.07); } else if (brightnessMode == 3) { // Luma luminance -- NTSC / BT.601 (Digital CCIR601) newPixel.bright = \ (imageVector[inputPixel] * 0.299) + \ (imageVector[inputPixel+1] * 0.587) + \ (imageVector[inputPixel+2] * 0.114); } else if (brightnessMode == 4) { // Relative luminance (Photometric/digital ITU-R) newPixel.bright = \ (imageVector[inputPixel] * 0.2126) + \

65 (imageVector[inputPixel+1] * 0.7152) + \ (imageVector[inputPixel+2] * 0.0722); } else if (brightnessMode == 5) { // HSP? newPixel.bright = sqrt(\ (imageVector[inputPixel] * imageVector[inputPixel] * 0.299) + \ (imageVector[inputPixel+1] * imageVector[inputPixel+1] * 0.587) + \ (imageVector[inputPixel+2] * imageVector[inputPixel+2] * 0.114)); } else if (brightnessMode == 6) { // Euclidian distance newPixel.bright = pow(\ (imageVector[inputPixel] * imageVector[inputPixel]) + \ (imageVector[inputPixel+1] * imageVector[inputPixel+1]) + \ (imageVector[inputPixel+2] * imageVector[inputPixel+2]), 0.33333); } else if (brightnessMode == 7) { // Fast ITU-R newPixel.bright = \ (imageVector[inputPixel] * 0.33) + \ (imageVector[inputPixel+1] * 0.5) + \ (imageVector[inputPixel+2] * 0.16); } else if (brightnessMode == 8) { // Fast BT.601 newPixel.bright = (imageVector[inputPixel] * 0.375) + \ (imageVector[inputPixel+1] * 0.5) + \ (imageVector[inputPixel+2] * 0.125); }

newPixel.energy = 0; newPixel.seamvalH = 0; newPixel.seamvalV = 0; newPixel.usecountH = 0; newPixel.usecountV = 0; newPixel.areaBoundaryH = 0; newPixel.areaBoundaryV = 0; workingImage[currentPixel] = newPixel;

resultImage[inputPixel] = 0; } }

// binarize the image as/if requested if (contrastMode == 1) { // not really binarized, but brightness passed though a cosine function // this increases differentiation between light and dark pixels int currentBrightness = 0; double currentRadians = 0; for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i;

currentBrightness = workingImage[currentPixel].bright; currentRadians = ((double)currentBrightness / 255.0) * PI; currentBrightness = (int)(((1.0 - cos(currentRadians)) / 2.0) * 255.0); workingImage[currentPixel].bright = currentBrightness; } } } else if (contrastMode == 2) { int currentBrightness = 0; double currentRadians = 0; for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i;

currentBrightness = workingImage[currentPixel].bright; currentRadians = ((double)currentBrightness / 255.0) * PI; currentBrightness = (int)(((1.0 - cos(currentRadians)) / 2.0) * 255.0);

currentRadians = ((double)currentBrightness / 255.0) * PI; currentBrightness = (int)(((1.0 - cos(currentRadians)) / 2.0) * 255.0);

workingImage[currentPixel].bright = currentBrightness;

66 } } } else if (contrastMode == 3) { int currentBrightness = 0; double currentRadians = 0; for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i;

currentBrightness = workingImage[currentPixel].bright; currentRadians = ((double)currentBrightness / 255.0) * PI; currentBrightness = (int)(((1.0 - cos(currentRadians)) / 2.0) * 255.0);

currentRadians = ((double)currentBrightness / 255.0) * PI; currentBrightness = (int)(((1.0 - cos(currentRadians)) / 2.0) * 255.0);

currentRadians = ((double)currentBrightness / 255.0) * PI; currentBrightness = (int)(((1.0 - cos(currentRadians)) / 2.0) * 255.0);

workingImage[currentPixel].bright = currentBrightness; } } } else if (contrastMode == 4) { int currentBrightness = 0; double currentRadians = 0; for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i;

currentBrightness = workingImage[currentPixel].bright; currentRadians = ((double)currentBrightness / 255.0) * PI; currentBrightness = (int)(((1.0 - cos(currentRadians)) / 2.0) * 255.0);

currentRadians = ((double)currentBrightness / 255.0) * PI; currentBrightness = (int)(((1.0 - cos(currentRadians)) / 2.0) * 255.0);

currentRadians = ((double)currentBrightness / 255.0) * PI; currentBrightness = (int)(((1.0 - cos(currentRadians)) / 2.0) * 255.0);

currentRadians = ((double)currentBrightness / 255.0) * PI; currentBrightness = (int)(((1.0 - cos(currentRadians)) / 2.0) * 255.0);

workingImage[currentPixel].bright = currentBrightness; } } } else if (contrastMode == 5) { // Do you think anyone would ever read code in the appendix of a thesis? int bins[256]; int currentBrightness = 0;

// get historgram for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; currentBrightness = workingImage[currentPixel].bright; bins[currentBrightness] += 1; } }

int threshold = otsuBinarization(bins, (imageWidth * imageHeight));

// apply threshold for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; currentBrightness = workingImage[currentPixel].bright; if (currentBrightness > threshold) { workingImage[currentPixel].bright = 255; } else { workingImage[currentPixel].bright = 0; } }

67 } } int gaussA = 0; int gaussB = 0; int tmpDoG = 0; int tmpSobel = 0;

// get energy values using the prescribed method for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i;

if (preGauss == 1) { workingImage[currentPixel].bright = \ getPixelGaussian(workingImage, imageWidth, imageHeight, 1, \ currentPixel, 20); } else if (preGauss == 2) { workingImage[currentPixel].bright = \ getPixelGaussian(workingImage, imageWidth, imageHeight, 1, \ currentPixel, 40); } else if (preGauss == 3) { workingImage[currentPixel].bright = \ getPixelGaussian(workingImage, imageWidth, imageHeight, 1, \ currentPixel, 80); }

if (forceEdge == 0) { gaussA = getPixelGaussian(workingImage, imageWidth, imageHeight, 1, \ currentPixel, 14); gaussB = getPixelGaussian(workingImage, imageWidth, imageHeight, 1, \ currentPixel, 16); workingImage[currentPixel].energy = getPixelEnergyDoG(gaussA, gaussB); } else if (forceEdge == 1) { workingImage[currentPixel].bright = \ getPixelGaussian(workingImage, imageWidth, imageHeight, 1, \ currentPixel, 80); workingImage[currentPixel].energy = sqrt(\ getPixelEnergyLaplacian(workingImage, imageWidth, imageHeight, \ currentPixel)); } else if (forceEdge == 2) { workingImage[currentPixel].bright = \ getPixelGaussian(workingImage, imageWidth, imageHeight, 1, \ currentPixel, 40); workingImage[currentPixel].energy = sqrt(\ getPixelEnergyLaplacian(workingImage, \ imageWidth, imageHeight, currentPixel)); } else if (forceEdge == 3) { workingImage[currentPixel].bright = \ getPixelGaussian(workingImage, imageWidth, imageHeight, 1, \ currentPixel, 20); workingImage[currentPixel].energy = sqrt(\ getPixelEnergyLaplacian(workingImage, \ imageWidth, imageHeight, currentPixel)); } else if (forceEdge == 4) { workingImage[currentPixel].energy = \ getPixelEnergySobel(workingImage, imageWidth, imageHeight, currentPixel); } else if (forceEdge == 5) { workingImage[currentPixel].energy = \ getPixelGaussian(workingImage, imageWidth, imageHeight, 1, \ currentPixel, 9999); } else if (forceEdge == 6) { workingImage[currentPixel].energy = \ getPixelEnergySimple(workingImage, imageWidth, imageHeight, \ currentPixel, 1); } else if (forceEdge == 7) { gaussA = getPixelGaussian(workingImage, imageWidth, imageHeight, 1, \ currentPixel, 14); gaussB = getPixelGaussian(workingImage, imageWidth, imageHeight, 1, \ currentPixel, 16);

tmpDoG = getPixelEnergyDoG(gaussA, gaussB);

68 tmpSobel = getPixelEnergySobel(workingImage, imageWidth, \ imageHeight, currentPixel);

workingImage[currentPixel].energy = (tmpDoG + ((tmpSobel / 80) * 20)) / 2; } else if (forceEdge == 8) { workingImage[currentPixel].bright = \ getPixelGaussian(workingImage, imageWidth, imageHeight, 1, \ currentPixel, 80); tmpDoG = sqrt(\ getPixelEnergyLaplacian(workingImage, imageWidth, \ imageHeight, currentPixel)); tmpSobel = getPixelEnergySobel(workingImage, imageWidth, \ imageHeight, currentPixel);

if (tmpDoG && tmpSobel) { workingImage[currentPixel].energy = (tmpDoG + (tmpSobel / 24)); } else { workingImage[currentPixel].energy = 0; } } workingImage[currentPixel].seamvalH = workingImage[currentPixel].energy; workingImage[currentPixel].seamvalV = workingImage[currentPixel].energy; } }

/** * Any Artificial Intelligence (AI) should know Kant and the Buddha, at least * intuitively, at a minimum. AI should have a moral philosophy or ethic */ struct window *currentWindow = \ newWindow(0, 0, imageWidth, imageHeight, imageWidth, imageHeight); // find seams in the prescribed direction int resultDirection = forceDirection; if (forceDirection == 0) { int horizontalSeamCost = fillSeamMatrixHorizontal(workingImage, currentWindow); int verticalSeamCost = fillSeamMatrixVertical(workingImage, currentWindow);

findSeamsHorizontal(workingImage, currentWindow, 0); findSeamsVertical(workingImage, currentWindow, 0);

if (horizontalSeamCost < verticalSeamCost) { printf("Horizontal \n"); resultDirection = 1; } else { printf("Vertical \n"); resultDirection = 2; } } else if ((forceDirection == 1) || (forceDirection == 6) || (forceDirection == 8) || \ (forceDirection == 49)) { fillSeamMatrixHorizontal(workingImage, currentWindow); if (forceDirection == 49) { findSeamsHorizontal(workingImage, currentWindow, 1); } else { findSeamsHorizontal(workingImage, currentWindow, 0); } } else if ((forceDirection == 2) || (forceDirection == 7) || (forceDirection == 9) || \ (forceDirection == 50)) { fillSeamMatrixVertical(workingImage, currentWindow); if (forceDirection == 50) { findSeamsVertical(workingImage, currentWindow, 1); } else { findSeamsVertical(workingImage, currentWindow, 0); } } else if ((forceDirection == 4) || (forceDirection == 5)) { // pass } else { fillSeamMatrixHorizontal(workingImage, currentWindow); fillSeamMatrixVertical(workingImage, currentWindow); findSeamsHorizontal(workingImage, currentWindow, 0); findSeamsVertical(workingImage, currentWindow, 0); }

69 // prepare results for output if (resultDirection == 1) { for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; outputPixel = currentPixel * imageDepth;

if (workingImage[currentPixel].usecountH > THRESHHOLD_USECOUNT) { resultImage[outputPixel] = 255; resultImage[outputPixel+1] = 0; resultImage[outputPixel+2] = 0; resultImage[outputPixel+3] = 255; } else { resultImage[outputPixel] = workingImage[currentPixel].r / 2; resultImage[outputPixel+1] = workingImage[currentPixel].g / 2; resultImage[outputPixel+2] = workingImage[currentPixel].b / 2; resultImage[outputPixel+3] = 255; } } } } else if (resultDirection == 2) { for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; outputPixel = currentPixel * imageDepth;

if (workingImage[currentPixel].usecountV > THRESHHOLD_USECOUNT) { resultImage[outputPixel] = 255; resultImage[outputPixel+1] = 0; resultImage[outputPixel+2] = 0; resultImage[outputPixel+3] = 255; } else { resultImage[outputPixel] = workingImage[currentPixel].r / 2; resultImage[outputPixel+1] = workingImage[currentPixel].g / 2; resultImage[outputPixel+2] = workingImage[currentPixel].b / 2; resultImage[outputPixel+3] = 255; } } } } else if (resultDirection == 3) { int currentUseCount = 0; for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; outputPixel = currentPixel * imageDepth;

currentUseCount = \ workingImage[currentPixel].usecountH + \ workingImage[currentPixel].usecountV; if (currentUseCount > THRESHHOLD_USECOUNT) { resultImage[outputPixel] = 255; resultImage[outputPixel+1] = 0; resultImage[outputPixel+2] = 0; resultImage[outputPixel+3] = 255; } else { resultImage[outputPixel] = workingImage[currentPixel].r / 2; resultImage[outputPixel+1] = workingImage[currentPixel].g / 2; resultImage[outputPixel+2] = workingImage[currentPixel].b / 2; resultImage[outputPixel+3] = 255; } } } } else if (resultDirection == 4) { int currentUseCount = 0; for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; outputPixel = currentPixel * imageDepth;

resultImage[outputPixel] = min(max(workingImage[currentPixel].bright, 0), 255); resultImage[outputPixel+1] = min(max(workingImage[currentPixel].bright, 0), 255);

70 resultImage[outputPixel+2] = min(max(workingImage[currentPixel].bright, 0), 255); resultImage[outputPixel+3] = 255; } } } else if (resultDirection == 5) { int energyScale = 16; int currentUseCount = 0; for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; outputPixel = currentPixel * imageDepth;

resultImage[outputPixel] = \ min(max((workingImage[currentPixel].energy * energyScale), 0), 255); resultImage[outputPixel+1] = \ min(max((workingImage[currentPixel].energy * energyScale), 0), 255); resultImage[outputPixel+2] = \ min(max((workingImage[currentPixel].energy * energyScale), 0), 255); resultImage[outputPixel+3] = 255; } } } else if (resultDirection == 6) { int seamValueScale = 16; int currentUseCount = 0; for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; outputPixel = currentPixel * imageDepth;

currentUseCount = workingImage[currentPixel].usecountH; if (currentUseCount > 256) { resultImage[outputPixel] = 255; resultImage[outputPixel+1] = 0; resultImage[outputPixel+2] = 0; resultImage[outputPixel+3] = 255; } else { resultImage[outputPixel] = \ min(max((workingImage[currentPixel].seamvalH * seamValueScale), 0), 255); resultImage[outputPixel+1] = \ min(max((workingImage[currentPixel].seamvalH * seamValueScale), 0), 255); resultImage[outputPixel+2] = \ min(max((workingImage[currentPixel].seamvalH * seamValueScale), 0), 255); resultImage[outputPixel+3] = 255; } } } } else if (resultDirection == 7) { int seamValueScale = 16; int currentUseCount = 0; for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; outputPixel = currentPixel * imageDepth;

currentUseCount = workingImage[currentPixel].usecountV; if (currentUseCount > 256) { resultImage[outputPixel] = 255; resultImage[outputPixel+1] = 0; resultImage[outputPixel+2] = 0; resultImage[outputPixel+3] = 255; } else { resultImage[outputPixel] = \ min(max((workingImage[currentPixel].seamvalV * seamValueScale), 0), 255); resultImage[outputPixel+1] = \ min(max((workingImage[currentPixel].seamvalV * seamValueScale), 0), 255); resultImage[outputPixel+2] = \ min(max((workingImage[currentPixel].seamvalV * seamValueScale), 0), 255); resultImage[outputPixel+3] = 255; } } } } else if (resultDirection == 8) {

71 int seamValueScale = 4; int currentUseCount = 0;

for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; outputPixel = currentPixel * imageDepth;

currentUseCount = workingImage[currentPixel].usecountH; if (currentUseCount > 256) { resultImage[outputPixel] = 255; resultImage[outputPixel+1] = 0; resultImage[outputPixel+2] = 0; resultImage[outputPixel+3] = 255; } else { if (invertOutput) { resultImage[outputPixel] = \ 255-min(max((workingImage[currentPixel].usecountH), 0), 255); resultImage[outputPixel+1] = \ 255-min(max((workingImage[currentPixel].usecountH), 0), 255); resultImage[outputPixel+2] = \ 255-min(max((workingImage[currentPixel].usecountH), 0), 255); resultImage[outputPixel+3] = 255; } else { resultImage[outputPixel] = \ min(max((workingImage[currentPixel].usecountH), 0), 255); resultImage[outputPixel+1] = \ min(max((workingImage[currentPixel].usecountH), 0), 255); resultImage[outputPixel+2] = \ min(max((workingImage[currentPixel].usecountH), 0), 255); resultImage[outputPixel+3] = 255; } } } } } else if (resultDirection == 9) { int seamValueScale = 4; int currentUseCount = 0;

for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; outputPixel = currentPixel * imageDepth;

currentUseCount = workingImage[currentPixel].usecountV; if (currentUseCount > 256) { resultImage[outputPixel] = 255; resultImage[outputPixel+1] = 0; resultImage[outputPixel+2] = 0; resultImage[outputPixel+3] = 255; } else { if (invertOutput) { resultImage[outputPixel] = \ 255-min(max((workingImage[currentPixel].usecountV), 0), 255); resultImage[outputPixel+1] = \ 255-min(max((workingImage[currentPixel].usecountV), 0), 255); resultImage[outputPixel+2] = \ 255-min(max((workingImage[currentPixel].usecountV), 0), 255); resultImage[outputPixel+3] = 255; } else { resultImage[outputPixel] = \ min(max((workingImage[currentPixel].usecountV), 0), 255); resultImage[outputPixel+1] = \ min(max((workingImage[currentPixel].usecountV), 0), 255); resultImage[outputPixel+2] = \ min(max((workingImage[currentPixel].usecountV), 0), 255); resultImage[outputPixel+3] = 255; } } } } } else if (resultDirection == 49) {

72 int seamValueScale = 4; int currentUseCount = 0;

for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; outputPixel = currentPixel * imageDepth;

if (workingImage[currentPixel].areaBoundaryH == 1) { resultImage[outputPixel] = 255; resultImage[outputPixel+1] = 0; resultImage[outputPixel+2] = 0; resultImage[outputPixel+3] = 255; } else if (workingImage[currentPixel].areaBoundaryH == 2) { resultImage[outputPixel] = 0; resultImage[outputPixel+1] = 255; resultImage[outputPixel+2] = 0; resultImage[outputPixel+3] = 255; } else if (workingImage[currentPixel].areaBoundaryH == 3) { resultImage[outputPixel] = 0; resultImage[outputPixel+1] = 0; resultImage[outputPixel+2] = 255; resultImage[outputPixel+3] = 255; } else { resultImage[outputPixel] = workingImage[currentPixel].r; resultImage[outputPixel+1] = workingImage[currentPixel].g; resultImage[outputPixel+2] = workingImage[currentPixel].b; resultImage[outputPixel+3] = 255; } } } } else if (resultDirection == 50) { int seamValueScale = 4; int currentUseCount = 0;

for (int j = 0; j < imageHeight; ++j) { for (int i = 0; i < imageWidth; ++i) { currentPixel = (j * imageWidth) + i; outputPixel = currentPixel * imageDepth;

if (workingImage[currentPixel].areaBoundaryV == 1) { resultImage[outputPixel] = 255; resultImage[outputPixel+1] = 0; resultImage[outputPixel+2] = 0; resultImage[outputPixel+3] = 255; } else if (workingImage[currentPixel].areaBoundaryV == 2) { resultImage[outputPixel] = 0; resultImage[outputPixel+1] = 255; resultImage[outputPixel+2] = 0; resultImage[outputPixel+3] = 255; } else if (workingImage[currentPixel].areaBoundaryV == 3) { resultImage[outputPixel] = 0; resultImage[outputPixel+1] = 0; resultImage[outputPixel+2] = 255; resultImage[outputPixel+3] = 255; } else { resultImage[outputPixel] = workingImage[currentPixel].r; resultImage[outputPixel+1] = workingImage[currentPixel].g; resultImage[outputPixel+2] = workingImage[currentPixel].b; resultImage[outputPixel+3] = 255; } } } } return resultImage; }

#endif

73 APPENDIX D

PNG IMPORT FUNCTIONS — LIBPNGHELPER.C

/** * libpngHelper.c * Masters Thesis Work * Christopher Stoll, 2014 */ #ifndef LIBPNGHELPER_C #define LIBPNGHELPER_C

#include #include #include #include

#define PNG_BYTES_TO_CHECK 4

/** * Uses libpng to read in a PNG file * @param filename The name of the PNG file to read in * @param imageWidth Reference to an integer which will store the image's width * @param imageHeight Reference to an integer which will store the image's height * @param verbosity Whether or not to print error messages to stderr * @return Returns an array of integers representing the image pixels */ static int *readPNGFile(char *filename, int *imageWidth, int *imageHeight, int *imageDepth, int verbosity) { FILE *pngfile = fopen(filename, "rb"); if(!pngfile) { if(verbosity > 0) { fprintf(stderr, "Error opening file: %s\n", filename); } return NULL; }

char header[PNG_BYTES_TO_CHECK]; fread(header, 1, PNG_BYTES_TO_CHECK, pngfile); int is_png = !png_sig_cmp((png_bytep)header, 0, PNG_BYTES_TO_CHECK); if (!is_png) { if(verbosity > 0) { fprintf(stderr, "Invalid PNG file: %s\n", filename); } return NULL; }

png_structp png = png_create_read_struct(PNG_LIBPNG_VER_STRING, NULL, NULL, NULL); if(!png) { return NULL; }

png_infop info = png_create_info_struct(png); if(!info) { return NULL; }

74 if(setjmp(png_jmpbuf(png))) { return NULL; } png_init_io(png, pngfile); png_set_sig_bytes(png, PNG_BYTES_TO_CHECK); png_read_info(png, info); int width = (int)png_get_image_width(png, info); int height = (int)png_get_image_height(png, info); png_byte color_type = png_get_color_type(png, info); png_byte bit_depth = png_get_bit_depth(png, info);

// Read any color_type into 8bit depth, RGBA format. // See http://www.libpng.org/pub/png/libpng-manual.txt if(bit_depth == 16) png_set_strip_16(png); if(color_type == PNG_COLOR_TYPE_PALETTE) png_set_palette_to_rgb(png);

// PNG_COLOR_TYPE_GRAY_ALPHA is always 8 or 16bit depth. if(color_type == PNG_COLOR_TYPE_GRAY && bit_depth < 8) png_set_expand_gray_1_2_4_to_8(png); if(png_get_valid(png, info, PNG_INFO_tRNS)) png_set_tRNS_to_alpha(png);

// These color_type don't have an alpha channel then fill it with 0xff. if(color_type == PNG_COLOR_TYPE_RGB || color_type == PNG_COLOR_TYPE_GRAY || color_type == PNG_COLOR_TYPE_PALETTE) png_set_filler(png, 0xFF, PNG_FILLER_AFTER); if(color_type == PNG_COLOR_TYPE_GRAY || color_type == PNG_COLOR_TYPE_GRAY_ALPHA) png_set_gray_to_rgb(png); png_read_update_info(png, info); png_bytep *row_pointers; row_pointers = (png_bytep*)malloc((unsigned long)height * sizeof(png_bytep)); for(int y = 0; y < height; y++) { row_pointers[y] = (png_byte*)malloc(png_get_rowbytes(png,info)); } png_read_image(png, row_pointers); fclose(pngfile); int pixelDepth = 4; int *imagePixels = \ (int*)malloc((unsigned long)height * \ (unsigned long)width * (unsigned long)pixelDepth * sizeof(int)); int n = 0; int rPixel = 0; int bPixel = 0; int gPixel = 0; int aPixel = 0; double greyPixel = 0.0; double radianShift = 3.14159265; double radianRange = 3.14159265; double radianPixel = 0; double scaledPixel = 0; for(int y = 0; y < height; ++y) { png_bytep row = row_pointers[y]; for(int x = 0; x < width; ++x) { png_bytep pixel = &(row[x * 4]); rPixel = (int)pixel[0];

75 gPixel = (int)pixel[1]; bPixel = (int)pixel[2]; aPixel = (int)pixel[3];

n = ((y * width) + x) * pixelDepth;

imagePixels[n] = rPixel; imagePixels[n+1] = gPixel; imagePixels[n+2] = bPixel; imagePixels[n+3] = aPixel; } }

// release png memory for(int y = 0; y < height; y++) { free(row_pointers[y]); } free(row_pointers);

*imageWidth = width; *imageHeight = height; *imageDepth = pixelDepth; return imagePixels; }

/** * Three marks of existence, characteristics shared by all sentient beings * 1. sabbe saṅkhāra aniccā — all saṅkhāras (conditioned things) are impermanent * 2. sabbe saṅkhāra dukkhā — all saṅkhāras are unsatisfactory (suffering) * 3. sabbe dhammā anattā — all dhammas (conditioned/unconditioned things) are not self * ☸ */ static void write_png_file(int *imageVector, int width, int height, char *filename) { FILE *fp = fopen(filename, "wb"); if(!fp) abort();

png_structp png = png_create_write_struct(PNG_LIBPNG_VER_STRING, NULL, NULL, NULL); if (!png) abort();

png_infop info = png_create_info_struct(png); if (!info) abort();

if (setjmp(png_jmpbuf(png))) abort();

png_init_io(png, fp);

// Output is 8bit depth, RGBA format. png_set_IHDR( png, info, (png_uint_32)width, (png_uint_32)height, 8, PNG_COLOR_TYPE_RGBA, PNG_INTERLACE_NONE, PNG_COMPRESSION_TYPE_DEFAULT, PNG_FILTER_TYPE_DEFAULT ); png_write_info(png, info);

png_bytep *row_pointers; row_pointers = (png_bytep*)malloc((unsigned long)height * sizeof(png_bytep)); for(int y = 0; y < height; y++) { row_pointers[y] = (png_byte*)malloc(png_get_rowbytes(png,info)); }

int i = 0; for(int y = 0; y < height; y++) { for(int z = 0; z < width; z++) { row_pointers[y][z*4+0] = (png_byte)imageVector[i]; row_pointers[y][z*4+1] = (png_byte)imageVector[i+1]; row_pointers[y][z*4+2] = (png_byte)imageVector[i+2];

76 row_pointers[y][z*4+3] = (png_byte)imageVector[i+3]; i += 4; } }

png_write_image(png, row_pointers); png_write_end(png, NULL);

for(int y = 0; y < height; y++) { free(row_pointers[y]); } free(row_pointers);

fclose(fp); }

#endif

77 APPENDIX E

IMAGE RESIZE FUNCTIONS — LIBRESIZE.C

/** * libResize.c * Masters Thesis Work * Christopher Stoll, 2014 */ #ifndef LIBRESIZE_C #define LIBRESIZE_C double linearInterpolation(double s, double e, double t) { return s + (e - s) * t; } double bilinearInterpolation(double c00, double c10, double c01, double c11, \ double tx, double ty) { return linearInterpolation(linearInterpolation(c00, c10, tx), \ linearInterpolation(c01, c11, tx), ty); } static void scaleBilinearBW(int *srcImgVector, int srcImgWidth, int srcImgHeight, \ int *dstImgVector, int dstImgWidth, int dstImgHeight) { const double dblSrcW = (double)srcImgWidth; const double dblSrcH = (double)srcImgHeight; const double dblDstW = (double)dstImgWidth; const double dblDstH = (double)dstImgHeight; const double scaleX = dblSrcW / dblDstW; const double scaleY = dblSrcH / dblDstH;

double dblX = 0; double dblY = 0;

double gx = 0; double gy = 0; int gxi = 0; int gyi = 0;

int location00 = 0; int location10 = 0; int location01 = 0; int location11 = 0;

double pixel00 = 0; double pixel10 = 0; double pixel01 = 0; double pixel11 = 0;

double result = 0; int currentPixel = 0;

for (int y = 0; y < dstImgHeight; ++y) {

78 for (int x = 0; x < dstImgWidth; ++x) { dblX = (double)x; dblY = (double)y;

gx = dblX * scaleX; gy = dblY * scaleY; gxi = (int)gx; gyi = (int)gy;

location00 = (gyi * srcImgWidth) + gxi; location10 = (gyi * srcImgWidth) + gxi + 1; location01 = ((gyi + 1) * srcImgWidth) + gxi; location11 = ((gyi + 1) * srcImgWidth) + gxi + 1;

pixel00 = srcImgVector[location00]; pixel10 = srcImgVector[location10]; pixel01 = srcImgVector[location01]; pixel11 = srcImgVector[location11];

result = bilinearInterpolation(pixel00, pixel10, pixel01, pixel11, \ (gx - gxi), (gy -gyi));

currentPixel = (y * dstImgWidth) + x; dstImgVector[currentPixel] = result; } } } static void resize(int *srcImgVector, int srcImgWidth, int srcImgHeight, \ int *dstImgVector, int dstImgWidth, int dstImgHeight) { scaleBilinearBW(srcImgVector, srcImgWidth, srcImgHeight, \ dstImgVector, dstImgWidth, dstImgHeight); } static int getScaledSize(int srcSize, double scalePercentage) { double srcSizeDbl = (double)srcSize; double resultSizeDbl = srcSizeDbl * scalePercentage; return (int)resultSizeDbl; } static void scale(int *srcImgVector, int srcImgWidth, int srcImgHeight, \ double scalePercentage, int *dstImgVector) { int dstImgWidth = getScaledSize(srcImgWidth, scalePercentage); int dstImgHeight = getScaledSize(srcImgHeight, scalePercentage); scaleBilinearBW(srcImgVector, srcImgWidth, srcImgHeight, \ dstImgVector, dstImgWidth, dstImgHeight); }

#endif

79 APPENDIX F

IMAGE BINARIZATION — LIBBINARIZATION.C

/** * libBinarization.c * Masters Thesis Work * Christopher Stoll, 2014 */ #ifndef LIBBINARIZATION_C #define LIBBINARIZATION_C static int otsuBinarization(int *histogram, int pixelCount) { int sum = 0; for (int i = 1; i < 256; ++i) { sum += i * histogram[i]; }

/** * Immanuel Kant, The Categorical Imperative * 1. Act only according to that maxim whereby you can at the same time will * that it should become a universal law. * 2. Act in such a way that you treat humanity, whether in your own person or * in the person of any other, never merely as a means to an end, but always * at the same time as an end. * 3. Therefore, every rational being must so act as if he were through his * maxim always a legislating member in the universal kingdom of ends. */

int sumB = 0; int wB = 0; int wF = 0; int mB; int mF; double max = 0.0; double between = 0.0; double threshold1 = 0.0; double threshold2 = 0.0;

for (int i = 0; i < 256; ++i) { wB += histogram[i];

if (wB) { wF = pixelCount - wB;

if (wF == 0) { break; }

sumB += i * histogram[i];

mB = sumB / wB; mF = (sum - sumB) / wF; between = wB * wF * pow(mB - mF, 2);

80 if ( between >= max ) { threshold1 = i; if ( between > max ) { threshold2 = i; } max = between; } } } return ( threshold1 + threshold2 ) / 2.0; }

#endif

81 APPENDIX G

IMAGE ENERGY FUNCTIONS — LIBENERGIES.C

/** * libEnergies.c * Masters Thesis Work * Christopher Stoll, 2014 */ #ifndef LIBENERGIES_C #define LIBENERGIES_C

#include "pixel.h" #include "libMinMax.c"

// Simple energy function, basically a gradient magnitude calculation static int getPixelEnergySimple(struct pixel *imageVector, int imageWidth, \ int imageHeight, int currentPixel, int gradientSize) { // We can pull from two pixels above instead of summing one above and one below int pixelAbove = 0; if (currentPixel > (imageWidth * gradientSize)) { pixelAbove = currentPixel - (imageWidth * gradientSize); }

int yDif = 0; if (imageVector[pixelAbove].bright > imageVector[currentPixel].bright) { yDif = imageVector[pixelAbove].bright - imageVector[currentPixel].bright; } else { yDif = imageVector[currentPixel].bright - imageVector[pixelAbove].bright; }

int pixelLeft = 0; pixelLeft = currentPixel - gradientSize; if (pixelLeft < 0) { pixelLeft = 0; }

int pixelCol = currentPixel % imageWidth; int xDif = 0; if (pixelCol > 0) { if (imageVector[pixelLeft].bright > imageVector[currentPixel].bright) { xDif = imageVector[pixelLeft].bright - imageVector[currentPixel].bright; } else { xDif = imageVector[currentPixel].bright - imageVector[pixelLeft].bright; } } return min((yDif + xDif), 255); } static int getPixelEnergySobel(struct pixel *imageVector, int imageWidth, \ int imageHeight, int currentPixel) { int pixelDepth = 1; int imageByteWidth = imageWidth * pixelDepth; int currentCol = currentPixel % imageByteWidth;

82 int p1, p2, p3, p4, p5, p6, p7, p8, p9;

// get pixel locations within the image array // image border pixels have undefined (zero) energy if ((currentPixel > imageByteWidth) && (currentPixel < (imageByteWidth * (imageHeight - 1))) && (currentCol > 0) && (currentCol < (imageByteWidth - pixelDepth))) { p1 = currentPixel - imageByteWidth - pixelDepth; p2 = currentPixel - imageByteWidth; p3 = currentPixel - imageByteWidth + pixelDepth; p4 = currentPixel - pixelDepth; p5 = currentPixel; p6 = currentPixel + pixelDepth; p7 = currentPixel + imageByteWidth - pixelDepth; p8 = currentPixel + imageByteWidth; p9 = currentPixel + imageByteWidth + pixelDepth; } else { // TODO: consider attempting to evaluate border pixels return 0;//33; // zero and INT_MAX are significant, so return 1 }

// // Declaration of the Rights of Man and of the Citizen // (Déclaration des droits de l'homme et du citoyen) // // Article I - Men (and women) are born and remain free and equal in rights. // Social distinctions can be founded only on the common good. // Article II - The goal of any political association is the conservation of // the natural and imprescriptible rights of man. These rights are // liberty, property, safety and resistance against oppression. // Article III - The principle of any sovereignty resides essentially in the Nation. // No body, no individual can exert authority which does not emanate // expressly from it. // Article IV - Liberty consists of doing anything which does not harm others: // thus, the exercise of the natural rights of each man has only those // borders which assure other members of the society the enjoyment of // these same rights. These borders can be determined only by the law. // Article V - The law has the right to forbid only actions harmful to society. // Anything which is not forbidden by the law cannot be impeded, and // no one can be constrained to do what it does not order. // Article VI - The law is the expression of the general will. All the citizens have // the right of contributing personally or through their representatives // to its formation. It must be the same for all, either that it // protects, or that it punishes. All the citizens, being equal in its // eyes, are equally admissible to all public dignities, places and // employments, according to their capacity and without distinction // other than that of their virtues and of their talents. // Article VII - No man can be accused, arrested nor detained but in the cases // determined by the law, and according to the forms which it has // prescribed. Those who solicit, dispatch, carry out or cause to be // carried out arbitrary orders, must be punished; but any citizen // called or seized under the terms of the law must obey at once; he // renders himself culpable by resistance. // Article VIII - The law should establish only penalties that are strictly and // evidently necessary, and no one can be punished but under a law // established and promulgated before the offense and legally applied. //…

// get the pixel values from the image array int p1val = imageVector[p1].bright; int p2val = imageVector[p2].bright; int p3val = imageVector[p3].bright; int p4val = imageVector[p4].bright; int p5val = imageVector[p5].bright; int p6val = imageVector[p6].bright; int p7val = imageVector[p7].bright; int p8val = imageVector[p8].bright; int p9val = imageVector[p9].bright;

// apply the sobel filter int sobelX = (p3val + (p6val + p6val) + p9val - p1val - (p4val + p4val) - p7val);

83 int sobelY = (p1val + (p2val + p2val) + p3val - p7val - (p8val + p8val) - p9val);

// bounded gradient magnitude return min(max((int)(sqrt((sobelX * sobelX) + (sobelY * sobelY))/2), 0), 255); } static int getPixelEnergyLaplacian(struct pixel *imageVector, int imageWidth, \ int imageHeight, int currentPixel) { int pixelDepth = 1; int imageByteWidth = imageWidth * pixelDepth; int currentCol = currentPixel % imageByteWidth; int p1, p2, p3, p4, p5, p6, p7, p8, p9;

// get pixel locations within the image array // image border pixels have undefined (zero) energy if ((currentPixel > imageByteWidth) && (currentPixel < (imageByteWidth * (imageHeight - 1))) && (currentCol > 0) && (currentCol < (imageByteWidth - pixelDepth))) { p1 = currentPixel - imageByteWidth - pixelDepth; p2 = currentPixel - imageByteWidth; p3 = currentPixel - imageByteWidth + pixelDepth; p4 = currentPixel - pixelDepth; p5 = currentPixel; p6 = currentPixel + pixelDepth; p7 = currentPixel + imageByteWidth - pixelDepth; p8 = currentPixel + imageByteWidth; p9 = currentPixel + imageByteWidth + pixelDepth; } else { // TODO: consider attempting to evaluate border pixels return 0;//33; // zero and INT_MAX are significant, so return 1 }

//… // Article IX - Any man being presumed innocent until he is declared culpable, if it // is judged indispensable to arrest him, any rigor which would not be // necessary for the securing of his person must be severely reprimanded // by the law. // Article X - No one may be disturbed for his opinions, even religious ones, // provided that their manifestation does not trouble the public order // established by the law. // Article XI - The free communication of thoughts and of opinions is one of the most // precious rights of man: any citizen thus may speak, write, print // freely, except to respond to the abuse of this liberty, in the cases // determined by the law. // Article XII - The guarantee of the rights of man and of the citizen necessitates a // public force: this force is thus instituted for the advantage of all // and not for the particular utility of those in whom it is trusted. // Article XIII - For the maintenance of the public force and for the expenditures of // administration, a common contribution is indispensable; it must be // equally distributed between all the citizens, according to their // ability to pay. //…

// get the pixel values from the image array int p1val = imageVector[p1].bright; int p2val = imageVector[p2].bright; int p3val = imageVector[p3].bright; int p4val = imageVector[p4].bright; int p5val = imageVector[p5].bright; int p6val = imageVector[p6].bright; int p7val = imageVector[p7].bright; int p8val = imageVector[p8].bright; int p9val = imageVector[p9].bright;

// apply the sobel filter int laplace = (4 * p5val) - p1val - p4val - p6val - p8val; return min(max(laplace, 0), 255); } static int getPixelGaussian(struct pixel *imageVector, int imageWidth, \

84 int imageHeight, int pixelDepth, int currentPixel, int sigma) { int imageByteWidth = imageWidth * pixelDepth; int points[25]; double pointValues[25];

points[0] = currentPixel - imageByteWidth - imageByteWidth - pixelDepth - pixelDepth; points[1] = currentPixel - imageByteWidth - imageByteWidth - pixelDepth; points[2] = currentPixel - imageByteWidth - imageByteWidth; points[3] = currentPixel - imageByteWidth - imageByteWidth + pixelDepth; points[4] = currentPixel - imageByteWidth - imageByteWidth + pixelDepth + pixelDepth; points[5] = currentPixel - imageByteWidth - pixelDepth - pixelDepth; points[6] = currentPixel - imageByteWidth - pixelDepth; points[7] = currentPixel - imageByteWidth; points[8] = currentPixel - imageByteWidth + pixelDepth; points[9] = currentPixel - imageByteWidth + pixelDepth + pixelDepth; points[10] = currentPixel - pixelDepth - pixelDepth; points[11] = currentPixel - pixelDepth; points[12] = currentPixel; points[13] = currentPixel + pixelDepth; points[14] = currentPixel + pixelDepth + pixelDepth; points[15] = currentPixel + imageByteWidth - pixelDepth - pixelDepth; points[16] = currentPixel + imageByteWidth - pixelDepth; points[17] = currentPixel + imageByteWidth; points[18] = currentPixel + imageByteWidth + pixelDepth; points[19] = currentPixel + imageByteWidth + pixelDepth + pixelDepth; points[20] = currentPixel + imageByteWidth + imageByteWidth - pixelDepth - pixelDepth; points[21] = currentPixel + imageByteWidth + imageByteWidth - pixelDepth; points[22] = currentPixel + imageByteWidth + imageByteWidth; points[23] = currentPixel + imageByteWidth + imageByteWidth + pixelDepth; points[24] = currentPixel + imageByteWidth + imageByteWidth + pixelDepth + pixelDepth;

// TODO: this is wrong, fix it for (int i = 0; i < 25; ++i) { if (points[i] < 0) { points[i] = 0; } else if (points[i] >= (imageHeight * imageWidth * pixelDepth)) { points[i] = (imageHeight * imageWidth * pixelDepth); } }

//… // Article XIV - Each citizen has the right to ascertain, by himself or through his // representatives, the need for a public tax, to consent to it freely, // to know the uses to which it is put, and of determining the // proportion, basis, collection, and duration. // Article XV - The society has the right of requesting account from any public agent // of its administration. // Article XVI - Any society in which the guarantee of rights is not assured, nor the // separation of powers determined, has no Constitution. // Article XVII - Property being an inviolable and sacred right, no one can be deprived // of private usage, if it is not when the public necessity, legally // noted, evidently requires it, and under the condition of a just and // prior indemnity. //

// get the pixel values from the image array pointValues[0] = (double)imageVector[points[0]].bright; pointValues[1] = (double)imageVector[points[1]].bright; pointValues[2] = (double)imageVector[points[2]].bright; pointValues[3] = (double)imageVector[points[3]].bright; pointValues[4] = (double)imageVector[points[4]].bright; pointValues[5] = (double)imageVector[points[5]].bright; pointValues[6] = (double)imageVector[points[6]].bright; pointValues[7] = (double)imageVector[points[7]].bright; pointValues[8] = (double)imageVector[points[8]].bright; pointValues[9] = (double)imageVector[points[9]].bright; pointValues[10] = (double)imageVector[points[10]].bright; pointValues[11] = (double)imageVector[points[11]].bright; pointValues[12] = (double)imageVector[points[12]].bright; pointValues[13] = (double)imageVector[points[13]].bright; pointValues[14] = (double)imageVector[points[14]].bright;

85 pointValues[15] = (double)imageVector[points[15]].bright; pointValues[16] = (double)imageVector[points[16]].bright; pointValues[17] = (double)imageVector[points[17]].bright; pointValues[18] = (double)imageVector[points[18]].bright; pointValues[19] = (double)imageVector[points[19]].bright; pointValues[20] = (double)imageVector[points[20]].bright; pointValues[21] = (double)imageVector[points[21]].bright; pointValues[22] = (double)imageVector[points[22]].bright; pointValues[23] = (double)imageVector[points[23]].bright; pointValues[24] = (double)imageVector[points[24]].bright; double gaussL1 = 0.0; double gaussL2 = 0.0; double gaussL3 = 0.0; double gaussL4 = 0.0; double gaussL5 = 0.0; double gaussAll = 0.0; double gaussDvsr = 1.0; double weights[25];

// scale courtesy: http://dev.theomader.com/gaussian-kernel-calculator/ if (sigma == 9999) { // LoG -- Laplacian of Gaussian weights[0] = 0; weights[1] = 0; weights[2] = -1; weights[6] = -1; weights[7] = -2; weights[12] = 16; } else if (sigma == 80) { // scaling factor / standard deviation / sigma = 8.0 weights[0] = 0.038764; weights[1] = 0.039682; weights[2] = 0.039993; weights[6] = 0.040622; weights[7] = 0.040940; weights[12] = 0.041261; } else if (sigma == 40) { // scaling factor / standard deviation / sigma = 4.0 weights[0] = 0.035228; weights[1] = 0.038671; weights[2] = 0.039892; weights[6] = 0.042452; weights[7] = 0.043792; weights[12] = 0.045175; } else if (sigma == 20) { // scaling factor / standard deviation / sigma = 2.0 weights[0] = 0.023528; weights[1] = 0.033969; weights[2] = 0.038393; weights[6] = 0.049045; weights[7] = 0.055432; weights[12] = 0.062651; } else if (sigma == 16) { // scaling factor / standard deviation / sigma = 1.6 weights[0] = 0.017056; weights[1] = 0.030076; weights[2] = 0.036334; weights[6] = 0.053035; weights[7] = 0.064071; weights[12] = 0.077404; } else if (sigma == 14) { // scaling factor / standard deviation / sigma = 1.4 gaussDvsr = 159; weights[0] = 2; weights[1] = 4; weights[2] = 5; weights[6] = 9; weights[7] = 12; weights[12] = 15; } else if (sigma == 12) { // scaling factor / standard deviation / sigma = 1.2

86 weights[0] = 0.008173; weights[1] = 0.021861; weights[2] = 0.030337; weights[6] = 0.058473; weights[7] = 0.081144; weights[12] = 0.112606; } else if (sigma == 10) { // scaling factor / standard deviation / sigma = 1 gaussDvsr = 273; weights[0] = 1; weights[1] = 4; weights[2] = 7; weights[6] = 16; weights[7] = 26; weights[12] = 41; } else { weights[0] = 1; weights[1] = 2; weights[2] = 4; weights[6] = 8; weights[7] = 16; weights[12] = 32; } // line 1 has 2 duplicated values weights[3] = weights[1]; weights[4] = weights[0]; // line 2 has 3 duplicated values weights[5] = weights[1]; weights[8] = weights[6]; weights[9] = weights[5]; // line 3 has 4 duplicated values weights[10] = weights[2]; weights[11] = weights[7]; weights[13] = weights[11]; weights[14] = weights[10]; // line 4 is the same as line 2 weights[15] = weights[5]; weights[16] = weights[6]; weights[17] = weights[7]; weights[18] = weights[8]; weights[19] = weights[9]; // line 5 is the same as line 1 weights[20] = weights[0]; weights[21] = weights[1]; weights[22] = weights[2]; weights[23] = weights[3]; weights[24] = weights[4]; gaussL1 = \ (weights[1] * pointValues[0]) + \ (weights[1] * pointValues[1]) + \ (weights[2] * pointValues[2]) + \ (weights[3] * pointValues[3]) + \ (weights[4] * pointValues[4]); gaussL2 = \ (weights[5] * pointValues[5]) + \ (weights[6] * pointValues[6]) + \ (weights[7] * pointValues[7]) + \ (weights[8] * pointValues[8]) + \ (weights[9] * pointValues[9]); gaussL3 = \ (weights[10] * pointValues[10]) + \ (weights[11] * pointValues[11]) + \ (weights[12] * pointValues[12]) + \ (weights[13] * pointValues[13]) + \ (weights[14] * pointValues[14]); gaussL4 = \ (weights[15] * pointValues[15]) + \ (weights[16] * pointValues[16]) + \ (weights[17] * pointValues[17]) + \ (weights[18] * pointValues[18]) + \ (weights[19] * pointValues[19]);

87 gaussL5 = \ (weights[20] * pointValues[20]) + \ (weights[21] * pointValues[21]) + \ (weights[22] * pointValues[22]) + \ (weights[23] * pointValues[23]) + \ (weights[24] * pointValues[24]); gaussAll = (gaussL1 + gaussL2 + gaussL3 + gaussL4 + gaussL5) / gaussDvsr; return min(max((int)gaussAll, 0), 255); } static int getPixelEnergyDoG(int gaussianValue1, int gaussianValue2) { double greyPixel = 0.0; if (gaussianValue1 > gaussianValue2) { greyPixel = (gaussianValue1 - gaussianValue2); } else { greyPixel = (gaussianValue2 - gaussianValue1); } return min(max(greyPixel, 0), 255); }

#endif

88 APPENDIX H

PIXEL DATA STRUCTURE — PIXEL.H

/** * pixel.h * Masters Thesis Work * Christopher Stoll, 2014 */ #ifndef PIXEL_H #define PIXEL_H struct pixel { int r; int g; int b; int a; int bright; int energy; int seamvalH; int seamvalV; int usecountH; int usecountV; int areaBoundaryH; int areaBoundaryV; };

#endif

89 APPENDIX I

WINDOW DATA STRUCTURE — WINDOW.H

/** * window.h * Masters Thesis Work: OCR * Christopher Stoll, 2015 */ #ifndef WINDOW_H #define WINDOW_H

#include "libWrappers.c" struct window { int fullWidth; int fullHeight; int xOrigin; int yOrigin; int xLength; int yLength; int xTerminus; int yTerminus; int xStep; int yStep; int firstPixel; int lastPixel; int pixelCount; }; struct window *newWindow(int x, int y, int width, int height, \ int fullWidth, int fullHeight) { struct window *newWindow = (struct window*)xmalloc(sizeof(struct window));

newWindow->fullWidth = fullWidth; newWindow->fullHeight = fullHeight; newWindow->xOrigin = x; newWindow->yOrigin = y; newWindow->xLength = width; newWindow->yLength = height; newWindow->xTerminus = x + width; newWindow->yTerminus = y + height; newWindow->xStep = 1; newWindow->yStep = fullWidth; newWindow->firstPixel = (newWindow->yOrigin * fullWidth) + newWindow->xOrigin; newWindow->lastPixel = (newWindow->yTerminus * fullWidth) + newWindow->xTerminus; newWindow->pixelCount = width * height;

if (newWindow->xTerminus > newWindow->fullWidth) { printf("TODO: Handle this error -- xTerminus > fullWidth\n"); } if (newWindow->yTerminus > newWindow->fullHeight) { printf("TODO: Handle this error -- yTerminus > fullHeight\n"); }

90 return newWindow; } void freeWindow(struct window *thisWindow) { if (thisWindow) { free(thisWindow); } }

#endif

91 APPENDIX J

UTILITIES — LIBMINMAX.C

/** * libMinMax.c * Masters Thesis Work * Christopher Stoll, 2014 */ #ifndef LIBMINMAX_C #define LIBMINMAX_C static inline int max(int a, int b) { if (a > b) { return a; } else { return b; } } static inline int max3(int a, int b, int c) { if (a > b) { if (a > c) { return a; } else { return c; } } else { if (b > c) { return b; } else { return c; } } } static inline int min(int a, int b) { if (a < b) { return a; } else { return b; } } static inline int min3(int a, int b, int c) { if (a < b) { if (a < c) { return a; } else { return c; } } else { if (b < c) { return b; } else {

92 return c; } } }

#endif

/* https://keybase.io/stollcri/key.asc -----BEGIN PGP PUBLIC KEY BLOCK----- Version: Keybase OpenPGP v1.0.5 Comment: https://keybase.io/stollcri xsFNBFQjA+EBEACVzyTTITuDWvfeyb9+/KGHSC5cmmgjXYqWGpJf7046k7Hy/8Dv BHslcOeaaHQfDc5+qBB3R1J6NmX/C2+eRg6+jZ/SUSe1Hb+tknHuYtuK94m8/JTd Mx5et5bIFRbCHM0oW9XUHgP4s0GQKtVTL9kDkbMF1Z7SVlbY7ImIvmtIdmIOt+Ua ReWmnqwGEC9jY8Yppp+FZe9HZ0TYN5cGwaXuqVO44YtZEm1Pd0ApXqOoklc7tm+B iNlazFhffZv6X1/9tuhJs6UPGZG74zD3hc5e67u3pN0NuIJ/kcwy5iFchuuumEoF 4wv42JMa0jjkHc5Py0RYLwM056hQaRWItniUfX/a9ahDNmr2kHlToKXTXuO2pTNX 7qnIgkIy89dqVJMoN7KJLZ9THvwbxHq2SSrfejk6E9IvwrqG1278B5viu7aIYvNA 43d3fcqtsMTe3bvczKcSRdNfuHNg/a3VFoW3LEK6+OhCW2Q/U5EcoIxE5oFdgdWt N20VgVYlDYEmbYDVujD3gGLrMaYBWf1rR2QnWAgp9nvyZgrNYXejvv1oxe0jeoen keVsiUQYEEwUIdNB1DMaz6SgJoEeh12Bx9/eTKQzOoZMb0xAIc72oclCIKg+XeOd WsAuPGCLWC+nBO22H+4NAf9yPxbBsT88GW+l7na9JBp9jENcWsQNwuWhLQARAQAB zSlrZXliYXNlLmlvL3N0b2xsY3JpIDxzdG9sbGNyaUBrZXliYXNlLmlvPsLBbQQT AQoAFwUCVCMD4QIbLwMLCQcDFQoIAh4BAheAAAoJEA0Bqo9RsuPq3DUQAIJvfm6o 7arweMSr+dIv0TwQDU9ZsTGCsJ14+B75cROFWN5O4yuD+QxTVZdzcWOKDhENrfMx jgl04yMJU0vuwXMDml+ZTbmm18ihGcq0O5JXoDLWfYx0SJjJXz4NHEXuylYOaj7s DnoyQxT99A0vNTlG0DdnVv3WRYhYtYm4Dxe1TXT4jvgKR/QmsC51Ib0ogkxT9L8E rKLY6+KDytY1L1/eQ78W0fRz/6ojTKJ1ic9QPdtO/a/fQlUKG+FwIAuWBM25gPSI yYomnTFRfV9QvuXlaHocrLbXmTE9rU4v4V12C5zGjWw/uA5pQWCYGWKK9iMbAlql n+aSqMGH7yNHJ2wYC0jnfNSABRNtj5SqfPgxlDxlJdK0552n5e1h90wH3LFmexwE X8KMaXQLbujOH2+HhYWfB+zdFDcmw/rmks5VLX+yU619N9od7PILM+VfG4ptmyG4 KchqXhoyiDjZ1YlW+qU4w1FAM2d7bpb2M9vddWRm6s3BvAcJZBIs0DrRU7QNOcYv ePRsiYfpFIfK7VpPgMIS1gAwMxUmiBlpURIS7NfhAQJXZcXoIyQYG18+YpSkw4wN Pm3D3UPTf35ZtErFyqtSUsi7Z8lciFKIbXDvg5ImCYcSdBujTtZzB7obfJQc6L9v TWKW5nbfSeL68WQi0aS4hvSD7g1C5I5hFARVzsBNBFQjA+EBCAClpHdaE2TMMpqC HrjmYWAUD9wiklNzUtIElz4MduF3b0y5Z1rxL1Dg/+89UVZvpjYZqM+1d/jUPfjD 659Ln/2MoEBIv3hs7wCo7yBe75XbTsci2qhSPKjoTMOmTp18Lh3H+eOyanQqClPh rT3sbkIuEc1zKYtuYjQnL5dgG01TeorKFDH7vr/0E5XLxaV604n7ewKiEiqQnvQE pBnGGTBvsUDpoHWbXQrdG1QAYP5wug3NA020vHwCdicDWO8YzO5nMLgz1UkG9yCr uD6CyqY08aaXukZq+MsxY4kWepkR+bT9v37BqNPrZEopbBR7Yc4P6gPOdUzS9bD4 LVgN0B2dABEBAAHCwoQEGAEKAA8FAlQjA+EFCQ8JnAACGwIBKQkQDQGqj1Gy4+rA XSAEGQEKAAYFAlQjA+EACgkQnatmXo9ruwtehgf/YKj+EWHLIuCW8D7AerCqtQAz NBuXb2OgWs+O+k0WC+BaGXgztuOkPpcuvzBuOIv15cMdcRdOjd1pBqbuxeMxSYro 9XeYicBxyt419Ugmtk9GLfKkbX3gqi5/g+u/XHn2xtt39j9yTDlr2UgEJVBZE3zs VSwbFXz48b+MOok9rx7yZ83umkPWpSdzOgiP9LOxAgUHOA8+XJ1J61Cbrklbtd3H q3ceCZT2rsh1EKNApsFrxoIYRFfevlMh82rf4NADTyCfMNrzdvKXdys6JVzrXzqR OXueww3zDI45vB83EIxA2WDhH/5jJiNzLxGpfRE30O9aFeGQCYXtc+hsBlQ69rJ1 EACPeXzJxqRJUei7VXdcrICI8fGyVglJb0roA+PPWSbnJKsudgb/wYvbYNqJ48z6 xiv+Gn4W02b8k+jm3E+gYwhQtPy5eR6IdlWL+mjGYynwPgzp2Xz4hoY2/H87GVlu rSdx+L+Hvz2wi5tSlkThsF6RJZGTYyH9d5J2jCc9cfjb4FMW7K+xLM1QKGCTXX2r f5/JsL+YLIue0kL/7nNh7sOq4KwUJCEbd+veMyRX1zSR8fIbyjC1UvLYLh5mXU5a FkJuCAtj1m5LK2eohrBCxy1c69z7skf6jhm4gK64cipAI6ZKvMVMrFoinHW4qsLF Gnt86ffpBrS0wlwKDhZhCe3EOC5kUhA1OYKbBOwmRf2CbRWH6onfZjRQvf4llGq5 Ec8ybrKVXVCTvtxq+hrZw3m2uZ80JU/MH9a2HlLK8EY/9BrMe0hDmdqrrsE8g0Jz s2nULSIPjabLbIP7gacILvDKeR+iSWg+6EY+Ie4d3D8XeIeQxggDnpwlhT/0/NQJ RJMrSzpnetnHiudqmCT60/sWbNcx6uU38PcbaMIG0rP1uck6DE0S5lOb2m642Bjl icfxNXksD0OCx9NRkiM3X5tIEeaNipuYnp0cLSzDdFMhrj/tm77wV/ZHNO5rCLSR CPZWlWuv963RYr7mWvsHSWfVjeuCXqhAqA+FTSWtre7kB87ATQRUIwPhAQgA2uUr JucGJT15trmPAjC6HyD5KCqD2VZWEXFNNKiyPHksl/b1n65z/GgxNsxzHcTJmiEt sAGs8Ry1wLZ+hIhA1/MabsK/lSVSR6evsMCXT9cuBqU+MLYBhKrMEAT7FPo8+bX/ ApVsMFJmpkr+iaAIaAk9m56y1nIxMDk+q0/OinmM4mYf5ILRLlvQC4zBwsjMT4pM RopXJH+4flbphVYn1qZ8gBpFb25nKDE+s+DoxY27AKWd+sE+xv8Y2Glg8FHRl/os oZpTZnwS/DkHK5WDva7qNksqkVacYv4ZQ9ydRfOzSp9NObeT50zdlVMxrF7//2jN +Qrc6Af0vc8cEPE4nwARAQABwsKEBBgBCgAPBQJUIwPhBQkPCZwAAhsMASkJEA0B qo9RsuPqwF0gBBkBCgAGBQJUIwPhAAoJEMuYorwYXBG1jCMH/jeS8ir/qFngn1Jt EgKRd2KYdKEoYnJDKRjWLk1KilwpnQC/t2D4blBwSs0bKBtUjc40zQVtKiTwXjlA pYPQv+CSIGI+qizk27zkevEmRfK7+dFMWM5bZteSeKBA94VLB76QC28bg4Z0Fs2Y NCCedNUfr7MHK6I0LkBGNONEOaDPnXXVtnLPw/DhSLcpFRH3vvzPQs7h94cUv/qs z5AJG/xtMkURt02c9taZW+Qi7H9BpyMQu6UqB/IfoY3L+nojdCkoKJw4+6yK7/4X OjrJmXZuYYCbOkKvpSlS1KSrhVCK+2rOhsdXglRIkBdmA4dnMrPqa7W0ac7fSKOv gM/w5njD/A//X8WxcipDkWnAR+nLMB/v2lWtt57b8vwy1ExCqDof/QCHRdT5QPTL Do550uPhLYAbFmaPybF6/IyvMPaZfdNcKQbsH6nLvRVVNjpl6t6xIErV7sQImSFg RRt2A7sLX3hP66x+ZsnVlGwuKRCSCnydpL4jIiVUgTYozG4HmwhorfFKdw2g9gqX bUGQfQJoRmVjuDwJm5QyMGTh0s7x78u0aOrr1lBZZJIsjjF4c0JdrnR9xnKJy0sm C78znmYq1ljKUuU7LN9GAuIKHyMaW0UVwAADn7zxKCk8lfVGMlyYnhQqfZf7Skeh jA1Mw7U0A6B0b1fPKUxHJh3sAywJNu5qlEugHZJ7Ykj2x5ikFKOI5Q0qCMoiwQkR 4SLW0CDfOhH+ZoFQS2aI55JRyPpe3qeX8/uHJQHn3VM0R3VcKjHBvcI6H9RnYkgp KPFQfHD529evb6QUMh7gA9+rt901LBzOaiL98oXMW0q2h423HTiZJ9YBHZS6kme4 KxgqhaeI5k6Q7Gost5RZ7soZ/AIKpAXFA/TjYecv9+u+VGHE/E2W/dG51LaG+wPy iWDGG/VWbeqdDirPuWr9QuwFB6x2bsaY6wd216SUR7cl93xvPSG84RIAX4tjGKKB vhGjP7FHm8LGGzZc8AliVEjNe8Aw5Ev8kWmI0dJec0HAxzycTgNom/w= =Cz7t -----END PGP PUBLIC KEY BLOCK----- */

93