Efficient Binary and Run Length Morphology and Its Application To
Total Page:16
File Type:pdf, Size:1020Kb
Efficient Binary and Run Length Morphology and its Application to Document Image Processing Thomas M. Breuel DFKI and U. Kaiserslautern [email protected] Abstract This paper describes the implementation and evaluation of an open source library for mathematical morphology based on packed binary and run-length compressed images for document imaging applications. Ab- stractions and patterns useful in the implementation of the interval oper- ations are described. A number of benchmarks and comparisons to bit-blit based implementations on standard document images are provided. 1 Introduction Binary morphology is an important and widely used method in document im- age analysis, useful for tasks like image cleaning and noise removal, [24] layout analysis, [23] skew correction, [15] and text line finding. [6] Real-world doc- ument analysis systems currently primarily rely on bit blit-based implementa- tions. Practical implementations take advantage of separability and logarithmic decomposition of rectangular structuring elements [19, 4, 15]. This technical report describes a binary morphology library containing both a run-length and a packed binary implementation of morphological operations. A number of the methods described in this paper are very similar to methods described in the literature [13, 19], although the library was developed inde- pently of that literature. The paper will not provide a detailed discussion of the arXiv:0712.0121v1 [cs.GR] 2 Dec 2007 similarities and differences of the algorithms described in this memo to those in the literature1 This memo does provide a number of benchmarks that should help practitioners choose good algorithms for their particular applications. We note that, in addition to run length and packed binary methods, a num- ber of other methods have been described in the literature. Binary mathematical morphology with convex structuring elements can be computed by propaga- tion of distances on the pixel grid using a dynamic programming algorithm [22] (brushfire algorithms). Another class of algorithms is based on contours [21] and loop and chain methods [22]. The van Herk/Gil-Werman algorithms [20, 10, 9] 1Comments and additional references to prior work would be appreciated, however. 1 have constant per-pixel size overhead for grayscale morphology, and binary mor- phology can be viewed as a special case. Another class of algorithms is taking advantage of anchors, [8]. Although some of these algorithms are competitive for gray scale morphology, they have not been demonstrated to be competi- tive with high quality bit blit-based implementations for binary morphology on packed binary representations [4]. Some authors have looked again at grayscale morphology, using more com- plex intermediate representations [7]. It remains to be seen how such algorithms compare to the algorithms in this paper, both in performance and storage; we will not be addressing that question here. Bit blit-based implementations at their lowest level take advantage of opera- tions that are highly efficient on current hardware because they are used as part of many different algorithms and display operations: their running time grows quadratically in the resolution of the input image; they do not take advantage of coherence in the input image{an almost blank image takes the same amount of time to process as a highly detailed image; and operations that need to take into account the coordinates of individual pixels (e.g., connected component la- beling) often need to decompress (at least on the fly) or use costly pixel access functions. We will mostly limit ourselves in this paper to the development of morpho- logical operations involving rectangular structuring elements. These are by far the most common operations in document image analysis. However, the run- length method can also be used for implementing morphological operations for arbitrary masks; algorithms and performance will be given in a separate paper. Converting between run length and non-run length representations can be carried out fairly quickly, so we also have the option of mixing run-length and bitmap representations. However, many binary image processing algorithms can be implemented directly on run length images. In fact, prior work in image processing on the line adjacency graph and algorithms operating on it are di- rectly transferable. We therefore briefly discuss a number of these algorithms. Taken together with the binary morphology operations in this paper, they allow complete binary image processing pipelines to be built on run length images, with no conversion costs. Finally, we give benchmarks and comparisons with the Leptonica library, an open source library for morphological image processing. It has compara- tively good performance, uses well-documented algorithms, and is used in sev- eral large-scale document analysis systems. 2 Run Length Image Coding Run-length image representations have a long history in image processing and analysis. They have been used, for example, for efficient storage of binary and color images and for skeletonization of large images. Consider a 1D boolean array a containing pixel values 0 and 1 at each location ai. The run length representation r is an array of intervals r1 = 2 [s1; e1]; : : : ; rn = [sn; en] such that ai = 1 iff i 2 rj for some j and ei < si+1. The 2D run-length representation we are using in this paper is a straight- forward, extension to 2D that treats the two coordinates asymmetricaly; in 2 particular, the binary image aij is represented as a sequence of one-dimensional run-length representations rij, such that for any fixed i0, the 1D array aj = ai0;j is represented by the 1D runlength representation rj = ri0j. Even algorithms that are not explicitly using run length representations are often still implicitly manipulating runs of pixels internally; for example, the usual connected component labeling algorithm internally considers neighbor- hood relations between runs of pixels. An extended version of 2D run-length representations has been used as the line adjacency graph (LAG); it adds a graph structure encoding neighborhood relations between runs to the basic run length encoding; in our algorithms, these neighborhood relations are simply inferred dynamically. In practice, since all the algorithms described in this paper access runs se- quentially, both linked lists or extensible arrays with exponential doubling can be used to represent the runs of each line; our implementation uses extensible arrays with exponential doubling, which results in fewer calls to the memory al- locator, less average memory usage, and better locality of reference than linked list representations. In our current implementation, each run is represented as a pair of 16 bit integers, allowing images up to approximately 65535 × 65535 to be represented; other, more efficient coding schemes are possible (e.g., using a Unicode-like variable length encoding). Note that, in the worst case, that of alternating black and white pixels, the run-length representation may be up to 16 times bigger than a packed bit representation, or a factor of two compared to a one-byte-per-pixel representation. On the other hand, run-length encoded images scale linearly with image resolution, rather than quadratically. That is, a 1200 dpi binary image takes approximately 4 times as much space than a 300 dpi binary image using run- length encoding, while a packed bit image would take 16 times as much space. 3 Morphological Operations Because of the asymmetry in the two dimensions of the 2D run-length represen- tation we are using, morphological operations behave differently in the x and y direction in run-length representations. An analogous asymmetry is found in bit-blit operations, in which the bits making up image lines are packed into words, and a list of lines represents the entire image. There are multiple pos- sible approaches for dealing with this issue. First, we can implement separate operations for horizontal and vertical operations. Second, we can implement only the within-line operations and then transform the between-line operations into within-line operations through transposition. For separable operations, the 2 This paper and our library uses PostScript/mathematical conventions, with a0;0 repre- senting the bottom left pixel of the image. 3 second approach is often the easier one. Therefore, an erosion with a rectangular structuring element of size u × v can be written as:3 function erode2d(image,u,v) erode1d(image,u) transpose(image) erode1d(image,v) transpose(image) end 4 Within-Line Morphological Operations There are four basic morphological operations we consider: erosion, dilation, opening, and closing. One-dimensional opening and closing are the easiest to understand. Essentially, a one-dimensional opening with size u simply deletes all runs of pixels that are less than size u large, and leaves all others untouched:4 void open1d(image,u) { for i in 1,length(image.lines) do line = image.lines[i] filtered = [] for j in 1,length(line.runs) do if runs[j].width() >= u then filtered.append(line.runs[j]) end image.lines[i] = filtered end end A one-dimensional closing with size u deletes all gaps that are smaller than size u, joining the neighboring intervals together. It can either be implemented directly, or it can be implemented in terms of complementation and opening5 function complement(image) for i in 1,length(image.lines) do line = image.lines[i] filtered = [] last = 0 for j in 1,length(line.runs) do run = line.runs[j] newrun = make_run(last,run.start) filtered.append(newrun) 3 Our convention is output arguments before input arguments, and the various procedures modify the image in place. 4We are using 1-based arrays in the pseudo-code. 5 To simplify boundary conditions, we are using the notation exp1 or exp2 to mean use the value of exp1 if it is defined, otherwise use exp2. 4 last = run.end end filtered.append(make_run(last,maxint)) image.lines[i] = filtered end end function close1d(image,u) complement(image) open1d(image) complement(image) end Note that openings and closing are not separable, so we cannot use these imple- mentations directly for implementing true 2D openings and closings; for that, we have to combine erosions and dilations.