Analyzing Microtomography Data with Python and the Scikit-Image Library

Gouillart et al. Adv Struct Chem Imag (2016) 2:18 DOI 10.1186/s40679-016-0031-0 RESEARCH Open Access Analyzing microtomography data with Python and the scikit‑image library Emmanuelle Gouillart1*, Juan Nunez‑Iglesias2 and Stéfan van der Walt3 Abstract The exploration and processing of images is a vital aspect of the scientific workflows of many- X ray imaging modali‑ ties. Users require tools that combine interactivity, versatility, and performance. scikit-image is an open-source image processing toolkit for the Python language that supports a large variety of file formats and is compatible with 2D and 3D images. The toolkit exposes a simple programming interface, with thematic modules grouping functions according to their purpose, such as image restoration, segmentation, and measurements. scikit-image users benefit from a rich scientific Python ecosystem that contains many powerful libraries for tasks such as visualization or machine learning. scikit-image combines a gentle learning curve, versatile image processing capabilities, and the scalable performance required for the high-throughput analysis of X-ray imaging data. Keywords: Scikit-image, Python, Image processing library, 3D image Background Several software applications and libraries are avail- The acquisition time of synchrotron tomography images able to synchrotron users to process their images. has decreased dramatically over the last decade, from ImageJ [1, 48] and its distribution Fiji [45] is a popu- hours to seconds [29]. New modalities such as single- lar general-purpose tool for 2D and 3D images, thanks bunch imaging provide a time resolution down to the to its intuitive menus and graphical tools, and the nanosecond for radiography [41]. However, the time wealth of plugins contributed by a vivid community subsequently spent in processing the images has not [46]. Software specialized in analyzing synchrotron decreased as much, so that the outcome of a success- data is available as well, such as XRDUA [17] for dif- ful synchrotron imaging run often takes weeks or even fraction images obtained in powder diffraction analysis, months to be transformed into scientific results. or for 3D images, commercial tools such as Avizo 3D Transforming billions of pixels and voxels to a few software (TM), or ToolIP/MAVIkit [19] are appreci- meaningful figures represents a tremendous data reduc- ated for an intuitive graphical pipeline and advanced tion. Often, the sequence of operations needed to pro- 3D visualization. Some synchrotrons have even devel- duce these data is not known beforehand, or might be oped their own tools for volume processing, such as altered due to artifacts [31], or to an unforeseen evolu- Pore3D [9] at the Elettra facility. Alternatively, the use tion of the sample. Image processing necessarily involves of a programming language gives finer control, better trial and error phases to choose the processing work- reproducibility, and more complex analysis possibili- flow. Therefore, image processing tools need to offer at ties, provided classical processing algorithms can be the same time enough flexibility of use, a variety of algo- called from libraries—thereby limiting the complexity rithms, and efficient implementations to allow for fast of the programming task and the risk of bugs. Matlab iterations while adjusting the workflow. (TM) and its image processing toolbox are popular in the academic community of computer vision and image processing. The Python language is widely used in the *Correspondence: [email protected] scientific world and in synchrotron facilities. As a gen- 1 Surface du Verre et Interfaces, UMR 125 CNRS/Saint-Gobain, eral-purpose language, Python is used in synchrotrons 93303 Aubervilliers, France Full list of author information is available at the end of the article to control device servers [8, 15, 51], to access raw data © The Author(s) 2016. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Gouillart et al. Adv Struct Chem Imag (2016) 2:18 Page 2 of 11 of X-ray detectors [27], to reconstruct tomography vol- A stack of 2D images, such as tomography slices gener- umes from radiographs [23, 32], and in data process- ated by a reconstruction algorithm, can be opened as an ing packages for macromolecular crystallography [3], image collection or a 3D array: azimuthal integration of diffraction data [4], or fluores- cence analysis [50, 52]. scikit-image [54] is a general-purpose image processing library for the Python language, and a com- ponent of the ecosystem of Python scientific modules commonly known as Scientific Python [34]. Like the rest of the ecosystem, scikit-image is released under a permissive open-source license and is available free Raw data formats can be opened using the NumPy functions (to load the array into memory) or of charge. Most of scikit-image is compatible with fromfile both 2D and 3D images, so that it can be used for a large memmap (to keep the array on disk). The following code number of imaging modalities, such as microscopy, creates an array from a raw image file of unsigned 16-bit radiography, or tomography. In this article, we explain integers with a header of 1024 bytes: how scikit-image can be used for processing data acquired in X-ray imaging experiments, with a focus on microtomography 3D images. This article does not intend to be a pedagogical tutorial on scikit-image For every raw data specification, it is thus very easy for X-ray imaging, but rather to explain the rationale to write a reader using np.memmap (see for example behind the package, and provide various examples of its https://github.com/jni/python-redshirt). hdf5 files are capabilities. accessed using modules such as h5py, pytables. Overview and first steps scikit-image has a simple Application Program- ming Interface (API), based almost exclusively on In this section, we provide a short overview of the typi- functions. Most functions take an image (i.e., a multi- cal use patterns of scikit-image, illustrated by short dimensional array) as an input parameter: snippets of code. Since Python is a programming language, the user interacts with data objects and images through code, which is either entered and executed in an interactive interpreter, or written in text files (so-called scripts) that are executed. Images are manipulated as numerical arrays, each with Optional parameters can be passed as Python keyword a single, uniform data type. This common format guaran- arguments, in addition to the image parameter. tees interoperability with other libraries and straightfor- ward access to and interpretation of computer memory. The N-dimensional (2D, 3D, ...) numerical array object is provided by the NumPy module [53]. In image processing in Python, one of the first tasks A few functions require several arrays to be passed, then is to generate NumPy arrays, which is often achieved such as the watershed segmentation algorithm that takes by reading data from files. We read one 2-dimensional as parameters the image to be segmented, and an image image from a file and display it as follows: of markers from which labels are propagated: Therefore, the image processing workflow can be seen as a directed graph (a richer structure than a lin- skimage is the name under which scikit-image ear pipeline), where nodes are image-shaped arrays, and is imported in Python code. Note that functions (such edges are functions of scikit-image transforming the as imread that reads an image file, orimshow that arrays (see Fig. 1). displays an image) are found in thematic submodules of Most functions transparently handle 2D, 3D, or even skimage, such as io for Input/Output. higher-dimensional images as arguments, so the same Gouillart et al. Adv Struct Chem Imag (2016) 2:18 Page 3 of 11 Fig. 1 scikit-image and the Scientific Python ecosystem. Images are opened from files asNumPy arrays. Functions of scikit-image transform image arrays into other arrays with the same dimensions, or into arrays of numbers corresponding to features of the image. The output of scikit‑image functions can be passed to other Python modules relying on NumPy arrays, such as SciPy or scikit-learn. Image‑shaped arrays are transformed into visualizations with matplotlib (2D) or Mayavi (3D). A variety of environments is available for code development and execution, from classical IDEs to Jupyter notebooks functions can be used to process tomography, micros- Python [34, 38]. Figure 1 illustrates how several com- copy, or natural images. The rest raise an error when ponents of this ecosystem combine into a sophisticated passed a 3D argument: image processing workflow. NumPy arrays are the cornerstone of the Scientific Python ecosystem, and of scikit-image operations in particular. NumPy “one-liners” include cropping or down- sampling an image, or retrieving pixels corresponding to a given label in a segmentation. To illustrate the com- However, the proportion of functions supporting 3D pactness of NumPy code, consider modifying pixel values images is always increasing, thanks to the many contribu- below a threshold. This operation can be written as tors to the library. While the majority of functions return processed images, returns can also be numerical value(s) such as pixel coordinates of objects of interest or statistical infor- exploiting the ability to index arrays with boolean arrays, mation about the image: also called masking. NumPy uses memory sparingly and avoids making new copies of arrays whenever possible, an important requirement when dealing with the gigabyte-sized images of tomography. For example, cropping a subvolume as follows does not create a copy of the original array The Python ecosystem The benefits ofscikit-image for image processing come not only from the features of the package alone, but but instead refers to the correct memory offsets in the also from the rich environment surrounding scientific original.

Load more