
Daurer∗ et al. RESEARCH Nanosurveyor: a framework for real-time data processing Benedikt J. Daurer∗1, Hari Krishnan∗2, Talita Perciano2, Filipe R.N.C. Maia1, David A. Shapiro2, James A. Sethian2,3 and Stefano Marchesini2* elsewhere to achieve superior resolution and contrast in three dimensions, encompassing a macroscopic field of view and chemical or magnetic sensitivity, by cou- pling together the brightest sources of tunable X-rays, Abstract nanometer positioning, nanofocusing lenses and faster Scientists are drawn to synchrotrons and detectors. Existing soft X-ray detector technology in accelerator based light sources because of their use at the Advanced Light Source (ALS) for exam- brightness, coherence and flux. The rate of ple generates 350 MBytes/second per instrument [7], improvement in brightness and detector technology commercial detectors for hard X-rays can record 6 has outpaced Moore's law growth seen for GB/second or raw data per detector [8,9], and a syn- computers, networks, and storage, and is enabling chrotron light source can support 40 or more exper- novel observations and discoveries with faster iments simultaneously 24 hours a day. Furthermore, frame rates, larger fields of view, higher resolution, accelerator technology such as multi-bend achromat and higher dimensionality. Here we present an [10] will increase brightness by two orders of magni- integrated software/algorithmic framework tude around the globe [11, 12]. designed to capitalize on high throughput Transforming massive amounts of data into the experiments, and describe the streamlined sharpest images ever recorded will help mankind processing pipeline of ptychography data analysis. understand ever more complex nano-materials, self- The pipeline provides throughput, compression, assembled devices, or to study different length-scales involved in life - from macro-molecular machines to and resolution as well as rapid feedback to the bones - where observing the whole picture is as impor- microscope operators. tant as recovering the local arrangement of the com- Keywords: streaming; ptychography ponents. In order to do so, there is a need for reducing raw data into meaningful images as rapidly as possi- ble, using the fewest possible computational resources to sustain ever increasing data rates. 1 Introduction Modern synchrotron experiments often have quite When new drugs are synthesized [1], dust particles are complex processing pipelines, iterating through many brought back from space [2], or new superconductors different steps until reaching the final output. One ex- are discovered [3], a variety of sophisticated X-ray mi- ample for such an experiment is ptychography [15, 16, croscopes, spectrometers and scattering instruments 17], which enables one to build up very large images are often summoned to characterize their structure by combining the large field of view of a high preci- and properties. High resolution and hyperspectral X- sion scanning microscope system with the resolution arXiv:1609.02831v1 [physics.ins-det] 9 Sep 2016 ray imaging, scattering and tomography instruments provided by diffraction measurements. at modern synchrotrons are among the workhorses of Ptychography uses a small step size relative to the modern discovery to study nano-materials and charac- size of the illuminating beam when scanning the sam- terize chemical interactions or electronic properties at ple, continuously generating large redundant datasets their interfaces. that can be reduced into a high resolution image. Reso- A new generation of microscopes are being pio- lution of a ptychography image does not depend on the neered, commissioned and planned at several U.S De- size or shape of the illumination. X-ray wavelengths partment of Energy (DOE) user facilities [4,5,6] and can probe atomic and subatomic scales, although res- olution in scattering experiments is limitated by other * Correspondence: [email protected], [email protected], factors such as radiation damage, exposure and bright- [email protected]. 2 Lawrence Berkeley National Laboratory, Berkeley, CA, USA ness of the source to a few nanometers except in spe- Full list of author information is available at the end of the article cial cases (such as periodic crystals). To reconstruct an Daurer∗ et al. Page 2 of9 image of the object from a series of x-ray scattering ex- of data. The framework makes use of a modular in- periments, one needs to solve a difficult phase retriaval frastructure similar to Hummingbird [33] developed to problem because at short wavelengths it is only pos- monitor flash x-ray imaging experiments at free elec- sible to measure the intensity of the photons on a de- tron lasers (FELs) with high data rates in real time tector. The phase retrieval problem is made tractable over multiple cores and nodes. in ptychography by recording multiple diffraction pat- Within this framework, we developed a streamlined terns from overlapping regions of the object, providing processing pipeline for ptychography which unifies all redundant datasets to compensate for the lack of the components involved and allows users to monitor and phase information. The problem is made even more quickly act upon changes along the experimental and challenging in the presence of noise, experimental un- computational pipeline. certainties, optical aberrations, and perturbations of the experimental geometry which require specialized 2 Real-time Streaming Framework solvers and software [18, 19, 20]. Nanosurveyor was developed to provide real-time In addition to its complex reconstruction pipeline, a feedback through analysis and visualization for exper- ptychography experiment involves additional I/O op- iments performed at synchrotron facilities, and exe- erations such as calibrating the detector, filtering raw cute a complex set of operations within a production data, and communicating parameters (such as X-ray environment. Its design is such that it can be effec- wavelength, scan positions, detector distance and flux tively adapted to different beamline environments. It or exposure times) to the analysis infrastructure. is built around a client-server infrastructure allowing Large community driven projects have developed users to use facility resources while located at a beam- frameworks optimized for distributed data stream pro- line or remotely, operating on live data streamed from cessing. Map-Reduce based solutions such as Hadoop the beamline. Additionally, one can use the Nanosur- [21, 22] and Spark [23] provide distributed I/O, a uni- veyor user interface for off-line processing of experi- fied environment, and hooks for running map and re- mental data saved on disk. In this section we describe duce operations over a cloud-based network. Other the resources and capabilities provided by the modular frameworks such as Flink [24], Samza [25], and Storm streaming infrastructure. [26] are more tailored for realtime stream processing of tasks executing a directed acyclic graph (DAG) [27] of 2.1 Modular Framework operations as fast as possible. Workflow graphs such a As described above, Nanosurveyor is designed to be Luigi [28] and Dask Distributed [29, 30] provide an it- adaptable and modular. Therefore, we designed it with erative component, but are either optimized for batch a client-server infrastructure (1) enabling users to run processing and workers are treated as a singular entity their experiment while at the beamline or remotely able to execute the DAG in its entirety. from their institution. This strategy also allows the Such frameworks target operations as a unit of tasks client to be very light and flexible while the server can and generalize the notion of resources, however the be scaled according to the resources needed. ecosystem is harder to decentralize. These paradigms The Nanosurveyor infrastructure equips each are not easily mappable to a production beamline en- module with two fundamental capabilities. First, a vironment, where data from a detector might be run- description format language of key-value pairs allows ning on a field-programmable gate array (FPGA), the every module to describe its input and output. Sec- motion control system on a real-time MCU, the ac- ond, it provides the ability to describe the connection quisition control on a windows operating system, and between the modules, including the front-end. the scientist a macOS laptop. The rest of the pipeline The capability to connect the communication path tasks might hop to several different architectures in- between modules allows the end-to-end pipeline to be cluding CPUs for latency bound tasks, and GPUs for constructed and described seamlessly. This is done high throughput image processing and visualization. through a proxy communication layer allowing the While frameworks such as Flink along with Kafka modules to run either closely together or on completely [31] (high throughput distributed message system) and separate machines. This strategy is transparent to the ZooKeeper [32] (distributed coordination and manage- beamline user and accommodates both environments ment) can be adopted to fit the described processing with centralized resources as well as those where re- environment, our solution at a lower level accomplishes sources are spread across a network. the same task with less computational and human re- Additionally, as each module in the pipeline can sources. be executed in its own environment, Nanosurveyor Nanosurveyor is a modular framework to sup- provides dynamic parallelism by allowing the user to port distributed
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages9 Page
-
File Size-