Daurer∗ et al.

RESEARCH Nanosurveyor: a framework for real-time data processing

Benedikt J. Daurer∗1, Hari Krishnan∗2, Talita Perciano2, Filipe R.N.C. Maia1, David A. Shapiro2, James A. Sethian2,3 and Stefano Marchesini2*

elsewhere to achieve superior resolution and contrast in three dimensions, encompassing a macroscopic field of view and chemical or magnetic sensitivity, by cou- pling together the brightest sources of tunable X-rays, Abstract nanometer positioning, nanofocusing lenses and faster Scientists are drawn to synchrotrons and detectors. Existing soft X-ray detector technology in accelerator based light sources because of their use at the Advanced Light Source (ALS) for exam- brightness, coherence and flux. The rate of ple generates 350 MBytes/second per instrument [7], improvement in brightness and detector technology commercial detectors for hard X-rays can record 6 has outpaced Moore’s law growth seen for GB/second or raw data per detector [8,9], and a syn- computers, networks, and storage, and is enabling chrotron light source can support 40 or more exper- novel observations and discoveries with faster iments simultaneously 24 hours a day. Furthermore, frame rates, larger fields of view, higher resolution, accelerator technology such as multi-bend achromat and higher dimensionality. Here we present an [10] will increase brightness by two orders of magni- integrated software/algorithmic framework tude around the globe [11, 12]. designed to capitalize on high throughput Transforming massive amounts of data into the experiments, and describe the streamlined sharpest images ever recorded will help mankind processing pipeline of ptychography data analysis. understand ever more complex nano-materials, self- The pipeline provides throughput, compression, assembled devices, or to study different length-scales involved in life - from macro-molecular machines to and resolution as well as rapid feedback to the bones - where observing the whole picture is as impor- microscope operators. tant as recovering the local arrangement of the com- Keywords: streaming; ptychography ponents. In order to do so, there is a need for reducing raw data into meaningful images as rapidly as possi- ble, using the fewest possible computational resources to sustain ever increasing data rates. 1 Introduction Modern synchrotron experiments often have quite When new drugs are synthesized [1], dust particles are complex processing pipelines, iterating through many brought back from space [2], or new superconductors different steps until reaching the final output. One ex- are discovered [3], a variety of sophisticated X-ray mi- ample for such an experiment is ptychography [15, 16, croscopes, spectrometers and scattering instruments 17], which enables one to build up very large images are often summoned to characterize their structure by combining the large field of view of a high preci- and properties. High resolution and hyperspectral X- sion scanning microscope system with the resolution

arXiv:1609.02831v1 [physics.ins-det] 9 Sep 2016 ray imaging, scattering and tomography instruments provided by diffraction measurements. at modern synchrotrons are among the workhorses of Ptychography uses a small step size relative to the modern discovery to study nano-materials and charac- size of the illuminating beam when scanning the sam- terize chemical interactions or electronic properties at ple, continuously generating large redundant datasets their interfaces. that can be reduced into a high resolution image. Reso- A new generation of microscopes are being pio- lution of a ptychography image does not depend on the neered, commissioned and planned at several U.S De- size or shape of the illumination. X-ray wavelengths partment of Energy (DOE) user facilities [4,5,6] and can probe atomic and subatomic scales, although res- olution in scattering experiments is limitated by other * Correspondence: [email protected], [email protected], factors such as radiation damage, exposure and bright- [email protected]. 2 Lawrence Berkeley National Laboratory, Berkeley, CA, USA ness of the source to a few nanometers except in spe- Full list of author information is available at the end of the article cial cases (such as periodic crystals). To reconstruct an Daurer∗ et al. Page 2 of9

image of the object from a series of x-ray scattering ex- of data. The framework makes use of a modular in- periments, one needs to solve a difficult phase retriaval frastructure similar to Hummingbird [33] developed to problem because at short wavelengths it is only pos- monitor flash x-ray imaging experiments at free elec- sible to measure the intensity of the photons on a de- tron lasers (FELs) with high data rates in real time tector. The phase retrieval problem is made tractable over multiple cores and nodes. in ptychography by recording multiple diffraction pat- Within this framework, we developed a streamlined terns from overlapping regions of the object, providing processing pipeline for ptychography which unifies all redundant datasets to compensate for the lack of the components involved and allows users to monitor and phase information. The problem is made even more quickly act upon changes along the experimental and challenging in the presence of noise, experimental un- computational pipeline. certainties, optical aberrations, and perturbations of the experimental geometry which require specialized 2 Real-time Streaming Framework solvers and software [18, 19, 20]. Nanosurveyor was developed to provide real-time In addition to its complex reconstruction pipeline, a feedback through analysis and visualization for exper- ptychography experiment involves additional I/O op- iments performed at synchrotron facilities, and exe- erations such as calibrating the detector, filtering raw cute a complex set of operations within a production data, and communicating parameters (such as X-ray environment. Its design is such that it can be effec- wavelength, scan positions, detector distance and flux tively adapted to different beamline environments. It or exposure times) to the analysis infrastructure. is built around a client-server infrastructure allowing Large community driven projects have developed users to use facility resources while located at a beam- frameworks optimized for distributed data stream pro- line or remotely, operating on live data streamed from cessing. Map-Reduce based solutions such as Hadoop the beamline. Additionally, one can use the Nanosur- [21, 22] and Spark [23] provide distributed I/O, a uni- veyor user interface for off-line processing of experi- fied environment, and hooks for running map and re- mental data saved on disk. In this section we describe duce operations over a cloud-based network. Other the resources and capabilities provided by the modular frameworks such as Flink [24], Samza [25], and Storm streaming infrastructure. [26] are more tailored for realtime of tasks executing a directed acyclic graph (DAG) [27] of 2.1 Modular Framework operations as fast as possible. Workflow graphs such a As described above, Nanosurveyor is designed to be Luigi [28] and Dask Distributed [29, 30] provide an it- adaptable and modular. Therefore, we designed it with erative component, but are either optimized for batch a client-server infrastructure (1) enabling users to run processing and workers are treated as a singular entity their experiment while at the beamline or remotely able to execute the DAG in its entirety. from their institution. This strategy also allows the Such frameworks target operations as a unit of tasks client to be very light and flexible while the server can and generalize the notion of resources, however the be scaled according to the resources needed. ecosystem is harder to decentralize. These paradigms The Nanosurveyor infrastructure equips each are not easily mappable to a production beamline en- module with two fundamental capabilities. First, a vironment, where data from a detector might be run- description format language of key-value pairs allows ning on a field-programmable gate array (FPGA), the every module to describe its input and output. Sec- motion control system on a real-time MCU, the ac- ond, it provides the ability to describe the connection quisition control on a windows , and between the modules, including the front-end. the scientist a macOS laptop. The rest of the pipeline The capability to connect the communication path tasks might hop to several different architectures in- between modules allows the end-to-end pipeline to be cluding CPUs for latency bound tasks, and GPUs for constructed and described seamlessly. This is done high throughput image processing and visualization. through a proxy communication layer allowing the While frameworks such as Flink along with Kafka modules to run either closely together or on completely [31] (high throughput distributed message system) and separate machines. This strategy is transparent to the ZooKeeper [32] (distributed coordination and manage- beamline user and accommodates both environments ment) can be adopted to fit the described processing with centralized resources as well as those where re- environment, our solution at a lower level accomplishes sources are spread across a network. the same task with less computational and human re- Additionally, as each module in the pipeline can sources. be executed in its own environment, Nanosurveyor Nanosurveyor is a modular framework to sup- provides dynamic parallelism by allowing the user to port distributed real-time analysis and visualization scale the number of resources available to each step: Daurer∗ et al. Page 3 of9

support for module dependencies. The core compo- Image reconstruction Event loop nents that Nanosurveyor depends on are available Data socket Data filtering & Image segmentation on all major platforms. Control socket reduction Trigger Detector Data Data stream Data writing correction processing 2.3 Communication unit Dark calibration A critical component in generating usable real-time pipelines relies on the communication infrastructure. Backend This enables a clear and concise separation of the in- Frontend Detector puts and outputs at the module level. Furthermore, Motors, etc. Data collection it defines how modules communicate from beginning unit Visualization & Experimental control to end, and ensures that tasks are load-balanced to unit achieve the appropriate performance characteristics of the pipeline. The communication in Nanosurveyor uses Javascript Figure 1 Overview of the real-time streaming framework of Object Notation (JSON) [39], an industry standard Nanosurveyor. The modular server-client infrastructure is way of conveying metadata between modules as well divided into a back-end (running the data processing unit) and as between the front-end and back-end. The metadata a front-end (running the visualization and control unit). Once an experiment has started, the data collection unit provides a human readable component. continuously receives new data packets from a detector and ZeroMQ provides the communication backbone of the sends raw data frames to the data processing unit. Depending Nanosurveyor infrastructure. Using the publisher- on the specific needs of the experiment, data is being corrected, reduced, reconstructed and various outputs are subscriber model for the core components enables written to file. At all times, there is an active connection Nanosurveyor to provide a load-balancing scheme, (asynchronous socket communication) between all which uses a backlog queue to avoid losing data when components (including the visualization interface) allowing the user to monitor progress while data is still being acquired and sufficient resources are not be available. The execu- processed. tion pipeline creates a command port and a data port. The command port allows metadata to reach and up- date parameters as well as return responses to keep status requests alive and provide feedback on the cur- this is done by treating each stage as a worker rent state of the running module. The data port moves that can be scaled up or down to address bottleneck data through the pipeline, running the actionable item or performance issues. within each module and moving the result to the out- 2.2 Software Stack put queue to be processed by the next stage of the The core components of the Nanosurveyor stream- pipeline. ing software are written in Python using ZeroMQ, a Two types of configurations are required: front-end high performance messaging [34] for network and back-end. The front-end sets up the variables nec- communication, PyQt4 [35] and PyQtGraph [36] for the essary for each module to function while the back-end graphical user interface (GUI) and visualization, and configuration is responsible for allocating resources, Numpy [37] together with Scipy [38] for manipulation balancing the load of workers, scheduling activities, of data arrays. For some components, we used C ex- and communicating between modules while providing tensions in order to boost the performance to meet feedback to the front-end. the demands of producing a real-time interactive tool These two components provide the Nanosurveyor running at the beamline. infrastructure with the information it needs to estab- Python is a language with a robust and active com- lish the relevant connections, receive and send parame- munity with libraries that are well tested, supported, ters to ensure proper configuration, and introspect the and maintained. Additionally, the choice of Python al- state of parameters and data to provide visual feed- lows our infrastructure to be flexible to the demands of back to the user when running through the processing varying requirements of different processing pipelines. pipeline. The ptychography pipeline (discussed in detail later in the paper) contains GPU optimized code and Python 3 Client-Server architecture binding support easily allows the Nanosurveyor in- The Nanosurveyor framework consists of an as- frastructure to provide support for these types of hy- sortment of core components that ensure that the brid architectures. The framework currently runs on front-end provides easy to use and adaptable inter- Mac, Linux and Linux-based cluster environments, and face while the back-end is efficient, resilient, and re- can be extended to Windows platforms depending on sponsive. The individual processing modules are all Daurer∗ et al. Page 4 of9

based on the same structure: an event loop runs rout- The output of all modules are piped to STDOUT and ing data from the control and data sockets, waiting for STDERR within the file system running each process tasks, asking the handler for configuration parameters ($HOME/.nanosurveyor/streaming/log/). (JSON string), and processing data (receiving/sending This is a useful tool that invokes tail -f on the through the data socket). piped out/err files, making it possible to monitor what is going within the individual processing modules. 3.1 Back-end The main back-end handler is running a big ZeroMQ 3.4 Graphical User Interface event loop. The main task of the handler is to register For the front-end, the framework provides a versatile the modules that run on the back-end and ensure data GUI based on PyQt4 and PyQtgraph for monitoring, and control paths are appropriately connected up and visualizing and controlling the data processed live or running. It also does the following: post-processed through the pipeline. PyQt4 (built on • Launches all the processing modules as separate Qt) provides the ability to construct and modify the processes (single-core or MPI) and keeps track of user interface to easily add and remove functionality the jobs started. This can be done with a batch while PyQtgraph provides access to advanced visual- processing system such as SLURM (or any other ization functionality for data that can be represented queuing system) or by launching separate python as images or volumes. Several common operations pro- processes; vided through the framework include: • Creates the sockets for streaming pipeline, which • View the content of already processed files: In- is a list of control and data sockets communicating spect reconstructions from collected data and pro- between the handler and all the processing mod- vide other useful utilities (histograms, error line ules as well as the data collector and the interface; plots, correlation plots, and others); • Runs the event loop, takes commands, deals out • Control and monitor the streaming: Configure data packets and handles everything in the back- streaming, inspect live reconstruction, monitor end including user interruption and other control performance (upload/download rates, status up- and configuration commands. date of the streaming components); • Simulate an experiment starting from an SEM im- 3.2 Data Tracking age or similar; Tracking and ensuring the correctness of data is an im- • Process and inspect, through a provided interface, portant part of the execution pipeline. The Nanosur- data from custom modules processed on the back- veyor framework provides a module called nscxwrite end (e.g. data from a ptychography or Tomogra- which allows customized writing of files at different phy reconstruction). stages of the data acquisition pipeline (raw, filtered, Generally speaking, the design facilitates adding new and reconstructed). This capability provides several modules to the GUI, e.g. a viewer for tomograms benefits, such as assurances to users that data moves or similar. This flexibility allows the front-end to be correctly from module to module and is not corrupted customized for different beamline processing environ- along the way, as well as an ability to debug an algo- ments. rithm that is executed within a complex sequence of Finally, the architecture aims to be modular in the events. Furthermore, the ability to save intermediate data front and back-end of the client-server architecture, can be enabled or disabled (for performance reasons meaning that there is a structure for the ba- or to reduce storage) as well as customized. The sic features of a processing module. Additionally, in framework also comes with a standalone script called principle, any given processing module can be hooked nsraw2cxi, that translates raw detector data to pro- into this network (e.g. tomography, spectral analysis, cessed CXI files, and a script to stream simulated or any other image analysis). FCCD data through the pipeline for testing. The data format of the output files follows the CXI file format 4 Streaming ptychography [40]. We adapted the outlined streaming framework de- scribed above for the specific needs of ptychogra- 3.3 Logging phy and are currently implementing this ptychogra- Nanosurveyor also provides a way to debug a com- phy streaming pipeline at the beamline for scanning plex pipeline through logging of both the output and transmission X-Ray microscopy (STXM) at the Ad- error channels which includes communication between vanced Light Source (ALS). The main motivation for modules as well as output and error that arise from this project is to make high-resolution ptychographic within modules. reconstructions available to the user in real-time. To Daurer∗ et al. Page 5 of9

Event loop Frame Frame worker Control sockets: worker N TCP (binding) 2 TCP (connecting) Frame Image worker worker Data sockets: 1 UDP (binding) UDP (connecting) Handler TCP (binding) Dark SHARP TCP (connecting) worker Trigger Data stream Backend

Frontend CCD

Scan x/y Framegrabber

Figure 3 Graphical User Interface (GUI) for the ptychographic streaming pipeline implemented at the ALS. Experiment Graphical User Interface The interface provides (a) Real-time view of the ptychographic control (GUI) reconstruction (high resolution), (b) Real-time view of the STXM analysis (low resolution), (c) Current guess of the illumination function, () Current processed data frame, (e) Figure 2 Streaming pipeline implemented at the ALS for Logging and error messages and (f) Error metrics of the ptychographic imaging. The software structure follows the iterative reconstruction process, and other control and same logic as sketched in Figure1. Once a new scan has been monitoring elements around. triggered by the experimental control, a frame-grabber continuously receives raw data packets from the camera, assembles them to a frame and sends raw frames to the back-end. Incoming frames are processed by different (and independent) workers of the back-end and reduced data is duces a collection of clean diffraction frames, produc- send back to the front-end and visualized in a graphical user interface (GUI). A handler is coordinating the data and ing low-resolution image reconstructions and an initial communication workflow. estimate of the illumination function which, together with the clean diffraction frames, is then feeded as an input for the high-resolution ptychograhic reconstruc- achieve this goal, we streamlined all relevant process- tion worker (SHARP). ing components of ptychography into a single unit. A The front-end consists of a worker that reads raw detailed outline of our pipeline is sketched in Figure2. data frames from a fast charge-coupled device (FCCD) As described in the previous sections, we follow the [41], coordinating with a separately developed inter- idea of a modular streaming network using a client- face for controlling the experiment (such as motors and server architecture, with a back-end for ptychographic shutters) and a graphical user interface (GUI) which processing pipeline and a front-end for configuration, is used both for visualizing and controlling the ongo- control and visualization purposes. ing reconstruction. An example view of the GUI for On the back-end side, the streaming infrastructure streaming ptychography is shown in Figure3. is composed of a communication handler and four dif- Following the data flow along the streaming pipeline, ferent kinds of workers addressing dark frames, diffrac- the starting trigger comes from the control interface tion frames, reduced and downsampled images and the which initiates a new ptychographic scan providing ptychographic reconstruction using a software pack- information about the scan (step size, scan pattern, age for scalable heterogeneous adaptive real-time pty- number of scan points) and other relevant informa- chography (SHARP [20]). The handler bridges the tion (e.g. wavelength) to the back-end handler. Simul- back-end with the front-end and controls the commu- taneously, the control sends triggers to the scanning nication and data flow among the different back-end motors and the FCCD. A typical ptychographic scan workers. The dark worker accumulates dark frames combines the accumulation of a given number of dark and provides statistical maps (mean and variance) of frames together with scanning the sample in a region the noise structure on the detector. The frame work- of interest. The frame-grabber, already waiting for raw ers transform raw into clean (pre-processed) diffrac- data packets to arrive, assembles the data and sends it tion frames. This involves a subtraction of the av- frame-by-frame to the back-end handler. When deal- erage dark, filtering, photon counting and downsam- ing with an acquisition control system that runs inde- pling. Depending on the computing capacities of the pendently, the handler can distinguish between dark back-end, it is possible to run as many frame work- and data frames using counters. Dark and data frames ers simultaneously as needed. The image worker re- are distributed to the corresponding workers. Having Daurer∗ et al. Page 6 of9

clean diffraction frames and an initial guess for the il- 4.2 Simulation lumination ready, the SHARP worker is able to start For testing the functionality and performance of the the iterative reconstruction process. SHARP initial- streaming ptychography pipeline as well as exploring izes and allocates space to hold all frames in a scan, different configurations, we developed a protocol that computes a decomposition scheme, initializes the im- simulates an entire ptychography scan. Using a sim- age and starts the reconstruction process. Unmeasured ulated illumination from a Fresnel Zone Plate (FZP) frames are either set to a bright-field frame (measured and basic scan parameters (number of scan points, the by removing the sample) or their weight is set to 0 scanning step size, the scanning pattern), diffraction until real data is received. patterns from a well-known test sample are calculated Depending on the configuration, data at different in the same raw data format as those generated by the states within the streaming flow can be displayed in FCCD. As a last step, Poisson noise and a real back- the GUI and/or saved to a CXI file via the nscxiwrite ground are added to the data. These raw data packets worker module. together with the simulated metadata are introduced All components of the streaming interface run in- to the end-to-end streaming pipeline and produce out- dependent event loops and use asynchronous (non- puts as shown in Figure3. blocking) socket communication. To maximize perfor- One major benefit of this feature is the ability to mance, the front-end operates very close to the ac- scale and test the pipeline at different acquisition rates tual experiment, while the back-end runs remotely on and therefore be able to provide performance metrics a powerful GPU/CPU cluster. on the behavior of a sequence of algorithms enabling developers to further improve their execution pipeline. 4.1 Pre-processing of FCCD data In a simple performance test, we simulated a 40x40 We developed the following processing scheme for de- scan producing 1 600 raw data frames which were sent noising and cleaning the raw data from the FCCD and by a virtual FCCD at a rate of 10 Hz. At the end preparing frames for the ptychographic reconstruction, of the pipeline, we observed a complete reconstructed image after around 5 minutes. This translates into a 1 Define center (acquire some diffraction frames streamlining pipeline rate of about 2 Hz, with most of and compute the center of mass if needed). the time spent on filtering and cleaning the individual This is needed for cropping, and to deal with frames. A significant portion of the pre-processing time beamstop transmission; is unique to the FCCD pipeline. While this rate is still 2 Average dark frames: we first acquire a se- far from ideal, it can be easily be sped up and scaled by quence of frames when no light is present, and using parallel execution, load-balancing strategies, and compute the average and standard deviation eventually through high throughput GPU optimiza- of each pixel and readout block. We set a bi- tions. With further improvements on the performance nary threshold to define bad (noisy) pixels or on the individual components as well as optimization bad (noisy) ADC channels, when the stan- of the network communication, we expect a substantial dard deviation is above a threshold or if the increase of the processing rate. standard deviation is equal to 0; 3 Remove offset using the overscan linear or 4.3 Experimental Data quadratic offset: stretch out the read-out se- Experimental data produced by the FCCD can involve quence in time and fit a second order polyno- missing frames, corrupted frames, and timing issues mial over the overscan; between different hardware and software components. 4 Identify the background by thresholding; In addition, the correct choice parameter values for the 5 Perform a Fourier transform of the readout ptychographic reconstruction might be inherent to the sequence of the background for each channel, data itself and can thus carry from exeperiment to ex- remove high frequency spikes by thresholding, periment. To make Nanosurveyor more robust for and subtract from data; such cases, it is desirable to expose configuration pa- 6 Threshold signal below 1 photon; rameters as a runtime or heuristic feature rather than 7 Divide by beamstop transmission; determined them at execution time, and take a more 8 Crop image around center; data-based approach where options are set based on 9 Downsample: take a fast Fourier transform feature detection. (FFT), crop or multiply by a kernel (e.g. Gaussian) and take inverse fast Fourier trans- 5 Considerations form (IFFT). Performance considerations and additional limitations must be understood and considered in integrating such Daurer∗ et al. Page 7 of9

an execution pipeline in a production environment. The framework was adapted to support streamlined While the following list is not comprehensive, in build- pipelines for ptychography. In this case, expensive ing this environment, we have considered the following: stages such as pre-processing are load-balanced with • Limits (performance, algorithm, memory, disk) multiple workers, and image reconstruction are paral- to software and hardware need to be consid- lelized over MPI to compute efficiently in a distributed ered. The Nanosurveyor infrastructure pro- manner. Results from every stage of the pipeline are vides logging support while the ZeroMQ publisher- then transmitted to the front-end, providing users at subscriber model allows a stuck or crashed process the beamline comprehensive knowledge of the experi- to be replaced with another. The current solution ment and of how the data is transformed from start of Nanosurveyor can be made more robust and acquisition to end output. Although the Nanosur- this is work that is considered as active and on- veyor framework provides several core capabilities going; that are necessary for operating at typical beamlines, • Hardware failures are inevitable in a produc- there are several key advances that we are currently tion environment involving machinery. Recovery working on to make the computational pipeline com- from these types of issues requires customization plete. A couple of highlights include: for each beamline environment. Within Nanosur- veyor, there is a heartbeat for each module and a Iterative execution, instrument control: Adding sup- base mechanism within the framework to inform port for controlling the beamline itself will complete the user that a failure (or multiple failures) might the current pipeline and provide an iterative execu- have occurred; tion loop enabling future pipelines to adaptively ac- • Interrupting experiments should be a core use case quire and analyze data from the operating beamline, of any real-time feedback loop when trying to get and automatically request more data when necessary. an understanding of the data as quickly as possi- For example, if the reconstruction detects bad frames, ble. Once information about the material is flow- or that the sample has drifted, then more frames can ing through the computational pipeline, it is valu- be automatically requested on the fly without inter- able to be able to determine if an experiment is, rupting the overall experiment. If the reconstruction in fact, failing or uninteresting. This can occur in determines that part of the image being acquired is many ways such as wrong setup, wrong material empty or uninteresting it could request fewer frames or wrong region of scanning. For these scenarios and focus on the relevant part of the sample. it is prudent for a working pipeline to be able to abort, clear out the pipeline, and reset itself; Optimizing pipeline execution: Currently commu- • Expensive operations and algorithms executed in nication occurs over ZeroMQ providing many bene- a beamline operating environment may have vary- fits, including dealing with backlog, automated load- ing degrees of performance characteristics (see balancing, and the ability to interleave work running bullet point on Limits ). These characteristics can different stages of the execution pipeline. We are also often slow down the overall pipeline if any one investigating ways to fuse modules to optimize execu- of the operations is inefficient. Nanosurveyor tion times. Making communication agnostic by using attempts to get around this issue in two ways: handles enables efficient use of memory optimization first, it allows for a load-balancing approach where strategies, socket communication, or saving on data more workers can be added to the expensive stages movement costs e.g., transferring data between GPU- of the pipeline. Second, using the ZeroMQ queue, based modules by moving a pointer rather than copy- the beamline can still operate with the slowdown ing data. and backlog while ensuring that the pipeline can In conclusion, we have presented a framework that is continue to function, at least until hardware mem- built to run at modern beamlines, can handle the geo- ory runs out. This issue can also be mitigated by graphic considerations between users and experiments evaluating the performance of the module and if running at synchrotron facilities, and supports real- possible optimizing the algorithm as well. time feedback. These features, along with the modu- lar design, provide a foundation that can be extended 6 Conclusions & Future Work and readily deployed on many of the beamlines in use This work introduced Nanosurveyor - a framework today. Further information about Nanosurveyor is for real-time processing at synchrotron facilities. The available at http://www.camera.lbl.gov/software infrastructure provides a modular framework, support or upon request to [email protected]. for load-balancing operations, the ability to run in a distributed client-server mode, and gives feedback on Competing interests each stage of a complex pipeline. The authors declare that they have no competing interests. Daurer∗ et al. Page 8 of9

Author’s contributions 11. Almer, J., Chupas, P., Stephenson, B., Tiede, D., Vogt, S., Young, L., BJD, HK, TP, FM and SM designed and implemented the real-time Evans, P., Parise, J., Suter, B.: Emerging opportunities in high-energy streaming framework. BJD, HK, TP, JAS and SM wrote the manuscript x-ray science: The diffraction-limited storage ring frontier. Synchrotron with contributions from all. DAS translated the preprocessing code from Radiation News 29(1), 12–13 (2016). matlab to python and helped us testing the streaming ptychography doi:10.1080/08940886.2016.1124675. framework at the ALS. http://dx.doi.org/10.1080/08940886.2016.1124675 12. Reich, E.S., et al.: Ultimate upgrade for us synchrotron. Nature Acknowledgements 501(7466), 148–149 (2013) This work was partially funded by the Center for Applied Mathematics for 13. Tarawneh, H., Steier, C., Falcone, R., Robin, D., Nishimura, H., Sun, Energy Research Applications, a joint ASCR-BES funded project within the C., Wan, W.: Als-ii, a potential soft x-ray, diffraction limited upgrade Office of Science, US Department of Energy, under contract number of the advanced light source. In: Journal of Physics: Conference Series, DOE-DE-AC03-76SF00098, by the Swedish Research Council and by the vol. 493, p. 012020 (2014). IOP Publishing Swedish Foundation for Strategic Research. The Advanced Light Source is 14. Borland, M., Sajaev, V., Sun, Y.: A seven-bend-achromat lattice as a supported by the Director, Office of Science, Office of Basic Energy potential upgrade for the advanced photon source. Proc. of Sciences, of the U.S. Department of Energy under Contract No. NA-PAC2013, MOPHO07, Pasadena, California, USA (2013) DE-AC02-05CH11231. 15. Rodenburg, J.M.: Ptychography and related diffractive imaging Author details methods. Advances in Imaging and Electron Physics 150 (2008) 1 Laboratory of Molecular Biophysics, Department of Cell and Molecular 16. Rodenburg, J.M., Hurst, A.C., Cullis, A.G., Dobson, B.R., Pfeiffer, F., Biology, Uppsala University, Uppsala, SE. 2 Lawrence Berkeley National Bunk, O., David, C., Jefimovs, K., Johnson, I.: Hard-x-ray lensless Laboratory, Berkeley, CA, USA. 3 Department of Mathematics, University imaging of extended objects. Phys. Rev. Lett. 98, 034801 (2007). of California, Berkeley, Berkeley, CA, USA. doi:10.1103/PhysRevLett.98.034801 17. Thibault, P., Dierolf, M., Menzel, A., Bunk, O., David, C., Pfeiffer, F.: References High-Resolution scanning x-ray diffraction microscopy. Science 1. Cavalier, M.C., Pierce, A.D., Wilder, P.T., Alasady, M.J., Hartman, 321(5887), 379–382 (2008). doi:10.1126/science.1158573 K.G., Neau, D.B., Foley, T.L., Jadhav, A., Maloney, D.J., Simeonov, 18. Nashed, Y.S.G., Vine, D.J., Peterka, T., Deng, J., Ross, R., Jacobsen, A., et al.: Covalent small molecule inhibitors of ca2+-bound s100b. C.: Parallel ptychographic reconstruction. Opt. Express 22(26), Biochemistry 53(42), 6628–6640 (2014) 32082–32097 (2014). doi:10.1364/OE.22.032082 2. Westphal, A.J., Stroud, R.M., Bechtel, H.A., Brenker, F.E., 19. ptypy. http://ptycho.github.io/ptypy/ Butterworth, A.L., Flynn, G.J., Frank, D.R., Gainsforth, Z., Hillier, 20. Marchesini, S., Krishnan, H., Daurer, B.J., Shapiro, D.A., Perciano, J.K., Postberg, F., et al.: Evidence for interstellar origin of seven dust T., Sethian, J.A., Maia, F.R.N.C.: SHARP: a distributed GPU-based particles collected by the stardust spacecraft. Science 345(6198), ptychographic solver. Journal of Applied Crystallography 49(4), 786–791 (2014) 1245–1252 (2016). doi:10.1107/S1600576716008074 3. Uchiyama, H., Shen, K., Lee, S., Damascelli, A., Lu, D., Feng, D., 21. Apache Hadoop. http://hadoop.apache.org/ Shen, Z.-X., Tajima, S.: Electronic structure of mgb 2 from 22. White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media, Inc. angle-resolved photoemission spectroscopy. Physical review letters Sebastopol CA, ??? (2009) 88(15), 157002 (2002) 23. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: 4. Nazaretski, E., Huang, X., Yan, H., Lauer, K., Conley, R., Bouet, N., Spark: Cluster computing with working sets. In: Proceedings of the Zhou, J., Xu, W., Eom, D., Legnini, D., Harder, R., Lin, C.-H., Chen, 2Nd USENIX Conference on Hot Topics in . Y.-S., Hwu, Y., Chu, Y.S.: Design and performance of a scanning HotCloud’10, pp. 10–10. USENIX Association, Berkeley, CA, USA ptychography microscope. Review of Scientific Instruments 85(3) (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113 (2014). doi:10.1063/1.4868968 24. Apache Flink. http://flink.apache.org/ 5. Winarski, R.P., Holt, M.V., Rose, V., Fuesz, P., Carbaugh, D., Benson, 25. Apache Samza. http://samza.apache.org/ C., Shu, D., Kline, D., Stephenson, G.B., McNulty, I., et al.: A hard 26. . http://storm.apache.org/ x-ray nanoprobe beamline for nanoscale microscopy. Journal of 27. Jensen, F.V.: An Introduction to Bayesian Networks vol. 210. UCL synchrotron radiation 19(6), 1056–1060 (2012) press London, ??? (1996) 6. Shapiro, D., Roy, S., Celestre, R., Chao, W., Doering, D., Howells, M., 28. Luigi: A workflow engine in Python. Kevan, S., Kilcoyne, D., Kirz, J., Marchesini, S., et al.: Development https://luigi.readthedocs.io/en/stable of coherent scattering and diffractive imaging and the cosmic facility 29. Dask Development Team: Dask: Library for Dynamic Task Scheduling. at the advanced light source. In: Journal of Physics: Conference Series, (2016). http://dask.pydata.org vol. 425, p. 192011 (2013). IOP Publishing 30. Rocklin, M.: Dask: Parallel computation with blocked algorithms and 7. Doering, D., Chuang, Y.-D., Andresen, N., Chow, K., Contarato, D., task scheduling. In: Huff, K., Bergstra, J. (eds.) Proceedings of the Cummings, C., Domning, E., Joseph, J., Pepper, J.S., Smith, B., 14th Python in Science Conference, pp. 130–136 (2015) Zizka, G., Ford, C., Lee, W.S., Weaver, M., Patthey, L., Weizeorick, 31. Apache Kafka. http://kafka.apache.org/ J., Hussain, Z., Denes, P.: Development of a compact fast ccd camera 32. Apache ZooKeeper. http://zookeeper.apache.org/ and resonant soft x-ray scattering endstation for time-resolved 33. Daurer, B.J., Hantke, M.F., Nettelblad, C., Maia, F.R.: Hummingbird: pump-probe experiments. Review of Scientific Instruments 82(7), monitoring and analyzing flash x-ray imaging experiments in real time. 073303 (2011). doi:10.1063/1.3609862 Journal of applied crystallography 49(3) (2016) 8. Broennimann, C., Eikenberry, E.F., Henrich, B., Horisberger, R., 34. Hintjens, P.: ZeroMQ: Messaging for Many Applications. ” O’Reilly Huelsen, G., Pohl, E., Schmitt, B., Schulze-Briese, C., Suzuki, M., Media, Inc.”, ??? (2013) Tomizaki, T., Toyokawa, H., Wagner, A.: The pilatus 1m detector. 35. Riverbank Computing: (2016). Journal of Synchrotron Radiation 13(2), 120–130 (2006). http://www.riverbankcomputing.com/software/pyqt doi:10.1107/S0909049505038665 36. Campagnola, L.: (2016). http://pyqtgraph.org 9. Dinapoli, R., Bergamaschi, A., Henrich, B., Horisberger, R., Johnson, 37. van der Walt, S., Colbert, S.C., Varoquaux, G.: The numpy array: A I., Mozzanica, A., Schmid, E., Schmitt, B., Schreiber, A., Shi, X., et structure for efficient numerical computation. Computing in Science al.: Eiger: Next generation single photon counting detector for x-ray Engineering 13(2), 22–30 (2011). doi:10.1109/MCSE.2011.37 applications. Nuclear Instruments and Methods in Physics Research 38. Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open source Section A: Accelerators, Spectrometers, Detectors and Associated scientific tools for Python. [Online; accessed 2016-04-07] (2001–). Equipment 650(1), 79–83 (2011) http://www.scipy.org/ 10. Eriksson, M., Al-dmour, E., Ahlb¨ack,J., Andersson, A.,˚ Bocchetta, C., 39. ECMA international: (2016). http://www.ecma-international.org/ Johansson, M., Kumbaro, D., Leemann, S., Lilja, P., Lindau, F., et al.: publications/standards/Ecma-404.htm The max iv facility. In: Journal of Physics: Conference Series, vol. 425, 40. Maia, F.R.N.C.: The Coherent X-ray Imaging Data Bank. Nature p. 072008 (2013). IOP Publishing Daurer∗ et al. Page 9 of9

Methods 9(9), 854–855 (2012). doi:10.1038/nmeth.2110 41. Denes, P., Doering, D., Padmore, H., Walder, J.-P., Weizeorick, J.: A fast, direct x-ray detection charge-coupled device. Review of Scientific Instruments 80(8), 083302 (2009)