Titelfolie Mit Langer Überschrift, Zwei- Oder Mehrzeilig, TT Norms Pro, 28 Pt
Total Page:16
File Type:pdf, Size:1020Kb
A visual programming approach for intuitive handling of bioprocess data Robert Söldner,*1, Jonas Austerjost1, Simon Stumm1, David J. Pollard1 1 Sartorius AG, Corporate Research, August-Spindler-Straße 11, 37097 * Corresponding author: [email protected] Introduction This simplification process allows for the easy combination of workflow steps into “macro” elements, which enhances the clarity of the During the last decade the expansion of high throughput biology and application. Several of these overarching macro elements were automated bioprocessing workflows has progressed by the advances in implemented. These included the transfer of specific measured robotics, automation and sensor technology. As a result, the size and parameters (DO, pH, current volume) of a specific fermentation vessel number of datasets dramatically grew as did the complexity. An into designated tables, as well as string to number conversion upstream bioprocess engineer or scientist may have to manipulate & combined with a forward fill of the imported raw data. wrangle datasets from several different sources, such as online Furthermore, data visualization features were introduced as part of the bioreactor profiles, and offline sample analysis, to process one typical data processing “assembly line”. Again, the basic components of experiment. This data wrangling burden to the end users often KNIME were modified to fit the demanded visualization application counteracts any efficiency advantage from using high throughput tools. (see Figure 2). This generated a requirement for data aware scientists to intervene over the wrangling and processing of complex datasets. This inefficient approach desperately needs a streamlined workflow to eliminate the need of experts for repetitive data processing tasks. An innovative approach to simplify and democratize basic data processing tasks is the introduction of visual programming. Visual programming is a paradigm that enables programs or scripts to be created by connecting graphical elements and illustrations instead of writing complex computation as text. This allows an end user, with no prior coding experience, to describe a process using an intuitive drag’n’drop approach. This degree of freedom enables lab scientists to perform rapid data processing based upon functions already implemented as graphical building blocks. In addition, this frees data scientists to focus on more complex challenges, than general data wrangling tasks. Approach The initialized visual programming application is based on the free and open-source KNIME Analytics Platform [1]. For the realization of the component, several recurrent data wrangling tasks within an R&D bioprocessing laboratory were identified. For multi-parallel bioreactor Figure 2: Connected nodes, which depict a data wrangling and visualization process of multi-parallel bioreactor system system batch data, these data wrangling tasks included the removal of batch raw data. Plottable features of the preprocessed raw data file were selected and plotted in line plots. undefined data points (NaN – Not a Number), forward filling of data points, data visualization and more. Based on these identifications, All generated visual programming building blocks were evaluated by specific building components were implemented that cover the laboratory staff. The technique was identified as a fast an simple specific tasks. Subsequently the usability and the intuitiveness of the approach to perform basic data wrangling and visualization tasks. custom components was tested by lab scientists with no prior coding experience. Conclusion & Outlook Results The presented visual programming approach identified as a valuable tool for democratizing and accelerating the data wrangling and The implemented components are based on KNIME standard nodes, processing routine in a bioprocessing laboratory. Work that previously which were customized to fit to the structure and content of the was performed by SMEs can now be executed by all parties in a short bioprocessing raw data. These standard nodes were then condensed amount of time. Although the concept was assessed on bioprocessing into single components to remove the complexity of the workflow (see data, the initialized approach is applicable to various kinds of structured Figure 1). or unstructured data. A unified, and broadly agreed standard for A) B) analytical and process data could make the process of custom building blocks redundant, though. Future implementations are aiming to integrate data analysis components, such as SIMCA-Q [2]. I Dissolved Oxygen Date&Time D conc. The concept of visual programming is not limited to data processing. 2019-06-13 17:35:59,040 1 91,4165724839181 2019-06-13 17:36:10,677 1 91,4977221972207 The integration of already existing visual hardware, service and software 2019-06-13 17:36:22,710 1 91,4828750193322 2019-06-13 17:36:34,827 1 91,4680310948919 wiring platforms into the laboratory and manufacturing space could 2019-06-13 17:36:35,296 3 94,5678994962959 2019-06-13 17:36:47,292 1 91,4531904229532 Condensation 2019-06-13 17:36:47,760 3 94,6518133572467 facilitate and democratize equipment integration tasks and the setup of 2019-06-13 17:36:59,290 1 91,4531904229532 C) 2019-06-13 17:36:59,813 3 94,5834377966272 process automation pipelines. 2019-06-13 17:37:00,287 4 95,6246640379906 2019-06-13 17:37:11,483 1 91,3572556621537 2019-06-13 17:37:11,952 3 94,6673653803266 2019-06-13 17:37:12,421 4 95,370168263591 References [1] KNIME Analytics Platform - https://www.knime.com/knime- Figure 1: Connected nodes (A), which depict the data wrangling process of multi-parallel bioreactor system batch raw data analytics-platform into a structured table with the desired features (B). A simplification of the data wrangling workflow was introduced by condensing the processing nodes of flow A) into a single component (flow B). [2] SIMCA-Q Data Analysis Software - https://umetrics.com/products/simca-q.