FishInspector Knime Workflows guide

Copyright © 2016-2018 Elisabet Teixido, Stefan Scholz - Helmholtz Centre for Environmental Research - UFZ

(www.ufz.de, [email protected])

Contents 1 KNIME installation 2 1.1 Install Extensions and Integrations 3 1.2 Additional KNIME Image Processing Plugins 4

2 Install packages 5 2.1 Installing R package in R studio 6 2.2 Required R packages 6 2.2.1 Momocs package installation 7 2.3 Path to R Home in Knime 7

3 FishInspector features workflow 8 3.1 Instructions 9 3.2 Description of the raw data 11

4 Control variability workflow 14 4.1 Instructions 14

5 Concentration-response analysis, FishInspector endpoints with threshold values. 15 5.1 Instructions 15

1 1 KNIME installation

1. Go to the download page to start downloading KNIME Analytics Platform 2. The download page shows three tabs which can be opened individually:

o Register for Help and Updates: here you can optionally provide some personal information and sign up to the mailing list to receive the latest KNIME news o Download KNIME: this is where you can download the software o Getting Started: this tab gives you information and links about what you can do after you have installed KNIME Analytics Platform 3. Now open the Download KNIME tab and click the installation option that fits your . Notes on the different options for Windows:

o The Windows installer extracts the compressed installation folder, adds an icon to your desktop, and suggests suitable memory settings. o The self-extracting archive simply creates a folder containing the KNIME installation files. You don’t need any software to manage archiving. o The zip archive can be downloaded, saved, and extracted in your preferred location on a system to which you have full access rights. 4. Read and accept the privacy policy and terms and conditions. Then click Download. 5. Once downloaded, proceed with installing KNIME Analytics Platform.

2 1.1 Install Extensions and Integrations

The extensions used with the Knime workflows are the following: - Interactive R statistics integration - Quick forms - Image processing and Image J integration (beta) - JFree Chart - Vernalis Knime Nodes

Install extensions by:

● Clicking "File" on the menu bar and then "Install KNIME Extensions…"

Figure 1. Installing Extensions and Integrations ● Typing the extension to filter text or/and selecting the extensions you want to install ● Clicking "Next" and following the instructions ● Restart KNIME Analytics Platform

3

1.2 Additional KNIME Image Processing Plugins

Follow the steps below to install more plugins, like ImageJ, necessary for the workflow that rotates, crops and includes a virtual capillary into the images.

● Start KNIME ● Click on “Help” -> “Install New Software” Click on “Manage...”

● Activate the Stable Community Contributions Update-Site

4

● Click on “Apply and close” to confirm your settings ● In KNIME click on “Files” -> “Install KNIME Extensions” ● Select KNIME Community Contributions -> Imaging -> KNIME Image Processing - ImageJ Integration (Beta) ● Click "next" to install the plugin 2 Install R packages

If you don’t have R installed, download and install R.

Packages can be installed with the install.packages() function in R. To install a single package, pass the name of the package to the install.packages() function as the first argument. By typing the following code the ggplot2 package is installed from CRAN. install.packages("ggplot2")

This command downloads the ggplot2 package from CRAN and installs it on your computer. Any packages on which this package depends will also be downloaded and installed.

You can install multiple R packages at once with a single call to install.packages(). Place the names of the R packages in a character vector. You may simply copy-paste the command below to install the required packages.

5 install.packages(c("Rserve", "ggplot2", "devtools"))

2.1 Installing R packages in R studio ● Select “Tools → Install Packages”

● Provide the name of the package and the path to the R library

2.2 Required R packages

Required for Knime: - Rserve For plots: - ggplot21 Morphometric analysis: - Momocs2 - features3

1 H. Wickham. ggplot2: Elegant Graphics for . Springer-Verlag New York, 2016. 2 Vincent Bonhomme, Sandrine Picq, Cedric Gaucherel, Julien Claude (2014). Momocs: Outline Analysis Using R. Journal of Statistical Software, 56(13), 1-24. 3 Ravi Varadhan, Johns Hopkins University, MKG Subramaniam and AT&T Reserach Labs. (2015). features: Feature Extraction for Discretely-Sampled Functional Data. R package version 2015.12-1.

6 - sp4 Concentration-response analysis - drc5 - qpcR6 - DescTools7 Correlation analysis: - corrplot8 - Hmisc9 - PerformanceAnalytics10

2.2.1 Momocs package installation

In R type:

install.packages("devtools") devtools::install_github("vbonhomme/Momocs")

2.3 Path to R Home in Knime

Be sure that the path to R home in Knime is the correct one. ● Open Knime go to “File → Preferences”

4 Pebesma, E.J., R.S. Bivand, 2005. Classes and methods for spatial data in R. R News 5 (2). 5 Ritz, C., Baty, F., Streibig, J. C., Gerhard, D. (2015) Dose-Response Analysis Using R PLOS ONE, 10(12), e0146021 6 Andrej-Nikolai Spiess (2018). qpcR: Modelling and Analysis of Real-Time PCR Data. R package version 1.4-1. 7 Andri Signorell et mult. al. (2018). DescTools: Tools for descriptive statistics. R package version 0.99.25. 8 Taiyun Wei and Viliam Simko (2017). R package "corrplot": Visualization of a Correlation Matrix (Version 0.84). 9 Frank E Harrell Jr, with contributions from Charles Dupont and many others. (2018). Hmisc: Harrell Miscellaneous. 10 Brian G. Peterson and Peter Carl (2018). PerformanceAnalytics: Econometric Tools for Performance and Risk Analysis.

7

● On the preference window go to KNIME → R ● Check or browse to introduce the correct path to R software. The path should be C:\Program Files\R\R-3.4.0, i.e., your R home directory.

8 3 FishInspector features workflow This workflow extracts the data of the JSON files generated by the FishInspector software and conducts an analysis of the following endpoints: - Yolk sac size - Eye size - Pericard size - Otolith-eye distance (for 96hpf) or head trunk-angle (for 48hpf) - Maximum tail curvature and three tail angles (equidistant points along the notochord) - Body length - Pigmentation - Head size - Lower jaw distance and mandibular jaw distance (for 96hpf)

3.1 Instructions In the “INPUT” yellow box you will find the nodes that must be modified.

1. Double click on the List files node to browse for the folder that contains all the JSON files.

2. Double-click on the Single Selection node and select stage of embryos from Default Value: 48hpf or 96hpf.

NOTE: You may analyse other embryo stages. This selection impacts on the analysis of jaw features and swim bladder (will be only analysed if 96hpf is selected). Head size is also analysed differently for 48 h old embryos. Check supplementary information in our manuscript11.

11 Teixidó E, Kießling TR, Krupp E, Quevedo C, Muriana A, Scholz S. Automated Morphological Feature Assessment for Zebrafish Embryo Developmental Toxicity Screens. Toxicol Sci. 2019. 177(2), 438-449.

9 Important: If you select 96 hpf you should select the lower jaw tip of all your embryos, otherwise the workflow will fail.

3. Configure your plate layout and fill test concentrations or load a plate layout (XLS Reader).

The plate layout must contain at least four columns:

Treatment: string cell indicating the treatment level Concentration: double cell indicating the tested concentration Units: string cell indicating the units of concentration Well: string cell indicating the well/name of image

10 4. Double click on Scale selection wrapped node and select input pictures (VAST, LAS or MANUAL). This is required to select the scale conversion from pixels to mm.

Note: VAST and LAS are predefined scales that we use, select MANUAL to input your scale under Manual scale box (how many pixels are 1 mm in your pictures). By default is set to 1 (that would be no scale conversion, results will be in pixels).

5. Introduce some metadata (optional), e.g. compound tested, experiment number or Cas N.

6. Define your output file, introduce the name of the file on the XLS writer node with yellow border. Save as .xls to view the images correctly. File name should contain a suffix of the stage used for the analysis either _48hpf.xls or _96hpf.xls.

7. Execute the workflow.

3.2 Description of the output data The output xls file contains different sheets:

Metadata – contains the metadata and also the box whisker plots of all features.

Trunk – It contains the mean, median, 25 and 75 quantile of the body length, tail length and Sum area of pigment cells from treatments and embryo count (Well(count)).

11 OE and HTA – It contains the mean, median, 25 and 75 quantile of the otolith-eye distance and head-trunk angle from different treatments and embryo count (Well(count)). Pericard and head – It contains the mean, median, 25 and 75 quantile of the pericard and head size from different treatments and embryo count (Well(count)). Eye and yolk sac – It contains the mean, median, 25 and 75 quantile of the eye area and yolk sac size from different treatments and embryo count (Well(count)). Raw_grouped – contains the “raw data” transformed from pixel to mm. This is the sheet used as input for next workflows (section 4 and 5). Jaw and swim bladder – It contains the mean, median, 25 and 75 quantile of the swim bladder size and jaw-eye distance and angle from different treatments and embryo count (Well(count)). Raw data – represents the features analysed with the JSON files. Each row represents the data and plots extracted from one fish embryo image, and contains the following data/plots: ● URL – full path to the JSON file analysed with the workflow. The URL contains data on well number/file number to associate them with the plate layout. ● Notochord plot (rotated) – Plot displaying the notochord (as the middle line of the two notochord lines identified with the FishInspector). The notochord is rotated to display start and end horizontally. ● Pts_spline – Points identified along the notochord in the x coordinate (px) where a curvature was identified with the features function in R. ● Curvature – Maximum curvature value along the fish notochord. ● Tail malformation analysis (plot) – Plot obtained after using features function in R. The two top plots are the smoothed function of the line, the left bottom plot display the first derivative and the right bottom plot the second derivative of the line. ● Chordal tail distance – chordal distance in pixels of the tail ● fishOrientation.horizontally.flipped – Indicates the orientation of the fish on the horitzontal plane (see section 6.5.2 of the FishInspector User guide) ● fishOrientation.vertically.flipped –Indicates the orientation of the fish on the vertical plane (see section 6.5.2 of the FishInspector User guide) ● Angle A / B / C – Angles of three equidistant points along the tail of the fish. ● Notochord plot with 4 equidistant points – plot of the equidistant points obtained along the fish tail. ● NumPixY – surface area in pixels of the yolk sac. ● NumPixEd – surface area in pixels of the pericard. ● NumPixE – surface are in pixels of the eye. ● Yolk sac elongation –yolk sac elongation (shape descriptor). ● Shape plot – This plot displays the shape of the yolk sac, eye and pericard obtained with the package Momocs in R. ● Yolk min / max – Minimum and maximum X coordinate of the yolk outline. ● Contour min / max – Minimum and maximum X coordinate of the contour outline. ● Bitmask Head region –Black and white image of the head area region selected. ● Head size – Surface area in pixels of the head region. ● Fish Head plot – Outline of the head, displaying the fish contour, eye contour and lines that delimit the head region selected.

12 ● Centroid X/Y – Centroid of the eye shape. ● Otolith.point.x/y –coordinates of the biggest otolith. ● Otolith-eye distance –Distance in pixels between the otolith and eye centroid. ● Area (mean, median, max, min, sum) – Pigment cells surface area in pixels: mean, median, minimum area, maximum area and total sum of the surface area of pigment cells detected. ● Contrast (mean, median) – Mean and median contrast of the pigment cells detected. ● ID2 (count) – Total number of pigment cells detected. ● Bladder size – Surface area in pixels of the swim bladder. ● Swim bladder (plot) – Plot displaying the shape of the swim bladder detected. ● Head-trunk angle – Angle between the head (eye as reference) and trunk. Calculated as described in (Kimmel et al., 1995)12. ● Mandibular arch distance –Distance between the eye and lower mandibular arch (taking into account the contour coordinates at the eye position). ● Manual.point.x/y –Coordinates of the manual point inserted with manual selection in FishInspector (this point is used for the calculation of the jaw descriptors) ● Angle jaw-eyeotolith – Angle formed between the jaw, eye centroid and otolith point.

12 Kimmel, C. B., Ballard, W. W., Kimmel, S. R. et al. (1995). Stages of embryonic development of the zebrafish. Dev Dyn 203, 253–310. https://doi.org/10.1002/aja.1002030302.

13 4 Control variability workflow

This workflow allows obtaining threshold values of the features by analysis of control variability (mean, standard deviation of data). It also creates histograms for each feature. These threshold values will be used to calculate the fraction of embryos affected for each endpoint. You may conduct control variability not for each experiment/replicate but for a set of experiments and use the same threshold for a series of experiments. 4.1 Instructions

1. Double click in List files node and browse to select folder that contains the FishInspector .xls files (the files should indicate the stage, 48h or 96h, in the file name). These files had been generated by the previous workflow.

2. Define the output xls name (Xls writer node inside red boxes)

3. Execute the workflow

14 5 Concentration-response analysis, FishInspector endpoints with threshold values.

This workflow allows you to derive concentration-response curves from the features analysed in the FishInspector (section 3). The upper part of the workflow analyses the data from 48 hpf embryos and the bottom part for 96 hpf.

5.1 Instructions

1. Double click on List files node and browse to select the folder that contains the FishInspector xls files (the files should indicate if the stage is 48h or 96h in the file name). See 4.1. for instructions how to generate these files.

2. Load the threshold values in the xls reader nodes (obtained from the control variability workflow).

3. To set the threshold, double click on the Threshold wrapped metanode , default is set to 1. We recommend that a threshold is used that refers to 1, 1.5 or 2-fold of the control standard deviation. Low threshold increase variability but provide higher sensitivity. High threshold result in more robust concentration- response analysis.

15 4. Double click on Hill model wrapped metanode to adjust control value display. It should be adjusted to correctly display the control in the log plots.

In this node it also possible to modify the Hill model constraints (minimum and maximum). For the type of analysis provided here, the min value may be set to “0”. If “0” is selected, curve fittings would constraint to “0” for low tested concentrations.

The image on the left doesn’t display the control values, in contrast the image on the right that it does.

5. Define output in the XLS writer node.

Upper part of the workflow analyses 48 h, lower part workflow is for 96 h old embryos. The workflow can be adapted to your needs, the difference between the upper and lower branch is the generation of concentration-response curves for the swim bladder and jaw features, which are only generated for 96 hpf embryos.. Save as .xls to correctly display the images.

6. Execute the workflow

The output consist of six xls sheets: ● 96 hpf DR : Summary of effect concentration values. The data is filtered by the Conrad Armitage test with a p-value of 0.01 (It filters the concentration-response curves that display a significant trend - increase in abnormal embryos)and also by maximum tested concentration. It can be adjusted in the clean data metanode next to the xls writer output.

16 The endpoint pigmentation is not filtered and should be checked (based on discrete data and not frequencies). ● DR graphs 96hpf: Concentration-response curves for all endpoints. ● Raw data 96hpf: raw data with the frequency of affected embryos for each endpoint. ● Corr 96hpf: Pearson correlation coefficients among endpoints analysed. ● Corr graph 96hpf: Summary graph of the correlation between endpoints. ● Z score 96hpf: Heatmap and z-scores of selected endpoints.

17