Having Your Cake and Eating It Too: Scripted Workflows for Image Manipulation

Having Your Cake and Eating It Too: Scripted Workflows for Image Manipulation

Having your cake and eating it too: Scripted workflows for image manipulation Paul A. Thompson, Ph.D Norm Matlo, Ph.D. Sanford Research & Health University of California-Davis University of South Dakota-Vermillion [email protected] [email protected] Alex Fu Ariel Shin Princeton University University of California-Davis [email protected] [email protected] ABSTRACT “Scientic workows attempt to automate The reproducibility issue in science has come under increased repetitive computation and analysis by chain- scrutiny. One consistent suggestion lies in the use of scripted ing together related processes. Automating methods or workows for data analysis. Image analysis is one repetitive time-consuming tasks allows scien- area in science in which little can be done in scripted methods. tists to keep pace with ever-growing volumes The SWIIM Project (Scripted Workows to Improve Image of data. Furthermore, workows can aid in Manipulation) is designed to generate workows from pop- the reproducibility of scientic computations ular image manipulation tools. In the project, 2 approaches by providing a formal declaration of an analy- are being taken to construct workows in the image analysis sis. Reproducibility is central to the scientic area. First, the open-source tool GIMP is being enhanced to method, and detailed workow provenance produce an active log (which can be run on a stand-alone basis in- formation ensures an analysis can be re- to perform the same manipulation). Second, the R system produced and extended.” [21] Shiny tool is being used to construct a graphical user interface Workows have been the subject of much investigation. They (GUI) which works with EBImage code to modify images, and take many dierent forms. Some workows are dened by to produce an active log which can perform the same oper- an interactive process, while others are dened by scripts. In ations. This process has been successful to date, but is not using a scripted workow, the process which the workow complete. The basic method for each component is discussed, performs becomes public, transparent, and reproducible. and example code is shown. ACM Reference format: 1.2 Reproducibility Paul A. Thompson, Ph.D, Norm Matlo, Ph.D., Alex Fu, and Ariel Reproducible research methods are increasingly important Shin. 2016. Having your cake and eating it too: Scripted workows for in science([11][18][19][23][32][31][39][42]). “Reproducibility image manipulation. In Proceedings of ACM Conference, Washington, of research” is dened by the dierent issues which result in DC, USA, July 2017 (Conference’17), 7 pages. problems obtaining the same results from a study. Are the DOI: 10.1145/nnnnnnn.nnnnnnn results the same from a second processing of the same data? “Getting the same result” can mean dierent things. The 1 most specic is seemingly the simplest: When reanalysing a given dataset using the same methods, can identical out- 1 INTRODUCTION come values (test statistics, p values, summary statistics) be arXiv:1709.07406v1 [eess.IV] 29 Aug 2017 1.1 Workows in science obtained? This may be termed “data reproducibility”. A some- what dierent form of reproducibility may involve running A scientic workow is a tool to structure and regularize a the study again with dierent subjects to examine the “scien- process ([12] [21] [15]). A good description is as follows: tic reproducibility” of the study. Each type of reproducibility 1The authors would like the acknowledge the contributions of Douglas Cromey, examines the dierent aspects to the degree to which the re- MS, University of Arizona, to the ideas in the SWIIM project. sults of a study are repeatable. Science involves determining a process which can be repeated and produce the same results, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made and thus reproducibility is the essence of science. or distributed for prot or commercial advantage and that copies bear this Obtaining the same result from a given set of data sounds notice and the full citation on the rst page. Copyrights for components of this obvious and trivial, but there are a number of reasons why work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute this can be problematic. First, certain types of analyses are to lists, requires prior specic permission and/or a fee. Request permissions not closed-form but rather are iterative and approximating from [email protected]. (with a loss function and convergence criteria). Unless the Conference’17, Washington, DC, USA © 2016 ACM. 978-x-xxxx-xxxx-x/YY/MM...$15.00 same convergence criteria, start values, and step sizes are DOI: 10.1145/nnnnnnn.nnnnnnn used, it is entirely possible to get dierent outcomes. This Conference’17, July 2017, Washington, DC, USA Thompson et al is particularly true in cases in which the outcome surface is Image processing is primarily done using interactive tools relatively at. Second, analyses may be done in an interactive such as Adobe Photoshop([16]) ImageJ,([9]) and GIMP.([40]) manner, and thus the tracking of the exact processes involved These programs can read in images, modify them in many can sometimes be dicult. When interactive methods are ways, and save the results. It is sometimes dicult to re- used, it is possible that steps are forgotten, or that steps are produce the interactive process of producing an image for a done in dierent orders, or that the specic details in a step publication from a source image. This is due in part to the are not correctly noted. Third, the version of software used use of the computer mouse, and partly due to the diculty of for analysis may change from one use to the next. Newer remembering operations. versions can include dierent convergence criteria or even When images are prepared for scientic presentation, re- dierent methods for estimation. Fourth, the persons using producibility problems are common. The diculties in repro- the software can be dierent, and use the software in dierent ducibility, due to the interactive nature of the process, partly ways. In the well-known Potti et al case, the original data were arise due to the “semi-continuous” nature of the process. When analyzed by a physician who was not well trained in proper cropping (selecting a small part of the picture for presenta- data analysis, proper data storage, or proper use of training tion), a selection is made using the mouse. Although this is and validation samples ([1][33]). Later analysis by better- done using positions which are numbers, the scale is large and trained bioinformatics scientists found many errors, including the position is dicult to remember exactly. When increasing changes in the version of the main data analysis tool ([1]). brightness-contrast, the increases are done using a scale which In producing scientic articles, the data must be structured emphasizes relative amounts; the exact value is a number, but for the analysis rst. This is the “data management” process, the number is likely not remembered exactly. While a person and is often a key step in the process. Values are corrected. could remember such values, the exact numbers are quite dif- Occasionally data are removed. The statistical analysis which cult to remember, and the process is not condusive to simple examines the data is next performed. Again, this must be recollection. carefully documented to produce valid outcomes ([31][5][38] Image fraud is a serious and pressing issue in science([4] [43]). Scripted methods (i.e., analysis performed using pro- [24][35][27]). Image fraud includes a number of processes grams of computer code) are necessary for reproducible results (e.g., image reuse, improper preparation, improper combina- ([32][5][38][30]). The code can be inspected, transfered to tion of images). The “Retraction Watch” blog provides a con- others, used on more than one project, and modied easily. It temporaneous record of research fraud.[22] In examining this also functions as the memory of the project([41]). blog, it is clear that a large proportion of retractions involve The use of analysis code also is “transparent” or able to be image fraud. As of 2017/03/31, 512 of the entries in the blog inspected by others. Transparent, scripted code ensures that are related to image fraud. Improper image preparation occurs the author of a scientic document can produce the same re- commonly; some reports suggest that 25% of all submissions sults later, and can demonstrate to others (e.g., journal editors, to journals have improper image preparation.[35] 20 years of colleagues) exactly how the published information was cre- discussing the problem have not reduced the incidence of the ated from source materials. In science, repeating an analysis problem. Dierent approaches are needed. must produce the same result. 1.3 Image manipulation 1.4 Journaling Scientic image manipulation is a key part of many areas, Writing code for analysis is a dicult skill. Interactive methods particularly basic biology and chemistry ([25][4][8][24][34] for data analysis are preferred by some as being simpler and [36][35]). It is the process of preparing images for publication. more intuitive. When an analysis is performed by a graphical Scientic journals have clear and well-dened requirements user interface (GUI; a window with buttons and controls), for proper preparation of images ([25][8][36]). Such scientic this is termed an “interactive approach”. The reproducibility image manipulation follows general guidelines: of interactive approaches is questioned by many.[38] That is (1) specic features may not be changed or modied; because interactive methods often involve important but small (2) adjustments to the full image (brightness, contrast decisions, which are often hard to remember and write down.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us