NBDiff Documentation Release 1

Shurouq Abusalah, Tavish Armstrong, Marwa Malti, Lina Nouh, Borislav Pipev, Selena Sachdeva, Richard Tang

August 26, 2014

Contents

1 Proposal 3 1.1 Preamble...... 3 1.2 Proposal...... 3 1.3 Challenges...... 3 1.4 Conclusion...... 4 1.5 References...... 4

2 Vision Document 5 2.1 Introduction...... 6 2.2 User Description...... 6 2.3 Product Overview...... 8 2.4 Feature Attributes...... 9 2.5 Product Features...... 9 2.6 Exemplary Use Cases...... 10 2.7 Quality Ranges...... 10 2.8 Precedence and Priority...... 10 2.9 Other Product Requirements...... 10 2.10 Documentation Requirements...... 10 2.11 Glossary...... 11

3 Project Management Plan 13 3.1 Identification...... 14 3.2 Team and Responsibilities...... 14 3.3 Work Breakdown Structure, Tasks and Planning...... 15 3.4 Resource Identification...... 19 3.5 Relationships with project stakeholders...... 19 3.6 Communication...... 20 3.7 System Requirements and Project Input Data...... 20 3.8 Configuration Management...... 20 3.9 Software Configuration Management...... 20 3.10 Documentation Configuration Management...... 21 3.11 Software Development Management...... 21 3.12 Software Development Process...... 21 3.13 Software Development Tools...... 21 3.14 Software Development Rules and Standards...... 22 3.15 Test Phases Management...... 22 3.16 Problem Resolution...... 23

i 4 Activity Plan 25

5 Hour Tracking 27 5.1 1. Milestone 1...... 27 5.2 2. Milestone 2...... 27 5.3 3. Milestone 3...... 28 5.4 4. Milestone 4...... 28 5.5 5. Milestone 5...... 28 5.6 6. Milestone 6...... 28

6 Earned Value Management 31

7 Data 33

8 Graphs 35

9 User Interface Specification 37 9.1 List of Figures...... 37 9.2 Introduction...... 38 9.3 User Centered Design...... 38 9.4 Overall User Interface Architecture...... 40 9.5 Navigation...... 40 9.6 Feedback...... 41 9.7 Screen Layout...... 41 9.8 Business Rules...... 44 9.9 Future Enhancements...... 44

10 Software Requirements Specification 45 10.1 List of Figures...... 47 10.2 Introduction...... 47 10.3 Overall Description...... 48 10.4 Specific Requirements...... 50 10.5 Non-Functional Requirements...... 51 10.6 Design Constraints...... 52 10.7 Documentation Requirements...... 52 10.8 Purchased Components...... 52 10.9 Licensing Requirements...... 53 10.10 Legal, Copyright and Other Notices...... 53 10.11 Analysis Models...... 53 10.12 Information on Version Control Systems...... 61

11 Diffing-Related Use Cases 71 11.1 ID: DUC1 - Diff Notebooks Locally from Version Control...... 72 11.2 ID: DUC2 - Diff Notebooks Locally from Files...... 74 11.3 ID: DUC3 - Diff from Notebooks on A Remote Server...... 76

12 Merging-Related Use Cases 77 12.1 ID: MUC1 - Locally Resolve Merge Conflicts in Notebooks in Version Control...... 78 12.2 ID: MUC2 - Locally Resolve Merge Conflicts in Notebooks from the Filesystem...... 80 12.3 ID: MUC3 - Merge Notebooks Located on A Remote Server...... 82 12.4 ID: MUC4 - Resolve Merge Conflicts Remotely in Version Control Pull Request...... 84

13 Scoped-out Use Cases 85 13.1 ID: DUC4 - Selective Staging...... 86

ii 14 Risk Management Plan 87 14.1 Introduction...... 87 14.2 Roles and Responsibilities...... 87 14.3 Risk Assessment...... 88 14.4 Risk Analysis...... 88 14.5 Risk Control...... 89

15 Software Architecture Document 91 15.1 List of Figures...... 91 15.2 Introduction...... 92 15.3 Architectural Representation...... 93 15.4 Architectural Requirements: Goals and Constraints...... 94 15.5 Scenarios...... 95 15.6 Logical View...... 95 15.7 Development View...... 106 15.8 Process View...... 108 15.9 Deployment View...... 108 15.10 Size and Performance...... 109 15.11 Quality...... 110 15.12 References...... 111

16 Test Plan Document 113 16.1 List of Figures...... 114 16.2 Introduction...... 114 16.3 Compatibility Testing...... 115 16.4 Regression Testing...... 123 16.5 Acceptance Testing...... 123 16.6 Alpha Testing...... 123 16.7 Beta Testing...... 123 16.8 Performance Testing...... 124 16.9 Usability Testing...... 124 16.10 Defects/Bugs Management...... 124 16.11 Maintenance Testing...... 124

17 Test Summary Report 125 17.1 List of Figures...... 125 17.2 Introduction...... 125 17.3 Test Summary...... 125 17.4 Test Assessment...... 126 17.5 Test Tesults...... 126

18 Comparison of Existing Diff/Merge Tools 139 18.1 Overview...... 139

19 New Requirements for Capstone Projects 201 19.1 Impact OF engineering on society...... 201 19.2 Ethics and equity...... 201 19.3 Professionalism...... 201 19.4 Economics...... 202 19.5 Lifelong Learning...... 202

20 Glossary 203

21 Authors 205

iii 22 Indices and tables 207

iv NBDiff Documentation, Release 1

Contents:

Contents 1 NBDiff Documentation, Release 1

2 Contents CHAPTER 1

Proposal

1.1 Preamble

The IPython Notebook is a “web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document.”[1] For scientists, this means a new way of doing the work they have always done: combine observations, results, and analysis in a single document. Historically, this analogue notebook would be the basis for a scientific paper. Today, this electronic IPython Notebook is the scientific paper. Scientists are moving more of their work onto computers and applying software solutions and so they are adopting more of the tools of the software trade. One important tool they have adopted—one that software engineers have been using for decades—is version control. For scientists, this means knowing the ways in which their scientific paper has changed since their last publication; having a precise understanding of who contributed what to each paper revision; and more. It also enables scientists to share their work with other scientists in their field, improving upon peer review methods. However, sharing involves a clear and precise understanding of what their peers have changed. It gets more complex when they are changing similar parts of their scientific code simultaneously. Displaying these changes in a sensible fashion is challenging, and even modern version control tools like git are not up to the challenge.

1.2 Proposal

We would like to propose a diffing and merging tool for the IPython Notebook tool. Modern version control systems operate on lines of code—a reasonable unit to work with for software. However, we would like version control systems to work with formats that are more complicated than text-based source code. The IPython Notebook is one example. Any Notebook might include source code, prose, LaTeX, images, raw data from experiments, and more. Comparing elements of these types can be challenging.

1.3 Challenges

A tool like this is complex because it involves building on existing, well-tested diffing algorithms available in packages like git or subversion. However, these packages operate on lines of code, not cells, like the IPython Notebook. Finding the difference between two comparable cells is as simple as performing a normal line-based diff in most cases. However, the task becomes more difficult when determining whether or not two cells are comparable. Similarity matches that look within cells to score how close the contents are to each other when doing cell-alignment could be a project in itself.

3 NBDiff Documentation, Release 1

The contents of the IPython Notebook are varied and many diffing algorithms will need to be created for each kind of content. This includes image- or plot-diffing to determine whether or not a plot of numerical data has changed since the last snapshot; finding the difference between two tables of data; and more. New kinds of content are constantly being added to the Notebook, and our tool will need to be able to handle them. Diffing and merging tools have been used by software engineers for decades, but introducing them to scientists is novel. Since the scientists using this tool might not be experts when it comes to software development, building a usable tool is important. Thorough usability testing (ideally in a controlled lab environment) and UI design will need to be done in order to verify that the target users can use it with ease and accuracy. This diffing and merging tool will need to be extended by external library users, before and after our project completes. Architectural clarity and documentation will be necessary for people outside our project group to develop plugins that work well with our architecture. Consulting the needs of these plugin writers will be important to the success of this part of the project as reusability will be an important feature. Testing this project will be the most difficult part: verifying the correctness of the diffing algorithms, ensuring good performance in the face of complex merges (in some cases through use of parallelization), fuzz testing of invalid and potentially dangerous inputs from any source (including a malicious source), and UI testing of a dynamic graphical UI will need to be performed. The project will have many users quickly and performing this testing is crucial to their trust in the project.

1.4 Conclusion

We would like to build the IPython Notebook diffing and merging tool as our capstone project. We believe it is challenging and sizable, worthy of a year of hard work.

1.5 References

[1] “The IPython Notebook — IPython”. http://ipython.org/notebook.html

4 Chapter 1. Proposal CHAPTER 2

Vision Document

Table of Contents • Vision Document – Introduction * Purpose – User Description * User/Market Demographics * User Profiles * User Environment * Key stakeholder and user needs * Alternatives and Competition – Product Overview * Product Perspective * Product Position Statement * Summary of Capabilities * Assumptions and Dependencies * Cost and Pricing – Feature Attributes – Product Features – Exemplary Use Cases – Quality Ranges – Precedence and Priority – Other Product Requirements * Applicable Standards * System Requirements * Performance Requirements * Environmental Requirements – Documentation Requirements * User Manual * Online Help * Installation Guides, Configuration and Readme Files * Labelling and Packaging – Glossary

5 NBDiff Documentation, Release 1

2.1 Introduction

This document describes the overall goals of the project, the project’s main stakeholders, the target users and demo- graphics of the software, as well as the main features that the software will contain.

2.1.1 Purpose

The purpose of this document is to describe the high level needs and features of the NbDiff tool. It focuses on the functionality needed by the stakeholders, and the end-users, as well as why these needs exist. The document is structured as a standard Vision document, containing within it the user description, stakeholder needs, alternatives and the product features. The intended audiences of this document are the professors grading the SOEN 490 capstone documents, Greg Wilson and other stakeholders working on the IPython notebook. This document is intended to converge the understanding of the software to be, between the stakeholders and the development team.

2.2 User Description

The target users of NbDiff are scientists who use the IPython Notebook in order to produce reproducible research.

2.2.1 User/Market Demographics

The target users of NbDiff are scientists who are, or will be, using the IPython Notebook to perform reproducible computational research in science, technology, engineering and medicine (STEM). Our target users are scientists with a wide range of computational ability, including beginner and expert . The NbDiff project seeks to make version control of Notebooks easier; including collaboration with multiple scientists (for instance, in a lab). Most of our users will be graduate students, but some will be faculty members of universities, and others will be researchers working in publicly- or privately-funded labs. In most cases, the users will have at least a bachelor’s degree in a technical field, though some universities are using Notebooks as part of their undergraduate curriculum.

2.2.2 User Profiles

The target audience of our stakeholder’s education program can be found here: http://software- carpentry.org/audience.html We hope to cater NbDiff to users of the following types: academic researcher (beginner ), academic researcher (expert programmer), peer reviewer, study reproducer, undergraduate student, data analyst, blogger. The most important user types are documented in detail below: Representative Scientist / Beginner Programmer (User-SBP) Description This user is typically a graduate-level student in a university who performs research in the sciences, engineering, or medicine fields. While their research relies on performing computations on data they have collected through experiments, or writing simulations of theoretical phenomena, they have received no formal training in programming. This user is the most important user to our primary stakeholder, as our stakeholder trains these scientists to become better programmers. Level Beginner Responsibilities Analyze data by creating models, calculating statistical results, or creating simulations. Producing plots and tables of data for exploratory purposes or for inclusion in published research. Collaborating on IPython Notebooks with other researchers.

6 Chapter 2. Vision Document NBDiff Documentation, Release 1

Success criteria Accepting changes from other researchers to their Notebooks and successfully produc- ing correct, merged Notebooks.

Representative Scientist / Expert Programmer (User-SEP) Description This user is typically a graduate-level student or faculty member in a university who per- forms research in the sciences, engineering, or medicine fields. Their research is computationally intensive, including complex calculations and algorithms for producing exploratory and publishable results. They are comfortable with various programming/scripting languages and employ a wide variety of tools, not limited to testing frameworks, makefiles, and version control. Level Expert Responsibilities The User-SEP wishes to produce reproducible research in the form of IPython Note- books that other researchers can run. In some cases they will publish the Notebooks alongside a paper, or in some cases, submit the Notebook as the paper itself. In addition to authoring the Note- books themselves, they also collaborate with other researchers on them. They need to be able to effectively perform code reviews on the changes before they are merged into the master Notebook. Currently, typical diffing tools do not show a meaningful changeset that this reviewing researcher can understand. Success criteria Accepting changes from other researchers to their Notebooks and successfully produc- ing correct, merged Notebooks. Reviewing changes to Notebooks. Finding bugs or code smells in new/modified code. Understanding the changes between Notebooks to determine when scientific results change for providence.

Representative Data Analyst (User-DA) Description This user is typically a well-educated statistician or computer scientist employed by a for- profit company interested in analyzing a data set for strategic gain. They are well-versed in data analysis toolkits like the R programming language, Weka, or Pandas. They know how to quickly get a sense of the data they are working with, and how to extract meaningful results from that data. They may share preliminary results of their analysis with other members of their company in the form of an IPython Notebook. They are well-versed in version control tools. Level Expert Responsibilities Creating research results and proofs-of-concept of data analysis algorithms for analysing large data sets. Tracking changes to their work and results as they explore data sets. Success criteria Accepting changes from other analysts to their notebooks. Determining what has changed between different versions of the Notebook.

2.2.3 User Environment

The users will install our software on their laptops/PCs on either linux, windows, or Mac OS X environments (see SRS). In some cases, the users may work in a lab, or in an office. In general, they will have access to reliable internet.

2.2.4 Key stakeholder and user needs

Existing solutions lack the ability to display diffs and merge Python notebook properly. Since the format of the notebooks is JSON, existing merge tools are inadequate in dealing with this format as they will show massive amount of in-comprehensive text. Not all scientists would be able to read JSON and would prefer a similar user interface present in iPython notebook.

2.2. User Description 7 NBDiff Documentation, Release 1

Need Priority Concerns Current solution Proposed solution Diffing High Algorithm Manual diffing Automatic diffing Merging High Algorithm Manual merging Automatic merging

2.2.5 Alternatives and Competition

See Competitive Analysis document.

2.3 Product Overview

The IPython Notebook is a “web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document.”[1] For scientists, this means a new way of doing the work they have always done: combine observations, results, and analysis in a single document. Historically, this analogue notebook would be the basis for a scientific paper. Today, this electronic IPython Notebook is the scientific paper. Scientists are moving more of their work onto computers and applying software solutions and so they are adopting more of the tools of the software trade. One important tool they have adopted—one that software engineers have been using for decades—is version control. For scientists, this means knowing the ways in which their scientific paper has changed since their last publication; having a precise understanding of who contributed what to each paper revision; and more. It also enables scientists to share their work with other scientists in their field, improving upon peer review methods. However, sharing involves a clear and precise understanding of what their peers have changed. It gets more complex when they are changing similar parts of their scientific code simultaneously. Displaying these changes in a sensible fashion is challenging, and even modern version control tools like git are not up to the challenge. NbDiff is a diffing and merging tool for the IPython Notebook. Modern version control systems operate on lines of code—a reasonable unit to work with for most software. However, we would like version control systems to work with formats that are more complicated than text-based source code. The IPython Notebook is one example. Any Notebook might include source code, prose, LaTeX, images, raw tabular data from experiments, and more. The goal of the product is to show these differences to users in a human-readable fashion, and allow them to merge two (possibly conflicting) notebooks.

2.3.1 Product Perspective

Please see the SRS.

2.3.2 Product Position Statement

The problem of Diffing Affects User-SBP, User-SEP, User-DA The impact of which is Manually diffing two iPython notebooks A successful solution would be Automatically diffing two iPython notebooks.

8 Chapter 2. Vision Document NBDiff Documentation, Release 1

The problem of Merging Affects User-SBP, User-SEP, User-DA The impact of which is Manually merging two iPython notebooks A successful solution would be Automatically merging two iPython notebooks.

2.3.3 Summary of Capabilities

Customer benefit Supporting features Users can easily diff their iPython notebooks iPython notebook diffing Users can easily merge their iPython notebooks iPython notebook merging

2.3.4 Assumptions and Dependencies

Please see the SRS.

2.3.5 Cost and Pricing

This product will be open source and thus the product will be free.

2.4 Feature Attributes

All the features outlined below have been proposed for NbDiff 1.0.

2.5 Product Features

1. Diffing and displaying modifications made to an IPython notebook 2. Generating a diff of a notebook from the previous version, provided by version control 3. Generating a diff of a notebook from two local files, provided by the user 4. Generating a diff of a notebook stored remotely 5. Allowing for selective staging of changes made to an IPython Notebook 6. Handling merge conflict between two notebooks by the version control system 7. Allowing manual changes to be made to the notebook when resolving a merge conflict 8. Notebook file format validation 9. Generating line based diffs for text based cells 10. Generating line based diffs for code based cells 11. Supporting line based diffing of LaTeX markdown in code based cells 12. Generating cell based diffs for graphs 13. Generating cell based diffs for tables 14. Integrating of notebook diffing with BitBucket by Atlassian 15. Integrating of notebook merging with BitBucket by Atlassian

2.4. Feature Attributes 9 NBDiff Documentation, Release 1

2.6 Exemplary Use Cases

Please see the UCM.

2.7 Quality Ranges

In terms of the quality of performance, our system will be fault tolerant. As the system will primarily be generating diffs and conducting merges of iPython notebooks, it will not allow a diff to be generated or a merge to be completed if the system encounters a fault. This will ensure the integrity as well as the quality of the system.

2.8 Precedence and Priority

All of the features outlined in section 5. Product Features have been ordered by priority, from highest to lowest. Features 1 through 6 are considered critical, features 7 and 8 are considered important and the remaining features 9 through 15 are classified as useful.

2.9 Other Product Requirements

Please see the SRS.

2.9.1 Applicable Standards

N/A.

2.9.2 System Requirements

The system’s minimum requirements necessary to support the NbDiff tool are that the system must be able to open and run .ipynb through their browser. Licensing, security and installation. This product will be licensed as open source software. The installation details will be explained in the Installation Manual.

2.9.3 Performance Requirements

A performance requirement of the NbDiff tool is a response time of 2 seconds.

2.9.4 Environmental Requirements

N/A

2.10 Documentation Requirements

The following documents will be developed and provided to the user: User Manual, Installation Guide, as well as a Readme.

10 Chapter 2. Vision Document NBDiff Documentation, Release 1

2.10.1 User Manual

The purpose of the User Manual is to be a resource for the users to inform them of the functionalities of the NbDiff tool. The User Manual will explain how to use each feature by providing detailed step by step instructions.

2.10.2 Online Help

Please visit the online version of the Installation Manual, as well as the User Manual.

2.10.3 Installation Guides, Configuration and Readme Files

Please see the Installation Guide.

2.10.4 Labelling and Packaging

NbDiff will be distributed online only; we will not need any labelling or packaging.

2.11 Glossary

Please see the Glossary.

2.11. Glossary 11 NBDiff Documentation, Release 1

12 Chapter 2. Vision Document CHAPTER 3

Project Management Plan

Table of Contents • Project Management Plan – Identification * Document Overview * Abbreviations and Glossary * References * Project Management – Team and Responsibilities – Work Breakdown Structure, Tasks and Planning * Work Breakdown Structure and Task Estimation * Activity Planning * Planned Effort and Earned Value – Resource Identification – Relationships with project stakeholders * End-User Involvement – Communication * Meetings * Reviews * Training – System Requirements and Project Input Data – Configuration Management – Software Configuration Management – Documentation Configuration Management – Software Development Management – Software Development Process – Software Development Tools – Software Development Rules and Standards – Test Phases Management * Unit tests * Integration Tests * System Tests * Verification Tests * Usability Tests – Problem Resolution

13 NBDiff Documentation, Release 1

3.1 Identification

3.1.1 Document Overview

This document outlines the manner in which the project will be planned and executed. It describes the project’s development process: how and when different activities will be undertaken, when stakeholders will be consulted, how and when the software will be released, and how changes to the software will be tracked. It contains additional information on development tools that the team will use. As requirements analysis and further project planning is performed, this document will describe the work performed (and to be performed) on the project.

3.1.2 Abbreviations and Glossary

PO Product owner; main stakeholder of the project.

3.1.3 References

Docu- Document Title URL ment ID [pep8] PEP-8: Python Enhancement Proposal 8: Style http://www.python.org/dev/peps/pep-0008/ Guide for Python Code [ipy] The IPython Notebook — IPython http://ipython.org/notebook.html [tpope] A Note About Git Commit Messages http://tbaggery.com/2008/04/19/a-note-about-gi t-commit-messages.html

3.1.4 Project Management

3.2 Team and Responsibilities

Seven members make up the NBDiff team. We mutually share most analysis, development, and testing tasks. Their fictional pay rates and real responsibilities are listed in detail below. In addition to the project team, we have an external stakeholder, Dr Greg Wilson. The organization he represents is Software Carpentry (http://software-carpentry.org/), a “volunteer organization whose members teach basic software skills to researchers in science, engineering, and medicine.”

14 Chapter 3. Project Management Plan NBDiff Documentation, Release 1

Title Name Responsibilities Pay rate/hr Project Manager Tavish Project Management, Maintaining Client Relationship, Risk $40/hr (Group Leader) Arm- analysis, Software Development, Documentation, Testing strong Developer Marwa Software Development, Documentation, Testing $40/hr Malti Developer Borislav Software Development, Documentation, Testing $40/hr Pipev Developer Selena Software Development, Documentation, Testing $40/hr Sachdeva Developer Shurouq Software Development, Documentation, Testing $40/hr Abusalah Developer Richard Software Development, Documentation, Testing $40/hr Tang Developer Lina Software Development, Documentation, Testing $40/hr Nouh

Figure 3.1: Team organization

3.3 Work Breakdown Structure, Tasks and Planning

3.3.1 Work Breakdown Structure and Task Estimation

Table 3.1: WBS :header-rows:1

# Work Estimated effort (hrs) 1. PROJECT INTIALIZATION 1.1. Project Proposal 5 1.2. Design logo 3 2. PROJECT PROPOSAL 2.1. Project Management Plan 16 2.2. Vision Document 20 Continued on next page

3.3. Work Breakdown Structure, Tasks and Planning 15 NBDiff Documentation, Release 1

Table 3.1 – continued from previous page 2.2.1. Perform competitive analysis 16.5 2.2.2. Research IPython Notebook 20 2.3. Risk Management Plan 12 2.4. Activity Plan 2 3. REQUIREMENT GATHERING 3.1. Software Requirements Specification 25 3.2. Data Gathering 3.2.1. Research existing diff tools 6 3.2.2. Research version control systems 10 3.2.3. Research IPython Notebook html design 7 3.2.4. Research IPython Notebook image rendering from JSON 4 3.2.5. Research IPython Notebook js rendering of notebooks 4 3.2.6. Research IPythonNotebook use of CodeMirror 4 3.3. Interview Stakeholders 4 3.4. Research Similar Products 10 4. DESIGN 4.1. Software Architecture Design 150 4.1.1. Design adapter to support git 3 4.1.2. Design adapter to support Mercurial 3 4.1.3. Design cell based diff algorithm 6 4.1.4. Design cell based merge algorithm 6 4.1.5. Design header based diffs 3 4.1.6. Design html for diff prototype 2 4.1.7. Design html for merge prototype 2 4.1.8. Design image based diffs 4 4.1.9. Design for diff prototype 5 4.1.10. Design javascript for merge prototype 5 4.1.11. Design line based diff algorithm 10 4.1.12. Design line based merge algorithm 10 4.1.13. Design local server 4 4.1.14. Design markdown based diffs 4 4.1.15. Design remote server 10 4.1.16. Design nbdiff.org 10 4.1.17. Design selective staging 3 4.1.18. Design to support multiple notebooks 2 4.1.19. Design LaTeX diffs 5 4.1.20. Design graph diffs 5 4.1.21. Design table diffs 4 4.1.22. Design Bitbucket integration 20 4.2. User Interface Design 4.2.1. Create UI mockups 10 4.2.2. Design UI for cell based diffs 3 4.2.3. Design UI for cell based merge diffs 3 4.2.4. Design UI for line based diffs 3 4.2.5. Design UI for line based merge 3 4.2.6. Design UI for nbdiff.org 5 5. PROTOTYPE 5.1.1. Design prototype version of diff algorithm 5 5.1.2. Design prototype version of merge algorithm 5 5.1.3. Design UI for diff prototype 3 5.1.4. Design UI for merge prototype 3 5.1.5. Design tests for js for diff prototype 4 Continued on next page

16 Chapter 3. Project Management Plan NBDiff Documentation, Release 1

Table 3.1 – continued from previous page 5.1.6. Design tests for js for merge prototype 4 5.1.7. Implement protype version of diff algorithm 15 5.1.8. Implement protype version of merge algorithm 15 5.1.9. Implement html for merge prototype 2 5.1.10. Implement js for prototype 8 5.1.11. Implement tests for js for merge prototype 2 5.1.12. Implement tests for js for prototype 2 5.1.13. Perform usability testing of prototype 5 6. SOFTWARE DEVELOPMENT 6.1. Development 6.1.1. Implement nbdiff.org 20 6.1.2. Implement adapter to support git 30 6.1.3. Implement adapter to support Mercurial 30 6.1.4. Implement header based diffs 6 6.1.5. Implement html for cell based diffs 5 6.1.6. Implement html for cell based merge 5 6.1.7. Implement html for diff prototype 10 6.1.8. Implement html for line based diffs 10 6.1.9. Implement html for line based merge 10 6.1.10. Implement image based diffs 35 6.1.11. Implement js for cell based diffs 35 6.1.12. Implement js for cell based merge 40 6.1.13. Implement js for line based diffs 40 6.1.14. Implement js for line based merge 40 6.1.15. Implement local server 55 6.1.16. Implement markdown based diffs 20 6.1.17. Implement remote server 100 6.1.18. Implement selective staging 15 6.1.19. Implement to support multiple notebooks 10 6.1.20. Implement LaTeX diffs 20 6.1.21. Implement graph diffs 30 6.1.22. Implement table diffs 20 6.1.23. Implement Bitbucket integration 75 7. TESTING AND QUALITY ASSURANCE 7.1. Test Plan 7.1.1. Design tests for adapter to support git 5 7.1.2. Design tests for adapter to support Mercurial 5 7.1.3. Design tests for cell based diffs 25 7.1.4. Design tests for cell based merge 25 7.1.5. Design tests for header based diffs 4 7.1.6. Design tests for image based diffs 4 7.1.7. Design tests for js for merge prototype 5 7.1.8. Design tests for line based diffs 4 7.1.9. Design tests for line based merge 4 7.1.10. Design tests for local server 15 7.1.11. Design tests for markdown based diffs 5 7.1.12. Design tests for remote server 20 7.1.13. Design tests for selective staging 5 7.1.14. Design tests for to support multiple notebooks 5 7.1.15. Design tests for LaTeX diffs 10 7.1.16. Design tests for graph diffs 10 7.1.17. Design tests for table diffs 7 Continued on next page

3.3. Work Breakdown Structure, Tasks and Planning 17 NBDiff Documentation, Release 1

Table 3.1 – continued from previous page 7.1.18. Design tests for Bitbucket integration 20 7.2. Unit Testing 7.2.1. Implement tests for adapter to support git 12 7.2.2. Implement tests for adapter to support Mercurial 12 7.2.3. Implement tests for cell based diffs 8 7.2.4. Implement tests for cell based merge 8 7.2.5. Implement tests for header based diffs 4 7.2.6. Implement tests for image based diffs 16 7.2.7. Implement tests for markdown based diffs 4 7.2.8. Implement tests for line based diffs 10 7.2.9. Implement tests for line based merge 15 7.2.10. Implement tests for local server 20 7.2.11. Implement tests for remote server 30 7.2.12. Create notebooks for testing 15 7.2.13. Create scripts to generate diff/merge conflicts 10 7.2.14. Implement tests for selective staging 6 7.2.15. Implement tests for to support multiple notebooks 4 7.2.16. Implement tests for LaTeX diffs 15 7.2.17. Implement tests for graph diffs 15 7.2.18. Implement tests for table diffs 15 7.2.19. Implement tests for Bitbucket integration 20 7.3. User Interface Testing 7.3.1. Perform usability testing 20 8. INTEGRATION 8.1. Integration Testing 25 9. DEPLOYMENT/ROLLOUT 9.1. Define Configuration and Readme Files 4 9.2. Define Online Help 9.2.1. Documentation for nbdff-docs.readthedocs.org 25 9.3. Installation and User Guide 9.3.1. Document installation instructions 12 9.3.2. Document user guide 10 9.4. Maintain and Update Documentation 69 10. PROJECT PLANNING 10.1. Team Meetings 196 10.2. Stakeholder Meetings 98

3.3.2 Activity Planning

At the beginning of each release cycle (see “Software Development Process” below) we will work with our stakeholder to determine the features that will be developed in that cycle. They will be chosen based on stakeholder opinion, and their relative value and risk; high-risk/high-value features will be developed before low-risk/low-value features. We will incorporate feedback from each release of our software into the planning for our next release, adjusting the project requirements accordingly. The general approach to activity planning is described in the following diagram; it should not be taken as an outline of our specific project. Activity planning and development model example (image from http://upload.wikimedia.org/wikipedia/commons/0/05/Development- iterative.gif) See the Activity Plan

18 Chapter 3. Project Management Plan NBDiff Documentation, Release 1

3.3.3 Planned Effort and Earned Value

Table 3.2: WBS :header-rows:1 M1 M2 M3 M4 M5 M6 Planned effort (hrs) 196.5 378 217 358 245 215 Actual effort (hrs) 295 337 247 358 452 396 Earned value (hrs) 196.5 378 192 202 359 282 See Hour Tracking for detailed breakdown of Actual Effort

3.4 Resource Identification

No additional resources beyond the project team’s effort and the resources granted to us by the capstone course are needed.

3.5 Relationships with project stakeholders

3.5.1 End-User Involvement

As our project will be an open source project, many end-users will choose to give feedback on the GitHub issue tracker and mailing list, before and after releases. We will also solicit feedback from the IPython community while estab- lishing our requirements and throughout the development process — this will be done through the IPython mailing list.

3.4. Resource Identification 19 NBDiff Documentation, Release 1

However, not all users are connected to the online IPython community — particularly the ones that our stakeholder Greg Wilson would like to target (scientists with little skill in software engineering). We will involve these users once we have a release of the software. In particular, we will involve them in a usability test (which we will describe in our test plan document.) Greg Wilson also uses the IPython Notebook himself, so the information he provides us with will be similar to that of other end-users.

3.6 Communication

3.6.1 Meetings

• Initial PO meeting: We will meet our PO in person and discuss project requirements and goals. • Weekly PO meeting: We will discuss the project’s progress weekly with our PO in a remote meeting. We will discuss the features in progress; our progress towards the next release; and perform requirements analysis. • Post-release meeting: We will discuss a release of the software after it is published.

3.6.2 Reviews

• Code Review: Code review will be done on every pull request (i.e., code change). – At least one developer other than the author will review the code change. – The reviewer(s) will annotate the code with their comments. – The developer will revise their pull request to satisfy the reviewer. – The reviewer will merge the code change into the main repository. • Design Review: New features will be discussed in the GitHub issue tracker. Feedback will be solicited from interested stakeholders. • Release Candidates (RCs): before each release, a release candidate version will be provided to the public for review. This will provoke feedback of various kinds.

3.6.3 Training

We started training during the summer to learn both python and javascript by assigning two to three chapter from both programming languages to be read by set deadlines. We would have meetings to review the topics that had been covered in the readings and discuss if we had any difficulties. We intend to continue this training throughout the semester to ensure that we continue to learn both programming languages so that we produce high quality code.

3.7 System Requirements and Project Input Data

3.8 Configuration Management

3.9 Software Configuration Management

We will use Git for software configuration management. Each change to the software will be captured in a commit on the developer’s computer. These changes will then be uploaded to GitHub for review and merging into the master

20 Chapter 3. Project Management Plan NBDiff Documentation, Release 1 branch. Each commit contains a description of the change. We will follow the recommendations found on Tim Pope’s blog post on the subject [tpope] and enforce the rules during code review.

3.10 Documentation Configuration Management

We will use Git and GitHub (https://github.com/tarmstrong/nbdiff-docs) to track our documents as we produce and receive them. This will also track changes to the documents.

3.11 Software Development Management

3.12 Software Development Process

Our development process will be based on an iterative and incremental model. The rationale for this choice is: • We wish to release functioning subsets of the final system to stakeholders early in the project. • We wish to gather feedback from stakeholders in order to adjust our requirements and design. • We wish to improve project quality by revisiting previously released artifacts including source code and docu- mentation. • We wish to reduce project risk by implementing high-risk, high-value requirements first or based on the order our stakeholder prefers. We have split the project into six major milestones spaced 5 weeks apart. These will have equal portions of the budget allocated to them. Each milestone will consist of a (public) release of the functioning software and a release of updated documents to the course coordinator. Minor milestones will be one week before each major milestone and the output will be a release candidate of the software. Milestone Milestone Date M1 2013-10-21 M2 2013-11-25 M3 2013-12-23 M4 2014-02-03 M5 2014-03-03 M6 2014-03-31

3.13 Software Development Tools

The following is a list of the main tools we will use while developing this project. We will add tools to this document as we discover which are effective for our process. • Git: Git is a distributed version control system for source code. • GitHub: is a hosting service for Git that provides a web-based interface to various Git features, and includes issue trackers and release hosting. • Python: is the programming language that the IPython notebook is written in. In order to be compatible with the Notebook’s development process, we will also adopt Python for our tool. – Nosetests: is a unit testing tool for Python.

3.10. Documentation Configuration Management 21 NBDiff Documentation, Release 1

– PyFlakes: is a tool for automatically checking our Python code against the PEP-8 standard [pep8]. – Mock: is a library for mocking objects in unit tests for Python. • JavaScript: is the programming language supported by all major web browsers. Since our interface will likely be web-based, we will need to use this language to provide an interactive UI. – Chrome Developer Tools provide a Javascript debugger and a log. – PhantomJS provides a headless testing environment that mimics a . – Selenium will be used to test the web-based UI. – QUnit is a unit-testing framework for Javascript. – JSLint for JS quality control: http://www.jslint.com/. • Chrome: Our web-based UI will targeted towards Chrome. • Documentation: – Epydoc is a tool to automatically generate API documentation from Python source code. – Sphinx: is a widely-used documentation system for Python. This will be useful for manually written documentation (including installation instructions, tutorials, etc.) • TravisCI (https://travis-ci.org/): is a free, online continuous integration service that runs automated tests, checks code coverage, and checks code quality every time a patch is submitted to a project. This will be used to provide automatic verification of pull requests to aid reviewers. • GitHub: is a free, online service for code hosting, code review, issue/bug tracking, and release management.

3.14 Software Development Rules and Standards

For our source code (both functional code and test code), we will adhere to the following standards. Where possible, we will use a tool to automatically verify that our code adheres to the standard. We will also verify this through our code reviews. • Coding standard for Python: PEP-8 [pep8] • Enforced by PyFlakes: https://pypi.python.org/pypi/pyflakes • JavaScript JSLint coding standard • Enforced by the JSLint tool: http://www.jslint.com/ For architectural documentation, we will use the Unified Modeling Language (UML).

3.15 Test Phases Management

3.15.1 Unit tests

New patches to the system will be required to include unit tests where appropriate. Patches related to bugs will be required to include regression tests where appropriate. Our coverage goals are: • Python: statement coverage of at least 60%

22 Chapter 3. Project Management Plan NBDiff Documentation, Release 1

• JavaScript: code coverage tools for JavaScript are immature. Thus we will not track our JavaScript code coverage numerically. We will instead use our judgement when reviewing additions to the code base and request additional tests when necessary.

3.15.2 Integration Tests

To test multiple components of the software, we will use the unit testing frameworks listed above when the integra- tion is between components in a shared language. In the case of testing integration between JavaScript and Python components, we will use Selenium, a browser automation tool.

3.15.3 System Tests

Before each release of our software, we will perform manual testing of the full system on the target platforms. This will be described in our test plan document. Where possible, system tests will be scripted with Selenium to ensure reproducible results.

3.15.4 Verification Tests

A week before each release of our software, we will release a “Release Candidate” (RC) version of our release in order to solicit early feedback before publishing the final release. This will provide users a chance to test the tool in their own environments.

3.15.5 Usability Tests

We will perform usability tests according to our test plan document.

3.16 Problem Resolution

We will use GitHub’s issue tracking to handle all feature requests, change requests, inquiries, questions as well as to report bugs. Using GitHub’s tracking feature, issues will be opened when a matter is raised. GitHub allows us to create custom categories to easily classify our issues. This will allow us to filter through the different requests, inquiries and/or bugs. We will also be able to assign issues to different individuals based on who is more qualified to handle the given issue. Comments can be left on issues, allowing for discussion and problem solving among other team members, as well as status updates on the given issue. Finally, once an issue is resolved, the issue can be closed, allowing us to easily track which issues remain.

3.16. Problem Resolution 23 NBDiff Documentation, Release 1

24 Chapter 3. Project Management Plan CHAPTER 4

Activity Plan

Activity Start Date Duration End Date Specification 01-Sep 50 days 21-Oct Analysis 15-Sep 57 days 11-Nov Prototype 07-Oct 55 days 25-Nov Design 25-Nov 42 days 06-Jan Implementation 09-Dec 92 days 03-Mar Testing 24-Feb 32 days 28-Mar Note: We used iterative development for this project. The above dates and durations reflect only the general time period during which these activities primarily took place.

Figure 4.1: Gantt Chart

25 NBDiff Documentation, Release 1

26 Chapter 4. Activity Plan CHAPTER 5

Hour Tracking

Table of Contents • Hour Tracking – 1. Milestone 1 – 2. Milestone 2 – 3. Milestone 3 – 4. Milestone 4 – 5. Milestone 5 – 6. Milestone 6

5.1 1. Milestone 1

Figure 5.1: Milestone 1

5.2 2. Milestone 2

Figure 5.2: Milestone 2

27 NBDiff Documentation, Release 1

5.3 3. Milestone 3

Figure 5.3: Milestone 3

5.4 4. Milestone 4

Figure 5.4: Milestone 4

5.5 5. Milestone 5

Figure 5.5: Milestone 5

5.6 6. Milestone 6

28 Chapter 5. Hour Tracking NBDiff Documentation, Release 1

Figure 5.6: Milestone 6

5.6. 6. Milestone 6 29 NBDiff Documentation, Release 1

30 Chapter 5. Hour Tracking CHAPTER 6

Earned Value Management

Table of Contents • Earned Value Management • Data • Graphs

31 NBDiff Documentation, Release 1

32 Chapter 6. Earned Value Management CHAPTER 7

Data

33 NBDiff Documentation, Release 1

34 Chapter 7. Data CHAPTER 8

Graphs

35 NBDiff Documentation, Release 1

36 Chapter 8. Graphs CHAPTER 9

User Interface Specification

Table of Contents • User Interface Specification – List of Figures – Introduction – User Centered Design * User Characteristics * Set of Tasks Performed * Context of Use * Stakeholder Objectives – Overall User Interface Architecture * Pages/Screens * Options * Structure of the UI Rationale – Navigation * Visibility * Navigation * Navigation Rationale – Feedback * Application * Task Completion * Error Recovery – Screen Layout * Space Management * Items Location – Business Rules – Future Enhancements

9.1 List of Figures

• Figure 1 • Figure 2 • Figure 3 • Figure 4

37 NBDiff Documentation, Release 1

9.2 Introduction

This document provides a detailed description of the user interface of the NBDiff system. The user interface is designed based on the visibility, feedback, and affordance in order to accomplish a higher accessibility and achieve the required tasks in an easier and faster way. It also describes the general background of the users going to use this system. This includes scientists who use the IPython Notebook in order to produce reproducible research. Also, this document includes user interface architecture, navigation, feedback, and screen layout.

9.3 User Centered Design

9.3.1 User Characteristics

The target audience of our stakeholder’s education program can be found here: http://software- carpentry.org/audience.html We hope to cater NBDIff to users of the following types (most important in bold): academic researcher (beginner programmer), academic researcher (expert programmer), peer reviewer, study reproducer, undergraduate student, data analyst, blogger. The most important user types are documented in detail below: Representative Scientist / Beginner Programmer (User-SBP) Description This user is typically a graduate-level student in a university who performs research in the sciences, engineering, or medicine fields. While their research relies on performing computations on data they have collected through experiments, or writing simulations of theoretical phenomena, they have received no formal training in programming. This user is the most important user to our primary stakeholder, as our stakeholder trains these scientists to become better programmers. Level Beginner Responsibilities Analyze data by creating models, calculating statistical results, or creating simulations. Producing plots and tables of data for exploratory purposes or for inclusion in published research. Collaborating on IPython Notebooks with other researchers. Success criteria Accepting changes from other researchers to their Notebooks and successfully produc- ing correct, merged Notebooks.

Representative Data Analyst (User-DA) Description This user is typically a well-educated statistician or computer scientist employed by a for- profit company interested in analyzing a data set for strategic gain. They are well-versed in data analysis toolkits like the R programming language, Weka, or Pandas. They know how to quickly get a sense of the data they are working with, and how to extract meaningful results from that data. They may share preliminary results of their analysis with other members of their company in the form of an IPython Notebook. They are well-versed in version control tools. Level Expert Responsibilities Creating research results and proofs-of-concept of data analysis algorithms for analysing large data sets. Tracking changes to their work and results as they explore data sets. Success criteria Accepting changes from other analysts to their notebooks. Determining what has changed between different versions of the Notebook.

38 Chapter 9. User Interface Specification NBDiff Documentation, Release 1

9.3.2 Set of Tasks Performed

1. Merge task • For the beginner scientist programmer: when they create some model and calculate their statistical result or create simulations, they use the data result by producing plots and tables for exploratory purposes or for inclusion in published re- search. This system allows them to accept changes from other researchers to their notebooks and successfully produce correct merged Notebooks. • For the expert scientist programmer: when they want to produce reproducible research in the form of IPython Notebooks that other researchers can run, they will publish the Notebooks alongside a paper, or in some cases submit the Notebooks as the papers itself. In addition to authoring the Notebooks themselves, they also collaborate with other researchers on them. They need to be able to effectively perform code reviews on the changes before they merged into the master Notebook. This system allows accepting changes from other researchers to their Notebooks and successfully producing correct, merged Notebooks. It also allows reviewing changes to Notebooks, finding bugs or code smells in new/modified code, and understanding the changes between Notebooks to determine when scientific results change for providence. 2. Diff task • For the expert scientist programmer: when they want to create research results and proofs of concept of data analysis algorithms for analysing large data sets, and when they want to track changes to their work and results as they explore data sets; this system allows accepting changes from other analysts to their Notebooks, and determine what has changed between different versions of the Notebooks. • For the beginner scientist programmer: when they create some model and calculate their statistical result or create simulations, they use the data result by producing plots and tables for exploratory purposes or for inclusion in published re- search. This system allows accepting changes from other analysts to their Notebooks, and determining what has changed between different versions of the Notebooks.

9.3.3 Context of Use

The target users of NBDiff are scientists who are, or will be, using the IPython Notebook to perform reproducible computational research in science, technology, engineering and medicine (STEM). Our target users are scientists with a wide range of computational ability, including beginner and expert programmers. The NBDiff project seeks to make version control of Notebooks easier; including collaboration with multiple scientists (for instance, in a lab). Most of our users will be graduate students, but some will be faculty members of universities, and others will be researchers working in publicly- or privately-funded labs. In most cases, the users will have at least a bachelor’s degree in a technical field, though some universities are using Notebooks as part of their undergraduate curriculum.

9.3.4 Stakeholder Objectives

Existing solutions lack the ability to display diffs and merge Python notebook properly. Since the format of the notebooks is JSON, existing merge tools are inadequate in dealing with this format as they will show massive amount of incomprehensive text. Not all scientists would be able to read JSON and would prefer a similar user interface present in IPython notebook. Need Priority Concerns Current Solution Proposed Solution Diffing High Algorithm Manual Diffing Automatic Diffing Merging High Algorithm Manual merging Automatic merging

9.3. User Centered Design 39 NBDiff Documentation, Release 1

9.4 Overall User Interface Architecture

9.4.1 Pages/Screens

The system contains two main pages which are the diff page and the merge page. The diff page displays the two IPython Notebooks that are being diffed on both sides of the page. Each cell of left Notebook is compared with the corresponding cell of the right Notebook. As a result of this process the two Notebook files will include some colors to notify the user of all differences between these two Notebook’s cells such as having red color when something has been deleted and green color when something has been added. The merge page shows three columns where the notebooks are on either side of the page and the middle column shows which version of the elements in the notebooks will be added to the final notebook. The Notebooks still have the elements in the cells highlighted in different colors just as in the diff page.

9.4.2 Options

In the diff page, a check mark can be enabled or disabled in between each elements being compared in the notebooks. When a check mark is enabled, the element next to it on the diffing page will be staged. As for the merge page, in order to pick which version will be part of the middle column, arrows are placed in between the columns, which makes it easier for the user to pick which element he wishes to keep in the merged notebook.

9.4.3 Structure of the UI Rationale

The approach that was chosen to make diffing and merging easier is to compare each cell in the different notebooks and place them side by side in different columns. This approach enables the users to notice the differences between the cells faster and therefore make a choice between which cell he/she wishes to keep easily. The structure of the UI is designed in a way that helps the user be more efficient in completing his or her task.

9.5 Navigation

9.5.1 Visibility

In our system the User interface is simple enough to be visible. The user first will easily open one of the two pages and upload two Notebook files in the appropriate place for them and start either the diffing process or the merging process. The visibility of the result in both processes is clear enough by having the coloring system described before that makes it easier for the user to understand what happened in each cell compared to the one horizontally parallel to it.

9.5.2 Navigation

The navigation between the two main pages (diff page and the merge page) is not implemented and designed yet. By the time this system will be done, the navigation between different screens will be designed by having a button in each page. In the diff page, the button will navigate to the merge page, and in the merge page, the button will navigate to the diff page.

9.5.3 Navigation Rationale

N/A

40 Chapter 9. User Interface Specification NBDiff Documentation, Release 1

9.6 Feedback

9.6.1 Application

No user has yet given feedback for the final application’s user experience. Refer to document usability testing, we did usability testing for our User Interface and we got some feedback that we took into account.

9.6.2 Task Completion

Diff Tool After the user selects both Notebook files and starts the diffing process, he/she will be notified that the task was completed by seeing the colors added to both files. Red color to the removed cells or subcells and green to the added cells or subcells. Merge Tool After the user selects both files and starts the merging process, he/she will be notified of the completed task by seeing the middle bar filled in and being able to press on the arrow buttons to select left or right cells. In addition, the user will be able to save the resulting merged file that appears in the middle bar. Also, colors will be added in the original Notebook files placed in the left and right bars. Red color will be added to the removed cells or subcells and green to the added cells or subcells.

9.6.3 Error Recovery

If an error ever occurs during a diffing or a merging process, the error will be displayed to the user along with how to resolve the error. Error messages will be displayed on top of the current page and the user will have to click on a button to remove the message.

9.7 Screen Layout

9.7.1 Space Management

In our system we don’t have much empty space in both the merge and the diff pages. At the top, we made use of the empty spacing by adding the NBDiff logo as a header that moves with the page up and down. Also, the question mark button is place in the empty space at the top right of the page which moves in the same way as the logo. In addition to that, we added a save button in the merge page to make it possible for the user to save the result of merging the two Notebook files and it moves in the same way the logo and the question mark move.

9.7.2 Items Location

Bars In the diff tool, we have two vertical bars that divide the page into two equivalent columns each containing a Notebook file. Diff Bars In the merge tool, we have three vertical bars. The left and the right ones will contain the two Notebook files. The middle one will contain the merged result of the two existing Notebook files. Merge Bars

9.6. Feedback 41 NBDiff Documentation, Release 1

Figure 9.1: Figure 1

42 Chapter 9. User Interface Specification NBDiff Documentation, Release 1

Figure 9.2: Figure 2

Arrow buttons In order to have an organized, visible merge page, we designed the user interface to have arrow buttons placed between the left bar and the middle bar in addition to placing them between the right bar and the middle bar. These arrow buttons are located there to allow the user to choose the required cell and place it in the middle bar (merged result bar). NBDiff Logo The system logo is placed on the top left of the page. Also, it is visible all the time even if the user scrolls up or down.

Figure 9.3: Figure 3

NBDiff Logo Question mark button This button is used as a help menu for the user. Whenever the user clicks on it, some notes that help in accomplishing the current task will be displayed. We placed it on the top to make it as visible as possible all the time to the user if he/she needed any help. Save button in merge This button is accessible only after generating a merged file in the merge tool. This button is placed at the top left of the page close to the logo and grouped with other important buttons. It is located there to have a higher visibility to the user to see it. Next/Previous Notebook button

9.7. Screen Layout 43 NBDiff Documentation, Release 1

This button is to the right of the Save button. The page will change to the next or previous unmerged notebook if there are such Notebooks. Finish button This button is also part of the menu bar and will shut down the local server when pressed. The menu bar buttons are grouped together because they represent important actions the user can do and have very high visibility.

Figure 9.4: Figure 4

Menu Bar

9.8 Business Rules

N/A

9.9 Future Enhancements

N/A

44 Chapter 9. User Interface Specification CHAPTER 10

Software Requirements Specification

45 NBDiff Documentation, Release 1

Table of Contents • Software Requirements Specification – List of Figures – Introduction * Purpose * Scope * Definitions, Acronyms and Abbreviations * References – Overall Description * Product Perspective * Product Functions * User Characteristics * Constraints * Assumptions and Dependencies – Specific Requirements * External Interfaces * Functional Requirements · Actor Goal List · Use Case View – Non-Functional Requirements * Reliability * Usability * Efficiency * Maintainability * Portability – Design Constraints – Documentation Requirements * Offline Help * Online Help * User’s Installation Guide and Developer’s Guide – Purchased Components – Licensing Requirements – Legal, Copyright and Other Notices – Analysis Models * System Sequence Diagrams: · SSD-DUC1 – Diff notebooks locally from VCS. · SSD · CONTRACTS · SSD-DUC2 – Diff notebooks locally from filesystem · SSD · CONTRACTS · SSD-DUC3 – Diff remote notebooks · SSD · CONTRACTS · SSD-MUC1 – Locally Resolve a Merge Conflict from VCS · SSD · CONTRACTS · SSD-MUC2 – Locally Merge Notebooks from Filesystem · SSD · CONTRACTS * Activity Diagrams: * Domain Model: – Information on Version Control Systems

46 Chapter 10. Software Requirements Specification NBDiff Documentation, Release 1

10.1 List of Figures

• Figure 1 • Figure 2 • Figure 3 • Figure 4 • Figure 5 • Figure 6 • Figure 7 • Figure 8 • Figure 9 • Figure 10 • Figure 11 • Figure 12 • Figure 13

10.2 Introduction

The following Software Requirement Specification document defines all of the functional, non-functional and special requirements for the NBDiff project. Additional requirements can be found in the Use Case Model (UCM). Rationale and background on the requirements specified in this document are included in the Vision Document. In addition to textual specifications of requirements, this document includes or references visually represented domain models (class and sequence).

10.2.1 Purpose

The purpose of this document is to clearly identify and define all requirements provided by the stakeholder. This Software Requirement Specification document will serve as a reference for the project team when architecting and constructing the software. Project stakeholders may review the document to provide clarifications on the requirements.

10.2.2 Scope

This document describes all the functions that NBDiff software will include, as well as defines specifications as to how the software will perform them.

10.2.3 Definitions, Acronyms and Abbreviations

The definitions of all terms, acronyms, and abbreviations required to properly interpret the Software Requirement Specifications have been provided by reference in the Glossary.

10.1. List of Figures 47 NBDiff Documentation, Release 1

10.2.4 References

• Bradner, S., Network Working Group, Internet Engineering Task Force. “RFC-2119: Key words for use in RFCs to Indicate Requirement Levels.” https://www.ietf.org/rfc/rfc2119.txt

10.3 Overall Description

The following section provides an overall background as to the requirements. Both the product perspective and the product functions are explained. Any constrains, assumptions and dependencies are also defined.

10.3.1 Product Perspective

NBDiff will serve as a tool for users who work with IPython Notebooks. The IPython Notebook software currently allows users to create their own computational lab notebooks. The structure of these notebooks provides an interactive component that allows others to reproduce the notebook’s computations, and even continue or add to their work. However, currently there is a limitation to this collaboration. The IPython Notebook file format is pretty-printed JSON data. While it is possible for experts to understand the file format itself and diffs performed on this type of file, the results are often noisy (unimportant changes are highlighted, e.g., base64-encoded images stored in thousands of lines of JSON) and confusing. Furthermore, combining changes made by multiple authors is difficult because it requires manually editing the note- book JSON data with a text editor. While possible, many users find this tedious and error-prone. NBDiff provides a means of viewing changes made to IPython Notebook files and resolving merge conflicts – a necessary task for working asynchronously with other authors.

10.3.2 Product Functions

FR-1. Diff and display modifications made in an IPython notebook This functionality allows the user to see the differences between two versions of the same IPython notebook. A comparison will be made between the versions of the notebook, allowing the user to view the added, removed or modified elements. FR-2. Generate a diff of a notebook from the previous version, provided by version control This functionality will use the version control system to acquire the current head of the notebook and will generate a diff based on the comparison of the elements in both versions. This functionality is described in DUC1. FR-3. Generate a diff of a notebook from two local files, provided by the user This functionality allows the user to provide NBDiff with two locally stored IPYNB files, from which a diff will be generated. This functionality is described in DUC2. FR-4. Generate a diff of a notebook stored remotely This functionality will generate a diff of notebooks that have been stored remotely and have been provided using remote urls. This functionality is described in DUC3. FR-5. Allow for selective staging of changes made to an IPython Notebook (Scoped Out) This functionality will use the diff provided by the comparison of a notebook to the current head (FR2) and allow the user to selectively stage certain modifications to be committed using the version control system. FR-6. Handle merge conflict between two notebooks by the version control system

48 Chapter 10. Software Requirements Specification NBDiff Documentation, Release 1

When the version control system encounters a merge conflict between the notebook version the user is trying to commit and the remote head, this functionality allows the users to see the differences between the two notebooks. The user is able to resolve this merge conflict by using the visual representation to select which modifications to accept or discard. The system will handle this conflict by creating a merged notebook for the version control system to commit. This functionality is described in MUC1 and MUC4. FR-7. Allow manual changes to be made in the notebook when resolving a merge conflict This functionality allows the user to modify the merged notebook in-line and make manual changes to the notebook before NBDiff creates a merged notebook for the version control system to commit. FR-8. Validate notebook file format This functionality verifies that the input notebooks are of the correct file format (.ipynb). FR-9. Generate line based diffs for text based cells This functionality generates a diff of a notebook on a line based level for text based cells, indicating to the user not only when a cell has been modified, but also which lines were added or deleted. FR-10. Generate line based diffs for code based cells This functionality generates a diff of a notebook on a line based level for code based cells, indicating to the user not only when a cell has been modified, but also which lines were added or deleted. FR-11. Support line based diffing of LaTeX markdown in code based cells (Scoped Out) This functionality will generate a diff of any LaTex markdown used in a code based cell. FR-12. Generate cell based diffs for graphs (Scoped Out) This functionality will display to the user a diff of a graph based cell. FR-13. Generate cell based diffs for tables (Scoped Out) This functionality generates a diff any tabular data included in a notebook. FR-14. Generate word based diffs for headers This functionality generates a diff of notebook headers on a word based level, indicating to the user not only when a header has been modified, but also which words were added or deleted. FR-15. Integration of notebook diffing with BitBucket by Atlassian (Scoped Out) This functionality would allow notebook diffing to be integrated into their free code hosting website. FR-16. Integration of nbdiff with existing version control systems. Existing merge tools implement a standard command-line interface that can be invoked by a version control system upon a merge conflict. NBDiff should implement this standard interface. FR-17. Handle merge conflict between two local notebooks, provided by the user This functionality allows the user to merge two different versions of a notebook saved in the user’s file system. The user is able to see the differences between the two versions of the notebook. The user is able to merge the two versions by using the visual representation to select which modifications to accept or discard in the resulting notebook. This functionality is described in MUC2. FR-18. Handle merge conflict between two notebooks stored remotely This functionality allows the user to merge two notebooks that have been stored remotely and have been provided using remote urls. The user is able to see the differences between the two versions of the notebook. The user is able to merge the two versions by using the visual representation to select which modifications to accept or discard in the resulting notebook. This functionality is described in MUC3.

10.3. Overall Description 49 NBDiff Documentation, Release 1

10.3.3 User Characteristics

See Vision Document.

10.3.4 Constraints

The system must use the IPython Notebook file format (.ipynb) both as input and output file format. The system must work with the version control system to acquire the remote head when diffing and solving merge conflicts for notebooks managed by a version control system. The NBDiff source code must not contain any proprietary components that the team cannot release under our open source license (see Licensing Requirements). Future maintainers of the NBDiff project must be able to maintain NBDiff without proprietary tools.

10.3.5 Assumptions and Dependencies

Assumptions: • The IPython Notebook’s file format is plain text and is not understandable to users Software Dependencies: • Python 2.7 • Windows 7, Windows 8, Mac OS X, or Linux • Mozilla (most recent)

10.4 Specific Requirements

10.4.1 External Interfaces

Inputs: • valid/invalid .ipynb file(s) created by the IPython Notebook Outputs: • valid .ipynb file(s) created by the IPython Notebook • Commands to a version control system (VCS), viz.: – git – mercurial – subversion

10.4.2 Functional Requirements

In this system we have functional requirements that capture the intended behavior of our system and have been speci- fied in section 2.2 Product functions.

50 Chapter 10. Software Requirements Specification NBDiff Documentation, Release 1

Actor Goal List

Ac- Goal tor User The main goal for the user to use this system is to be able to understand differences between two IPython notebook files, as well as be able to resolve any conflict between these changes. See Vision Document for more information.

Use Case View

The use case view has been provided in the Use Case Model.

10.5 Non-Functional Requirements

10.5.1 Reliability

• NFR-Rel-1 The system shall detect corrupt IPython Notebook files and display a message to the user indicating that the file is corrupt.

10.5.2 Usability

• NFR-Use-1 The system should show differences and conflicts in a fashion similar to the Notebook itself, so that the user does not need to understand the underlying Notebook file format to perform merges. • NFR-Use-2 The system should resemble existing merging tools so that users familiar with those tools will know how to use the system. • NFR-Use-3 The system should allow the user to undo merges that the user did not intend to perform

10.5.3 Efficiency

• NFR-Eff-1 A diff of two IPython Notebooks of 100 cells each should take no longer than 3 seconds. • NFR-Eff-2 When requesting notebooks from remote servers (e.g., DUC3, MUC3), multiple actors should cause no more than one request to the same notebook to be made at the same time. That is, multiple parallel requests to the same notebook file should not be made. An open request to a given URI should be treated as a shared resource among multiple threads (i.e., concurrency).

10.5.4 Maintainability

• NFR-Maint-1 The system’s should be documented so that future developers of the IPython notebook may be able to improve the system. • NFR-Maint-2 NBDiff should include developer’s documentation so that future maintainers know how to submit patches and run tests against the code base. Quality guidelines for new contributors should be publicly available. • NFR-Maint-3 Changes to the IPython Notebook file format should be anticipated by the design. • NFR-Maint-4 NBDiff should contain a plugin system so that other developers can add diffing features to NBDiff for specific kinds of IPython cells.

10.5. Non-Functional Requirements 51 NBDiff Documentation, Release 1

• NFR-Maint-5 NBDiff source code should adhere to PEP-8 (Python) and JSHint (Javascript). JSHint options must be documented/configured in the source code repository.

10.5.5 Portability

• NFR-Port-1 The system shall run on Windows 7 and 8, Ubuntu Linux 13.04, and Mac OS X.

10.6 Design Constraints

• The system shall contain only components that can be distributed by the team under the MIT license. (See Licensing Requirements). • The system shall use the IPython Notebook file format (.ipynb) for input and output. • The system should render the IPython Notebook elements similarly to how the IPython Notebook renders them.

10.7 Documentation Requirements

10.7.1 Offline Help

Describe the purpose and contents of the user manual. For users of unix-based systems (i.e., Linux and Mac OS X) we will provide help messages for use in a terminal-based environment. The help messages will describe how to run the tool and all options available on the command-line.

10.7.2 Online Help

List requirements for on-line help We will provide a user’s manual online. The online user manual will: • describe how to install the tool, • describe the high-level concepts necessary to understand the tool (diff, merge, merge conflict), • describe the actions that the user needs to perform to follow the use cases as described in the use case model.

10.7.3 User’s Installation Guide and Developer’s Guide

A user’s installation guide (i.e., a README file) on our software’s Github page. The User’s installation guide will describe how to install and configure NBDiff. The installation guide will describe how to install NBDiff on all supported platforms, including instructions for acquiring dependencies. In addition to the user’s installation guide, we will provide a developer’s guide for use by future maintainers of NBDiff. In the developer’s guide, we will provide instructions for running and building NBDiff from source; running automated test suites; and guidelines for contributing changes to the project.

10.8 Purchased Components

N/A

52 Chapter 10. Software Requirements Specification NBDiff Documentation, Release 1

10.9 Licensing Requirements

• The system shall be released under the MIT License or equivalent as defined by the Open Source Initiative. See http://opensource.org/licenses/MIT

10.10 Legal, Copyright and Other Notices

No special legal requirements exist. Copyright for source code and documentation must be retained by the members of the team (this is granted by the university’s Intellectual Property Policy) in order for the code to be distributed under the MIT license.

10.11 Analysis Models

10.11.1 System Sequence Diagrams:

SSD-DUC1 – Diff notebooks locally from VCS.

SSD

SSD-DUC1 – Diff notebooks locally from VCS

CONTRACTS

Table 10.1: CON-BEGIN_DIFF_SESSION: begin_diff_session() Contract begin_diff_session() Pre-conditions 1. The user is in a repository. 2. There is at least one notebook with uncommitted changes.

Post-conditions 1. An instance of DiffSession was created.

Table 10.2: CON-SET_MODE: set_mode(mode) Contract set_mode(mode) Description Indicates source of notebook files. Pre-conditions 1. A DiffSession or MergeSession (session for the purposes of this contract) is under way. 2. mode is either VCS or Filesystem.

Post-conditions 1. session.mode was set to mode. 2. If session.mode = VCS then the current repository was associated with the session.

10.9. Licensing Requirements 53 NBDiff Documentation, Release 1

Figure 10.1: Figure 1

54 Chapter 10. Software Requirements Specification NBDiff Documentation, Release 1

Table 10.3: CON-GENERATE_DIFF: generate_diff() Contract generate_diff() Pre-conditions 1. A DiffSession is under way. 2. There is at least one notebook in the current repos- itory with uncommitted changes. 3. The notebooks with uncommitted changes are valid notebook files.

Post-conditions 1. A diff was presented to the user for each notebook with uncommitted changes. 2. Invalid notebooks were indicated to the user.

SSD-DUC2 – Diff notebooks locally from filesystem

SSD

SSD-DUC2 – Diff notebooks locally from filesystem

CONTRACTS

See CON-BEGIN_DIFF_SESSION and CON-SET_MODE.

Table 10.4: CON-SET_BEFORE_FILE: set_before_file(file) Contract set_before_file(file) Pre-conditions 1. A DiffSession session is under way. 2. session.mode is set to filesystem. 3. file is a path to a valid IPython Notebook file. 4. The user has permission to read file.

Post-conditions 1. file was associated with session as the before-file.

Table 10.5: CON-SET_AFTER_FILE: set_after_file(file) Contract set_after_file(file) Pre-conditions 1. A DiffSession session is under way. 2. session.mode is set to filesystem. 3. file is a path to a valid IPython Notebook file. 4. The user has permission to read file.

Post-conditions 1. file was associated with session as the after- file.

10.11. Analysis Models 55 NBDiff Documentation, Release 1

Figure 10.2: Figure 2

56 Chapter 10. Software Requirements Specification NBDiff Documentation, Release 1

Table 10.6: CON-GENERATE_DIFF: generate_diff() Contract generate_diff() Pre-conditions 1. A DiffSession session is under way. 2. session.mode is set to filesystem. 3. session.before-file and session.after-file are set.

Post-conditions 1. The changes applied to before-file to create after-file are shown to the user.

SSD-DUC3 – Diff remote notebooks

SSD

SSD-DUC3 – Diff remote notebooks.

CONTRACTS

See CON-BEGIN_DIFF_SESSION, CON-SET_MODE, and CON-GENERATE_DIFF.

Table 10.7: CON-SET_URIs: set_URIs(uri1, uri2) Contract set_URIs(uri1, uri2) Pre-conditions 1. A DiffSession session is under way. 2. session.mode is set to URI.

Post-conditions 1. session.before-file and session.after-file are set to the files reached by uri1 and uri2 respectively.

SSD-MUC1 – Locally Resolve a Merge Conflict from VCS

SSD

SSD-MUC1 – Locally Resolve a Merge Conflict from VCS.

CONTRACTS

See CON-SET_MODE.

10.11. Analysis Models 57 NBDiff Documentation, Release 1

Figure 10.3: Figure 3

58 Chapter 10. Software Requirements Specification NBDiff Documentation, Release 1

10.11. Analysis Models 59

Figure 10.4: Figure 4 NBDiff Documentation, Release 1

Table 10.8: CON-BEGIN_MERGE_SESSION: begin_merge_session() Contract begin_merge_session() Pre-conditions 1. The user is in a repository. 2. The repository is in a merge conflict state.

Post-conditions 1. An instance of MergeSession was created.

Table 10.9: CON-VIEW_MERGE_CONFLICT: view_merge_conflict() Contract view_merge_conflict() Pre-conditions 1. The user is in a repository. 2. The repository is in a merge conflict state.

Post-conditions 1. An instance of MergeConflict was created and displayed.

Table 10.10: CON-GET_LOCAL: get_local() Contract get_local() Pre-conditions 1. A MergeSession session is under way. 2. session.mode is set to VCS. 3. localfile is the local IPython Notebook file. 4. The user has permission to read localfile.

Post-conditions 1. localfile was associated with session as the local-file.

Table 10.11: CON-GET_BASE: get_base() Contract get_base() Pre-conditions 1. A MergeSession session is under way. 2. session.mode is set to VCS. 3. basefile is the base IPython Notebook file. 4. The user has permission to read basefile.

Post-conditions 1. basefile was associated with session as the base-file.

60 Chapter 10. Software Requirements Specification NBDiff Documentation, Release 1

Table 10.12: CON-GET_REMOTE: get_remote() Contract get_remote() Pre-conditions 1. A MergeSession session is under way. 2. session.mode is set to VCS. 3. remotefile is the remote IPython Notebook file. 4. The user has permission to read remotefile.

Post-conditions 1. remotefile was associated with session as the remote-file.

SSD-MUC2 – Locally Merge Notebooks from Filesystem

SSD

SSD-MUC2 – Locally Merge Notebooks from Filesystem

CONTRACTS

See CON-BEGIN_MERGE_SESSION, CON-SET_MODE, CON-VIEW_MERGE_CONFLICT, CON-GET_LOCAL, CON-GET_BASE, and CON-GET_REMOTE.

10.11.2 Activity Diagrams:

Activity Diagram – Generate diff from a notebook in Version Control Activity Diagram – Generate diff from local notebooks Activity Diagram – Generate diff from notebooks located on a remote server Activity Diagram – Perform selective staging Activity Diagram – Merge Notebooks

10.11.3 Domain Model:

Domain Model

10.12 Information on Version Control Systems

The following state machines help explain the function of the version control systems that NBDiff interfaces with. They focus on Git, our main target, but the concepts are similar for Mercurial and Subversion. Repository State Diagram File State Diagram

10.12. Information on Version Control Systems 61 NBDiff Documentation, Release 1

62 Chapter 10. Software Requirements Specification Figure 10.5: Figure 5 NBDiff Documentation, Release 1

Figure 10.6: Figure 6

10.12. Information on Version Control Systems 63 NBDiff Documentation, Release 1

Figure 10.7: Figure 7

64 Chapter 10. Software Requirements Specification NBDiff Documentation, Release 1

Figure 10.8: Figure 8

10.12. Information on Version Control Systems 65 NBDiff Documentation, Release 1

Figure 10.9: Figure 9

66 Chapter 10. Software Requirements Specification NBDiff Documentation, Release 1

Figure 10.10: Figure 10

10.12. Information on Version Control Systems 67 NBDiff Documentation, Release 1

Figure 10.11: Figure 11

68 Chapter 10. Software Requirements Specification NBDiff Documentation, Release 1

Figure 10.12: Figure 12

Figure 10.13: Figure 13

10.12. Information on Version Control Systems 69 NBDiff Documentation, Release 1

70 Chapter 10. Software Requirements Specification 71 NBDiff Documentation, Release 1

CHAPTER 11

Diffing-Related Use Cases

11.1 ID: DUC1 - Diff Notebooks Locally from Version Control

Table 11.1: DUC1 Priority: Essential Level: User Goal Description: The system indicates which notebook files in the current repository were modified, and which parts of the files were modified. Primary Actor: User Secondary Actor: VCS Stakeholders: User Interests: • The user wishes to see which parts of their note- book files have been modified before making a commit in order to avoid committing irrelevant or incorrect changes. • The user wishes to see which parts of their note- book files have been modified by another author, in order to perform reviews of code or other con- tent. • The user wishes to perform the diff without access to the internet.

Preconditions: • The user’s current directory is tracked by the ver- sion control system. • One or more IPython Notebook files are tracked by the current directory’s repository. • One or more of the IPython Notebook files tracked by the current repository have uncommit- ted changes. • The current user has permissions to read the IPython Notebook files tracked by the current repository. • The IPython Notebooks in the repository are valid IPython Notebook files.

Trigger: The user requests a diff of all currently modified note- book files in the current repository. Success End Condition: The system presents the changes made to the IPython 72Notebook filesChapter to the 11. user Diffing-Related Use Cases Failure End Condition: The system has not generated a diff. An error message was displayed. Minimal Guarantee • The IPython Notebook files have not been modi- fied. • The system has exited without a stack trace or other unintended error message.

Main Success Scenario 1. The user requests a diff of all currently modified notebooks in the current repository. 2. The system requests a list of currently modified files from the version control system. 3. The system requests HEAD and working- directory versions of each modified file that is a valid Notebook file. 4. The system detects changes between HEAD and working-directory versions of each modified file and presents these changes to the user. 5. The user indicates they have understood the changes. 6. The system shuts down.

Extensions • 2a. The user is not currently in a version control repository. – The system exits and notifies the user that they are not in a repository. • 2b. There are no modified files in the repository. – The system exits and notifies the user that there are no modified files. • 3a. None of the modified files are IPython Note- books. – The system exits and notifies the user that there are no modified notebooks to diff. • 3b. One or more notebooks has a version (HEAD or working-directory) that is an invalid notebook file. – The system notifies the user of which note- books had an invalid version. – If there are no modified notebooks for which both HEAD and working-directory versions are valid, the system exits.

Special Requirements Refer to Supplementary Requirements Specification document. Related Use Cases N/A NBDiff Documentation, Release 1

11.1. ID: DUC1 - Diff Notebooks Locally from Version Control 73 NBDiff Documentation, Release 1

11.2 ID: DUC2 - Diff Notebooks Locally from Files

Table 11.2: DUC2 Priority: Essential Level: User Goal Description: The system indicates which parts of two separate note- book files (_before-file_ and _after-file_) are different. Primary Actor: User Secondary Actor: N/A Stakeholders: User Interests: • Outside of the of version control, the user wishes to see the difference between two note- book files. • The user wishes to see which parts of their note- book files have been modified by another author, in order to perform reviews of code or other con- tent. • The user wishes to perform the diff without access to the internet.

Preconditions: • The current user has permissions to read the two IPython Notebook files on the user’s filesystem. • The two notebooks are valid IPython Notebook files.

Trigger: The user requests a diff of the IPython Notebook files. Success End Condition: The system presents the set of changes that would have been necessary to transform the before-file into the after- file. Failure End Condition: The system has not generated a diff. An error message was displayed. Minimal Guarantee • The IPython Notebook files have not been modi- fied. • The system has exited without a stack trace or other unintended error message.

Main Success Scenario 1. The user requests a diff of two IPython Notebook files. 2. The system requests the paths to the before-file and the after-file. 3. The system detects changes between before-file and after-file versions of each modified file and presents these changes to the user. 4. The user indicates they have understood the changes. 5. The system shuts down.

Extensions • 3a. One or more of the specified files is not a valid IPython Notebook file. – The system indicates which file(s) were in- valid. – The system exits. 74 Chapter 11. Diffing-Related Use Cases Special Requirements Refer to Supplementary Requirements Specification document. Related Use Cases N/A NBDiff Documentation, Release 1

11.2. ID: DUC2 - Diff Notebooks Locally from Files 75 NBDiff Documentation, Release 1

11.3 ID: DUC3 - Diff from Notebooks on A Remote Server

Table 11.3: DUC3 Priority: Essential Level: User Goal Description: The system indicates which parts of two separate note- book files, located on one or two HTTP servers, are dif- ferent. Primary Actor: User Secondary Actor: • 1-2 Remote HTTP Servers (referred to as the RHS in this UC) • Remote user

Stakeholders: User Interests: • The user wishes to diff two notebooks without in- stalling nbdiff on their own computer. • The user wishes to share a diff with other users via the internet for discussion. • The remote user wishes to see the same diff and discuss the change with the primary actor.

Preconditions: • The user is connected to the Internet and is run- ning a web browser. • NBDiff is deployed on an Internet-accessible server that the user’s computer can access. • The RHSs are connected to the Internet and are accessible by the NBDiff server. • The RHSs contain valid notebook files at URIs known to the user.

Trigger: • The user accesses the NBDiff server. • The user requests a diff of the IPython Note- book files at two URIs; the first URI represents the before-file and the second URI represents the after-file.

Success End Condition: The system presents a remotely-accessible view of the set of changes that would have been necessary to trans- form the before-file into the after-file. Failure End Condition: The system has not generated a diff. An error message was displayed. Minimal Guarantee • The NBDiff remote server has not shut down. • The status of the operation was indicated to the user.

Main Success Scenario 1. The user accesses the NBDiff remote server. 2. The user requests a diff of two IPython Notebook files. 3. The system requests that the user to specify the URIs to the before-file and the after-file. 4. The system requests the files from the specified 76URIs Chapter from the 11. RHSs. Diffing-Related Use Cases 5. The system detects changes between before-file and after-file versions of each modified file and presents these changes to the user at a publicly- accessible URI. 6. The user views the URI and the system presents the user with the changes. 7. (Optional) The remote user also views the URI and the system presents the remote user with the changes.

Extensions • 3a. One or more of the specified files is not a valid IPython Notebook file. – The system indicates which file(s) were in- valid and presents the user the opportunity to begin again.

Special Requirements Refer to Supplementary Requirements Specification document. Related Use Cases N/A 77 NBDiff Documentation, Release 1

CHAPTER 12

Merging-Related Use Cases

12.1 ID: MUC1 - Locally Resolve Merge Conflicts in Notebooks in Ver- sion Control

Table 12.1: MUC1 Priority: Medium Level: User Goal Description: In the process of merging a remote branch into the user’s local branch, the user uses the system to resolve merge conflicts in one or more IPython Notebook files. Primary Actor: User Secondary Actor: Version Control System (VCS) Stakeholders: User Interests: • The user wishes to see what changes were ap- plied to the base notebook on the local and remote branches. • The user wishes to produce a valid notebook from the disparate versions on the local and remote branches by choosing which changes they would like to keep; the current status quo is to resolve the merge conflict “by hand” by editing the JSON files using a text editor. • The user wishes to confirm that the result of the merge conflict resolution is a valid notebook. • The user wishes to perform the merge without ac- cess to the internet.

Preconditions: • The user’s current directory is tracked by the ver- sion control system. • One or more IPython Notebook files are tracked by the current directory’s repository. • The current repository is in the process of merging two branches but the process has been halted due to a conflict. • One or more of the IPython Notebook files tracked by the current repository have unresolved merge conflicts. • The current user has permissions to read the IPython Notebook files tracked by the current 78repository. Chapter 12. Merging-Related Use Cases • The IPython Notebooks in the repository are valid IPython Notebook files.

Trigger: The user indicates that they would like to resolve all merge conflicts in the unmerged IPython Notebook files. Success End Condition: The user has resolved all merge conflicts in IPython Notebook files. Failure End Condition: The system has not generated a diff. An error message was displayed. Minimal Guarantee • The system has exited without a stack trace or other unintended error message.

Main Success Scenario 1. The user indicates that they would like to resolve merge conflicts in the unmerged IPython Note- book files in the current repository. 2. The system requests a list of currently unmerged files from the version control system. 3. The system requests (base, remote, local) versions of each unmerged file that is a valid Notebook file. 4. The system detects changes between (base, re- mote) and (base, local) versions of these versions of each Notebook file and presents the changes to the user. 5. The user indicates which changes from each branch they would like applied to the resulting merged notebooks. 6. The user indicates that they have completed the merge. 7. The system saves the resulting merged notebooks to the filesystem in the location of the local ver- sion of the file in the working directory.

Extensions • 2a. The user is not currently in a version control repository. – The system exits and notifies the user that they are not in a repository. • 2b. There are no unmerged files in the repository. – The system exits and notifies the user that there are no unmerged files. • 3a. None of the unmerged files are IPython Note- books. – The system exits and notifies the user that there are no unmerged notebooks to diff. • 3b. One or more notebooks has a version (local, base, or remote) that is an invalid notebook file. – The system notifies the user of which note- books had an invalid version. – If there are no unmerged notebooks for all (local, base, and remote) versions are valid, the system exits. • 5a. The user elects not to resolve a notebook’s merge conflicts. – The system does not modify the notebook on the filesystem.

Special Requirements Refer to Supplementary Requirements Specification document. Related Use Cases DUC1: Generate Diff from a Notebook in Version Con- trol NBDiff Documentation, Release 1

12.1. ID: MUC1 - Locally Resolve Merge Conflicts in Notebooks in Version Control 79 NBDiff Documentation, Release 1

12.2 ID: MUC2 - Locally Resolve Merge Conflicts in Notebooks from the Filesystem

Table 12.2: MUC2 Priority: Medium Level: User Goal Description: The user specifies the base, remote, and local versions of an IPython Notebook file. The system detects and presents the changes between (base, local) and (base, remote) versions of the file and allows the user to select which changes to apply to construct the final result. The final resulting Notebook file is saved to the file system. (Note: This use case ensures that the system implements the standard interface used by version control systems and other systems to talk to merge tools such as NBD- iff. Implementation details relating to that interface are therefore discussed in this use case.) Primary Actor: • User • or User, indirectly, through a separate program.

Secondary Actor: n/a Stakeholders: User Interests: • The user wishes to see what changes were applied to the base notebook to arrive at the local and re- mote versions. • The user wishes to resolve notebook conflicts in the same way that they resolve other merge con- flicts, by integrating NBDiff with their version control system. • The user wishes to produce a valid notebook from the disparate files by choosing which changes they would like to keep. • The user wishes to confirm that the result of the merge conflict resolution is a valid notebook. • The user wishes to perform the merge without ac- cess to the internet.

Preconditions: • The user has permissions to read the files to be merged. • The files to be merged are valid IPython Notebook files.

Trigger: The user indicates that they would like to merge files from the local filesystem. Success End Condition: The user has produced a single valid notebook file. Failure End Condition: The system has not generated a merged IPython Note- book file. An error message was displayed. Minimal Guarantee • The system has exited without a stack trace or other unintended error message.

Main Success Scenario 1. The user indicates that they would like to merge 80notebooks Chapter from 12. the Merging-Related local filesystem. Use Cases 2. The user specifies the paths to the base, remote, and local versions of the notebook. 3. The system detects changes between (base, re- mote) and (base, local) files and presents the changes to the user. 4. The user indicates which changes from each pair of files they would like applied to the resulting merged notebook. 5. The user indicates that they have completed the merge. 6. The system saves the resulting merged notebook to the filesystem. • If the user has specified four file paths, the result is saved to the fourth path given. • If the user has specified three file paths, the result is saved to the first file path given (the local file path).

Extensions • 2a. One or more of the files specified by the user are not valid notebook files. – The system indicates which notebook files are invalid. – The system exits. • 5a. The user elects not to save the merge result. – The system does not modify the notebook on the filesystem. – The system exits.

Special Requirements • Mercurial requires a command-line invocation of the format: $MERGETOOL $LOCAL $BASE $REMOTE • git-mergetool requires a command-line invocation of the format: $MERGETOOL $LOCAL $BASE $REMOTE $RESULT

Related Use Cases DUC2: Locally Diff Files from the Filesystem. NBDiff Documentation, Release 1

12.2. ID: MUC2 - Locally Resolve Merge Conflicts in Notebooks from the Filesystem 81 NBDiff Documentation, Release 1

12.3 ID: MUC3 - Merge Notebooks Located on A Remote Server

Table 12.3: MUC3 Priority: Medium Level: User Goal Description: The user specifies the URIs of base, remote, and lo- cal versions of an IPython Notebook file. The system fetches the three versions of the file using the specified URIs from the HTTP server(s) where they are stored. Then, the system detects and presents the changes be- tween (base, local) and (base, remote) versions of the file and allows the user to select which changes to apply to construct the final result. The system allows the user to save the final resulting Notebook file. Primary Actor: User Secondary Actor: 1-2 Remote HTTP Servers (referred to as the RHS in this UC) Stakeholders: User Interests: • The user wishes to merge two notebooks without installing nbdiff on their own computer. • The user wishes to see what changes were ap- plied to the base notebook on the local and remote branches. • The user wishes to produce a valid notebook from the disparate versions on the local and remote branches by choosing which changes they would like to keep; the current status quo is to resolve the merge conflict “by hand” by editing the JSON files using a text editor. • The user wishes to confirm that the result of the merge conflict resolution is a valid notebook.

Preconditions: • The user is connected to the Internet and is run- ning a web browser. • NBDiff is deployed on an Internet-accessible server that the user’s computer can access. • The RHSs are connected to the Internet and are accessible by the NBDiff server. • The RHSs contain valid notebook files at URIs known to the user.

Trigger: • The user accesses the NBDiff server. • The user indicates that they would like to merge IPython Notebook files saved on remote HTTPserver(s).

Success End Condition: The user has produced a single valid notebook file. Failure End Condition: The system has not generated a merged IPython Note- book file. An error message was displayed. Minimal Guarantee • The NBDiff remote server has not shut down. • The system has exited without a stack trace or other unintended error message. 82 Chapter 12. Merging-Related Use Cases Main Success Scenario 1. The user accesses the NBDiff remote server. 2. The user indicates that they would like to merge remote notebooks from URIs. 3. The system requests that the user to specify the three URIs. 4. The user specifies the URIs to the base, remote, and local versions of the notebook. 5. The system requests the files from the specified URIs from the RHSs. 6. The system detects changes between (base, re- mote) and (base, local) files and presents the changes to the user. 7. The user indicates which changes from each pair of files they would like applied to the resulting merged notebook. 8. The user indicates that they have completed the merge. 9. The user requests to save the resulting merged notebook. 10. The system prompts the user to enter where the resulting merged notebook should be saved. 11. The user specifies the file path where the resulting notebook will be saved. 12. The system saves the resulting merged notebook to the filesystem.

Extensions • 2a. One or more of the files specified by the user are not valid notebook files. – The system indicates which notebook file(s) are invalid. – The system exits. • 6a. The user elects not to save the merge result. – The system does not save the notebook on the filesystem. – The system exits.

Special Requirements Refer to Supplementary Requirements Specification document. Related Use Cases DUC3: Diff from Notebooks Located On A Remote Server NBDiff Documentation, Release 1

12.3. ID: MUC3 - Merge Notebooks Located on A Remote Server 83 NBDiff Documentation, Release 1

12.4 ID: MUC4 - Resolve Merge Conflicts Remotely in Version Control Pull Request

Table 12.4: MUC4 Priority: Medium Level: User Goal Description: In the process of making a pull request in VCS, the user uses the system to resolve merge conflicts in one or more IPython Notebook files. Primary Actor: User Secondary Actor: VCS Stakeholders: User Interests: • The user wishes to see what changes were ap- plied to the base notebook on the local and remote branches. • The user wishes to produce a valid notebook from the disparate versions on the local and remote branches by choosing which changes they would like to keep; the current status quo is to resolve the merge conflict “by hand” by editing the JSON files using a text editor. • The user wishes to confirm that the result of the merge conflict resolution is a valid notebook.

Preconditions: • The user is connected to the Internet and is run- ning a web browser. • NBDiff is deployed on an Internet-accessible server that the user’s computer can access. • The user’s current directory is tracked by the ver- sion control system. • One or more IPython Notebook files are tracked by the current directory’s repository. • The current repository is in the process of a pull request but the process has been halted due to a conflict. • One or more of the IPython Notebook files tracked by the current repository have unresolved merge conflicts. • The current user has permissions to read the IPython Notebook files tracked by the current repository. • The IPython Notebooks in the repository are valid IPython Notebook files.

Trigger: The user indicates that they would like to resolve all merge conflicts in the unmerged IPython Notebook files. Success End Condition: The user has resolved all conflicts in IPython Notebook files. Failure End Condition: The system has not generated a merged IPython Note- book file. An error message was displayed. Minimal Guarantee • The NBDiff remote server has not shut down. • The system has exited without a stack trace or 84other Chapter unintended 12. error Merging-Related message. Use Cases

Main Success Scenario 1. The user accesses the NBDiff remote server. 2. The user indicates that they would like to resolve conflicts in the unmerged IPython Notebook files in the current repository. 3. The system requests a list of currently unmerged files from the version control system. 4. The system requests (base, remote, local) versions of each unmerged file that is a valid Notebook file. 5. The system detects changes between (base, re- mote) and (base, local) versions of these versions of each Notebook file and presents the changes to the user. 6. The user indicates which changes from each branch they would like applied to the resulting merged notebooks. 7. The user indicates that they have completed the merge. 8. The system saves the resulting merged notebooks in the location of the head file.

Extensions • 3a. The user is not currently in a version control repository. – The system exits and notifies the user that they are not in a repository. • 3b. There are no unmerged files in the repository. – The system exits and notifies the user that there are no unmerged files. • 4a. None of the unmerged files are IPython Note- books. – The system exits and notifies the user that there are no unmerged notebooks to diff. • 4b. One or more notebooks has a version (local, base, or remote) that is an invalid notebook file. – The system notifies the user of which note- books had an invalid version. – If there are no unmerged notebooks for all (local, base, and remote) versions that are valid, the system exits. • 6a. The user elects not to resolve a notebook’s merge conflict. – The system does not modify the notebook.

Special Requirements Refer to Supplementary Requirements Specification document. Related Use Cases DUC3: Diff from Notebooks Located On A Remote Server CHAPTER 13

Scoped-out Use Cases

The following use cases were scoped out due to schedule constraints and low priority.

85 NBDiff Documentation, Release 1

13.1 ID: DUC4 - Selective Staging

Priority: Medium Level: User Goal Description: The user selects the changes that he wishes to stage. The system stages the indicated changes. Primary Actor: User Secondary Actor: N/A Stakeholders: • User

Interests: • The user wants to stage changes in a file.

Preconditions: • The user is authenticated to the Version Control System. • There have been multiple modifications made to the head on the Version Control System.

Trigger: The user decides to stage the changes made to the file. Success End Condition: The system has staged the changes and the staged mod- ifications are ready to be committed. Failure End Condition: The system has not staged the changes. Minimal Guarantee: The IPython notebook in the Version Control System re- main unchanged. Main Success Scenario: 1. The user chooses the changes on the notebook that he desires to be staged and confirms. 2. The system indicates that the changes are staged and ready to be committed.

Extensions: *a. At any time if the system fails: • The user restarts the system. • The system returns to the state it was in before the system failed. *b. At any time if the user cancels the request: • The user requests to cancel. • The system asks for confirmation. • The user confirms. • The system terminates.

Special Requirements: Refer to Supplementary Requirements Specification document. Related Use Cases: N/A

86 Chapter 13. Scoped-out Use Cases CHAPTER 14

Risk Management Plan

Table of Contents • Risk Management Plan – Introduction * Purpose – Roles and Responsibilities * Team Leader * Project Team * Project Stakeholders – Risk Assessment * Risk Identification – Risk Analysis * Risk Prioritization – Risk Control * Risk Response Planning * Resolution * Risk Monitoring and Reporting

14.1 Introduction

This document identifies risks affecting the NBDiff project and the team’s plan for coping with these risks. It also assigns responsibility to team members and project stakeholders.

14.1.1 Purpose

This plan is intended to help the project team avoid problems that could result in a project failure or otherwise un- pleasant outcomes. It provides stakeholders with assurance that the team is aware of all major project risks and has a plan for addressing them.

14.2 Roles and Responsibilities

Project stakeholders are defined in the Vision Document and the Project Management Plan. Stakeholders relevant to risk management are listed here, with their risk management roles described.

87 NBDiff Documentation, Release 1

14.2.1 Team Leader

The team leader will take responsibility for identifying and addressing risks. The team leader is responsible for ensuring each risk has people assigned to it and for monitoring the risk to ensure successful avoidance/mitigation.

14.2.2 Project Team

The project team assists in identifying and addressing risks.

14.2.3 Project Stakeholders

Project stakeholders will monitor and evaluate risk management success. They will identify risks originating from outside the project.

14.3 Risk Assessment

14.3.1 Risk Identification

The following is a list of the risks that affect the overall project. Each one has an identifier beginning with R- which is used to reference the risk in the rest of the document. We use this format instead of numerical identifiers to ease the revision process for our documentation. • R-license: The school’s intellectual property policy allows us to license our work as we choose. However, the NBDiff project has little ability to assess legal risks (i.e., by hiring a lawyer) involved with releasing the project under an open source license. See licensing information in the Supplementary Requirements Specification. • R-changing-notebook: Since the NBDiff project is a tool for use with the IPython Notebook, the NBDiff team will need to understand these changes and adapt the NBDiff tool to them. These changes might include a new file format or new user interface formatting in the Notebook. • R-user-engage: Our target users are primarily busy academics who have no time for shenanigans. Although some user feedback will be trivial to get from the internet, engaging with less-connected users could be difficult. • R-tech-learn: The project team is inexperienced with the technologies we will use. This presents a risk for productivity and achieving the quality desired by external stakeholders. • R-security: Since the NBDiff tool will be storing files the users’ IPython notebooks on their server, security becomes a concern. NBDiff is potentially at risk for intrusions by saving files from unknown users. • R-scheduling: The project team is in their last year of university and has a very hectic schedule. A potential risk is scheduling conflicts among team members. • R-time-management: The project team members are full time university students and it is a risk that without proper time management the project could experience delays.

14.4 Risk Analysis

We lack the resources to assess risks with the precision required for a quantitative model. In many cases it would not be possible to do so even with unlimited resources. Thus, we will present a coarsely-grained qualitative model in a likelihood vs. impact matrix. We will give additional weight in cases where the impact or likelihood is difficult to estimate.

88 Chapter 14. Risk Management Plan NBDiff Documentation, Release 1

Likelihood Low Moderate Significant High Impact High R-scheduling R-license R-security Significant R-time-management Moderate R-user-engage R-tech-learn Low R-changing-notebook

14.4.1 Risk Prioritization

Using the above risk analysis matrix, we rank our risks in the following order. 1. R-license 2. R-security 3. R-time-management 4. R-tech-learn 5. R-scheduling 6. R-user-engagement 7. R-changing-notebook

14.5 Risk Control

14.5.1 Risk Response Planning

The following table describes the approaches we will take for addressing each identified risk. Risk Strategy R-license Mitigation: We will seek legal clarification from the course coordinator and get an explicit disclaimer of intellectual property. R-changing- Acceptance: We cannot stop the Notebook from changing. Contingency: In order to reduce notebook impact of the changing Notebook, we will follow IPython Notebook development closely so that we can anticipate format and UI changes and adjust our plans accordingly. R-user- Mitigation: We will solicit feedback from the public via the mailing list and other venues; present engage users with prototypes at our stakeholder’s workshops; etc. Contingency: We will cater to the users we are able to reach and accept the decreased utility to the target users. R-tech-learn Mitigation: We will form a reading club for learning the prerequisite skills. We will explicitly plan for training in our scheduling. Contingency: We will adjust our activity planning and scheduling accordingly to account for the productivity loss. R- Reduction: We cannot change Team membeer’s schedules. Will try and work around everyone’s scheduling schedules, and when this is not possible we will have multiple smaller meetings with those members that can make it. R-security Mitigation: We will perform code reviews to ensure there are no gaps and our server is secure. R-time- Mitigation: We will create a group calendar will all relevant course dates to properly course work management and project work.

14.5.2 Resolution

The following table assigns people and deadlines to each risk. In cases where the risk will need to be managed throughout the project, the deadline is left blank.

14.5. Risk Control 89 NBDiff Documentation, Release 1

Risk Assigned To Resolution Deadline R-license Team Leader M2 (2nd Milestone) R-changing-notebook Project team – R-user-engage PO, Project team – R-tech-learn Project team – R-scheduling Project team – R-security Project team M6 (6th Milestone)

14.5.3 Risk Monitoring and Reporting

We will reassess the project risks (at least) at every project milestone (see PMP) and document the progress in this table. Where necessary, other updates will be documented as they occur.

Table 14.1: Risk Monitoring :header-rows: 1 Up- Description date 2013- R-license: Discussion about licensing issue started with coordinator. 9-11 2013- R-license: IP release forms provided on coordinator’s website. 9-24 2013- R-user-engage: Initial usability testing performed with prospective users in person. 11-18 2013- R-license: Signed IP Opt-out form delivered to coordinator. 11-25 2013- R-scheduling: End of the semester for team members. Due to studying for final exams, meeting had to 12-18 be cancelled. 2013- R-scheduling: Many team memebers are out of the country for Christmas. Skype meeting was held 12-28 instead. 2014- R-security: Code review perfomed for local-server. 02-01 2014- R-security: Code review perfomed for remote-server. 02-16 2014- R-time-management: For the last milestone all renmaining tasks were scheduled in the group calender. 03-03 Team members work schedules and interview dates were also added.

90 Chapter 14. Risk Management Plan CHAPTER 15

Software Architecture Document

Table of Contents • Software Architecture Document – List of Figures – Introduction * Purpose * Scope * Definitions, Acronyms, and Abbreviations – Architectural Representation – Architectural Requirements: Goals and Constraints * Functional Requirements * Non-Functional Requirements – Scenarios – Logical View * Layers * Architecturally Significant Design Packages · External Libraries * Use Case Realizations · MUC1 - Resolve Conflicts Locally from Version Control · DUC3 - Diff Notebooks From Remote Server – Development View * Reuse of Components and Frameworks – Process View – Deployment View – Size and Performance – Quality – References

15.1 List of Figures

• Figure 1 • Figure 2 • Figure 3 • Figure 4

91 NBDiff Documentation, Release 1

• Figure 5 • Figure 6 • Figure 7 • Figure 8 • Figure 9 • Figure 10 • Figure 11 • Figure 12 • Figure 13 • Figure 14 • Figure 15 • Figure 16 • Figure 17 • Figure 18 • Figure 19

15.2 Introduction

This software architecture document provides an overview of the software architecture used in the NBDiff project. It defines the program’s subsystems, interactions between its components, and the architecture pattern used for the system. The system is designed for the purpose of improving file comparison techniques for the iPython notebook file format (.ipynb). It also aims to be as maintainable as possible so that the iPython team may be able to reuse or maintain the code in the future. The use case and functional requirements play a role in the architectural design decisions and are traceable in this document.

15.2.1 Purpose

The purpose for the software architecture document is to describe the design of the NBDiff project and aid the pro- grammer as they are implementing the system. It provides insight on the architectural choices and includes an outline of the system. The document is intended for the project evaluators, the stakeholder Greg Wilson, the members of the NBDiff team, and future developers wishing to work on the software so that they may use this document as technical support in order to interact with the system and understand its components.

15.2.2 Scope

The information provided in this documentation covers the components of the system that is developed and imple- mented based on the needs of the project’s product owner, interactions with the iPython team, and the team’s discretion of what are commonly used features of diffing tools after a series of competitive analysis with other existing diffing tools.

92 Chapter 15. Software Architecture Document NBDiff Documentation, Release 1

15.2.3 Definitions, Acronyms, and Abbreviations

The definitions of all terms, acronyms, and abbreviations required to properly interpret this document can be found in the Supplementary Specifications document.

15.3 Architectural Representation

NBDiff is implemented as a web application; thus, by necessity, it uses a client-server architecture. This client- server architecture is somewhat modified due to specific requirements for implementation: namely, NBDiff reuses components of the IPython Notebook front-end (which are built using Javascript, HTML, and CSS) but also presents a command-line interface for invocation in keeping with other standard diff/merge tools. Thus, the client-server architecture used by NBDiff has two clients: the client which interfaces with version control or the filesystem and launches the HTTP server; and the web browser, which interfaces with the HTTP server to provide a rich GUI to the user. NBDiff may also be deployed as a public-facing web application, which requires a slightly different architecture. The command-line interface cannot be used to launch the server or send data to it, so the web application needs to provide a UI for beginning a diff/merge session to the web application user. Further, data must be persisted across multiple requests on the remote server without using an in-memory data store. This requires use of a database system. In this document, we use the 4 + 1 model to explain the architectural elements of NBDiff.

Figure 15.1: Figure 1

The 4+1 view model. Each view will be represented using UML model elements specified as follows:

15.3. Architectural Representation 93 NBDiff Documentation, Release 1

4+1 View Architecture Model Elements Use Cases Use Case Diagram (in the “Use Case Model” document.) Logical View Package Diagram, Class Diagram Process View Activity Diagram Implementation View || Package Diagram Development View Deployment View || Deployment Diagram Physical View

15.4 Architectural Requirements: Goals and Constraints

15.4.1 Functional Requirements

These FRs and UCs are not necessarily the most important functional requirements, but they are the functional re- quirements that had the most impact on architectural decisions. This list is not exhaustive.

Table 15.1: Key Functional Requirements Source Name Architectural Relevance Addressed in DUC1, MUC1 Diffing/Merging from VCS Requires integration with Logical view version control systems. DUC3, MUC3 Diffing/Merging remotely Deployment view, logical • Requires a separate view. deployment strategy. • Requires persistence layer to handle multi- ple requests. • Requires concur- rency handling.

FR-16 Integration of nbdiff with Requires command-line in- Logical view, Use Case Re- existing version control sys- vocation. alizations tems

15.4.2 Non-Functional Requirements

These non-functional requirements had the greatest impact on architectural decisions. This is not an exhaustive list.

94 Chapter 15. Software Architecture Document NBDiff Documentation, Release 1

Table 15.2: Key Non-Functional Requirements Source Name Architectural Relevance Addressed in NFR-Maint-3 Changes to the IPython • Requires use of • Logical view Notebook file format should IPython libraries for • Process view be anticipated by the de- parsing and rendering • Use Case Realization sign. notebooks. • Deployment view • The IPython libraries require use of a browser for the GUI. • Encourages relin- quishing control to IPython front-end code (inversion of control pattern) to ensure compatibility with IPython.

NFR-Port-1 The system shall run Logical view, deployment • Encourages use of on [multiple operating view web technologies for systems. see NFR-Port-1]. UI.

NFR-Use-1 The system should mimic Logical view, deployment • Encourages use of the Notebook’s UI. view web technologies for UI.

15.5 Scenarios

The significant use cases of the system are presented in the following use case diagram. See the use case documents for more information. Use Case Diagram

15.6 Logical View

The logical view captures the functionality provided by the system; it illustrates the collaborations between system components in order to realize the system’s use cases.

15.6.1 Layers

We present two layered views of the system. The first is a layered view of the system as it is deployed locally. The most important characteristic of the local system is that it has two user interfaces that are used throughout the course of execution: the web-based graphical user interface used to view and manipulate diffs and merges, and the terminal-based command line interface used to invoke the program. Layers used in a local NBDiff installation The second layered view is the system as it is deployed on a remote server for use as a web application. This view of the system is less complicated due to the single UI. However, additional systems are needed for data persistence.

15.5. Scenarios 95 NBDiff Documentation, Release 1

Figure 15.2: Figure 2

96 Chapter 15. Software Architecture Document NBDiff Documentation, Release 1

Figure 15.3: Figure 3

Figure 15.4: Figure 4

15.6. Logical View 97 NBDiff Documentation, Release 1

Layers used in a remote NBDiff installation

15.6.2 Architecturally Significant Design Packages

The following views focus on the classes and functions used to implement local diffing and merging. The following diagram shows a high-level view of the most significant classes used in executing MUC1.

Figure 15.5: Figure 5

High-level view of most important classes for local merging commands The commands module contains the entry points to the command-line interface. Our system follows the Python Package Index (PyPI) format which allows packages to specify Python objects to execute when an executable is run. In our case, we map the command nbmerge to the function commands.merge and the command nbdiff to the function commands.diff. vcs_adapter The vcs_adapter package is used for interfacing with different version control systems using the same API. The GitAdapter is an example of a VCSAdapter. local_server The server.local_server module is used to server web content to the browser in order for the UI to be rendered for the user. merge The merge module is responsible for the business logic of detecting merge conflicts and unifying three conflicting notebooks into a single valid IPython Notebook file that can be loaded by the IPython front-end with additional merge metadata added. This module relies heavily on the notebook_diff and diff modules.

98 Chapter 15. Software Architecture Document NBDiff Documentation, Release 1

notebook_diff The notebook_diff module is responsible for the business logic of diffing the data structures in an IPython Notebook file. The function notebook_diff is used to unify two notebooks in a single valid IPython Notebook file that can be loaded by the IPython Notebook front-end (with additional diff metadata added). diff The diff module contains the abstract, high-level diff function, which contains the implementation of the longest common subsequence (LCS) algorithm. This module does not know about Notebooks or any data structure in particular.

A more detailed view of the terminal-based component of NBDiff follows. This diagram shows a subset of the important functions and classes used to implement MUC1.

Figure 15.6: Figure 6

Main Python controller for local nbdiff/nbmerge merge.MergeConflict

15.6. Logical View 99 NBDiff Documentation, Release 1

MergeConflict is not a Python class but a data structure implemented as a dictionary. Conceptually, however, it represents a notebook with MergeChangedCell objects. This is an IPython-compatible data structure with extra information for the purposes of nbdiff.js. merge.MergeChangedCell MergeChangedCell is also a dictionary-based pseudoclass. This is compatible with IPython – that is, interchangeable with the Cell data structure found in IPython Notebooks. However, it contains extra metadata about which branch and what kind of change it represents: added, removed, unchanged, or modified. This pseudoclass may contain extra-diff-data that contains line diff data, etc.

The NBFlask component of the above diagram can be examined in greater detail in the following diagram. Many of the components shown are not classes in the traditional sense but other forms of objects and assets used to render a page.

Figure 15.7: Figure 7

Local server

100 Chapter 15. Software Architecture Document NBDiff Documentation, Release 1

Once the UI is loaded, NBDiff’s Javascript runs. This code is organized as follows. The class NBDiff is the main controller and initiates NBDiff’s special rendering of changes in the Notebook UI.

Figure 15.8: Figure 8

Client-side Javascript classes nbdiff.js:NBDiff NBDiff is the main controller for NBDiff in the front-end. Its init method is called after the IPython Notebook front-end and data is loaded. nbdiff.js:Merge Merge is the controller responsible for rendering a merge conflict and setting up the UI for the user to resolve the conflict. nbdiff.js:Diff Diff is the controller responsible for rendering a diff. nbdiff.js:Invoker Invoker contains a history of actions performed by the user; it is used to implement undo/redo. nbdiff.js:DragDrop DragDrop is responsible for handling mouse events related to dragging and dropping cells into the base cell.

External Libraries

See Reuse of components and frameworks.

15.6. Logical View 101 NBDiff Documentation, Release 1

15.6.3 Use Case Realizations

To clearly describe the important architectural elements of NBDiff, we will provide interaction diagrams for two use cases: MUC1 and DUC3. These are chosen to explain the remote and local versions of NBDiff. MUC1 is a local operation and DUC3 is a remote operation

MUC1 - Resolve Conflicts Locally from Version Control

We document the realization of MUC1 in multiple parts. The first SD below shows the portion of execution that occurs in the terminal. 1. The user begins by executing the command. 2. The merge entry point interfaces with the VCS and executes the diff algorithm. 3. The merge entry point sends the merge conflict data to the NBFlask server.

Figure 15.9: Figure 9

Sequence Diagram – Locally resolve merge conflicts from version control Once the NBFlask server is started, the terminal component of NBFlask starts an instance of a web browser which then makes an HTTP request to the NBFlask server. Major Javascript, CSS, HTML, and other UI components are sent to the browser. Once this setup is complete, the IPython Notebook Javascript executes – we relinquish control to their Javascript to request and unmarshall notebook data from the NBFlask server. (This uses the Inversion of Control pattern.) Sequence Diagram – Browser/NBFlask interaction Once this is complete, the IPython Notebook Javascript emits an event indicating that the notebook has been loaded. The NBDiff Javascript code listens for this event and then executes. Sequence Diagram – Inversion of Control interaction The NBDiff Javascript code executes after the IPython Notebook Javascript executes; it works by requesting data fromthe IPython Notebook data structures and then re-rendering the page with changes indicated.

102 Chapter 15. Software Architecture Document NBDiff Documentation, Release 1

Figure 15.10: Figure 10

Figure 15.11: Figure 11

15.6. Logical View 103 NBDiff Documentation, Release 1

The main object is the NBDiff controller object. This creates an instance of the Merge controller, which knows how to render merges (as opposed to diffs). Event handlers for user actions are bound and when they are fired, the Merge controller handles the event. Interaction with the NBFlask server is handled by the IPython Notebook front-end code.

Figure 15.12: Figure 12

Sequence Diagram – Front End Flow

DUC3 - Diff Notebooks From Remote Server

We split DUC3 into three interaction diagrams. 1. The first diagram shows the interaction between the browser and the web application when loading the front page of the web application. The web application presents a form to the user in which the user can input URIs. They submit the form and the diffing process is started. Sequence Diagram – Client Accessing NBDiff Web Application 2. The URIs are requested from the remote server(s). This notebook JSON data is parsed and passed to the notebook_diff function. The resulting annotated notebook is passed to the persistence layer. Finally, the front-end assets for the diff UI are returned to the NBFlask object, along with the persisted notebook’s ID. Sequence Diagram – NBFlask processing request 3. The browser loads the front-end assets for the diff UI. It uses the notebook_id to retrieve the annotated notebook from the Flask server. After loading and rendering this notebook, the browser initializes the NBDiff javascript code.

104 Chapter 15. Software Architecture Document NBDiff Documentation, Release 1

Figure 15.13: Figure 13

Figure 15.14: Figure 14

15.6. Logical View 105 NBDiff Documentation, Release 1

Figure 15.15: Figure 15

Sequence Diagram – Browser Loading Diff UI

15.7 Development View

Package Diagram

15.7.1 Reuse of Components and Frameworks

IPython IPython’s core libraries and interfaces will be reused to build NBDiff. Flask Flask is a framework for python web applications. It’s refers itself as a ‘microframework’ in that it keeps its core simple and extensible, while also allowing itself to be fully configurable, leaving design decisions up to the developer. It is based on Werkzeug and Jinja 2. Werkzeug is a Web Server Gateway Interface (WSGI) utility library for Python. Jinja2 is a templating language for Python. The use of Flask allows us to utilize effective and simple web applications using python as the controller and model of the system. SQLAlchemy SQLAlchemy is an Object Relational Manager for Python that supports multiple databases (e.g., SQLite, PostgreSQL). This will be used to support the server-side version of NBDiff.

106 Chapter 15. Software Architecture Document NBDiff Documentation, Release 1

Figure 15.16: Figure 16

15.7. Development View 107 NBDiff Documentation, Release 1

15.8 Process View

Please refer to Activity Diagrams in the SRS document. The following diagram illustrates the processes used to implement local NBDiff. Most significant about this view is that it shows the necessity for parallel execution of the browser and the NBFlask local server.

Figure 15.17: Figure 17

Activity Diagram - Local NBDiff

15.9 Deployment View

The deployment (or physical) view illustrates the physical components of the architecture, their connectors and their topology. Use a UML deployment diagram to capture this view.

108 Chapter 15. Software Architecture Document NBDiff Documentation, Release 1

Name Type Description Client Machine Device Laptop/PC with minimum of 512mb of ram and 1.8Ghz CPU Web Browser Mozilla Firefox, Google Latest versions of either Chrome Server Device Laptop/PC/Dedicated Server, 8gb Ram, 3Ghz 6 core CPU Machine Web Server Component Flask, Python server

Figure 15.18: Figure 18

Deployment Diagram Deployment Diagram - Remote NBDiff server

15.10 Size and Performance

The size of the system may be as small as a single laptop or as large as hundreds of computers connected to a single server. The software should scale well for these size constraints. The software should also perform well, when hundreds of requests are coming for diffing/merging. The client-server architecture is well-suited for these requirements because it is a one to many architecture and improving the server’s response time can improve the overall performance of the system.

15.10. Size and Performance 109 NBDiff Documentation, Release 1

Figure 15.19: Figure 19

15.11 Quality

Table 15.3: The Impact of NBDiff’s Architecture on Quality Usability • The architecture re-uses components from the IPython Notebook in order to maintain a look con- sistent with the IPython Notebook. • The command-line interface mimics interfaces used by standard unix diff tools.

Maintainability • The architecture re-uses components from the IPython Notebook. Thus, changes to IPython should minimally affect NBDiff. • The local and remote servers re-use UI compo- nents, thus reducing maintenance load. • The interface is built using open web technologies that are known to many developers.

Portability • The interface is built using open web technologies that are available on all major platforms.

Reliability N/A Efficiency N/A

110 Chapter 15. Software Architecture Document NBDiff Documentation, Release 1

15.12 References

The diff algorithms and implementations that we used as a reference when implementing nbdiff. • The original diff paper presenting the Hunt-McIllroy Algorithm: James W. Hunt and M. Douglas McIlroy (June 1976). “An Algorithm for Differential File Comparison”. Computing Science Tech- nical Report, Bell Laboratories

15.12. References 111 NBDiff Documentation, Release 1

112 Chapter 15. Software Architecture Document CHAPTER 16

Test Plan Document

113 NBDiff Documentation, Release 1

Table of Contents • Test Plan Document – List of Figures – Introduction – Compatibility Testing * Test Risks / Issues * Items to be Tested · Functional Testing · Business Cycle Testing · Compatibility Testing · Data and Database Integrity · User Interface Testing * Items Not to be Tested · Security Testing · Conformance Testing * Test Approach · Unit Testing · Integrity Testing · User Interface Testing * Test Pass / Fail Criteria * Test Entry / Exit Criteria · Test Plan · Test Cycle * Test Deliverables * Test Suspension / Resumption Criteria * Test Environmental / Staffing / Training Needs · Test Environment · Roles · Staffing & Training Needs – Regression Testing – Acceptance Testing – Alpha Testing – Beta Testing – Performance Testing – Usability Testing – Defects/Bugs Management – Maintenance Testing * flake8 * JsHint

16.1 List of Figures

No table of figures entries found.

16.2 Introduction

The purpose of the document is to provide all information that is important to plan and control the test effort for the development of this project. It covers the test plan for NBDiff software during its development and provides rationale

114 Chapter 16. Test Plan Document NBDiff Documentation, Release 1 behind necessity of these tests. This document provides an outline of the tests that were implemented, the items were targeted by the tests along with the testing approach that was used.

16.3 Compatibility Testing

16.3.1 Test Risks / Issues

The test risks related to the NBDiff project are the following: • Lack of availability of personnel resources when testing phase starts • Lack of availability of required hardware or software tools needed • Changes in application requirements • Complexities involved with the testing of the application

16.3.2 Items to be Tested

The following tests will be done on the system:

Functional Testing

1. Test if diffing two empty Notebooks results in an empty diff without the system terminating unexpectedly 2. Test if diffing an empty Notebook with a non-empty Notebook shows a result of added cells 3. Test if diffing a Notebook with an added cell does not cause the system to terminate unexpectedly 4. Test if diffing a Notebook with an added cell results in a diff with the cell displayed as added 5. Test if diffing a Notebook with a deleted cell does not cause the system to terminate unexpectedly 6. Test if diffing a Notebook with a deleted results in a diff with the cell displayed as deleted 7. Test if diffing two similar Notebooks shows a result of unchanged cells 8. Test in diffing two modified cells if diffing two similar lines shows a result of unchanged 9. Test in diffing two modified cells if diffing two different lines shows a result of deleted line - added line in the before Notebook - the after Notebook respectively 10. Test in diffing two modified cells while having two lines are modified if diffing two similar words shows a result of unchanged 11. Test in diffing two modified cells while having two modified lines if diffing two different words shows a result of deleted word - added word to the local side - remote side respectively 12. Test while merging if choosing to add a cell from the remote Notebook to the base Notebook and saving saves the result in a local Notebook with the added cell 13. Test while merging if choosing to add a cell from the local Notebook to the base Notebook and saving saves the result in a local Notebook with the added cell 14. Test if not choosing any changes doesn’t modify the Notebooks 15. Test while merging if choosing to apply the deletion of a cell as in the local Notebook to the base Notebook and saving results in a local Notebook without the deleted cell

16.3. Compatibility Testing 115 NBDiff Documentation, Release 1

16. Test while merging if choosing to apply the deletion of a cell as in the remote Notebook to the base Notebook and saving results in a local Notebook without the deleted cell 17. Test while merging if choosing to apply the modification of a cell as in the local Notebook (that has a line based diff) to the base Notebook and saving saves the result in a local Notebook with the modified cell 18. Test while merging if choosing to apply the modification of a cell as in the remote Notebook (that has a line based diff) to the base Notebook and saving saves the result in a local Notebook with the modified cell 19. Test if diffing a valid Notebook with an invalid Notebook will indicate an error 20. Test if diffing a valid Notebook with an invalid Notebook will indicate which Notebook is invalid 21. Test in diffing two headers if there are two modified lines shows a word line based (added, deleted, or unchanged for each word)

Business Cycle Testing

• Perform end to end testing of real life-like scenarios using the system. This applies to all the previously men- tioned Functional Testing.

Compatibility Testing

• Test if the system is compatible with and Mozilla Firefox

Data and Database Integrity

• For each form that submits a merge or a diff request, there is a similar test that provides a valid request to the server and verifies the result in the database

User Interface Testing

• The user interface must fit in any laptop screen • There must be no spelling mistakes • The NBDiff logo must load properly • The division of the screen in diff tool must be equally divided into two vertical bars • The division of the screen in merge tool must be divided into three equal vertical bars( remote bar, base bar, local bar) and between each two there is an arrow bar (to allow choosing the needed cell) • Verify that the help button is accessible • Verify that the save button is accessible in merge page • Colours should not annoy the user • Colours for added, deleted, modified functionalities should be consistent

16.3.3 Items Not to be Tested

The following tests will not be done on the system since they are out of the scope of this project:

116 Chapter 16. Test Plan Document NBDiff Documentation, Release 1

Security Testing

These kinds of tests will not be performed since there is no critical information stored in the database.

Conformance Testing

These kinds of tests are required to verify that the system meets certain business or domain standards. But these tests fall outside the project’s scope.

16.3.4 Test Approach

Unit Testing

Technique Objective Guarantee that the subsystems of the software for the internal functionalities is the same as the actual ones Technique Test the NBDiff difference unites independently to make sure that each unit works perfectly by itself Oracles Unit testing can be done through manual testing by us- ing JUint test in order to accomplish testing all the units of this system Required Tool The tools required to achieve the unit testing: • JUnit

Success Criteria The success criteria will be tested for: • Use cases of the system • Features

Special Considerations Unit testing tests the functionalities only and cannot catch any errors at the integrity level, which should be taken into consideration to back up the unit tests with other types of testing.

16.3. Compatibility Testing 117 NBDiff Documentation, Release 1

Integrity Testing

Technique Objective Guarantee that the behavior for the internal functionali- ties is the same as the actual ones for the whole system when test it all together Technique The technique to accomplish the Integrity testing: • all business rules are applied in a proper way • performing tests for all functionalities with valid Notebooks, and the system display the expected result

Oracles Unit testing can be done through manual testing by us- ing JUint test in order to accomplish testing all the unit tests for this system Required Tool The tools required to achieve the unit testing: • JUnit

Success Criteria The success criteria will be tested for: • Use cases of the system • Features

Special Considerations N/A

User Interface Testing

Technique Guarantee that the user interface agrees with the user needs, and allows the user to perform Objective better actions. Technique Have tests for all the actions the user can do in the system via the UI Oracles SeleniumHQ will be used to document all the criteria of the test cases when the UI test operation performed Required Tools Internet Explorer will be used to achieve the goal of this test Success Criteria This technique is going to be used for all the screens will be included for this system Special This test will access as much properties as possible for NBDiff system but it may not be Considerations accessible to test all the properties

118 Chapter 16. Test Plan Document NBDiff Documentation, Release 1

16.3. Compatibility Testing 119 NBDiff Documentation, Release 1

16.3.5 Test Pass / Fail Criteria

Test ID Description Steps to test Expected results TC1 Test if diffing two empty The system should produce • Create two empty Notebooks results in an an empty diff without re- Notebooks empty diff without the sys- sulting in errors nor termi- • Run NBDiff on these tem terminating unexpect- nating unexpectedly two Notebooks edly TC2 Test if diffing an empty The system should produce • Create an empty Notebook with a non-empty a diff showing the new cells Notebook as the Notebook shows a result of as added before Notebook added cells • Create another Note- book as the after Notebook and add some cells to it • Run NBDiff on these two Notebooks

TC3 Test if diffing a Notebook The system should produce • Create one Notebook with an added cell does not a diff without terminating • Create a copy of the cause the system to termi- unexpectedly Notebook and add a nate unexpectedly new cell to the copy • Run NBDiff on these two Notebooks

TC4 Test if diffing a Notebook The system should produce • Create one Notebook with an added cell results in a diff with the cell displayed • Create a copy of the a diff with the cell displayed as added Notebook and add a as added new cell to the copy • Run NBDiff on these two Notebooks

TC5 Test if diffing a Notebook The system should produce • Create one Notebook with a deleted cell does not a diff without terminating • Create a copy of the cause the system to termi- unexpectedly Notebook and delete nate unexpectedly one cell from the copy • Run NBDiff on these two Notebooks

TC6 Test if diffing a Notebook The system should produce • Create one Notebook with a deleted results in a a diff with the cell displayed • Create a copy of the diff with the cell displayed as deleted Notebook and delete as deleted one cell from the copy • Run NBDiff on these two Notebooks

TC7 Test if diffing two similar An unchaned status for cells • Create a non-empty Notebooks shows a result of in both Notebooks should Notebook unchanged cells appear • Create a copy of the Notebook 120• Run NBDiff Chapter on these 16. Test Plan Document two Notebooks

TC8 Test in diffing two modified The system should display • Create a non-empty cells if diffing two similar a diff with all the non- Notebook lines shows a result of un- modified lines displayed as • Create a copy of the changed unchanged lines Notebook and change a line in a cell • Run NBDiff on these two Notebooks

TC9 Test in diffing two modified The system should display a • Create a non-empty cells if diffing two differ- diff displaying the cell with Notebook ent lines shows a result of the chenged line as a modi- • Create a copy of the deleted line - added line in fied lines. The line should Notebook and change the before Notebook - the be displayed as deleted in one line in a cell in after Notebook respectively the before Notebook and the copy Notebook the new line displayed as • Run NBDiff on these added in the after Note- two Notebooks book. TC10 Test in diffing two modified The system should display a • Create a non-empty cells while having two lines diff displaying the modified Notebook are modified if diffing two cell with the unchnagd lines • Create a copy of the similar words shows a result as unchanged lines. Notebook and change of unchanged few words in few lines of a cell in the copy Notebook • Run NBDiff on these two Notebooks

TC11 Test in diffing two modi- The system should display a • Create a non-empty fied cells while having two diff displaying the modified Notebook modified lines if diffing two cell with the modified lines • Create a copy of the different words shows a re- as added, deleted words Notebook and change sult of deleted word - added few words in few word to the local side - re- lines of a cell in the mote side respectively copy Notebook • Run NBDiff on these two Notebooks

TC12 Test while merging if The system should produce • Create a non-empty choosing to add a cell from a local Notebook with the Notebook as the base the remote Notebook to the added cell Notebook base Notebook and saving • Create a copy of the saves the result in a local Notebook as the re- Notebook with the added mote Notebook and cell make some modifica- tions • Create another copy of the Notebook as the local Notebook and make some mod- ifications • Run nbmerge on these Notebooks • Choose to add a cell from the remote Notebook to the base Notebook • Save the result

TC13 Test while merging if The system should produce • Create a non-empty choosing to add a cell from a local Notebook with the Notebook as the base the local Notebook to the added cell Notebook base Notebook and saving • Create a copy of the saves the result in a local Notebook as the re- Notebook with the added mote Notebook and cell make some modifica- tions • Create another copy of the Notebook as the local Notebook and make some mod- ifications • Run nbmerge on these Notebooks • Choose to add a cell from the local Notebook to the base Notebook • Save the result

TC14 Test if not choosing any The system should make no • Create a non-empty changes doesn’t modify the changes to the Notebooks. Notebook as the base Notebooks Notebook • Create a copy of the Notebook as the re- mote Notebook and make some modifica- tions • Create another copy of the Notebook as the local Notebook and make some mod- ifications • Run nbmerge on these Notebooks

TC15 Test while merging if The system should produce • Create a non-empty choosing to apply the a local Notebook without Notebook as the base deletion of a cell as in the deleted cell Notebook the local Notebook to the • Create a copy of the base Notebook and saving Notebook as the re- results in a local Notebook mote Notebook and without the deleted cell make some modifica- tions • Create another copy of the Notebook as the local Notebook and delete a cell • Run nbmerge on these Notebooks • Choose to apply the deletion of a cell as in the local Notebook to the base Notebook • Save the result

TC16 Test while merging if The system should produce • Create a non-empty choosing to apply the a local Notebook without Notebook as the base deletion of a cell as in the the deleted cell Notebook remote Notebook to the • Create a copy of the base Notebook and saving Notebook as the re- results in a local Notebook mote Notebook and without the deleted cell delete a cell • Create another copy of the Notebook as the local Notebook and make some mod- ifications • Run nbmerge on these Notebooks • Choose to apply the deletion of a cell as in the remote Notebook to the base Notebook • Save the result

TC17 Test while merging if The system should produce • Create a non-empty choosing to apply the a local Notebook with the Notebook as the base modification of a cell as modified cell as in the local Notebook in the local Notebook(that Notebook • Create a copy of the has a line based diff) to Notebook as the re- the base Notebook and mote Notebook and saving saves the result in modify a cell a local Notebook with the • Create another copy modified cell of the Notebook as the local Notebook and modify a cell • Run nbmerge on these Notebooks • Choose to apply the modification of a cell as in the local Notebook to the base Notebook • Save the result

TC18 Test while merging if The system should produce • Create a non-empty choosing to apply the a local Notebook with the Notebook as the base modification of a cell as modified cell as in the re- Notebook in the remote Notebook mote Notebook • Create a copy of the (which has a line based Notebook as the re- diff) to the base Notebook mote Notebook and and saving saves the result modify a cell in a local Notebook with • Create another copy the modified cell of the Notebook as the local Notebook and modify a cell • Run nbmerge on these Notebooks • Choose to apply the modification of a cell as in the remote Notebook to the base Notebook • Save the result

TC19 Test if diffing a valid Note- The system should display • Create a valid Note- book with an invalid Note- an error message book as the before book will indicate an error Notebook • Create an invalid Notebook as the after Notebook • Run nbdiff on these Notebooks

TC20 Test if diffing a valid Note- The system should indicate • Create a valid Note- book with an invalid Note- which Notebook is invalid book as the before book will indicate which Notebook Notebook is invalid • Create an invalid Notebook as the after Notebook • Run nbdiff on these Notebooks

TC21 Test in diffing two head- The system should display • Create a non-empty ers if there are two modi- a word based diff for the Notebook as the be- fied lines shows a word line modified header fore Notebook that based (added, deleted, or has a header unchanged for each word) • create a copy of the Notebook as the af- ter Notebook that has a header and mod- ify few lines in that header • Run nbdiff on these Notebooks NBDiff Documentation, Release 1

16.3.6 Test Entry / Exit Criteria

This section covers the criteria needed the move the test from one state to another during the development of the NBDiff project.

Test Plan

• Test plan entry criteria The entry of the test plan for NBDiff system is based on the uses cases documented in “Use Case Model”. Once each use case is mapped out its respective group of well defined test cases. the system tests will use these test cases and then generate the different types of tests (unit testing, integrity testing) to evaluate the tests described in this document. • Test plan exit criteria The exit of test plan can be achieved by reaching the coverage goal for all our planned tests that’s covered in “Test Summary plan” document that specify all the test cases needed for this project. At the end of the development process, the system need to be tested with a coverage of 100% and the unit test should have more than 50%.

Test Cycle

• Test cycle entry criteria The entry of test cycle of test plan is based in the test cases. When the test cases have been defined for the test cycle we can begin the test cycle. • Test cycle exit criteria The test cycle can be ended once the test coverage goal was achieved for all test cases specified.

16.3.7 Test Deliverables

• Test Plan Document • Test Summary Report

16.3.8 Test Suspension / Resumption Criteria

This test specify the required criteria to authorize testing to move from the suspension state to the resumption state. The test plan should be suspended when the test cases significantly didn’t pass to meet the expected results or when there is not enough planned test cases for the expected coverage. When these issues have been handled in a proper way , they will be resumed. In general term when the testing is out of scope, the test plane should end.

16.3.9 Test Environmental / Staffing / Training Needs

Test Environment

The test environment used for unit testing the modules written in the python language is nose tests. Client side unit tests are done using Blanket.js javascript code coverage library and the Qunit testing framework while using PhantomJS to run the Qunit framework.

16.3. Compatibility Testing 121 NBDiff Documentation, Release 1

Roles

Table 16.1: Testing Roles Roles Description Responsibilities Test Manager Managing the overall testing process Responsibilities include: • Define the test plan • Document the management re- port • Document the interest of the tests • Evaluate effectiveness to test effort

Test Analyst Analyze and define the tests that need Responsibilities include: to be done • Define tests needed • Document the expected result • Get the actual result from the tester • Compare the actual result with the expected one • Document the tests needed • Document the comparison re- sult • Evaluate product quality

Test Designer Specify the test approach and the Responsibilities include: technique used in it • Identify test technique • Identify test approach

Tester Run the specified tests Responsibilities include: • Execute the specified tests • Document the test’s results

Designer Define the test requirements Responsibilities include: • Check the test team needs • Define the test attributes, oper- ations, and association • Document the test require- ments

Implementer Implement the test cases Responsibilities include: • Implement the tests based on the specification he got from the designer

Staffing & Training Needs

This section was necessary to be able to make use of time without any waste nor to store any leftover re- sources.Developers working on the NBDiff project need to meet the following requirements: • All group members spent their summer break training on IPython Notebook, and required testing, so while developing process none of the group members will be interrupted on what they are doing.

122 Chapter 16. Test Plan Document NBDiff Documentation, Release 1

• All members of the staffing team need to know how to use, open and edit Notebooks in the IPython Notebook. • The team members should be split into subgroups based on their expertise and knowledge to get the maximum amount of work done in the minimum time. • The team member who is most knowledgeable will be responsible of helping other members while training time and developing time • the team member assigned the team members responsibilities such as test manager, analyst, designer, etc (spec- ified in section 2.9.2 “Roles”) • Members need to be comfortable with the Python and the JavaScript programming languages as well as HTML5. • It is preferable for all members of the team to have previous experience with testing.

16.4 Regression Testing

This test is to be done when any change done for the functional or nonfunctional requirements during the development process to ensure that these changes do not break or introduce any new faults or bugs. For this project, the requirements and the specification are very well specified from the beginning. therefore, no need to do this type of testing (out of scope).

16.5 Acceptance Testing

For the acceptance testing we were doing a test every week by having a conference call with our stakeholder (Greg Wilson) showing him our product and what have done each week. The performance testing was recorded for the major features that have been implemented during the development process. The major part and the final version of this test was done when completely finish developing the NBDiff software. At that level, we tested the system by showing it to the product owner(Greg Wilson) and some of his team-mates who were involved in giving some feedback for this project in order to verify if the requirements and specifications at the final stages are met as specified or anything need to be added before the final submission of this product. This needs to be done in order to be able to make any changes if there is enough time to accomplish finishing these changes on time. The detailed actual acceptance tests can be found in the Test Summary Report document.

16.6 Alpha Testing

The alpha testing done at the end of the development process at latest stages of developing NBDiff system . This was accomplished by showing the software to users/customers who may use this software later on. The test started by allowing the users to carry out some actual tasks that the actual users might perform most importantly for this project, the diffing operation and the merge operation. In this test black box technique was used. The result of this test was compared to the expected result to find out the existence of any bug that was not appear in the previous tests. This test was done with the same people who did the usability testing.

16.7 Beta Testing

This test suppose to be taken into account after finishing alpha testing and the software features are completed. Due to the lack of time that we had at the end of the development process, we were not able to do this testing and have our software available to the public outside our team. However, we did an alpha testing and a usability testing for the prototype and another usability testing for the actual software but to small size number of people.

16.4. Regression Testing 123 NBDiff Documentation, Release 1

16.8 Performance Testing

These types of tests was performed on the NBDiff software to determine the system’s stability and responsiveness as it is put under a heavy workload.

16.9 Usability Testing

The prototype usability testing was done at early stages of the developing process. Another usability testing for the actual software was done to get the final feedbacks if any changes need to done to have better software for the users to use. (See usabilitytesting/20131118 in the documents repository for more detail)

16.10 Defects/Bugs Management

Defects and bugs found are issued on the team’s github repository. It is marked under issues with a “bug” tag. Once an issue with the tag bug has been created, it is assigned to a team member who is responsible for finding a way to fix the defect/bug. Any defects or bugs found during the development phase of the software will be fixed before the next version release. If defects or bugs are found after the software has been released, the defects or bugs will be fixed in the next release of the software.

16.11 Maintenance Testing

16.11.1 flake8

Python Flake7 lint is a sublime that check Python files against some of the style conventions. It was easily used by running flake8 in the file that needed to checked, it automatically check for any style error. The result is a list of style errors with specifying the line number for each error

16.11.2 JsHint

For Javascript code we used jsHint tool to check if the it complies with coding rules. We used the command line version of this tool. It detects the errors and the potential problems in our code. The system should show in the remote side deleted cells for the ones in the base and added cells for the ones in the remote

124 Chapter 16. Test Plan Document CHAPTER 17

Test Summary Report

Table of Contents • Test Summary Report – List of Figures – Introduction – Test Summary – Test Assessment – Test Tesults * List of Defects Found * List of Defects Fixed * System Testing * User Acceptance Testing * Regression Testing * Performance Testing · Benchmark Results * Usability Testing

17.1 List of Figures

• Figure 1

17.2 Introduction

The purpose of the Test Summary Report is to provide a summary of the results of all testing activities performed in this project. In this document we covered a detail summary of all the tests done in this project, test assessment, and test results for system testing, user acceptance testing, regression testing, performance testing, usability testing.

17.3 Test Summary

Tests performed on the NBDiff include system tests, user acceptance tests, regression testing, performance testing and usability tests. Based on the observed test results, the NBDiff project was modified to fit the testing criteria in order to pass the test.

125 NBDiff Documentation, Release 1

17.4 Test Assessment

Most of the core functionality of the NBDiff software has been tested as the code was written. Regression testing and Performance testing have been performed on the system once its manufacture has been completed. Usability testing of the NBDiff project has been done by the stakeholder four times. After trying to merge and diff IPython notebooks using the NBDiff tool, the stakeholder gives comments about the tool and what needs to be added/improved or removed.

17.5 Test Tesults

17.5.1 List of Defects Found

1. Localhost:5000 does not work with command line nbmerge with 3 parameters (Refers to issue #152) 2. NBDiff-type not removed from notebook metadata on save (Refers to issue #144) 3. If no unmerged notebooks are present, nbmerge crashes (Refers to issue #130) 4. If no cells are equal, NBDiff fails (Refers to issue #128) 5. NBDiff crashes on merge conflict (Refers to issue #121) 6. ImportError appears when running nbmerge (Refers to issue #118) 7. Save click handler broken on Firefox (Refers to issue #116) 8. A deleted cell being moved to the middle column while in merge state is not deleted from notebook (Refers to issue #107) 9. A save button appears on the diff notebook page (Refers to issue #106) 10. Placeholder cells are saved but should be ignored (Refers to issue #101) 11. Images shrink to thumbnail size in Firefox (Refers to issue #154) 12. IPython development version compatibility issues (Refers to issue #126) 13. Server stopping on refresh (Refers to issue #156) 14. During merge, adding a header to the middle column (different formatting) (Refers to issue #163) 15. Compatibility issue between the IPython development version and the NBDiff’s merge (Refers to issue #126) 16. Merging empty notebooks crashes (Refers to issue #196) 17. ‘command.js’ not found on server (case sensitivity) (Refers to issue #186) 18. Support nbdiff –check in file mode (Refers to issue #181) 19. Trying to merge with the same remote and base Notebooks but different local Notebook, remote and base are displayed as empty (Refers to issue #191) 20. Automatic notebook save at random times (Refers to issue #206) 21. Diffing two empty lists crashes (Refers to issue #183)

17.5.2 List of Defects Fixed

1. Localhost:5000 does not work with command line nbmerge with 3 parameters (Refers to issue #152) 2. NBDiff-type not removed from notebook metadata on save (Refers to issue #144)

126 Chapter 17. Test Summary Report NBDiff Documentation, Release 1

3. If no unmerged notebooks are present, nbmerge crashes (Refers to issue #130) 4. If no cells are equal, NBDiff fails (Refers to issue #128) 5. NBDiff crashes on merge conflict (Refers to issue #121) 6. ImportError appears when running nbmerge (Refers to issue #118) 7. Save click handler broken on Firefox (Refers to issue #116) 8. A deleted cell being moved to the middle column while in merge state is not deleted from notebook (Refers to issue #107) 9. A save button appears on the diff notebook page (Refers to issue #106) 10. Placeholder cells are saved but should be ignored (Refers to issue #101) 11. Server stopping on refresh (Refers to issue #156) 12. During merge, adding a header to the middle column (different formatting) (Refers to issue #163) 13. Compatibility issue between the IPython development version and the NBDiff’s merge (Refers to issue #126) 14. Merging empty notebooks crashes (Refers to issue #196) 15. ‘command.js’ not found on server (case sensitivity) (Refers to issue #186) 16. Support nbdiff –check in file mode (Refers to issue #181) 17. Trying to merge with the same remote and base Notebooks but different local Notebook, remote and base are displayed as empty (Refers to issue #191) 18. Automatic notebook save at random times (Refers to issue #206) 19. Diffing two empty lists crashes (Refers to issue #183)

17.5. Test Tesults 127 NBDiff Documentation, Release 1

17.5.3 System Testing

Table 17.1: Test Cases ID Description Sta- tus TC1 Test if diffing two empty Notebooks results in an empty diff without the system terminating Pass unexpectedly TC2 Test if diffing an empty Notebook with a non-empty Notebook shows a result of added cells Pass TC3 Test if diffing a notebook with an added cell does not cause the system to terminate unexpectedly Pass TC4 Test if diffing a notebook with an added cell shows a result of cell added Pass TC5 Test if diffing a notebook with a deleted cell does not cause the system to terminate unexpectedly Pass TC6 Test if diffing a notebook with a deleted cell shows a result of deleted cell Pass TC7 Test if diffing two similar cells shows a result of unchanged cells Pass TC8 Test in diffing two modified cells if diffing two similar lines shows a result of unchanged Pass TC9 Test in diffing two modified cells if diffing two different lines shows a result of deleted line - Pass added line in the before Notebook - the after Notebook respectively TC10 Test in diffing two modified cells while having two lines are modified if diffing two similar words Scoped shows a result of unchanged out TC11 Test in diffing two modified cells while having two modified lines if diffing two different words Scoped shows a result of deleted word - added word to the local side - remote side respectively out TC12 Test while merging if choosing to add a cell from the remote Notebook to the base Notebook and Pass saving saves the result in a local Notebook with the added cell TC13 Test while merging if choosing to add a cell from the local Notebook to the base Notebook and Pass saving saves the result in a local Notebook with the added cell TC14 Test if not choosing any changes doesn’t modify the Notebooks Pass TC15 Test while merging if choosing to apply the deletion of a cell as in the local Notebook to the base Pass Notebook and saving results in a local Notebook without the deleted cell TC16 Test while merging if choosing to apply the deletion of a cell as in the remote Notebook to the Pass base Notebook and saving results in a local Notebook without the deleted cell TC17 Test while merging if choosing to apply the modification of a cell as in the local Notebook(that Pass has a line based diff) to the base Notebook and saving saves the result in a local Notebook with the modified cell TC18 Test while merging if choosing to apply the modification of a cell as in the remote Notebook(that Pass has a line based diff) to the base Notebook and saving saves the result in a local Notebook with the modified cell TC19 Test if diffing a valid notebook with an invalid notebook will indicate an error Pass TC20 Test if diffing a valid notebook with an invalid notebook will indicate which notebook is invalid Pass TC21 Test in diffing two headers if there are two modified lines shows a word line based (added, Pass deleted, or unchanged for each word)

17.5.4 User Acceptance Testing

User acceptance testing has been done during the meetings with the NBDiff’s stakeholder Dr. Greg Wilson. All comments made by the stakeholder were taken into consideration and new requirements to the project were made. The summary of the tests are the following:

128 Chapter 17. Test Summary Report NBDiff Documentation, Release 1

Table 17.2: User Acceptance Test 1 Description Get the stakeholder to test the merge of two notebooks Date February 17th 2014 Tester Dr. Greg Wilson Status Pass Severity of defect N/A Summary of defect N/A Comments Suggestions made by the stakeholder • Remove the save button • Add handling of multiple notebook files

Table 17.3: User Acceptance Test 2 Description Get the stakeholder to try new features of the merge function. Functions include: • Cell drag and drop • Undo/redo cell moving • Line based diffs working

Date February 28th 2014 Tester Dr. Greg Wilson Status Pass Severity of defect No defect has been Summary of defect N/A Comments Suggestions made by the stakeholder • Need way of unmerging a cell • Need some indicator of which side a merged cell originally merged from

17.5. Test Tesults 129 NBDiff Documentation, Release 1

Table 17.4: User Acceptance Test 3 Description Get the stakeholder to try new features of the NBDiff & Nbmerge functions. Functions include: • NBDiff/Nbmerge functions support multiple notebooks • Test notebooks use fancier notebooks (many dif- ferent types of cells to be diffed) • Diffs of header cells • Line based merge UI

Date March 6th 2014 Tester Dr. Greg Wilson & Fernando Perez (creator of the IPython Notebook) Status Pass Severity of defect N/A Summary of defect N/A Comments Suggestions made by the stakeholder • Use Tornado instead of Flask (post-capstone development)

Table 17.5: User Acceptance Test 4 (Final Acceptance Testing) Description Get the stakeholder to try all features of the NBDiff & Nbmerge functions. Functions include: • NBDiff/Nbmerge main functions • Test notebooks use fancier notebooks (many dif- ferent types of cells to be diffed) • Test the UI of the system • dragging and dropping cells

Date March 21th 2014 Tester Dr. Greg Wilson Status Pass Severity of defect N/A Summary of defect N/A Comments Suggestions made by the stakeholder • The stakeholder likes what we ended up having

130 Chapter 17. Test Summary Report NBDiff Documentation, Release 1

17.5.5 Regression Testing

Table 17.6: Regression Test 1 Test ID RT1 Description Test if fixing the placeholder cell being saved issue has any impact on the diffing, merging and save options of the notebook. (Fix related to issue #101) Date March 9th 2014 Tester NBdiff team Status Pass Severity of Medium defect Summary of No resulting defect has been noticed defect Comments N/A

Table 17.7: Regression Test 2 Test ID RT2 Description Test if removal of save button on the diffing notebook page affected the diffing of the notebook. (Fix related to issue #106) Date March 9th 2014 Tester NBdiff team Status Pass Severity of Medium defect Summary of No resulting defect has been noticed defect Comments N/A

Table 17.8: Regression Test 3 Test ID RT3 Descrip- Test if fix the moving deleted cell in merged notebook will delete that cell in resulting notebook tion affected the merge notebook functionality. (Fix related to issue #107) Date March 9th 2014 Tester NBdiff team Status Pass Severity of High defect Summary No resulting defect has been noticed of defect Comments N/A

17.5. Test Tesults 131 NBDiff Documentation, Release 1

Table 17.9: Regression Test 4 Test ID RT3 Descrip- Test if fixing where nbmerge finds no unmerged notebooks and results in a crash of the system had tion no effect on the nbmerge functionality. (Fix related to issue #130) Date March 9th 2014 Tester NBdiff team Status Pass Severity of Medium defect Summary No resulting defect has been noticed. of defect Comments N/A

Table 17.10: Regression Test 5 Test ID RT5 Description Test if fixing the nbmerge opening two browser tabs instead of just one had any effect on the nbmerge functionality. (Fix related to issue #113) Date March 9th 2014 Tester NBdiff team Status Pass Severity of Medium defect Summary of No resulting defect has been noticed defect Comments N/A

Table 17.11: Regression Test 6 Test ID RT6 Description Test if fixing the save click that was broken on Firefox has had an impact on the nbmerge functionality. (Fix related to issue #116) ) Date March 13th 2014 Tester NBdiff team Status Pass Severity of High defect Summary of No resulting defect has been noticed defect Comments N/A

132 Chapter 17. Test Summary Report NBDiff Documentation, Release 1

Table 17.12: Regression Test 7 Test ID RT7 Description Test if fixing the crash of nbdiff on merge conflict affected the nbdiff and nbmerge functionality. (Fix related to issue #121) Date February 26th 2014 Tester NBdiff team Status Pass Severity of High defect Summary of No resulting defect has been noticed defect Comments N/A

Table 17.13: Regression Test 8 Test ID RT8 Description Test if fixing nbdiff fail when no cells are equal affected the nbdiff functionality. (Fix related to issue #128) Date March 9th 2014 Tester NBdiff team Status Pass Severity of High defect Summary of No resulting defect has been noticed defect Comments N/A

Table 17.14: Regression Test 9 Test ID RT9 Description Test if the removal of the nbdiff-type from notebook metadata when notebook is saved affects the nbdiff functionality. (Fix related to issue #144) Date March 18th 2014 Tester NBdiff team Status Pass Severity of Low defect Summary of No resulting defect has been noticed defect Comments N/A

17.5. Test Tesults 133 NBDiff Documentation, Release 1

Table 17.15: Regression Test 10 Test ID RT10 Description Test if fixing different formatting of added header in middle column affects nbdiff’s functionality. (Fix related to issue #163) Date March 10th 2014 Tester NBdiff team Status Pass Severity of Low defect Summary of No resulting defect has been noticed defect Comments N/A

Table 17.16: Regression Test 11 Test ID RT11 Descrip- Test if the renaming of ‘command.js’ due to problems with Windows causing the arrows to tion misbehave while merging notebooks affects nbdiff’s functionality. (Fix related to issue #186) Date March 27th 2014 Tester NBdiff team Status Pass Severity of High defect Summary No resulting defect has been noticed of defect Comments N/A

Table 17.17: Regression Test 12 Test ID RT12 Description Test if fixing the merge of three empty notebooks resulting in a crash affects nbdiff’s functionality. (Fix related to issue #196) Date March 27th 2014 Tester NBdiff team Status Pass Severity of High defect Summary of No resulting defect has been noticed defect Comments N/A

134 Chapter 17. Test Summary Report NBDiff Documentation, Release 1

Table 17.18: Regression Test 13 Test ID RT13 Description Test is fixing the automatic notebook save at random times issue affects nbdiff’s functionality. (Fix related to issue #206) Date March 28th 2014 Tester NBdiff team Status Pass Severity of Medium defect Summary of No resulting defect has been noticed defect Comments N/A

Table 17.19: Regression Test 14 Test ID RT14 Description Test if fixing the multiple tab opening issue when running nbmerge affects nbdiff’s functionality. (Fix related to issue #113) Date March 9th 2014 Tester NBdiff team Status Pass Severity of Low defect Summary of No resulting defect has been noticed defect Comments N/A

Table 17.20: Regression Test 15 Test ID RT15 Description Test if fixing the server stopping on refresh issue affects nbdiff’s functionality. (Fix related to issue #156) Date March 9th 2014 Tester NBdiff team Status Pass Severity of High defect Summary of No resulting defect has been noticed defect Comments N/A

17.5. Test Tesults 135 NBDiff Documentation, Release 1

Table 17.21: Regression Test 16 Test ID RT16 Description Test if fixing the bug where diffing of two empty lists would result in a crash affects nbdiff’s functionality. (Fix related to issue #183) Date March 20th 2014 Tester NBdiff team Status Pass Severity of High defect Summary of No resulting defect has been noticed defect Comments N/A

17.5.6 Performance Testing

The performance testing for the NBDiff was done through the following benchmark test:

Benchmark Results

In order to verify performance requirements (viz., NFR-Eff-1) we performed a benchmark on test data. The following benchmark was run on the final release of NBDiff, using the following steps: 1. A script (included in the code repository) generated two versions of a randomly created .ipynb file. These satisfied the constraints listed in NFR-Eff-1. 2. NBDiff was run on the two versions of the notebooks with the --check argument. This process was repeated 40 times. 3. The first 10 measurements were dropped to decrease noise in measurements. 4. A boxplot was generated showing the distribution of the measurements. Run Times A table with bottom quartile, median, and top quartile is also provided.

17.5.7 Usability Testing

This test was done at the early stages of the development process for the prototype with two users who are familiar with IPython Notebook (see usability testing/2012-11-18/subject1.pdf and usability-testing/2013-11-18/subject2) another usability test was done after successfully finish implemented of the NBDiff software in order to get more feedback and suggestions. This will ensure a user interface for the system that is more user friendly.

136 Chapter 17. Test Summary Report NBDiff Documentation, Release 1

Figure 17.1: Figure 1

17.5. Test Tesults 137 NBDiff Documentation, Release 1

138 Chapter 17. Test Summary Report CHAPTER 18

Comparison of Existing Diff/Merge Tools

18.1 Overview

This is a list that we compiled of existing diffing and merging tools. This document was used in order to understand and study the different implemented user interface patterns used in existing tools.

18.1.1 SwiftCompare

139 NBDiff Documentation, Release 1

http://www.oorjasoftware.com/product_info.html http://swiftcompare.software.informer.com/

140 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1.2 FourierRocks

http://sourceforge.net/projects/fourierrocks/

18.1. Overview 141 NBDiff Documentation, Release 1 http://www.aca.gr/index/hiend/hiendArticles?row=2063

18.1.3 AudioGrabber

http://www.audiograbber.org/

142 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1.4 FolderSynch

Uses winmerge/FilePro for comparisons http://saleensoftware.com/FolderSync

18.1. Overview 143 NBDiff Documentation, Release 1

18.1.5 Audio DiffMaker

144 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

http://www.libinst.com/Audio%20DiffMaker.htm

18.1. Overview 145 NBDiff Documentation, Release 1

18.1.6 Zynamics BinDiff

Changes in yellow new in red

146 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

http://www.zynamics.com/bindiff.html

18.1.7 BinDiff

18.1. Overview 147 NBDiff Documentation, Release 1

http://www.codeproject.com/Articles/509425/BinDiff-A-tool-to-compare-binary-files

18.1.8 Image Compare

148 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1 http://sourceforge.net/projects/imagecomp/

18.1.9 Bolide Audio Comparer

http://www.bolidesoft.com/audiocomparer/

18.1. Overview 149 NBDiff Documentation, Release 1

18.1.10 Bolide Image Comparer

http://www.bolidesoft.com/imagecomparer.html

150 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1.11 Bolide Compare Suite

18.1. Overview 151 NBDiff Documentation, Release 1

152 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

http://www.bolidesoft.com/compare-suite/screenshots.html

18.1.12 WinMerge

18.1. Overview 153 NBDiff Documentation, Release 1

http://winmerge.org/

154 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1.13 QuickDiff

Online diff tool http://www.quickdiff.com/

18.1. Overview 155 NBDiff Documentation, Release 1

18.1.14 PrestoSoft ExamDiff

156 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1. Overview 157 NBDiff Documentation, Release 1

http://www.prestosoft.com/edp_examdiffpro.asp

158 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1.15 KaleidoScope

18.1. Overview 159 NBDiff Documentation, Release 1

160 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1. Overview 161 NBDiff Documentation, Release 1

162 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1. Overview 163 NBDiff Documentation, Release 1

http://www.kaleidoscopeapp.com/

164 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1.16 Docu-Proof Enterprise

18.1. Overview 165 NBDiff Documentation, Release 1

http://www.globalvisioninc.com/products/docuproof.php

166 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1.17 WorkShare Compare

18.1. Overview 167 NBDiff Documentation, Release 1

http://www.workshare.com/products

168 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1.18 SoftInterface Diff Doc

http://www.softinterface.com/MD%5CDocument-Comparison-Software.htm

18.1. Overview 169 NBDiff Documentation, Release 1

18.1.19 Araxis Merge

170 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

Shows only the pixels that are different between the two files.

18.1. Overview 171 NBDiff Documentation, Release 1

172 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

http://www.araxis.com/merge/ ‘ ‘__

18.1.20 BitBQ Changes

18.1. Overview 173 NBDiff Documentation, Release 1

http://bitbq.com/changes/images/TextDiff.png

174 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1.21 Devart Code Compare

18.1. Overview 175 NBDiff Documentation, Release 1

176 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

http://www.devart.com/codecompare/

18.1. Overview 177 NBDiff Documentation, Release 1

18.1.22 Compare++

Able to compare using with programming language.

178 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1. Overview 179 NBDiff Documentation, Release 1

http://cmpp.coodesoft.com/

180 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1.23 Sourcegear DiffMerge

18.1. Overview 181 NBDiff Documentation, Release 1

182 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

http://sourcegear.com/diffmerge/index.html

18.1. Overview 183 NBDiff Documentation, Release 1

18.1.24 Pretty Diff

184 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

http://prettydiff.com/

18.1.25 Kompare

18.1. Overview 185 NBDiff Documentation, Release 1 http://www.caffeinated.me.uk/kompare/

18.1.26 Ultra Compare

186 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1. Overview 187 NBDiff Documentation, Release 1

188 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

http://www.ultraedit.com/products/ultracompare.html/

18.1. Overview 189 NBDiff Documentation, Release 1

18.1.27 Code Difference Comparison Tool

190 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

http://www.tareeinternet.com/scripts/comparison-tool/

18.1. Overview 191 NBDiff Documentation, Release 1

18.1.28 Diffuse

http://diffuse.sourceforge.net/index.html

18.1.29 Compare&Merge

http://www.compareandmerge.com/

192 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1.30 Formula Software

18.1. Overview 193 NBDiff Documentation, Release 1

http://www.formulasoft.com/index.html

194 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1.31 ColorDiffs

Not really a diff tool but a tool to improve the style of Subversion, CVS, Mercurial, etc diff messages. http://code.google.com/p/colorediffs/

18.1. Overview 195 NBDiff Documentation, Release 1

18.1.32 Compare PDF

196 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

Seems either Bolide or Compare PDF used each other’s code. . . http://www.compare-pdf.com/

18.1. Overview 197 NBDiff Documentation, Release 1

18.1.33 DiffPDF

http://www.qtrac.eu/diffpdf.html

18.1.34 Meld

198 Chapter 18. Comparison of Existing Diff/Merge Tools NBDiff Documentation, Release 1

18.1. Overview 199 NBDiff Documentation, Release 1

http://meldmerge.org/

200 Chapter 18. Comparison of Existing Diff/Merge Tools CHAPTER 19

New Requirements for Capstone Projects

This document discusses societal and professional implications for our capstone project. We argue that we have carefully considered the ethics of our project, and that it does not impact society negatively in any meaningful way. To the contrary, it is providing a societal good in providing tools to scientists.

19.1 Impact OF engineering on society

The wellbeing of our project should not affect the wellbeing of many people. As it is a software project, it does not affect the environment we live in, nor does it affect their health any more than normal computer use would. The possible effects of NBDiff on society are at worst second-order: This tool is intended to help change the way scientists work, and the way scientists work affects the society in which they do their research. The only direct potential negative effect people may experience as a result of use of our product is potential errors in scientific computation. If our tool does not work as intended, it may cause scientists to mangle their IPython Notebooks and cause defects in their code, affecting the results of their experiments. Code review may be an effective way to alleviate these concerns, which our tool might aid; however, bugs in our code may affect this process for our users. Errors in scientific research may impact society in negative ways; we will not discuss this issue in this document as it is well-covered by the mainstream press.

19.2 Ethics and equity

No ethical issues have appeared throughout the development of NBDiff.

19.3 Professionalism

The professional implications of our decisions as engineers during our capstone project are few. We do not believe our project will have a great impact on society as a whole; at best it might affect the way a part of the scientific community works, for the better. It will in no meaningful way affect the environment. The most difficult issue to resolve was the legal issue about releasing our project under the open source MIT license. This was resolved with the course coordinator early in the project cycle. For usability testing, we requested and obtained permission from the participants to publish the results of our research and anonymized the data in order to protect their identities.

201 NBDiff Documentation, Release 1

19.4 Economics

1. Can you make money from your design? Our project is open source and thus not a lucrative project to sell. We could, however, provide paid support, but the project is intended primarily for scientific researchers, who benefit greatly from open, free tools for reproducibility. 2. Do you plan to make money from your design if an opportunity arises? We do not intend to make money from our design. 3. For software projects: Would you be interested in releasing your product to the public domain under an open source license? We have released our project under the MIT license (not the public domain – the distinction is important). 4. Can you manufacture? No. 5. Would you change your design to improve the marketability of your product? Not for commercial gains, no.

19.5 Lifelong Learning

In order to complete our project, we had to learn a lot about version control systems, both because our tool interacts with them and uses concepts from them, but because it was integral to our open source development process. We learned how to develop open source software using common open source tools (GitHub, Travis-CI, etc.), which will make us better able to work on open source projects in the future. To integrate our project with the IPython Notebook, we had to learn how that project was architected and programmed. To learn more about that system, we read its online documentation and read relevant parts of its source code. This was an invaluable experience for learning how to deal with other people’s code. In addition to the obvious issues surrounding version control systems, we had to educate ourselves about diffing algorithms. This involved researching papers from the literature and translating their formal algorithms into Python.

202 Chapter 19. New Requirements for Capstone Projects CHAPTER 20

Glossary

Diff Refers to the action where to files (or IPython notebooks in this case) are compared in order to determine how they differ. Head Refers to the currently checked out branch. IPYNB Short for IPython Notebook. IPython Notebook A web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document. Merge conflict A conflict which occurs when two users are editing a same cell in the IPython Notebook. The system will not know which changes to save unless it is states by the user. Merge Combining two Notebook files in such a way that the resulting file has the same organization as the two individual Notebooks Notebook Refers to the content visible in the web application, including inputs and outputs of the computations, explanatory text, mathematics, images, and rich media representations of objects. Origin The remote repository from which the local copy of the repository was cloned. Revision A file and its contents at a given point in the commit history. Staging One of the three states that files in the working directory can be in; unmodified, modified, or staged. Staged files are included in the next commit. Upstream A repository from which the user’s repository has been forked, but to which the user does not have access to themself. This is typically the repository maintained by the official maintainers of a project. Version control Refers to the task of keeping well organized the software system’s many versions and configurations. Versions Variant copies of a file and its contents at a given point in the commit history. Working directory Refers to the directory containing the uncompressed, complete files as they currently are, includ- ing uncommitted modifications. Before-file Refers to one of the files being diffed which is the version of the IPython Notebook file before making any modifications After-file Refers to one of the files being diffed which is the version of the IPython Notebook file after making some modifications

203 NBDiff Documentation, Release 1

204 Chapter 20. Glossary CHAPTER 21

Authors

• Shurouq Abusalah • Tavish Armstrong (leader) • Marwa Malti • Lina Nouh • Boris Pipev • Selena Sachdeva • Richard Tang

205 NBDiff Documentation, Release 1

206 Chapter 21. Authors CHAPTER 22

Indices and tables

• genindex • modindex • search

207