Software Best Practices

Marco Mambelli – [email protected] Engineering Week 17 February 2020 Software

• Set of instructions and its associated documentations that tells a computer what to do or how to perform a task

• Any manuscript/artifact/product written by you with the scope to be used by machine and humans

2 2/17/20 Marco Mambelli | Software best practices 3 2/17/20 Marco Mambelli | Software best practices Outline

• General applicability, more in detail – and – Documentation • More specific to coding – Requirements – Design • Technology selection • OS Requirements • Software inputs • Software logs, metrics and accounting – Code development – Validation and testing – Releases – Deployment – Bug tracking – Change management – Critical services operation

4 2/17/20 Marco Mambelli | Software best practices “Piled Higher and Deeper” by Jorge Cham, http://www.phdcomics.com

5 2/17/20 Marco Mambelli | Software best practices Version Control System

• Preserves different version of a document • Helps merging different contributions

• Answers important questions on the documents – What changed? – Who changed it? – Why?

6 2/17/20 Marco Mambelli | Software best practices Centralized vs distributed VCS

7 2/17/20 Marco Mambelli | Software best practices Common RCS • SVN (Apache Subversion) – Newer system based on CVS – Includes atomic operations – Cheaper branch operations, slower comparative speed – Does not use peer-to-peer model – Still contains bugs relating to renaming files and directories – Insufficient repository management commands • – Easier to learn than Git (also more similar to CVS/SVN) – Distributed model – No merging of two parents – Less out of the box power, in Python, extension-based rather than scriptability • Git – Very different from CVS/SVN – Cheap branch operations, dramatic increase in operation speed – Full history tree available offline – Distributed, peer-to-peer model – Not optimal for single developers

8 2/17/20 Marco Mambelli | Software best practices Git concepts – Local repository

• Snapshot with GUID (SHA1 hash)

• Repository – init

• Staging – add

– commit (checkout)

• Tag – tag

9 2/17/20 Marco Mambelli | Software best practices Git remotes – Remote repositories

• Remote – clone – remote add – push – fetch (pull) Working copy Server

• Define a main repository!

10 2/17/20 Marco Mambelli | Software best practices Git resources

• Clients for the major platforms – Command line – GUI (Fork, GitHub desktop, GitKraken, Tower,…)

• Online hosting – – GitLab – GitHub

• Hosting at Fermilab – Git integrated with – GitLab instance

11 2/17/20 Marco Mambelli | Software best practices Do I need to put everything in GitHub?

• Word processors have frequently an integrated VC – Track changes in MS Word – Version history in Google Docs – Recording and Displaying Changes in Libre Office

• Not ideal for large binary files – File sharing have versioning systems • Version history in Dropbox • Versions in Box • Version History in OneDrive

– Dedicated versioning apps, e.g. ForeverSave – OS X time machine and other backup solutions

12 2/17/20 Marco Mambelli | Software best practices Use Git for …

• Code or text like documents* • When you need to know – What changed? – Who changed it? – Why?

* Git provides LFS (Large File Storage) https://git-lfs.github.com/

13 2/17/20 Marco Mambelli | Software best practices Centralized workflow

• Single (remote) repo • Single ordered flow

• Conflicts solved one at the time by the developer

14 2/17/20 Marco Mambelli | Software best practices Feature branching workflow

• Single (remote) repo • Leverage branches: • master (releases) • Development • Features • Hotfixes • Easier to enforce https://nvie.com/posts/a-successful-git-branching-model/ policies

15 2/17/20 Marco Mambelli | Software best practices Fork and branch workflow

R1 R2

• Multiple forked repos • Leverage branches • Feature branches in forked repos • Squash R1 • Rebase • Pull requests R2 • Even easier to enforce policies • Restricted access

16 2/17/20 Marco Mambelli | Software best practices Final Git recommendations • Write meaningful commit messages – First line is the summary – Enough detail to understand the changes • Access to the repository based on software purpose – Least Privilege approach

– Consider signing commits https://help.github.com/en/articles/signing-commits • Public software should have a license – LICENSE (text file in the root of the repository) – BSD 3-clause, Apache 2.0 – At Fermilab you can get help in picking and reviewing a license • Contact Aaron Sauers • https://cdcvs.fnal.gov/redmine/projects/scd-cst/wiki/Software_licensing • A DOI, Digital Object Identifier, can facilitate citations – https://about.zenodo.org/

17 2/17/20 Marco Mambelli | Software best practices What should never go in Git?

• PASSWORDS!

• Any credential: SSH keys, certificates, …

• Privale or PII – IP addresses – Names, birth dates, SSN …

18 2/17/20 Marco Mambelli | Software best practices Documentation

• Target audience – Developers (including your future self) – Users (operators/end users)

• Just a README (README.md)

• Common documents in a complex project – Requirement documentation – Design documentation – SQA/Test documentation – Installation and operation documentation – User manual – API specs

19 2/17/20 Marco Mambelli | Software best practices Documentation resources

• For software or to share information

• Fermilab supports – Microsoft SharePoint – WordPress – Custom website on Apache (HTML, CSS, includes) – DocDb

• GitHub Pages – https://pages.github.com/ – Markdown

20 2/17/20 Marco Mambelli | Software best practices Outline

• Version control and Git • Documentation • Requirements • Design – Technology selection – OS Requirements – Software inputs – Software logs, metrics and accounting • Code development • Validation and testing • Releases • Deployment • Bug tracking • Change management • Critical services operation

21 2/17/20 Marco Mambelli | Software best practices Planning and Requirements • Introduction – Purpose – Scope – Rationale – Terminology • Overview • Plan in advance • Requirements – Actors • Write requirements – The Major Inputs and Output – Behavioral requirements (use cases) – Constraints – Actors • Architectural Overview – Roles – The Major Inputs and Output – Functional unit or Component block diagram – Physical unit block diagram – Behavioral requirements (use – Deployment scenario • Component Interfaces cases) • Protocols • Discussion – Decisions and Choices – Constraints – Rationale – Implications resulting from Choices • Example document – Resulting rules – Constraints imposed on other systems – Outline • Testing considerations – Link: https://drive.google.com/file/d/1Lk7ku0GcqZ5IJFO3pTEIM_R0DmG0aHvG/view?usp=sharing

22 2/17/20 Marco Mambelli | Software best practices Technology selection

• Is it there a product providing a similar functionality? • Or a component needed?

• If yes – How well is adopted? – Open Source? – With active community support? – Accepts feature requests/contributions?

23 2/17/20 Marco Mambelli | Software best practices Technology selection: more questions

• Performance and scalability requirements • Current and future platform requirements • Long term costs – Licensing and support fees – In-house support • Industry trends (be aware but independent) • Strength and weaknesses of the developers in the project • Team consensus on the technology

24 2/17/20 Marco Mambelli | Software best practices OS requirements

• These are good practices in general and requirements at Fermilab • Recent and supported versions of – Lab configurations management practices https://cd-docdb.fnal.gov/cgi-bin/sso/RetrieveFile?docid=4264&filename=ConfigMgmt.pdf&version=3 – Listed baseline configurations https://fermipoint.fnal.gov/org/cs/pages/computer-security-documents---general-computing-environment.aspx • Rare and well documented exceptions – ServiceNow request

https://cd-docdb.fnal.gov/cgi-bin/sso/RetrieveFile?docid=4141&filename=Variance-Request-process-rc.docx&version=2

https://fermi.service-now.com/navpage.do

25 2/17/20 Marco Mambelli | Software best practices Software inputs (configuration)

• Avoid hardcoded inputs! • Evaluate all inputs – How frequently does it change (install, all invocations, …)? – Who is providing it (expert, trusted, person…)? • Possible input mechanisms – Configuration files – Command line (options and arguments) – Environment variables – Interactive inputs (GUI, prompt, …)

26 2/17/20 Marco Mambelli | Software best practices Software logs and metrics

Monitoring, accounting, troubleshooting and debugging

• Have a consistent structure • Readable for humans • Structured for machine parsing • Configurable severity levels • Distinguish between debug/info/warning/error/fatal messages • Use an existing (, python logging, …) • Well structured tags • Independent control for logging of different components

27 2/17/20 Marco Mambelli | Software best practices Log levels TRACE • finer-grained informational events than the DEBUG • fine-grained informational events that are most useful to debug an application • informational messages that highlight the progress of the application at coarse-grained level • potentially harmful situations • error events that might still allow the application to continue running • very severe error events that will presumably lead the application to abort. 28 2/17/20 Marco Mambelli | Software best practices Software development (coding)

• Provide clear coding guidelines – PEP8, Google Python style guide, GNU coding standards • Enforce code documentation and standard – Javadoc, Google Python docstrings • Enforce code validation (linting) – lint, pylint, pycodestyle, shellcheck, jslint – Integrate in VCS workflow • Use IDE – vi+extensions, PyCharm, , Brackets • Enforce reviews – Pair programming, peer reviews – Integrate in VCS workflow

29 2/17/20 Marco Mambelli | Software best practices Software validation and testing The art of Software testing G.J.Myers • Software should be tested! • Requirement documentation is crucial • Some classifications – Approach: static/dynamic/passive, exploratory, box – Level: unit, integration, system, operation https://en.wikipedia.org/wiki/Software_testing • Consider Test Driven Development (TDD) • Resources – Unit test libraries in most languages – Continuous Integration system (CI) provided by Fermilab – CI in GitLab and integrations in GitHub https://cdcvs.fnal.gov/redmine/projects/ci

30 2/17/20 Marco Mambelli | Software best practices Software releases

• Document the release process • A release should be tagged • Release notes should be easily accessible • Rare emergency patches in a well known location • Have a compatibility statement • Documented announcement process • Identify release managers (least privilege)

31 2/17/20 Marco Mambelli | Software best practices Software deployment

• Documented deployment procedure • Choose deployment models – Cloning the repo (branch/tag) – Archive (TAR, ZIP, …) – RPM or APT package – Microservice (aka Container) – PIP package – JAR file – SPAC • Ask questions to guide the selection – System/user install? – Relocatable? – One or more installations per host?

32 2/17/20 Marco Mambelli | Software best practices Software deployment considerations

• Credentials are not part of the deployment – Retrieved separately – Stored securely • Pay special care to configuration – Separate your files vs admin modified files – Secure it • Test each release – Development testbed – Integration testbed (ITB) • Least privilege/least exposure – Avoid privileged operations (sudo, admin) if possible – Deploy within a firewall

33 2/17/20 Marco Mambelli | Software best practices Bug tracking

• Essential to keep track of defects • Use it consistently • Some examples – – Bugzilla

– Redmine (at Fermilab) https://cdcvs.fnal.gov/redmine/ – GitHub https://help.github.com/en/github/managing-your-work-on-github/about-issues – GitLab • Used by the whole team • Consistent feature/bug life cycle – new, in progress, – feedback, resolved, …

34 2/17/20 Marco Mambelli | Software best practices Databases

• Some pointers about design (not covered here) – Books https://www.red-gate.com/simple-talk/sql/database-administration/ten-common-database-design-mistakes/ https://open.umn.edu/opentextbooks/textbooks/database-design-2nd-edition – Common mistakes https://docs.microsoft.com/en-us/sql/sql-server/?redirectedfrom=MSDN&view=sql-server-ver15

• Role based access (admin, users – Least Privilege) • Periodic backups • Perform a restore test! • Development, integration and production (w/ same environment) • Review from an expert DBA (ER diagram, queries, …)

35 2/17/20 Marco Mambelli | Software best practices More Databases

• Use database design tools – DbSchema https://dbschema.com/

• Use database encapsulation – Isolates data-related code Application – Clear API for other project developers – Protects database performance from rogue clients DB Encapsulation Layer

DB

36 2/17/20 Marco Mambelli | Software best practices Critical services operations

Additional recommendations for Critical Services https://en.wikipedia.org/wiki/Critical_system • Identify the source of all actions (how, who) • Minimize access permissions (Least privilege) • Controlled access – SSO (Fermilab or other organizations) – X509 certificates – OAuth, tokens – User/password (stored encrypted, no sharing passwords!) • Traceable modifications of metrics • Record retention policy for access logs – safe storage and backup

37 2/17/20 Marco Mambelli | Software best practices Change management

• Should be considered for the build/release process in critical systems • Goals – Minimize the impact of Change related incidents and problems – Reduce the number of backed-out and failed changes • The “Fermilab Change Management” document linked below describes the process as adopted by the Computing Sector

https://en.wikipedia.org/wiki/Critical_system http://cd-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=3529&filename= Fermilab%20Change%20Management%20Policy.docx&version=3 https://en.wikipedia.org/wiki/Change_management_(ITSM)

38 2/17/20 Marco Mambelli | Software best practices We have seen recommendation on …

• Version control and Git • Documentation • Requirements • Design – Technology selection – OS Requirements – Software inputs – Software logs, metrics and accounting • Code development • Validation and testing • Releases • Deployment • Bug tracking • Change management • Critical services operation

39 2/17/20 Marco Mambelli | Software best practices General ideas

• Planning • Simplicity • Testing • Coherence • Reviews

[email protected]

40 2/17/20 Marco Mambelli | Software best practices Thank you

• Sources and references – Software Development and Deployment Best Practices • Multiple authors, Fermilab SCD https://docs.google.com/document/d/1c9ofaj9dBFFjXfqsMlV- _HogL8vONJUOSJBojnz7lHI/edit?usp=sharing – VCS documents and tutorials • https://help.github.com/ • https://www.atlassian.com/git/tutorials/ • https://git-scm.com/ • https://swcarpentry.github.io/git-novice/

41 2/17/20 Marco Mambelli | Software best practices Hands on

• Git hands on – https://docs.google.com/document/d/1xsqjx7l5aq6lZy2w4brnnR KjvPgE9Z8XV4E_Z1ElTbk/edit?usp=sharing – https://tinyurl.com/s3e6zbu

42 2/17/20 Marco Mambelli | Software best practices