Software Best Practices
Marco Mambelli – [email protected] Engineering Week 17 February 2020 Software
• Set of instructions and its associated documentations that tells a computer what to do or how to perform a task
• Any manuscript/artifact/product written by you with the scope to be used by machine and humans
2 2/17/20 Marco Mambelli | Software best practices 3 2/17/20 Marco Mambelli | Software best practices Outline
• General applicability, more in detail – Version control and Git – Documentation • More specific to coding – Requirements – Design • Technology selection • OS Requirements • Software inputs • Software logs, metrics and accounting – Code development – Validation and testing – Releases – Deployment – Bug tracking – Change management – Critical services operation
4 2/17/20 Marco Mambelli | Software best practices “Piled Higher and Deeper” by Jorge Cham, http://www.phdcomics.com
5 2/17/20 Marco Mambelli | Software best practices Version Control System
• Preserves different version of a document • Helps merging different contributions
• Answers important questions on the documents – What changed? – Who changed it? – Why?
6 2/17/20 Marco Mambelli | Software best practices Centralized vs distributed VCS
7 2/17/20 Marco Mambelli | Software best practices Common RCS • SVN (Apache Subversion) – Newer system based on CVS – Includes atomic operations – Cheaper branch operations, slower comparative speed – Does not use peer-to-peer model – Still contains bugs relating to renaming files and directories – Insufficient repository management commands • Mercurial – Easier to learn than Git (also more similar to CVS/SVN) – Distributed model – No merging of two parents – Less out of the box power, in Python, extension-based rather than scriptability • Git – Very different from CVS/SVN – Cheap branch operations, dramatic increase in operation speed – Full history tree available offline – Distributed, peer-to-peer model – Not optimal for single developers
8 2/17/20 Marco Mambelli | Software best practices Git concepts – Local repository
• Snapshot with GUID (SHA1 hash)
• Repository – init
• Staging – add
• Commit – commit (checkout)
• Tag – tag
9 2/17/20 Marco Mambelli | Software best practices Git remotes – Remote repositories
• Remote – clone – remote add – push – fetch (pull) Working copy Server
• Define a main repository!
10 2/17/20 Marco Mambelli | Software best practices Git resources
• Clients for the major platforms – Command line – GUI (Fork, GitHub desktop, GitKraken, Tower,…)
• Online hosting – Bitbucket – GitLab – GitHub
• Hosting at Fermilab – Git integrated with Redmine – GitLab instance
11 2/17/20 Marco Mambelli | Software best practices Do I need to put everything in GitHub?
• Word processors have frequently an integrated VC – Track changes in MS Word – Version history in Google Docs – Recording and Displaying Changes in Libre Office
• Not ideal for large binary files – File sharing have versioning systems • Version history in Dropbox • Versions in Box • Version History in OneDrive
– Dedicated versioning apps, e.g. ForeverSave – OS X time machine and other backup solutions
12 2/17/20 Marco Mambelli | Software best practices Use Git for …
• Code or text like documents* • When you need to know – What changed? – Who changed it? – Why?
* Git provides LFS (Large File Storage) https://git-lfs.github.com/
13 2/17/20 Marco Mambelli | Software best practices Centralized workflow
• Single (remote) repo • Single ordered flow
• Conflicts solved one at the time by the developer
14 2/17/20 Marco Mambelli | Software best practices Feature branching workflow
• Single (remote) repo • Leverage branches: • master (releases) • Development • Features • Hotfixes • Easier to enforce https://nvie.com/posts/a-successful-git-branching-model/ policies
15 2/17/20 Marco Mambelli | Software best practices Fork and branch workflow
R1 R2
• Multiple forked repos • Leverage branches • Feature branches in forked repos • Squash R1 • Rebase • Pull requests R2 • Even easier to enforce policies • Restricted access
16 2/17/20 Marco Mambelli | Software best practices Final Git recommendations • Write meaningful commit messages – First line is the summary – Enough detail to understand the changes • Access to the repository based on software purpose – Least Privilege approach
– Consider signing commits https://help.github.com/en/articles/signing-commits • Public software should have a license – LICENSE (text file in the root of the repository) – BSD 3-clause, Apache 2.0 – At Fermilab you can get help in picking and reviewing a license • Contact Aaron Sauers • https://cdcvs.fnal.gov/redmine/projects/scd-cst/wiki/Software_licensing • A DOI, Digital Object Identifier, can facilitate citations – https://about.zenodo.org/
17 2/17/20 Marco Mambelli | Software best practices What should never go in Git?
• PASSWORDS!
• Any credential: SSH keys, certificates, …
• Privale or PII – IP addresses – Names, birth dates, SSN …
18 2/17/20 Marco Mambelli | Software best practices Documentation
• Target audience – Developers (including your future self) – Users (operators/end users)
• Just a README (README.md)
• Common documents in a complex project – Requirement documentation – Design documentation – SQA/Test documentation – Installation and operation documentation – User manual – API specs
19 2/17/20 Marco Mambelli | Software best practices Documentation resources
• For software or to share information
• Fermilab supports – Microsoft SharePoint – WordPress – Custom website on Apache (HTML, CSS, includes) – DocDb
• GitHub Pages – https://pages.github.com/ – Markdown
20 2/17/20 Marco Mambelli | Software best practices Outline
• Version control and Git • Documentation • Requirements • Design – Technology selection – OS Requirements – Software inputs – Software logs, metrics and accounting • Code development • Validation and testing • Releases • Deployment • Bug tracking • Change management • Critical services operation
21 2/17/20 Marco Mambelli | Software best practices Planning and Requirements • Introduction – Purpose – Scope – Rationale – Terminology • Overview • Plan in advance • Requirements – Actors • Write requirements – The Major Inputs and Output – Behavioral requirements (use cases) – Constraints – Actors • Architectural Overview – Roles – The Major Inputs and Output – Functional unit or Component block diagram – Physical unit block diagram – Behavioral requirements (use – Deployment scenario • Component Interfaces cases) • Protocols • Discussion – Decisions and Choices – Constraints – Rationale – Implications resulting from Choices • Example document – Resulting rules – Constraints imposed on other systems – Outline • Testing considerations – Link: https://drive.google.com/file/d/1Lk7ku0GcqZ5IJFO3pTEIM_R0DmG0aHvG/view?usp=sharing
22 2/17/20 Marco Mambelli | Software best practices Technology selection
• Is it there a product providing a similar functionality? • Or a component needed?
• If yes – How well is adopted? – Open Source? – With active community support? – Accepts feature requests/contributions?
23 2/17/20 Marco Mambelli | Software best practices Technology selection: more questions
• Performance and scalability requirements • Current and future platform requirements • Long term costs – Licensing and support fees – In-house support • Industry trends (be aware but independent) • Strength and weaknesses of the developers in the project • Team consensus on the technology
24 2/17/20 Marco Mambelli | Software best practices OS requirements
• These are good practices in general and requirements at Fermilab • Recent and supported versions of Operating System – Lab configurations management practices https://cd-docdb.fnal.gov/cgi-bin/sso/RetrieveFile?docid=4264&filename=ConfigMgmt.pdf&version=3 – Listed baseline configurations https://fermipoint.fnal.gov/org/cs/pages/computer-security-documents---general-computing-environment.aspx • Rare and well documented exceptions – ServiceNow request
https://cd-docdb.fnal.gov/cgi-bin/sso/RetrieveFile?docid=4141&filename=Variance-Request-process-rc.docx&version=2
https://fermi.service-now.com/navpage.do
25 2/17/20 Marco Mambelli | Software best practices Software inputs (configuration)
• Avoid hardcoded inputs! • Evaluate all inputs – How frequently does it change (install, all invocations, …)? – Who is providing it (expert, trusted, person…)? • Possible input mechanisms – Configuration files – Command line (options and arguments) – Environment variables – Interactive inputs (GUI, prompt, …)
26 2/17/20 Marco Mambelli | Software best practices Software logs and metrics
Monitoring, accounting, troubleshooting and debugging
• Have a consistent structure • Readable for humans • Structured for machine parsing • Configurable severity levels • Distinguish between debug/info/warning/error/fatal messages • Use an existing library (log4j, python logging, …) • Well structured tags • Independent control for logging of different components
27 2/17/20 Marco Mambelli | Software best practices Log levels TRACE • finer-grained informational events than the DEBUG • fine-grained informational events that are most useful to debug an application • informational messages that highlight the progress of the application at coarse-grained level • potentially harmful situations • error events that might still allow the application to continue running • very severe error events that will presumably lead the application to abort. 28 2/17/20 Marco Mambelli | Software best practices Software development (coding)
• Provide clear coding guidelines – PEP8, Google Python style guide, GNU C coding standards • Enforce code documentation and standard – Javadoc, Google Python docstrings • Enforce code validation (linting) – lint, pylint, pycodestyle, shellcheck, jslint – Integrate in VCS workflow • Use IDE – vi+extensions, PyCharm, Visual Studio Code, Brackets • Enforce reviews – Pair programming, peer reviews – Integrate in VCS workflow
29 2/17/20 Marco Mambelli | Software best practices Software validation and testing The art of Software testing G.J.Myers • Software should be tested! • Requirement documentation is crucial • Some classifications – Approach: static/dynamic/passive, exploratory, box – Level: unit, integration, system, operation https://en.wikipedia.org/wiki/Software_testing • Consider Test Driven Development (TDD) • Resources – Unit test libraries in most languages – Continuous Integration system (CI) provided by Fermilab – CI in GitLab and integrations in GitHub https://cdcvs.fnal.gov/redmine/projects/ci
30 2/17/20 Marco Mambelli | Software best practices Software releases
• Document the release process • A release should be tagged • Release notes should be easily accessible • Rare emergency patches in a well known location • Have a compatibility statement • Documented announcement process • Identify release managers (least privilege)
31 2/17/20 Marco Mambelli | Software best practices Software deployment
• Documented deployment procedure • Choose deployment models – Cloning the repo (branch/tag) – Archive (TAR, ZIP, …) – RPM or APT package – Microservice (aka Container) – PIP package – JAR file – SPAC • Ask questions to guide the selection – System/user install? – Relocatable? – One or more installations per host?
32 2/17/20 Marco Mambelli | Software best practices Software deployment considerations
• Credentials are not part of the deployment – Retrieved separately – Stored securely • Pay special care to configuration – Separate your files vs admin modified files – Secure it • Test each release – Development testbed – Integration testbed (ITB) • Least privilege/least exposure – Avoid privileged operations (sudo, admin) if possible – Deploy within a firewall
33 2/17/20 Marco Mambelli | Software best practices Bug tracking
• Essential to keep track of defects • Use it consistently • Some examples – Jira – Bugzilla
– Redmine (at Fermilab) https://cdcvs.fnal.gov/redmine/ – GitHub https://help.github.com/en/github/managing-your-work-on-github/about-issues – GitLab • Used by the whole team • Consistent feature/bug life cycle – new, in progress, – feedback, resolved, …
34 2/17/20 Marco Mambelli | Software best practices Databases
• Some pointers about design (not covered here) – Books https://www.red-gate.com/simple-talk/sql/database-administration/ten-common-database-design-mistakes/ https://open.umn.edu/opentextbooks/textbooks/database-design-2nd-edition – Common mistakes https://docs.microsoft.com/en-us/sql/sql-server/?redirectedfrom=MSDN&view=sql-server-ver15
• Role based access (admin, users – Least Privilege) • Periodic backups • Perform a restore test! • Development, integration and production (w/ same environment) • Review from an expert DBA (ER diagram, queries, …)
35 2/17/20 Marco Mambelli | Software best practices More Databases
• Use database design tools – DbSchema https://dbschema.com/
• Use database encapsulation – Isolates data-related code Application – Clear API for other project developers – Protects database performance from rogue clients DB Encapsulation Layer
DB
36 2/17/20 Marco Mambelli | Software best practices Critical services operations
Additional recommendations for Critical Services https://en.wikipedia.org/wiki/Critical_system • Identify the source of all actions (how, who) • Minimize access permissions (Least privilege) • Controlled access – SSO (Fermilab or other organizations) – X509 certificates – OAuth, tokens – User/password (stored encrypted, no sharing passwords!) • Traceable modifications of metrics • Record retention policy for access logs – safe storage and backup
37 2/17/20 Marco Mambelli | Software best practices Change management
• Should be considered for the build/release process in critical systems • Goals – Minimize the impact of Change related incidents and problems – Reduce the number of backed-out and failed changes • The “Fermilab Change Management” document linked below describes the process as adopted by the Computing Sector
https://en.wikipedia.org/wiki/Critical_system http://cd-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=3529&filename= Fermilab%20Change%20Management%20Policy.docx&version=3 https://en.wikipedia.org/wiki/Change_management_(ITSM)
38 2/17/20 Marco Mambelli | Software best practices We have seen recommendation on …
• Version control and Git • Documentation • Requirements • Design – Technology selection – OS Requirements – Software inputs – Software logs, metrics and accounting • Code development • Validation and testing • Releases • Deployment • Bug tracking • Change management • Critical services operation
39 2/17/20 Marco Mambelli | Software best practices General ideas
• Planning • Simplicity • Testing • Coherence • Reviews
40 2/17/20 Marco Mambelli | Software best practices Thank you
• Sources and references – Software Development and Deployment Best Practices • Multiple authors, Fermilab SCD https://docs.google.com/document/d/1c9ofaj9dBFFjXfqsMlV- _HogL8vONJUOSJBojnz7lHI/edit?usp=sharing – VCS documents and tutorials • https://help.github.com/ • https://www.atlassian.com/git/tutorials/ • https://git-scm.com/ • https://swcarpentry.github.io/git-novice/
41 2/17/20 Marco Mambelli | Software best practices Hands on
• Git hands on – https://docs.google.com/document/d/1xsqjx7l5aq6lZy2w4brnnR KjvPgE9Z8XV4E_Z1ElTbk/edit?usp=sharing – https://tinyurl.com/s3e6zbu
42 2/17/20 Marco Mambelli | Software best practices