Infrastructure and Performance

An Islandoracon Workshop

Instructors: Gavin Morris & Luke Taylor Instructor: Luke Taylor

DevOps Team Lead discoverygarden inc. 155 Queen St. Suite 101 Charlottetown, PE C1A 4B4 discoverygarden.ca [email protected] Instructors - Gavin Morris ● Team Lead & Dev Ops at Born Digital ○ http://born-digital.com/

● Convener of the Islandora DevOps Interest Group ○ https://github.com/islandora-interest-groups/Islandora- DevOps-Interest-Group

● Led the Islandora DevOps Panel: Building Islandora at the inaugural Islandora Conference (Islandoracon) on Prince Edward Island, Canada

● Presented Automating Islandora Upgrations, Maintenance and Deploys at the Islandora Camp in Hartford, CT Overview

● Intro to the stack ● Security ○ Build Types ○ Split the stack ● ○ Operating Systems Best Practices ○ Packages ○ Services / Software ● The future of the stack ○ CLAW ● Provisioning ○ ISLE ○ Deploy & Config management tools ○ Pipeline ● Q&A

● Performance ● Resources

● Scaling Intro to the Stack - What is Islandora?

Source: http://islandora.mnpals.net/pals/islandora/object/PALSrepository%3A412/datastream/OBJ/download/2016-08_Detailed_Islandora_Introduction.pdf Intro to the Stack - Build Types ALL

All in One 2-3 servers 5-7 servers Intro to the Stack - Build Types: All in One

Recommended Minimum

● 4-6 cores

● 16GB - 32GB RAM

● 100-200GB for OS, Temp files etc

● Volume large enough for repository data

● Additional space could be required for staging content Intro to the Stack - Build Types: 2-3 servers

Web & Database Server (Minimum Requirements) ● 1-2 cores ● 4 - 16 GB RAM (*depends on platform type e.g. staging, dev) ● 150-250GB for OS, Temp files etc ● Additional space could be required for staging content

Fedora Repository Server (Minimum Requirements) ● 2-4 cores ● 8 - 32 GB RAM (*depends on collection size) ● 150-250GB for OS, Temp files etc ● Additional space / volume for repository data e.g. 2 -20 TB Intro to the Stack - Build Types: 2-3 servers Intro to the Stack - Build Types: 5-7 servers

Storage mount (e.g. NFS) Staff/Ingest Front End Public Front End

Read-only fedora DB server Blazegraph Solr Read-write Fedora Intro to the Stack - Split the Stack

● Remote Solr ○ Use Gsearch 2.8+ ○ Edit fgsindex.indexBase in index.properties in Gsearch. ○ Still have to maintain a “dummy” index on the Gsearch server.

● Blazegraph ○ Used to replace Mulgara (Triplestore) for performance and stability gains ○ https://github.com/discoverygarden/trippi-sail ○ https://github.com/Smithsonian/trippi-sparql Intro to the Stack - Operating Systems

Current Stable Recommendations ● Ubuntu 14.04 LTS ● RHEL/CentOS 6.9

Needing more definitive testing ● Ubuntu 16.04 TLS (w/PHP7) ● RHEL/CentOS 7 (challenges with temporary file system)

Community Poll from Melissa Anez (Have your say!) ● Survey https://docs.google.com/forms/d/1E7NmS4944LD3E51A7SK_8MiNoOWCPnjgY8YWOUC7I-o ● Google Group topic https://groups.google.com/forum/?hl=en#!searchin/islandora/php$20testing|sort:relevance/islandora/WftNSPr7Xi0/vlh6eJU bAwAJ Intro to the Stack - Operating Systems packages (basic)

man vim curl unzip automake subversion kernel-headers

gcc zip dkms bzip2 openssh mercurial pkg-config build-essential

git wget htop cmake libtool apt-utils kernel-devel libfreetype6-dev

ntp yasm nasm rsync autoconf zlib1g-dev -headers Development tools

Intro to the Stack - Services / Software

● Apache 2.2 - 2.4 Web server ○ Modules include but are not limited to ■ ssl, rewrite, deflate, headers, expires, xml2enc ■ reverse proxy for multi-systems: ● proxy, proxy_http, proxy_html, proxy_connect

● Databases ○ Mysql 5.5+ ○ Percona ○ Mariadb ○ Postgres ○ Recommend UTF-8 encoding Intro to the Stack - Services / Software

● Tomcat 7.0.52 + ○ Oracle Java JDK or OpenJDK 7/8 ○ SSL & port 8443 ■ Will need to compile own jks/P12/truststore (how to automate?) ○ see Gotcha section re versions above 7.0.72/8.0.39+

● Apache Solr ○ versions 4.2, 4.6.1, 4.10 ○ Don’t use Gsearch Ant generated schema (not complete), missing catch_all entries etc. ○ Always helpful for starting out for schema & solrconfig .xml files https://github.com/discoverygarden/basic-solr-config Intro to the Stack - Services / Software

● PHP 5.3.x+ ○ Drupal 7.5.4 ■ Islandora 7.x / HEAD modules ■ Additional modules e.g. ctools, imagemagick, date, views etc. ○ Composer ■ Drush

● Fedora-Commons 3.8.1 ○ Triplestore (mulgara, Blazegraph)

● Fedoragsearch HEAD / 2.7.1 ○ DGI GSearch Extensions https://github.com/discoverygarden/dgi_gsearch_extensions ○ XSL Transforms for Gsearch https://github.com/discoverygarden/islandora_transforms Intro to the Stack - Services / Software

● Binaries, Derivative generation ○ Imagemagick ○ LAME (audio, mp3 etc) ○ FFMPEG (video) from source 3.3 ○ FITS ○ EXIF ○ XPDF ○ Ghostscript 9.05 (from source) ○ Tesseract (OCR) ○ Adore-djatoka 1.1 ■ On multi-system setups libraries should be additionally installed on web servers ■ Requires use of Oracle JDK 7/8 Provisioning - Deploy & Config management tools

Puppet DSL / Ruby Free Puppet Enterprise https://puppet.com/ (up to 10 nodes)

Chef DSL / Ruby Free Chef Automate / Hosted https://www.chef.io

Ansible DSL / Python Free Tower https://www.ansible.com/ (Red Hat owned) (agentless)

Saltstack DSL / Python Salt Open Salt Enterprise https://saltstack.com/

CFEngine DSL / Community Edition CFEngine Enterprise https://cfengine.com/

https://www.gnu.org/softw Shell Scripts Bash / sh Free are/bash/

Packer DSL / JSON Free Builds Images https://www.packer.io/

Developer #1 Provisioning - Pipeline Production Web & Db server VM Web & DB server Fedora server VM Theming, solution packs, Fedora repo server modules, XSLTs, schemas, config etc. Code Up!

Developer #2 Package & software updates, system Web & Db server VM configuration changes, data Fedora server VM migrations, re-indexing of triplestore etc. Data Down!

Development Developer #3 Web & DB server Continuous Integration w/ Web & Db server VM Testing Suites for Code & Fedora repo server Fedora server VM Data

Example Pipeline Performance

● Using Solr vs SPARQL/iTQL ○ Collection Solution Pack (Display Generation) ○ Islandora OAI (Query Backend) ○ Paged Content Module (Use Solr to derive pages and sequence numbers) ○ Breadcrumbs (Breadcrumb Generation)

● Breadcrumbs - Disable if not required or use Solr

● Enable Drupal caching options (Configuration - Development - Performance)

/ Performance

“(XmlUsersFileModule) null” error

ERROR 2017-03-10 08:56:54.796 [http-8080-21] (XmlUsersFileModule) null ERROR 2017-03-10 08:56:54.805 [http-8080-21] (AuthFilterJAAS) javax.security.auth.login.LoginException: Login Failure: all modules ignored

Source: /usr/local/fedora/server/logs/fedora.log

Reference: https://issues.apache.org/jira/browse/XERCESJ-211 https://jira.duraspace.org/browse/FCREPO-1230

Fix! https://github.com/discoverygarden/fcrepo3-security-jaas Performance

● Help too many multisites! ○ Islandora installations with Drupal multisites can cause unnecessary database connections.

● Multi-site optimization ○ https://github.com/discoverygarden/fcrepo3-security-jaas Performance

● Islandora Jobs ○ https://github.com/discoverygarden/islandora_job ○ Faster Ingests ○ Allows you to have multiple Gearman workers processing derivatives.

● Islandora Gsearcher ○ https://github.com/discoverygarden/islandora_gsearcher ○ Updates Solr index upon ingest completion vs waiting for ActiveMQ Security ● Directory permissions Tomcat/Drupal

● Run services using non-privileged users with no shell.

● Firewalls ○ Fail2ban (https://www.fail2ban.org) ○ Modsec (https://modsecurity.org/) ○ Ports / Rules

● Central logging ○ Syslog ○ Tripwire (https://www.tripwire.com/) (can be used for extended logging in addition to security) ○ ELK (ElasticSearch, Logstash & Kibana) https://logz.io/learn/complete-guide-elk-stack/ Best Practices, Gotchas, Tips

● Gsearch issues Tomcat 7.0.72/8.0.39+ ○ https://github.com/discoverygarden/gsearch.git

● Try the Islandora Deploy on Ubuntu guide https://github.com/islandora-interest-groups/Islandora-DevOps-Interest-Group/blob/master/Deployment %20Guides/Provisioning-Islandora-on-Ubuntu.md

● AWS S3 mounting as a file system ○ https://github.com/danilop/yas3fs ■ Debug mode first! ■ Make sure it re-mounts properly if system is restarted. ■ Gotcha: There may be an object size limit of 60 GB for ingested binaries e.g. video etc. ■ Mount the datastreamStore to S3 and leave objectStore on EBS for better performance ● Caution! Challenges with restoration! ○ Alternative https://bitbucket.org/nikratio/s3ql (same Gotchas apply!) The future of the stack - Islandora 7.2.x - CLAW

https://github.com/Islandora-CLAW/CLAW/blob/master/docs/user-documentation/i ntro-to-claw.md https://github.com/Islandora-CLAW/CLAW/blob/master/docs/mvp/mvp_doc.md The future of the stack - ISLE

Islandora + = Enterprise (ISLE)

https://github.com/Islandora-Collaboration-Group https://islandora-collaboration-group.github.io/ https://islandora.ca/content/islandora-together-meet-islandora-consortial-group Q&A Resources

● Islandora http://islandora.ca

● Islandora sandbox https://sandbox.islandora.ca/

● Vagrant up with Islandora Labs! https://github.com/Islandora-Labs/islandora_vagrant

● Please join the growing global community! http://islandora.ca/membership

● Perhaps jump on a call with one of the Islandora Interest groups? ○ https://github.com/islandora-interest-groups ○ https://github.com/islandora-interest-groups/Islandora-DevOps-Interest-Group

● One can learn so much from the Islandora Community on Google Groups! ○ https://groups.google.com/forum/?hl=en#!forum/islandora-dev ○ https://groups.google.com/forum/?hl=en#!forum/islandora Thank you!