Infrastructure and Performance
An Islandoracon Workshop
Instructors: Gavin Morris & Luke Taylor Instructor: Luke Taylor
DevOps Team Lead discoverygarden inc. 155 Queen St. Suite 101 Charlottetown, PE C1A 4B4 discoverygarden.ca [email protected] Instructors - Gavin Morris ● Team Lead & Dev Ops at Born Digital ○ http://born-digital.com/
● Convener of the Islandora DevOps Interest Group ○ https://github.com/islandora-interest-groups/Islandora- DevOps-Interest-Group
● Led the Islandora DevOps Panel: Building Islandora at the inaugural Islandora Conference (Islandoracon) on Prince Edward Island, Canada
● Presented Automating Islandora Upgrations, Maintenance and Deploys at the Islandora Camp in Hartford, CT Overview
● Intro to the stack ● Security ○ Build Types ○ Split the stack ● ○ Operating Systems Best Practices ○ Packages ○ Services / Software ● The future of the stack ○ CLAW ● Provisioning ○ ISLE ○ Deploy & Config management tools ○ Pipeline ● Q&A
● Performance ● Resources
● Scaling Intro to the Stack - What is Islandora?
Source: http://islandora.mnpals.net/pals/islandora/object/PALSrepository%3A412/datastream/OBJ/download/2016-08_Detailed_Islandora_Introduction.pdf Intro to the Stack - Build Types ALL
All in One 2-3 servers 5-7 servers Intro to the Stack - Build Types: All in One
Recommended Minimum
● 4-6 cores
● 16GB - 32GB RAM
● 100-200GB for OS, Temp files etc
● Volume large enough for repository data
● Additional space could be required for staging content Intro to the Stack - Build Types: 2-3 servers
Web & Database Server (Minimum Requirements) ● 1-2 cores ● 4 - 16 GB RAM (*depends on platform type e.g. staging, dev) ● 150-250GB for OS, Temp files etc ● Additional space could be required for staging content
Fedora Repository Server (Minimum Requirements) ● 2-4 cores ● 8 - 32 GB RAM (*depends on collection size) ● 150-250GB for OS, Temp files etc ● Additional space / volume for repository data e.g. 2 -20 TB Intro to the Stack - Build Types: 2-3 servers Intro to the Stack - Build Types: 5-7 servers
Storage mount (e.g. NFS) Staff/Ingest Front End Public Front End
Read-only fedora DB server Blazegraph Solr Read-write Fedora Intro to the Stack - Split the Stack
● Remote Solr ○ Use Gsearch 2.8+ ○ Edit fgsindex.indexBase in index.properties in Gsearch. ○ Still have to maintain a “dummy” index on the Gsearch server.
● Blazegraph ○ Used to replace Mulgara (Triplestore) for performance and stability gains ○ https://github.com/discoverygarden/trippi-sail ○ https://github.com/Smithsonian/trippi-sparql Intro to the Stack - Operating Systems
Current Stable Recommendations ● Ubuntu 14.04 LTS ● RHEL/CentOS 6.9
Needing more definitive testing ● Ubuntu 16.04 TLS (w/PHP7) ● RHEL/CentOS 7 (challenges with temporary file system)
Community Poll from Melissa Anez (Have your say!) ● Survey https://docs.google.com/forms/d/1E7NmS4944LD3E51A7SK_8MiNoOWCPnjgY8YWOUC7I-o ● Google Group topic https://groups.google.com/forum/?hl=en#!searchin/islandora/php$20testing|sort:relevance/islandora/WftNSPr7Xi0/vlh6eJU bAwAJ Intro to the Stack - Operating Systems packages (basic)
man vim curl perl unzip automake subversion kernel-headers
gcc zip dkms bzip2 openssh mercurial pkg-config build-essential
git wget htop cmake libtool apt-utils kernel-devel libfreetype6-dev
ntp yasm nasm rsync autoconf zlib1g-dev linux-headers Development tools
Intro to the Stack - Services / Software
● Apache 2.2 - 2.4 Web server ○ Modules include but are not limited to ■ ssl, rewrite, deflate, headers, expires, xml2enc ■ reverse proxy for multi-systems: ● proxy, proxy_http, proxy_html, proxy_connect
● Databases ○ Mysql 5.5+ ○ Percona ○ Mariadb ○ Postgres ○ Recommend UTF-8 encoding Intro to the Stack - Services / Software
● Tomcat 7.0.52 + ○ Oracle Java JDK or OpenJDK 7/8 ○ SSL & port 8443 ■ Will need to compile own jks/P12/truststore (how to automate?) ○ see Gotcha section re versions above 7.0.72/8.0.39+
● Apache Solr ○ versions 4.2, 4.6.1, 4.10 ○ Don’t use Gsearch Ant generated schema (not complete), missing catch_all entries etc. ○ Always helpful for starting out for schema & solrconfig .xml files https://github.com/discoverygarden/basic-solr-config Intro to the Stack - Services / Software
● PHP 5.3.x+ ○ Drupal 7.5.4 ■ Islandora 7.x / HEAD modules ■ Additional modules e.g. ctools, imagemagick, date, views etc. ○ Composer ■ Drush
● Fedora-Commons 3.8.1 ○ Triplestore (mulgara, Blazegraph)
● Fedoragsearch HEAD / 2.7.1 ○ DGI GSearch Extensions https://github.com/discoverygarden/dgi_gsearch_extensions ○ XSL Transforms for Gsearch https://github.com/discoverygarden/islandora_transforms Intro to the Stack - Services / Software
● Binaries, Derivative generation ○ Imagemagick ○ LAME (audio, mp3 etc) ○ FFMPEG (video) from source 3.3 ○ FITS ○ EXIF ○ XPDF ○ Ghostscript 9.05 (from source) ○ Tesseract (OCR) ○ Adore-djatoka 1.1 ■ On multi-system setups libraries should be additionally installed on web servers ■ Requires use of Oracle JDK 7/8 Provisioning - Deploy & Config management tools
Puppet DSL / Ruby Free Puppet Enterprise https://puppet.com/ (up to 10 nodes)
Chef DSL / Ruby Free Chef Automate / Hosted https://www.chef.io
Ansible DSL / Python Free Tower https://www.ansible.com/ (Red Hat owned) (agentless)
Saltstack DSL / Python Salt Open Salt Enterprise https://saltstack.com/
CFEngine DSL / C Community Edition CFEngine Enterprise https://cfengine.com/
https://www.gnu.org/softw Shell Scripts Bash / sh Free are/bash/
Packer DSL / JSON Free Builds Images https://www.packer.io/
Developer #1 Provisioning - Pipeline Production Web & Db server VM Web & DB server Fedora server VM Theming, solution packs, Fedora repo server modules, XSLTs, schemas, config etc. Code Up!
Developer #2 Package & software updates, system Web & Db server VM configuration changes, data Fedora server VM migrations, re-indexing of triplestore etc. Data Down!
Development Developer #3 Web & DB server Continuous Integration w/ Web & Db server VM Testing Suites for Code & Fedora repo server Fedora server VM Data
Example Pipeline Performance
● Using Solr vs SPARQL/iTQL ○ Collection Solution Pack (Display Generation) ○ Islandora OAI (Query Backend) ○ Paged Content Module (Use Solr to derive pages and sequence numbers) ○ Breadcrumbs (Breadcrumb Generation)
● Breadcrumbs - Disable if not required or use Solr
● Enable Drupal caching options (Configuration - Development - Performance)
● Memcached / Varnish Performance
“(XmlUsersFileModule) null” error
ERROR 2017-03-10 08:56:54.796 [http-8080-21] (XmlUsersFileModule) null ERROR 2017-03-10 08:56:54.805 [http-8080-21] (AuthFilterJAAS) javax.security.auth.login.LoginException: Login Failure: all modules ignored
Source: /usr/local/fedora/server/logs/fedora.log
Reference: https://issues.apache.org/jira/browse/XERCESJ-211 https://jira.duraspace.org/browse/FCREPO-1230
Fix! https://github.com/discoverygarden/fcrepo3-security-jaas Performance
● Help too many multisites! ○ Islandora installations with Drupal multisites can cause unnecessary database connections.
● Multi-site optimization ○ https://github.com/discoverygarden/fcrepo3-security-jaas Performance
● Islandora Jobs ○ https://github.com/discoverygarden/islandora_job ○ Faster Ingests ○ Allows you to have multiple Gearman workers processing derivatives.
● Islandora Gsearcher ○ https://github.com/discoverygarden/islandora_gsearcher ○ Updates Solr index upon ingest completion vs waiting for ActiveMQ Security ● Directory permissions Tomcat/Drupal
● Run services using non-privileged users with no shell.
● Firewalls ○ Fail2ban (https://www.fail2ban.org) ○ Modsec (https://modsecurity.org/) ○ Ports / Rules
● Central logging ○ Syslog ○ Tripwire (https://www.tripwire.com/) (can be used for extended logging in addition to security) ○ ELK (ElasticSearch, Logstash & Kibana) https://logz.io/learn/complete-guide-elk-stack/ Best Practices, Gotchas, Tips
● Gsearch issues Tomcat 7.0.72/8.0.39+ ○ https://github.com/discoverygarden/gsearch.git
● Try the Islandora Deploy on Ubuntu guide https://github.com/islandora-interest-groups/Islandora-DevOps-Interest-Group/blob/master/Deployment %20Guides/Provisioning-Islandora-on-Ubuntu.md
● AWS S3 mounting as a file system ○ https://github.com/danilop/yas3fs ■ Debug mode first! ■ Make sure it re-mounts properly if system is restarted. ■ Gotcha: There may be an object size limit of 60 GB for ingested binaries e.g. video etc. ■ Mount the datastreamStore to S3 and leave objectStore on EBS for better performance ● Caution! Challenges with restoration! ○ Alternative https://bitbucket.org/nikratio/s3ql (same Gotchas apply!) The future of the stack - Islandora 7.2.x - CLAW
https://github.com/Islandora-CLAW/CLAW/blob/master/docs/user-documentation/i ntro-to-claw.md https://github.com/Islandora-CLAW/CLAW/blob/master/docs/mvp/mvp_doc.md The future of the stack - ISLE
Islandora + = Enterprise (ISLE)
https://github.com/Islandora-Collaboration-Group https://islandora-collaboration-group.github.io/ https://islandora.ca/content/islandora-together-meet-islandora-consortial-group Q&A Resources
● Islandora http://islandora.ca
● Islandora sandbox https://sandbox.islandora.ca/
● Vagrant up with Islandora Labs! https://github.com/Islandora-Labs/islandora_vagrant
● Please join the growing global community! http://islandora.ca/membership
● Perhaps jump on a call with one of the Islandora Interest groups? ○ https://github.com/islandora-interest-groups ○ https://github.com/islandora-interest-groups/Islandora-DevOps-Interest-Group
● One can learn so much from the Islandora Community on Google Groups! ○ https://groups.google.com/forum/?hl=en#!forum/islandora-dev ○ https://groups.google.com/forum/?hl=en#!forum/islandora Thank you!