Infrastructure and Performance
Total Page:16
File Type:pdf, Size:1020Kb
Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor Instructor: Luke Taylor DevOps Team Lead discoverygarden inc. 155 Queen St. Suite 101 Charlottetown, PE C1A 4B4 discoverygarden.ca [email protected] Instructors - Gavin Morris ● Team Lead & Dev Ops at Born Digital ○ http://born-digital.com/ ● Convener of the Islandora DevOps Interest Group ○ https://github.com/islandora-interest-groups/Islandora- DevOps-Interest-Group ● Led the Islandora DevOps Panel: Building Islandora at the inaugural Islandora Conference (Islandoracon) on Prince Edward Island, Canada ● Presented Automating Islandora Upgrations, Maintenance and Deploys at the Islandora Camp in Hartford, CT Overview ● Intro to the stack ● Security ○ Build Types ○ Split the stack ● ○ Operating Systems Best Practices ○ Packages ○ Services / Software ● The future of the stack ○ CLAW ● Provisioning ○ ISLE ○ Deploy & Config management tools ○ Pipeline ● Q&A ● Performance ● Resources ● Scaling Intro to the Stack - What is Islandora? Source: http://islandora.mnpals.net/pals/islandora/object/PALSrepository%3A412/datastream/OBJ/download/2016-08_Detailed_Islandora_Introduction.pdf Intro to the Stack - Build Types ALL All in One 2-3 servers 5-7 servers Intro to the Stack - Build Types: All in One Recommended Minimum ● 4-6 cores ● 16GB - 32GB RAM ● 100-200GB for OS, Temp files etc ● Volume large enough for repository data ● Additional space could be required for staging content Intro to the Stack - Build Types: 2-3 servers Web & Database Server (Minimum Requirements) ● 1-2 cores ● 4 - 16 GB RAM (*depends on platform type e.g. staging, dev) ● 150-250GB for OS, Temp files etc ● Additional space could be required for staging content Fedora Repository Server (Minimum Requirements) ● 2-4 cores ● 8 - 32 GB RAM (*depends on collection size) ● 150-250GB for OS, Temp files etc ● Additional space / volume for repository data e.g. 2 -20 TB Intro to the Stack - Build Types: 2-3 servers Intro to the Stack - Build Types: 5-7 servers Storage mount (e.g. NFS) Staff/Ingest Front End Public Front End Read-only fedora DB server Blazegraph Solr Read-write Fedora Intro to the Stack - Split the Stack ● Remote Solr ○ Use Gsearch 2.8+ ○ Edit fgsindex.indexBase in index.properties in Gsearch. ○ Still have to maintain a “dummy” index on the Gsearch server. ● Blazegraph ○ Used to replace Mulgara (Triplestore) for performance and stability gains ○ https://github.com/discoverygarden/trippi-sail ○ https://github.com/Smithsonian/trippi-sparql Intro to the Stack - Operating Systems Current Stable Recommendations ● Ubuntu 14.04 LTS ● RHEL/CentOS 6.9 Needing more definitive testing ● Ubuntu 16.04 TLS (w/PHP7) ● RHEL/CentOS 7 (challenges with temporary file system) Community Poll from Melissa Anez (Have your say!) ● Survey https://docs.google.com/forms/d/1E7NmS4944LD3E51A7SK_8MiNoOWCPnjgY8YWOUC7I-o ● Google Group topic https://groups.google.com/forum/?hl=en#!searchin/islandora/php$20testing|sort:relevance/islandora/WftNSPr7Xi0/vlh6eJU bAwAJ Intro to the Stack - Operating Systems packages (basic) man vim curl perl unzip automake subversion kernel-headers gcc zip dkms bzip2 openssh mercurial pkg-config build-essential git wget htop cmake libtool apt-utils kernel-devel libfreetype6-dev ntp yasm nasm rsync autoconf zlib1g-dev linux-headers Development tools Intro to the Stack - Services / Software ● Apache 2.2 - 2.4 Web server ○ Modules include but are not limited to ■ ssl, rewrite, deflate, headers, expires, xml2enc ■ reverse proxy for multi-systems: ● proxy, proxy_http, proxy_html, proxy_connect ● Databases ○ Mysql 5.5+ ○ Percona ○ Mariadb ○ Postgres ○ Recommend UTF-8 encoding Intro to the Stack - Services / Software ● Tomcat 7.0.52 + ○ Oracle Java JDK or OpenJDK 7/8 ○ SSL & port 8443 ■ Will need to compile own jks/P12/truststore (how to automate?) ○ see Gotcha section re versions above 7.0.72/8.0.39+ ● Apache Solr ○ versions 4.2, 4.6.1, 4.10 ○ Don’t use Gsearch Ant generated schema (not complete), missing catch_all entries etc. ○ Always helpful for starting out for schema & solrconfig .xml files https://github.com/discoverygarden/basic-solr-config Intro to the Stack - Services / Software ● PHP 5.3.x+ ○ Drupal 7.5.4 ■ Islandora 7.x / HEAD modules ■ Additional modules e.g. ctools, imagemagick, date, views etc. ○ Composer ■ Drush ● Fedora-Commons 3.8.1 ○ Triplestore (mulgara, Blazegraph) ● Fedoragsearch HEAD / 2.7.1 ○ DGI GSearch Extensions https://github.com/discoverygarden/dgi_gsearch_extensions ○ XSL Transforms for Gsearch https://github.com/discoverygarden/islandora_transforms Intro to the Stack - Services / Software ● Binaries, Derivative generation ○ Imagemagick ○ LAME (audio, mp3 etc) ○ FFMPEG (video) from source 3.3 ○ FITS ○ EXIF ○ XPDF ○ Ghostscript 9.05 (from source) ○ Tesseract (OCR) ○ Adore-djatoka 1.1 ■ On multi-system setups libraries should be additionally installed on web servers ■ Requires use of Oracle JDK 7/8 Provisioning - Deploy & Config management tools Puppet DSL / Ruby Free Puppet Enterprise https://puppet.com/ (up to 10 nodes) Chef DSL / Ruby Free Chef Automate / Hosted https://www.chef.io Ansible DSL / Python Free Tower https://www.ansible.com/ (Red Hat owned) (agentless) Saltstack DSL / Python Salt Open Salt Enterprise https://saltstack.com/ CFEngine DSL / C Community Edition CFEngine Enterprise https://cfengine.com/ https://www.gnu.org/softw Shell Scripts Bash / sh Free are/bash/ Packer DSL / JSON Free Builds Images https://www.packer.io/ Developer #1 Provisioning - Pipeline Production Web & Db server VM Web & DB server Fedora server VM Theming, solution packs, Fedora repo server modules, XSLTs, schemas, config etc. Code Up! Developer #2 Package & software updates, system Web & Db server VM configuration changes, data Fedora server VM migrations, re-indexing of triplestore etc. Data Down! Development Developer #3 Web & DB server Continuous Integration w/ Web & Db server VM Testing Suites for Code & Fedora repo server Fedora server VM Data Example Pipeline Performance ● Using Solr vs SPARQL/iTQL ○ Collection Solution Pack (Display Generation) ○ Islandora OAI (Query Backend) ○ Paged Content Module (Use Solr to derive pages and sequence numbers) ○ Breadcrumbs (Breadcrumb Generation) ● Breadcrumbs - Disable if not required or use Solr ● Enable Drupal caching options (Configuration - Development - Performance) ● Memcached / Varnish Performance “(XmlUsersFileModule) null” error ERROR 2017-03-10 08:56:54.796 [http-8080-21] (XmlUsersFileModule) null ERROR 2017-03-10 08:56:54.805 [http-8080-21] (AuthFilterJAAS) javax.security.auth.login.LoginException: Login Failure: all modules ignored Source: /usr/local/fedora/server/logs/fedora.log Reference: https://issues.apache.org/jira/browse/XERCESJ-211 https://jira.duraspace.org/browse/FCREPO-1230 Fix! https://github.com/discoverygarden/fcrepo3-security-jaas Performance ● Help too many multisites! ○ Islandora installations with Drupal multisites can cause unnecessary database connections. ● Multi-site optimization ○ https://github.com/discoverygarden/fcrepo3-security-jaas Performance ● Islandora Jobs ○ https://github.com/discoverygarden/islandora_job ○ Faster Ingests ○ Allows you to have multiple Gearman workers processing derivatives. ● Islandora Gsearcher ○ https://github.com/discoverygarden/islandora_gsearcher ○ Updates Solr index upon ingest completion vs waiting for ActiveMQ Security ● Directory permissions Tomcat/Drupal ● Run services using non-privileged users with no shell. ● Firewalls ○ Fail2ban (https://www.fail2ban.org) ○ Modsec (https://modsecurity.org/) ○ Ports / Rules ● Central logging ○ Syslog ○ Tripwire (https://www.tripwire.com/) (can be used for extended logging in addition to security) ○ ELK (ElasticSearch, Logstash & Kibana) https://logz.io/learn/complete-guide-elk-stack/ Best Practices, Gotchas, Tips ● Gsearch issues Tomcat 7.0.72/8.0.39+ ○ https://github.com/discoverygarden/gsearch.git ● Try the Islandora Deploy on Ubuntu guide https://github.com/islandora-interest-groups/Islandora-DevOps-Interest-Group/blob/master/Deployment %20Guides/Provisioning-Islandora-on-Ubuntu.md ● AWS S3 mounting as a file system ○ https://github.com/danilop/yas3fs ■ Debug mode first! ■ Make sure it re-mounts properly if system is restarted. ■ Gotcha: There may be an object size limit of 60 GB for ingested binaries e.g. video etc. ■ Mount the datastreamStore to S3 and leave objectStore on EBS for better performance ● Caution! Challenges with restoration! ○ Alternative https://bitbucket.org/nikratio/s3ql (same Gotchas apply!) The future of the stack - Islandora 7.2.x - CLAW https://github.com/Islandora-CLAW/CLAW/blob/master/docs/user-documentation/i ntro-to-claw.md https://github.com/Islandora-CLAW/CLAW/blob/master/docs/mvp/mvp_doc.md The future of the stack - ISLE Islandora + = Enterprise (ISLE) https://github.com/Islandora-Collaboration-Group https://islandora-collaboration-group.github.io/ https://islandora.ca/content/islandora-together-meet-islandora-consortial-group Q&A Resources ● Islandora http://islandora.ca ● Islandora sandbox https://sandbox.islandora.ca/ ● Vagrant up with Islandora Labs! https://github.com/Islandora-Labs/islandora_vagrant ● Please join the growing global community! http://islandora.ca/membership ● Perhaps jump on a call with one of the Islandora Interest groups? ○ https://github.com/islandora-interest-groups ○ https://github.com/islandora-interest-groups/Islandora-DevOps-Interest-Group ● One can learn so much from the Islandora Community on Google Groups! ○ https://groups.google.com/forum/?hl=en#!forum/islandora-dev ○ https://groups.google.com/forum/?hl=en#!forum/islandora Thank you!.