Operating and Troubleshooting with MySQL

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA Table of Contents

1. Introduction 8. Hardware and OS 2. Basics and Tools 9. InnoDB Basics 3. Backup and Recovery 10. InnoDB Internals 4. Replication 11. Concurrency & Locking 5. Server Troubleshooting 12. InnoDB Diagnostics 6. Monitoring 13. Wrap-Up 7. Security

© 2011 – 2013 PERCONA 2 Welcome Everybody!

• Thank you for choosing a #Percona Training course. • We also offer additional services: – Consulting – Support – Remote DBA – Development • We support all MySQL distributions, including Oracle MySQL, Maria DB and Percona Server

© 2011 – 2013 PERCONA 3 House Keeping

• Bathrooms • When do we break for lunch? • When does the day end? • Can I get a copy of the slides? – Yes: http://learning.percona.com

© 2011 – 2013 PERCONA 4 What You Need to Already Know

• Basic MySQL (DDL, DML, etc.) • How to tune queries (using EXPLAIN) • Basic operating systems (Linux) and system administration concepts and commands – processes, threads, ssh, top, rsync, nc, ... • Basic hardware understanding – RAID, SSD vs. HDD, cache, filesystems, ...

© 2011 – 2013 PERCONA 5 Content in Detail

• Course introduction (this!) • Basics and Tools – Storage engines and its main features – MySQL history, versions and current support – Different MySQL flavors and forks – MySQL provided tools – Essential third party tools (incl. Percona Toolkit)

© 2011 – 2013 PERCONA 6 Content in Detail (cont.)

• Backup and Recovery – Logical backups (mysqldump, mydumper) – Physical backups (mylvmbackup, XtraBackup) – Backups strategies and validation • Replication – Replication internals – Setup and administration – Troubleshooting Replication, helper tools and alternatives

© 2011 – 2013 PERCONA 7 Content in Detail (cont.)

• Server Troubleshooting – Understanding hardware components that affect MySQL performance – Learning 3rd party tools to profile server performance – Evaluating current status of the server to find common bottlenecks and increasing performance by changing the configuration (SHOW GLOBAL STATUS) • Monitoring – Best practices for monitoring – Alerting Tools (Nagios) – Trending tools (Cacti)

© 2011 – 2013 PERCONA 8 Content in Detail (cont.)

• Security – Configuration Best practices for Security – Authentication and authorization in MySQL – Potential security threats for MySQL • Hardware and operating systems – Discussion on best practices for hardware purchase and operating system tuning

© 2011 – 2013 PERCONA 9 Content in Detail (cont.)

• InnoDB internals: – On Disk Information: tablespaces, -file-per-table, row formats, log files, undo space – In Memory information: Buffer pool and other structures; – Logical InnoDB organization for data and indexes: Clustered index – Advanced structures: adaptive hash, insert buffer and double write – Essential configuration parameters

© 2011 – 2013 PERCONA 10 Content in Detail (cont.)

• Concurrency and Locking: – MVCC implementation and isolation levels – Old data purging – Locking and its performance implications – Maintenance tasks and background threads • InnoDB Diagnostics – Solving issues and improving performance of InnoDB by using SHOW ENGINE INNODB STATUS, SHOW GLOBAL STATUS, INFORMATION_SCHEMA tables and other tools, including Percona XtraDB extensions

© 2011 – 2013 PERCONA 11 Useful Resources

• Online – MySQL Reference Manual – MySQL Internals Manual: InnoDB Storage Engine – MySQL Performance Blog • Books – High Performance MySQL, 3rd edition

© 2011 – 2013 PERCONA 13 MySQL Basics and Tools

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA 1 Table of Contents

1. Storage Engines 4. Essential Third Party DBA Tools

2. MySQL Versions and Support 5. Percona Toolkit

3. MySQL Supplied Tools

© 2011 – 2013 PERCONA 2 InnoDB Basics and Tools STORAGE ENGINES

© 2011 – 2013 PERCONA 3 Storage Engines

• MySQL Separates SQL from Storage. – Replication, Partitioning, Stored Procedures all happen above the storage engine layer. – Storage happens in the Storage Engines. • The most popular storage engine is InnoDB. – The default is InnoDB as of MySQL 5.5. – For 99% of people MyISAM is probably the wrong choice.

© 2011 – 2013 PERCONA 4 Storage Engines: InnoDB

• Most popular, default since MySQL 5.5. • Row-level locking. • ACID transactions. • Automatic crash recovery. • Caching data and indexes. • Referential integrity (foreign keys). • Better performance and scalability when tuned well. • Fulltext indexing in MySQL 5.6.

© 2011 – 2013 PERCONA 5 Storage Engines: MyISAM

• Default storage engine until MySQL 5.1. • Table-level locking. • Relies on filesystem caching—risk of corruption. • Fulltext indexing. • GIS indexing.

© 2011 – 2013 PERCONA 6 Storage Engines: Others

• MEMORY – Stores data in volatile memory. • BLACKHOLE – Stores no data, like /dev/null. Very fast! Small footprint! – Useful as a dummy target, while DML is written to the binlog. • CSV – Stores data in text files using a comma-separated value format. • ARCHIVE – Store large amounts of unindexed data with transparent compression. – Supports only INSERT and SELECT.

© 2011 – 2013 PERCONA 7 Storage Engines: Not Recommended

• MERGE – Interface to a collection of identical MyISAM tables as one table. – Use Partitioning instead. • FEDERATED – Lets you access data from remote MySQL instances without using replication or cluster technology. – Roughly analogous to Oracle Database Links. – Not recommended; stability and performance issues.

© 2011 – 2013 PERCONA 8 Changing a Table’s Storage Engine

• Simple to convert: > ALTER TABLE name ENGINE=InnoDB; • It performs a table restructure (just like many ALTER statements do), and the table is locked for the duration. • Test carefully—you could truncate data or lose table details if data types or index types are not supported in the new storage engine.

© 2011 – 2013 PERCONA 9 InnoDB Basics and Tools MYSQL VERSIONS AND SUPPORT

© 2011 – 2013 PERCONA 10 Show of Hands

• What version of MySQL is everyone using? – MySQL 5.0 or earlier? – MySQL 5.1? – MySQL 5.5? – MySQL 5.6? – ... a third party release?

© 2011 – 2013 PERCONA 11 MySQL Lifetime Support Policies

Extended Support Sustaining Support Release GA Date Premier Support End End End MySQL Database 5.0 Oct 2005 Dec 2011 Not available Indefinite

MySQL Database 5.1 Dec 2008 Dec 2013 Not available Indefinite

MySQL Database 5.5 Dec 2010 Dec 2015 Dec 2018 Indefinite

MySQL Database 5.6 Feb 2013 Feb 2018 Feb 2021 Indefinite

MySQL Cluster 6 Aug 2007 Mar 2013 Not available Indefinite

MySQL Cluster 7 Apr 2009 Apr 2014 Not available Indefinite

http://www.oracle.com/us/support/library/lifetime-support-technology-069183.pdf

© 2011 – 2013 PERCONA 12 Upgrading

• Migrating from 5.1.xx to 5.1.yy “should be safe”. – Internally MySQL has a policy of “no new features in a point release”*. – The hardest regressions to catch are performance and optimizer related. They affect everyone differently.

* This policy is occasionally broken.

© 2011 – 2013 PERCONA 13 Upgrading (cont.)

• Migrating between major releases (e.g. 5.5.x to 5.6.x) is not guaranteed to be safe. – Features can be added or removed. – The on-disk format can change in an incompatible way. • Incompatibilities are documented: – http://dev.mysql.com/doc/refman/5.0/en/upgrading-from-previous-series.html – http://dev.mysql.com/doc/refman/5.1/en/upgrading-from-previous-series.html – http://dev.mysql.com/doc/refman/5.5/en/upgrading-from-previous-series.html – http://dev.mysql.com/doc/refman/5.6/en/upgrading-from-previous-series.html

© 2011 – 2013 PERCONA 14 Version Number Soup

• It is possible to get confused looking at numbers: – MySQL Cluster (NDB) has its own versioning scheme; it is currently up to 7.2. – 5.2 was rebranded as 6.0; but 6.0 was canceled. – 5.4 was created to replace 6.0; some code in 6.0 was backported. – 5.4 was rebranded as 5.5. – InnoDB (plugin) now has its own versioning: • InnoDB 1.0.x for MySQL 5.1 • InnoDB 1.1.x for MySQL 5.5

© 2011 – 2013 PERCONA 15 MySQL in Linux Repositories

Version RHEL CentOS Ubuntu Debian Open Rpmfusi Webtati Mariadb Percona SuSE on.org c.com .com .com MySQL 5.0 5.x 5.x 8.04 -- 11.0 ------Yes1 11.1 MySQL 5.1 6.x 6.x 10.04 6.0 11.2 ------Yes1 11.10 (squeeze) 11.3 11.4 MySQL 5.5 -- -- 12.04 (wheezy) 12.2 Yes Yes Yes2 Yes1 12.10 13.04 MySQL 5.6 ------Yes1

1 Percona.com offers Percona Server packages. 2 Mariadb.com offers MariaDB packages.

© 2011 – 2013 PERCONA 16 Forks and Patches

Sun Microsystems Oracle Oracle MySQL 5.1 MySQL 5.5 MySQL 5.6

MyISAM InnoDB InnoDB InnoDB InnoDB (builtin) Plugin 1.0 Plugin 1.1 Plugin 1.2

Aria XtraDB XtraDB XtraDB Percona Server 5.1 Percona Server 5.5 Percona Server 5.6 MariaDB 5.1 MariaDB 5.5 (alpha)

Drizzle MariaDB 10

© 2011 – 2013 PERCONA 17 Percona Server

• At the MySQL level, very little is changed. – There is some additional instrumentation in the slow query log and some additional user statistics. • As newer versions of MySQL Server are released, Percona will rebase its patches against them. – It’s not a true fork. – On disk format remains the same*. – There is no intention to rebase against every new MySQL release, and no promises as to reaction time when security vulnerabilities are found. * Unless some specific features are enabled.

© 2011 – 2013 PERCONA 18 Percona Server (cont.)

• Main change is in storage engines. – Percona Server is built with XtraDB, a modified version of the InnoDB storage engine. • XtraDB has a number of performance and usability enhancements over InnoDB. – But it itself is not a true fork either. It rebases itself against newer versions of InnoDB plugin.

© 2011 – 2013 PERCONA 19 MariaDB

• Monty Program enhanced version of MySQL Server. – It adds new features to MySQL, fixes bugs, and enables a lot more storage engines. – It also rebases off newer MySQL releases (up to 5.5). – It focused first on creating a transactional and crash safe engine based on MyISAM: . – Its main focus now is SQL layer optimization.

© 2011 – 2013 PERCONA 20 Facebook

• Facebook maintains their own patches for MySQL. – They do not release binaries, but a source tree is available on launchpad. – More information at http://www.facebook.com/MySQLatFacebook

© 2011 – 2013 PERCONA 21 Drizzle

• The only true fork - it has no intention to maintain server level compatibility. – However, there is API compatibility. The MySQL protocol remains as a way to speak to Drizzle. • Does not change much from the storage engine perspective—but a lot of changes above the storage engine layer. • Status: GA (March 2011). – Second stable release on April 2012.

© 2011 – 2013 PERCONA 22 Additional Notes

• Other ‘forks’ : – The Google patches are no longer being worked on. – Twitter patches http://engineering.twitter.com/2012/04/mysql-at- twitter.html • New Storage Engines: – PBXT, TokuDB, InfiniDB, Infobright. • You are free to ask questions, but we won’t be talking about these today. – Some only solve niche markets, and not all are stable / ready for production.

© 2011 – 2013 PERCONA 23 InnoDB Basics and Tools MYSQL SUPPLIED TOOLS

© 2011 – 2013 PERCONA 24 The MySQL Server

• Start & stop MySQL Server with the init script: $ /etc/init.d/mysql [start|stop|restart|status]

• Some Linux distributions also support this style: $ service mysql [start|stop|restart|status]

• The init script launches mysqld_safe. This watchdog script runs the daemon mysqld, and restarts the daemon if it exits abnormally.

© 2011 – 2013 PERCONA 25 The MySQL Server (cont.)

• The mysqld daemon runs many threads for all the work of listening for client connections, running queries, logging, doing I/O, etc. $ pstree -a init ├─mysqld_safe /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/var/run/mysqld/mysqld.pid │ └─mysqld --basedir=/usr --datadir=/var/lib/mysql -- plugin-dir=/usr/lib64/mysql/plugin --user=mysql--pid-file=/v │ ├─{mysqld} │ ├─{mysqld} │ ├─{mysqld}

© 2011 – 2013 PERCONA 26 The MySQL Client

• The mysql client runs SQL commands interactively, or executes an SQL script in batch mode. • You can enable default client options in $HOME/.my.cnf: [client] host = db1 user = scott password = tiger

© 2011 – 2013 PERCONA 27 Client Builtins • mysql> pager command – Filter output through a shell program. Turn off with nopager.

• mysql> tee file – Log session to a file. Turn off with notee.

• mysql> warnings – Show any warnings by default. Turn of with nowarning.

© 2011 – 2013 PERCONA 28 Client Builtins

• mysql> edit – Edit the current command in $EDITOR.

• mysql> prompt string – Add metacharacters to prompt.

• mysql> delimiter string – Use string as statement terminator instead of default ; – Needed for CREATE TRIGGER / PROCEDURE / FUNCTION, because those statements include unquoted ; characters.

© 2011 – 2013 PERCONA 29 Client Builtins

• Vertical format output: \G statement terminator.

• readline/editline: Controls to help simple command editing. – Control-A / E: move cursor to start / end of line. – Control-W: erase word. – Control-R: search SQL command history. – See http://www.faqs.org/docs/bashman/bashref_93.html

© 2011 – 2013 PERCONA 30 Other MySQL Tools

• mysqladmin: Run administration commands as arguments, making it easier to write scripts. • mysqldump: Logical database dump tool. • mysqlbinlog: Convert binary logs to SQL scripts. • mysqlimport: Bulk load flat files to database.

© 2011 – 2013 PERCONA 31 MySQL GUI Tools

• MySQL Workbench – http://dev.mysql.com/downloads/workbench/ – Browse database objects. – Prototype and test SQL queries. – Edit data model diagrams. – Administer server instances.

• MySQL Enterprise Monitor – Commercial tool available to subscribers of Oracle Support. – Monitoring and alerting for one or many MySQL instances. – Advisors for tuning and fixing issues.

© 2011 – 2013 PERCONA 32 Many Third-Party GUI Tools

dbForge Studio http://www.devart.com/dbforge/mysql/studio/ Free to $99 HeidiSQL http://www.heidisql.com/ Free (GPL) http://www.navicat.com/ $199-369 phpMyAdmin http://www.phpmyadmin.net/ Free (GPL) Sequel Pro http://www.sequelpro.com/ Free (GPL) SQLYog / MonYog http://www.webyog.com/ $99+ / $199+ Toad for MySQL http://www.quest.com/toad-for-mysql/ Free

© 2011 – 2013 PERCONA 33 Note: Command Line Only

• Today’s examples assume use of the MySQL command line. – If you prefer to use MySQL Workbench or another GUI environment, you may do so on your own.

© 2011 – 2013 PERCONA 34 InnoDB Basics and Tools ESSENTIAL DBA THIRD PARTY TOOLS

© 2011 – 2013 PERCONA 35 Third Party Tools

• Percona Toolkit: helper scripts for DBA operations. http://www.percona.com/software/percona-toolkit/ • Innotop: dynamic console monitor for MySQL. http://code.google.com/p/innotop/ • Percona Monitoring Plugins: templates for Nagios & Cacti. http://www.percona.com/software/percona-monitoring-plugins/ • common_schema: “is to MySQL as jQuery is to javaScript.” https://code.google.com/p/common-schema/ • MySQL Sandbox: run ad hoc instances. http://mysqlsandbox.net/ • New Relic: real-time application profiling. https://newrelic.com/percona-training

© 2011 – 2013 PERCONA 36 InnoDB Basics and Tools PERCONA TOOLKIT

© 2011 – 2013 PERCONA 37 Development Tools

• pt-duplicate-key-checker – Find duplicate indexes and foreign keys on MySQL tables. • pt-online-schema-change – Perform online, non-blocking table schema changes. • pt-query-advisor – Analyze queries and advise on possible problems. • pt-query-digest – Analyze query execution logs and generate a query

© 2011 – 2013 PERCONA 38 Development Tools

• pt-show-grants – Canonicalize and print MySQL grants. • pt-upgrade – Execute queries on multiple servers and check for differences.

© 2011 – 2013 PERCONA 39 Profiling Tools

• pt-index-usage – Read queries from a log and analyze how they use indexes. • pt-pmp – Aggregate GDB stack traces for a selected program.

© 2011 – 2013 PERCONA 40 Configuration Tools

• pt-config-diff – Diff MySQL configuration files and server variables. • pt-mysql-summary – Summarize MySQL information in a nice way. • pt-variable-advisor – Analyze MySQL variables and advise on possible problems.

© 2011 – 2013 PERCONA 41 Monitoring Tools

• pt-deadlock-logger – Extract and log MySQL deadlock information. • pt-fk-error-logger – Extract and log MySQL foreign key errors. • pt-mext – Look at samples of SHOW GLOBAL STATUS side-by- side.

© 2011 – 2013 PERCONA 42 Replication Tools

• pt-heartbeat – Monitor MySQL replication delay. • pt-slave-delay – Make a MySQL slave server lag behind its master. • pt-slave-find – Find and print replication hierarchy tree of MySQL slaves. • pt-slave-restart – Watch and restart MySQL replication after errors.

© 2011 – 2013 PERCONA 43 Replication Tools

• pt-table-checksum – Perform an online replication consistency check, or checksum MySQL tables efficiently on one or many servers. • pt-table-sync – Synchronize MySQL table data efficiently.

© 2011 – 2013 PERCONA 44 System Tools

• pt-stalk – Collect information from a server for some period of time. • pt-diskstats – Aggregate and summarize /proc/diskstats. • pt-fifo-split – Split files and pipe lines to a fifo without really splitting.

© 2011 – 2013 PERCONA 45 System Tools

• pt-summary – Summarize system information in a nice way. • pt-sift – Browses files created by pt-stalk.

© 2011 – 2013 PERCONA 46 Utility Tools

• pt-archiver – Archive rows from a MySQL table into another table or a file. • pt-find – Find MySQL tables and execute actions, like GNU find. • pt-kill – Kill MySQL queries that match certain criteria.

© 2011 – 2013 PERCONA 47 Backup and Recovery

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA Table of Contents

1. Introduction 5. Incremental Backups

2. Backup Strategies 6. Point in Time Recovery

3. Logical Backups 7. Delayed Slave

4. Physical Backups 8. Validating Backups

© 2011 – 2013 PERCONA 2 Backup and Recovery INTRODUCTION

© 2011 – 2013 PERCONA 3 Overview

• Many aspects to a good backup policy: – Fast to backup – Fast to recover – Recover up until point of failure – Recover a smaller portion of data – No production/availability impact • We’ll start with the basics first, then fill in the other solutions later.

© 2011 – 2013 PERCONA 4 Backup Impact

• Hot backup – No locking; allows readers or writers to work on the database concurrently. • Warm backup – Writes are blocked and must be queued until the backup is completed. • Cold backup – System is unavailable for read or writes during backup window.

© 2011 – 2013 PERCONA 5 Backup Types: Physical

• Physical – Stores data as the “native” file format (e.g. .ibd files). – Faster to backup and restore. – Preserves physical fragmentation of data files, or even physical corruption. – Physical formats may also be version-dependent.

© 2011 – 2013 PERCONA 6 Backup Types: Logical

• Logical – Stores data in a generic, storage-independent format. – For example, a dump of SQL CREATE TABLE and INSERT statements to recreate the same logical data. – Slower to backup, slower to restore.

© 2011 – 2013 PERCONA 7 What’s the Best Solution?

• Backup in MySQL is a complicated story. – A hot backup is possible if you use all InnoDB tables. – As soon as you introduce one MyISAM table, the backup must be only warm.

© 2011 – 2013 PERCONA 8 Common Backup Tools

• mysqldump • mydumper • Filesystem Copy / LVM Snapshot • MySQL Enterprise Backup – (requires Oracle support contract) • Percona XtraBackup • Backup from a replication slave (any of the above)

© 2011 – 2013 PERCONA 9 Comparing Backup Options

Mixed MyISAM Impact Warmth Backup time Restore time Flexibility and InnoDB

cold backup very high cold very fast fast system

mysqldump high warm medium slow row

snapshotting high/medium hot/warm fast fast system

MySQL Enterprise low/medium warm fast fast table Backup Percona low/medium warm fast fast table XtraBackup

© 2011 – 2013 PERCONA 10 Comparing Backup Options Popular for small Mixed MyISAM databases. Impact Warmth Backup time Restore time Flexibility and InnoDB

cold backup very high cold very fast fast system

mysqldump high warm medium slow row

snapshotting high/medium hot/warm fast fast system

MySQL Enterprise low/medium warm fast fast table Backup Popular for MyISAM-only Percona low/mediumdatabases warm fast fast table XtraBackup

© 2011 – 2013 PERCONA 11 Comparing Backup Options

InnoDB only Impact Warmth Backup time Restore time Flexibility

cold backup very high cold very fast fast system

mysqldump medium hot medium slow row

snapshotting high/medium hot/warm fast fast system

MySQL Enterprise low hot fast fast table Backup Percona low hot fast fast table XtraBackup

© 2011 – 2013 PERCONA 12 Comparing Backup Options

InnoDB only Impact Warmth Backup time Restore time Flexibility

cold backup very high cold very fast fast system

mysqldump medium hot medium slow row

snapshotting high/medium hot/warm fast fast system

MySQL Enterprise low hot fast fast table Backup Percona low hot fast fast table XtraBackup Popular for large InnoDB backups © 2011 – 2013 PERCONA 13 Backup and Recovery LOGICAL BACKUPS

© 2011 – 2013 PERCONA 14 Example: mysqldump

• $ mysqldump --all-databases --single- transaction > backup-file.sql --all-databases for every database (including mysql). --single-transaction is only safe when all InnoDB. This ensures a hot backup. --master-data=1 (optional) records the binary log coordinates. • Restore with mysql < backup-file.sql

© 2011 – 2013 PERCONA 15 mydumper

• Logical backup tool, “mysqldump reimagined.” • http://mydumper.org/ • Multi-threaded, up to 10× faster than mysqldump, assuming you have enough cores and disks. • File compression on the fly. • Multi-threaded restore. • Daemon mode for scheduled backups and continuous binary log dumps.

© 2011 – 2013 PERCONA 16 Install mydumper

• Download from https://launchpad.net/mydumper • Package dependencies: – cmake glib2-devel mysql-devel pcre-devel zlib-devel • Build: $ cmake . $ make $ make install

© 2011 – 2013 PERCONA 17 Example: mydumper

$ mydumper --user=root --database=imdb --outputdir=dump --compress --verbose=3

© 2011 – 2013 PERCONA 18 Example: mydumper

** Message: Connected to a MySQL server ** Message: Started dump at: 2013-04-30 00:58:41

** Message: Written master status ** Message: Thread 1 connected using MySQL connection ID 17 ** Message: Thread 2 connected using MySQL connection ID 18 ** Message: Thread 3 connected using MySQL connection ID 19 ** Message: Thread 4 connected using MySQL connection ID 20 ** Message: Non-InnoDB dump complete, unlocking tables ** Message: Thread 1 dumping data for `imdb`.`aka_name` ** Message: Thread 2 dumping data for `imdb`.`aka_title` ** Message: Thread 3 dumping data for `imdb`.`cast_info` ** Message: Thread 4 dumping data for `imdb`.`char_name` ** Message: Thread 2 dumping data for `imdb`.`comp_cast_type` ** Message: Thread 2 dumping data for `imdb`.`company_name` ** Message: Thread 2 dumping data for `imdb`.`company_type` . . .

© 2011 – 2013 PERCONA 19 Example: mydumper

$ ls dump/ imdb.aka_name-schema.sql.gz imdb.link_type-schema.sql.gz imdb.aka_name.sql.gz imdb.link_type.sql.gz imdb.aka_title-schema.sql.gz imdb.movie_companies-schema.sql.gz imdb.aka_title.sql.gz imdb.movie_companies.sql.gz imdb.cast_info-schema.sql.gz imdb.movie_info_idx-schema.sql.gz imdb.cast_info.sql.gz imdb.movie_info_idx.sql.gz imdb.char_name-schema.sql.gz imdb.movie_info-schema.sql.gz imdb.char_name.sql.gz imdb.movie_info.sql.gz imdb.company_name-schema.sql.gz imdb.movie_keyword-schema.sql.gz imdb.company_name.sql.gz imdb.movie_keyword.sql.gz imdb.company_type-schema.sql.gz imdb.movie_link-schema.sql.gz imdb.company_type.sql.gz imdb.movie_link.sql.gz imdb.comp_cast_type-schema.sql.gz imdb.name-schema.sql.gz imdb.comp_cast_type.sql.gz imdb.name.sql.gz imdb.complete_cast-schema.sql.gz imdb.person_info-schema.sql.gz imdb.complete_cast.sql.gz imdb.person_info.sql.gz imdb.info_type-schema.sql.gz imdb.role_type-schema.sql.gz imdb.info_type.sql.gz imdb.role_type.sql.gz imdb.keyword-schema.sql.gz imdb.title-schema.sql.gz imdb.keyword.sql.gz imdb.title.sql.gz imdb.kind_type-schema.sql.gz metadata imdb.kind_type.sql.gz

© 2011 – 2013 PERCONA 20 Example: mydumper

$ myloader --database=imdb --directory=dump --queries-per-transaction=50000 --threads=4 --verbose=3

© 2011 – 2013 PERCONA 21 Example: mydumper

** Message: 4 threads created ** Message: Creating database `imdb` ** Message: Creating table `imdb`.`company_name` ** Message: Creating table `imdb`.`aka_name` ** Message: Creating table `imdb`.`info_type` ** Message: Creating table `imdb`.`person_info` . . . ** Message: Thread 1 restoring `imdb`.`comp_cast_type` part 0 ** Message: Thread 2 restoring `imdb`.`company_name` part 0 ** Message: Thread 3 restoring `imdb`.`role_type` part 0 ** Message: Thread 4 restoring `imdb`.`aka_title` part 0 ** Message: Thread 3 restoring `imdb`.`company_type` part 0 ** Message: Thread 1 restoring `imdb`.`movie_link` part 0 . . . ** Message: Thread 1 shutting down ** Message: Thread 3 shutting down ** Message: Thread 2 shutting down

© 2011 – 2013 PERCONA 22 Backup and Recovery PHYSICAL BACKUPS

© 2011 – 2013 PERCONA 23 LVM Backup

$ mylvmbackup – A wrapper around LVM, coordinating with MySQL – http://www.lenzg.net/mylvmbackup/ • Requires a few things set up correctly to work: – InnoDB data and transaction logs on the same partition. – Partition is residing on an LVM volume. – The LVM volume has sufficient free space.

http://www.mysqlperformanceblog.com/2008/06/09/estimating-undo-space-needed-for-lvm-snapshot/ http://www.mysqlperformanceblog.com/2006/08/21/using-lvm-for-mysql-backup-and-replication-setup/

© 2011 – 2013 PERCONA 24 Percona XtraBackup

• Non-blocking, hot backup solution for InnoDB and XtraDB (supports MyISAM too). – Supports MySQL, Percona Server, MariaDB, Drizzle • Open-source, free (GPL)

http://www.percona.com/software/percona-xtrabackup

© 2011 – 2013 PERCONA 25 Percona XtraBackup Features

• Compressed backups • Export individual tables • Partial backups • Restore tables to a • Throttling by IOPS different server • PITR support • Analyze data & index • Incremental backups files • Streaming backups • Encrypted backups* • Parallel backups • MySQL 5.6 support*

* Supported in Percona XtraBackup 2.1

© 2011 – 2013 PERCONA 26 What’s the Difference Between…

InnoDB only Impact Warmth Backup time Restore time Flexibility

snapshotting high/medium hot/warm fast fast system

Percona low hot fast fast table XtraBackup • I/O performance can become worse while the LVM is maintaining a snapshot. – You need to over-provision your I/O capacity significantly to compensate for this overhead.

http://www.mysqlperformanceblog.com/2009/02/05/disaster-lvm-performance-in-snapshot-mode/

© 2011 – 2013 PERCONA 27 But You Said...

InnoDB only Impact Warmth Backup time Restore time Flexibility

mysqldump medium hot medium slow row

Percona low hot fast fast table XtraBackup • Mysqldump may back up faster in some cases: – Have all data in memory, but use O_DIRECT. • It may restore faster in some cases: – A really small database with big InnoDB log files. • Recovery time is more important.

© 2011 – 2013 PERCONA 28 Example: Percona XtraBackup

1. Create full backup: $ innobackupex /path/to/backup/ – Creates a DATETIME subdirectory. – Makes a physical copy of data files and log files. 2. Prepare backup for restore: $ innobackupex --apply-log /path/to/backup/DATETIME – Reapplies changes from the original transaction log.

© 2011 – 2013 PERCONA 29 Example: Percona XtraBackup

3. Restore simply by physically moving the prepared backup to the datadir. $ innobackupex --copy-back /path/to/backup/DATETIME $ chown -R mysql:mysql /data/mysql

© 2011 – 2013 PERCONA 30 Backup and Recovery INCREMENTAL BACKUPS

© 2011 – 2013 PERCONA 31 Incremental/Differential Backup

• Backup only the changes that have happened since a previous backup. – Very helpful if there is a lot of data, but be careful how it impacts recovery time.

© 2011 – 2013 PERCONA 32 Incremental/Differential Backup

• Differential means cumulative changes since last full backup. – Restore in two steps: restore last full backup, then restore differential.

© 2011 – 2013 PERCONA 33 Incremental/Differential Backup

• Incremental means changes since the last full backup or the last incremental backup. – Restore in multiple steps: restore last full backup, then restore every incremental backup in order.

© 2011 – 2013 PERCONA 34 Incremental/Differential Backup

• A few tools support incremental or differential backups: – Binary logs (all storage engines).* – XtraBackup (InnoDB only). • http://www.percona.com/doc/percona- xtrabackup/howtos/recipes_ibkx_inc.html

* Some restrictions with its use. http://www.mysqlperformanceblog.com/2009/07/21/just-how-useful-are-binary-logs-for-incremental-backups/

© 2011 – 2013 PERCONA 35 Example: Percona XtraBackup

1. Create a full backup: $ innobackupex /path/to/backup – Note DATETIME subdirectory as $FULLBACKUP 2. Prepare the full backup, without rolling back uncommitted changes: $ innobackupex --apply-log --redo-only $FULLBACKUP

© 2011 – 2013 PERCONA 36 Example: Percona XtraBackup

3. Create an incremental backup: $ innobackupex --incremental /path/to/inc --incremental-basedir=$FULLBACKUP – Note DATETIME subdirectory as $INCRBACKUP 4. Apply the incremental: $ innobackupex --apply-log --redo-only $FULLBACKUP --incremental-dir=$INCRBACKUP 5. Repeat steps 3-4 each day (or interval you choose).

© 2011 – 2013 PERCONA 37 Example: Percona XtraBackup

6. Prepare to restore: $ innobackupex --apply-log $FULLBACKUP 7. Restore as you would a normal full backup: $ innobackupex --copy-back $FULLBACKUP

© 2011 – 2013 PERCONA 38 Segmented Backups

• Intentionally partition the backup. When there is a disaster, you can recover more important data as fast as possible. – For example, backup 20GB users database, then separately backup 200GB of click-logging data. – At 50MB/s restore time, it takes 6.8 minutes to be back online, instead of over 1 hour. – Restore the click-logging data at a later time.

© 2011 – 2013 PERCONA 39 Segmented Backups (cont.)

• No general-purpose tools exist to automate this. – Defining the subsets of data is application-specific. – Sites that use this technique must develop in-house scripts specific to their needs. • The easiest way to partition backups is via databases.

© 2011 – 2013 PERCONA 40 Segmented Backups (cont.)

• mysqldump can specify databases, tables (--tables), and even rows (--where): $ mysqldump databasename --where="id > 1" > dump.sql

© 2011 – 2013 PERCONA 41 Segmented Backups (cont.)

• XtraBackup can specify databases (--databases), tables (--tables-file), or filter databases/tables by regular expressions (--include). – Recovery of a subset of tables requires separate tablespaces (innodb-files-per-table) and Percona Server with XtraDB. – http://www.percona.com/doc/percona- xtrabackup/innobackupex/partial_backups_innobackupe x.html

© 2011 – 2013 PERCONA 42 Example: Percona XtraBackup

1. Create backup of world database using a regular expression: $ innobackupex --include='^world[.]' /path/to/backup 2. Prepare backup: $ innobackupex --apply-log --export /path/to/backup/DATETIME

© 2011 – 2013 PERCONA 43 Example: Percona XtraBackup

3. Create empty tables: mysql> CREATE DATABASE world2; mysql> CREATE TABLE world2.City LIKE world.City; 4. Remove tablespaces: mysql> ALTER TABLE world2.City DISCARD TABLESPACE; 5. Copy .ibd file into place: $ cp /path/to/backup/DATETIME/world/City.{ibd,exp} /data/mysql/world2/ $ chown mysql:mysql /data/mysql/world2/City.*

© 2011 – 2013 PERCONA 44 Example: Percona XtraBackup

6. Enable tablespace import – Percona Server 5.1: mysql> set global innodb_expand_import=1; – Percona Server 5.5: mysql> set global innodb_import_tables_from_xtrabackup=1; 7. Import new tablespaces: mysql> ALTER TABLE world2.City IMPORT TABLESPACE;

© 2011 – 2013 PERCONA 45 Backup and Recovery POINT IN TIME RECOVERY

© 2011 – 2013 PERCONA 46 Backups Miss Subsequent Changes

binary logs 20-25 binary logs 26-30 Disaster!

2. restore 1. backup Full Backup up to binlog 25 up to binlog 25

© 2011 – 2013 PERCONA 47 Point in Time Recovery

• In combination with the binary log, you can recover up until the point of failure, or recover and skip over a malicious statement. • The mysqlbinlog tool converts binary log files into SQL scripts that you can replay.

© 2011 – 2013 PERCONA 48 Replay Changes from Logs

3. convert binlogs to SQL

binlogs 20-25 binlogs 26-30 Disaster! replay binlogs 26-30

2. restore 1. backup Full Backup up to binlog 25 up to binlog 25

© 2011 – 2013 PERCONA 49 Exercise: Point in Time Recovery

• See exercise instructions in HTML file.

© 2011 – 2013 PERCONA 50 What if We Lose the Binary Logs?

• PITR requires a continuous set of binlogs from the time of the last backup forward. • You can rsync your binlogs to a fileserver. • MySQL 5.6 supports remote binlog download using the mysqlbinlog tool.

© 2011 – 2013 PERCONA 51 Backup and Recovery DELAYED SLAVE

© 2011 – 2013 PERCONA 52 Delayed Slave

• Not a backup, but a kind of safety net. • Helps recover from accidental changes: – A slave delays (1 hour, 1 day, etc.) changes from the master. – In case of disaster, stop replication to the slave just before the malicious statement. • MySQL 5.6 does this natively: CHANGE MASTER TO ... MASTER_DELAY=N; /*seconds*/ • MySQL < 5.6 can use pt-slave-delay: – http://www.percona.com/doc/percona-toolkit/pt-slave- delay.html

© 2011 – 2013 PERCONA 53 Backup and Recovery VALIDATING BACKUPS

© 2011 – 2013 PERCONA 54 Warning! Backups Can Fail

• An error during backup can produce invalid files. • Don’t wait for an emergency to discover that you can’t restore your backup! • Test restoration regularly.

© 2011 – 2013 PERCONA 55 How to Validate a Backup? (1)

1. Restore the backup on a test server. 2. Configure it as a slave of the backup’s source instance. 3. Use pt-table-checksum on the source instance. 4. Verify the checksums show no drift.

© 2011 – 2013 PERCONA 56 How to Validate a Backup? (2)

1. Restore the backup on a test server. 2. Apply some binary logs from the backup’s source instance. 3. Check for errors.

© 2011 – 2013 PERCONA 57 How to Validate a Backup? (3) 1. Restore the backup to a test server. 2. Run CHECKSUM TABLE table on selected tables. – http://dev.mysql.com/doc/refman/5.6/en/checksum- table.html 3. Comparing to original checksum requires that you collected checksums on the source instance while the database was locked. 4. Even without comparison, checksum at least confirms that you can read every row in the restored database.

© 2011 – 2013 PERCONA 58 How to Validate a Backup? (4)

1. Restore the backup to a test server. 2. Run a few smoke-test queries against selected tables. 3. Check that the queries return expected data, e.g.: – Count of rows. – Rows with recent date/time values.

© 2011 – 2013 PERCONA 59 Backing Up from a Slave

• This works fine—if your slave is a true replica. • You must verify the slave is true before you back up. – pt-table-checksum to detect slave drift – pt-table-sync to correct slave drift

© 2011 – 2013 PERCONA 60 Replication

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA 1 Table of Contents

1. Overview 5. Administration and Maintenance

2. Setting Up Replication 6. Problems and Solutions

3. Under the Hood 7. Tools and Alternative Technologies

4. Topologies

© 2011 – 2013 PERCONA 2 Replication OVERVIEW

© 2011 – 2013 PERCONA 3 Replication Overview

• Replication is a mechanism for recording a series of changes on one database server and applying the same changes to a replica. • The source the “master” and its replica is a “slave.”

Master Slave

© 2011 – 2013 PERCONA 4 Replication Solutions

High Availability If the master server crashes, the slave serves a hot spare. Load Balancing The application can send some read queries to the slave, giving you greater capacity for read-only query load. Backups You can create database backups on a slave, without worrying about impacting production traffic. Dedicated Queries Reports or other offline tasks can read data from a slave. Data Distribution The slave can be an off-site replica that is continually up to date. Testing Experiment with queries, MySQL tuning, or version upgrades you aren’t ready to use on the master.

© 2011 – 2013 PERCONA 5 How Replication Works

• Master records committed changes in its binary log.

Master Slave

binary log

© 2011 – 2013 PERCONA 6 How Replication Works

• The slave’s IO thread continually downloads the master’s binary logs. • These copies on the slave are called relay logs.

Master Slave

binary relay log log

© 2011 – 2013 PERCONA 7 How Replication Works

• The slave’s replication SQL thread executes the changes against its copy of the database. • They stay in sync as incremental changes are applied.

Master Slave

binary relay log log

© 2011 – 2013 PERCONA 8 How Replication Works

• Replication is asynchronous by default. • The slave can stop executing changes or stop downloading logs, and resume later where it left off.

Master Slave

binary relay log log

© 2011 – 2013 PERCONA 9 Semi-Synchronous Replication

• Commit on master waits for at least one semi-sync slave to confirm receipt of the binary log. • Assures a change is logged in two places—although slave can still lag behind executing the changes.

Master Slave

binary relay log log

© 2011 – 2013 PERCONA 10 Clarification on the Binary Log

• In Oracle and some other RDBMS implementations, the transaction log is also used for replication. • MySQL has two separate change logs. – InnoDB transaction log: physical changes to InnoDB data pages, to ensure durability. Used only during crash recovery. – Binary log: representing logical changes to data. These logs are used for replication and point-in-time recovery.

© 2011 – 2013 PERCONA 11 Replication SETTING UP

© 2011 – 2013 PERCONA 12 Setting Up Replication

1. Enable binary logs on the master. 2. Assign a server id to each server. 3. Grant a user on the master server. 4. Initialize the slave with a replica of data. 5. Configure the slave. 6. Start replication.

© 2011 – 2013 PERCONA 13 1. Enabling Binary Logs

• The log-bin config variable names a filename prefix for binlog files. • MySQL will generate a numeric suffix, with incrementing values as it allocates new files. • Configure in /etc/my.cnf: [server] log_bin = ON • Enabling/disabling the binary log or changing the file prefix requires restart of the MySQL instance.

© 2011 – 2013 PERCONA 14 2. Assigning Server-Id

• Each MySQL instance in a replication chain must have a distinct server id. • Any distinct positive integer between 1 and 232-1. • Server id 0 means the instance cannot be a master or a slave. • Configure in /etc/my.cnf: [server] server_id = 1234 • Changing the server id requires restart of the MySQL instance.

© 2011 – 2013 PERCONA 15 3. Creating the Replication User

• The slave needs to connect to the master to download binary logs. • The user must minimally have REPLICATION SLAVE privilege. • You may also grant REPLICATION CLIENT privilege so this user can run commands to report replication status. mysql> GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO repl@'192.168.0.%' IDENTIFIED BY 'xyzzy';

© 2011 – 2013 PERCONA 16 4. Initialize Data for the Slave

• Changes in the binary log are incremental, so the master and slave should start with a common baseline of data. • The most important thing to do is note the current binary log file and position when you capture the initial data. • For example, you can include the binary log coordinates in any backup from the master. $ mysqldump --master-data=1 …other options… • Restore the data dump on the slave.

© 2011 – 2013 PERCONA 17 Locating Master Binlog Coordinates

• You can also view the current binlog position on the master at the time you capture the initial data you use for the slave: mysql> SHOW MASTER STATUS\G File: mysql-bin.000023 Position: 107

© 2011 – 2013 PERCONA 18 5. Configure Replication

• Run on the slave: mysql> CHANGE MASTER TO MASTER_HOST='masterdb', MASTER_USER='repl', MASTER_PASSWORD='xyzzy', MASTER_LOG_FILE='mysql-bin.000023', MASTER_LOG_POS=107; • Use the log file and pos you noted on the master when you created the initial data.

© 2011 – 2013 PERCONA 19 5. Configure Replication

• Older versions of MySQL supported options in /etc/my.cnf to configure the slave, but this is deprecated. – Bad idea anyway, since your server may restart, and reset the binlog coordinate the slave subscribes to.

© 2011 – 2013 PERCONA 20 6. Start Replication

• Run on the slave: mysql> START SLAVE; • To stop the slave: mysql> STOP SLAVE; • You can also independently start and stop the IO thread (downloading binary logs) and the SQL thread (executing relay logs): mysql> START SLAVE IO_THREAD; mysql> START SLAVE SQL_THREAD;

© 2011 – 2013 PERCONA 21 Check Replication Status (1)

mysql> SHOW SLAVE STATUS\G Master_Host: 192.168.56.110 Master_User: repl Master_Port: 3307 Master_Server_Id: 2 Connect_Retry: 60 Master_Log_File: db2.000019 Read_Master_Log_Pos: 302 Relay_Master_Log_File: db2.000019 Exec_Master_Log_Pos: 302 Continued…

© 2011 – 2013 PERCONA 22 Check Replication Status (2)

Slave_IO_State: Waiting for master to send event Slave_IO_Running: Yes Slave_SQL_Running: Yes Seconds_Behind_Master: 0 Last_Errno: 0 Last_Error: Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Skip_Counter: 0 Continued…

© 2011 – 2013 PERCONA 23 Check Replication Status (3)

Relay_Log_File: relay.000007 Relay_Log_Pos: 4 Relay_Log_Space: 107 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Continued…

© 2011 – 2013 PERCONA 24 Check Replication Status (4)

Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Ignore_Server_Ids: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Continued…

© 2011 – 2013 PERCONA 25 Check Replication Status (5)

Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Master_SSL_Verify_Server_Cert: No

http://dev.mysql.com/doc/refman/5.6/en/replication-administration-status.html

© 2011 – 2013 PERCONA 26 Exercise: Set Up Replication

1. Configure replication between two instances. 2. Start replication and check replication status. 3. Verify that replication is running, by creating a dummy table on the master and then look for it on the slave. mysql> CREATE TABLE test.foo (id INT PRIMARY KEY); 4. Stop replication. 5. Create another dummy table on the master, look for it on the slave. It should not be there yet. 6. Start replication and look for the new table on the slave again.

© 2011 – 2013 PERCONA 27 Replication UNDER THE HOOD

© 2011 – 2013 PERCONA 28 Replication Under the Hood

• Binary log formats • More on log files • Chains of replication • Replication filtering

© 2011 – 2013 PERCONA 29 Binary Log Formats

• You can set the default binary log format on the master, in /etc/my.cnf: [server] binlog_format = STATEMENT binlog_format = ROW binlog_format = MIXED • In theory, you can change this dynamically, but some errors have been reported when attempting this on a busy server. • To be safe, at least stop applications from making changes before changing binlog_format globally.

© 2011 – 2013 PERCONA 30 Statement Based Binary logs

• Binary log can contain SQL statements to be executed. • Slave re-parses SQL statements from relay log and executes against its replica data. • Sensitive to discrepancies in data. Applying changes against wrong data can propagate and worsen the drift. • Some statements are by nature non-deterministic, or have different effects on the master vs. the slave: UPDATE tablename SET ... WHERE columnname > SYSDATE();

© 2011 – 2013 PERCONA 31 Row Based Binary Logs

• Binary log contains the result of changes executed on the master. That is, copies of the rows affected by changes. • Applying on the slave does not run SQL, it simply replaces the rows. • Pros: – Avoids re-executing costly statements, possibly reducing CPU load on the slave. – Protects against slave drift in many cases. – Reduces locks necessary to ensure changes are applied in the correct order.

© 2011 – 2013 PERCONA 32 Row Based Binary Logs

• Cons: – When a statement applies to many rows, all affected rows need to be copied into the binary log file. – Binary logs contain the row image before and after the change. – MySQL 5.6 mitigates this, storing only columns that changed.

© 2011 – 2013 PERCONA 33 Mixed Mode Binary Logs

• Defaults to STATEMENT format, and typically uses STATEMENT almost all the time. • Switches to ROW format only for statements that MySQL detects are unsafe for replication. • DDL (CREATE/ALTER/DROP) is always logged in STATEMENT format regardless.

© 2011 – 2013 PERCONA 34 Auxiliary Replication Files

mysql-bin.index MySQL uses this file to catalog its binary logs exist. mysql-relay-bin.index A slave catalogs its relay logs. master.info A slave stores its replication parameters (that you set with CHANGE MASTER). E.g. the replication password in plain text. relay-log.info A slave uses this file to record how far it’s executed changes. • Percona Server tracks a slave in a crash-safe way: http://www.percona.com/doc/percona- server/5.5/reliability/innodb_recovery_update_relay_log.h tml • MySQL 5.6 does this too.

© 2011 – 2013 PERCONA 35 Suppressing Binary Logging

• You can make changes in a session without logging. mysql> SET SESSION SQL_LOG_BIN=0; mysql> ALTER TABLE title ADD INDEX (title(50), production_year); • Resume logging by setting the variable back to 1, or else simply end the current session. • Common technique to reduce downtime: – Apply changes to a slave. – Swap the roles of a slave and its master. – Apply changes to the former master.

© 2011 – 2013 PERCONA 36 Chains of Replication

• A slave can be the master of a downstream slave. The middle slave must write to its own binary log. [server] log_bin = 1 log_slave_updates = 1

Master Slave Slave

binary relay binary relay log log log log

© 2011 – 2013 PERCONA 37 Chains and Binlog Format

• Tip: Intermediate slaves should use binlog_format=STATEMENT. – If master sends STATEMENT binlog records, these stay in STATEMENT. – If master sends ROW binlog records, these stay in ROW. – This configuration is important to support table checksums.

© 2011 – 2013 PERCONA 38 Replication Filters

• Replicate partial data, so slaves handle less traffic. • http://dev.mysql.com/doc/refman/5.6/en/replication- rules.html

© 2011 – 2013 PERCONA 39 Replication Filter on the Master

• Master writes to its logs for only some databases. • Then all slaves apply all changes in the logs.

Databases: Server 1 • wordpress • sessions • logs

Server 2 Server 3 Server 4

wordpress wordpress wordpress

© 2011 – 2013 PERCONA 40 Replication Filter on the Slave

• Master writes changes for all databases to its logs. • Then each slave downloads all binary logs, but executes only changes against specific databases.

Databases: Server 1 • wordpress • sessions • logs

Server 2 Server 3 Server 4

wordpress sessions logs

© 2011 – 2013 PERCONA 41 Replication Filter Risks

• Multi-database updates don’t work. • Table checksums must run database-specific. • Slaves cannot be promoted to master, because they don’t have a complete set of databases.

© 2011 – 2013 PERCONA 42 Replication TOPOLOGIES

© 2011 – 2013 PERCONA 44 Replication Topologies

• Master-Slave • Master-Master • Tiered-Slave • Tree • Master-Master + Tree • Dual-Tree • Ring

© 2011 – 2013 PERCONA 45 Master-Slave Topology

• Architecture suitable for most projects. • Use case: slaves for running backups, analytics, or increasing capacity for read queries. • Doesn’t help for failover/failback, or availability during upgrades.

Master Slave

© 2011 – 2013 PERCONA 46 Master-Multiple Slaves Topology

• Use case: additional slaves for other dedicated read- only queries (e.g. reporting), or increasing capacity for read queries.

Server 1

Server 2 Server 3 Server 4

© 2011 – 2013 PERCONA 47 Master-Master Topology

• Use CHANGE MASTER on both MySQL instances to subscribe to changes on the other instance. • Safest if your applications write to one instance at a time; the other instances are set read-only. • Use case: failover/failback, availability during upgrades.

Writer Reader Reader Writer

© 2011 – 2013 PERCONA 48 Tiered-Slave Topology

• Avoid multiple slaves downloading binlogs. • Not a burden for the master, but it costs bandwidth, for example if the slaves are in a remote data center. • Use case: isolating sets of slaves. E.g.: slaves are in a separate data center. Avoids redundant download of binlogs via the WAN. Master Slave Slave

© 2011 – 2013 PERCONA 49 Tree Topology

• Any slave can be a master for “downstream” slaves. • Use case: mix of read scaling and isolating sets of slaves. Server 1

Server 2 Server 3 Server 4

Server 5 Server 6

© 2011 – 2013 PERCONA 50 Master-Master + Tree Topology

• One master-master pair, with additional slaves. • All slaves use a single master to allow the passive master to be freely for maintenance or upgrades. • Use case: mix of read scaling and failover.

Server 1 Server 2

Server 3 Server 4 Server 5

© 2011 – 2013 PERCONA 51 Dual-Tree Topology

• One master-master pair, with additional slaves on each master. • Use case: mix of read scaling and failover to an alternate data center.

Server 1 Server 2

Server 3 Server 4 Server 5 Server 6

© 2011 – 2013 PERCONA 52 Ring Topology

• Possible, but not recommended. • Use case: you get increased read capacity, and in theory any slave can take over as master. • But if any instance fails, all downstream instances stop updating. Writer Reader

Reader Reader

© 2011 – 2013 PERCONA 53 Exercises: Topologies

• Configure master-master replication. • Apply a long-running change on one instance at a time, without writing to the binary log. • Discuss which replication topologies will be relevant in your environment.

© 2011 – 2013 PERCONA 54 Replication ADMINISTRATION AND MAINTENANCE

© 2011 – 2013 PERCONA 55 Replication Administration and Maintenance • Starting slave automatically (or not) • Managing log files • Monitoring replication health • Measuring slave lag • Measuring slave drift • Correcting slave drift • Changing masters • Failover and switchover

© 2011 – 2013 PERCONA 56 Starting Slave Automatically (or not)

• Replication slave threads start automatically, unless you set this in /etc/my.cnf: [server] skip_slave_start = 1 • Pros and cons of doing this? – Pro: gives the DBA the opportunity to CHANGE MASTER on the slave after startup (change the master, change the binlog coordinates, etc.). – Con: requires you to do one more manual step when restarting a slave.

© 2011 – 2013 PERCONA 57 Managing Log Files

• View the current binary logs at any time: mysql> SHOW BINARY LOGS; +------+------+ | Log_name | File_size | +------+------+ | db1.000023 | 144 | | db1.000024 | 107 | +------+------+

© 2011 – 2013 PERCONA 58 Managing Log Files

• MySQL creates a new binary log file: – When the mysqld server restarts. – When the log file size exceeds max_binlog_size. – When you issue FLUSH LOGS;

© 2011 – 2013 PERCONA 59 Managing Log Files

• Manually purge binary logs: mysql> PURGE BINARY LOGS TO 'db1.000024';

• Automatically purge binary logs: – Percona Server also has an option to purge binary logs when storage exceeds a threshold, instead of by days. [server] expire_logs_days = 7

© 2011 – 2013 PERCONA 60 Managing Log Files

• RESET MASTER – Purges all binary logs and initializes master file and position to 1. • RESET SLAVE – Rewrites the slave configuration with default values. • RESET SLAVE ALL – Removes slave configuration completely.

© 2011 – 2013 PERCONA 61 Monitoring Replication Health

• Check for errors: mysql> SHOW SLAVE STATUS\G; . . . Slave-IO-Running: Yes Slave-SQL-Running: No Last-Errno: 1062 Last-Error: Error 'Duplicate entry '15218' for key 1' on query. Default database: 'db'. Query: 'INSERT INTO db.table ( FIELDS ) VALUES ( VALUES )’ . . .

© 2011 – 2013 PERCONA 62 Measuring Slave Lag

• One measure of slave lag: mysql> SHOW SLAVE STATUS\G . . . Seconds_behind_master: 174 . . . • This is usually accurate, but it’s really reporting the difference in timestamps between the last executed change by the SQL thread, and the last downloaded change by the IO thread. • There might be more binary logs on the master that haven’t been downloaded yet.

© 2011 – 2013 PERCONA 63 Measuring Slave Lag

• On the master: mysql> REPLACE INTO dummy (timestamp) VALUES (SYSDATE()); • On the slave: mysql> SELECT SYSDATE() – dummy.timestamp FROM dummy; • This is how Percona Toolkit’s pt-heartbeat works. – Insert a timestamp into a dummy table once per second. The difference on the slave is always an accurate measure of the real slave lag (within 1 second). – http://www.percona.com/doc/percona-toolkit/pt-heartbeat.html

© 2011 – 2013 PERCONA 64 Measuring Slave Drift

• Percona Toolkit’s pt-table-checksum • http://www.percona.com/doc/percona-toolkit/pt- table-checksum.html

© 2011 – 2013 PERCONA 65 Changing Masters • Making a slave subscribe to a different master: mysql> STOP SLAVE; mysql> CHANGE MASTER TO MASTER_HOST='192.168.56.202'; • The binary log position of the new master is almost certainly not in sync with old master. • Discovering the correct binlog coordinate on the new master corresponding to the last change executed on the slave can be tricky. mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000123', MASTER_LOG_POS=3289439; mysql> START SLAVE;

© 2011 – 2013 PERCONA 67 Failover and Switchover

• Failover is when the current master fails and one of the slave is assigned to become the new master in an automatic, unattended manner. – This is even harder than it sounds to automate! • Switchover is also assigning another server as a new master, but in a planned manner. – This is much more achievable, if you can stop application traffic even for a few seconds.

© 2011 – 2013 PERCONA 68 REPLICATION PROBLEMS AND SOLUTIONS

© 2011 – 2013 PERCONA 70 Replication Problems and Solutions

• Slave lag • Risks of dual-masters • Slave drift • Logs out of sync • Data corruption • Oversized packets • Non-deterministic • Limited bandwidth changes • Disk space exhaustion • Out of band changes • Lost events • Bad server ids • Non-replicated data

© 2011 – 2013 PERCONA 71 Slave Lag

• Occasional slave lag is a fact of life, but sometimes it can get out of control. • Mitigation of slave lag: – Faster CPU to execute SQL statements more quickly. – Faster I/O system to write changes more quickly. – Use binlog_format=ROW if the SQL statements are slow to execute. – Replicate fewer changes to slaves (replication filtering). – Balance writes over multiple master-slave pairs (sharding). – Pre-warm buffer pool on the slave so updates run faster.

© 2011 – 2013 PERCONA 72 Slave Drift

• Percona Toolkit’s pt-table-sync • http://www.percona.com/doc/percona-toolkit/pt- table-sync.html

© 2011 – 2013 PERCONA 73 Data Corruption

• If the slave drift is too severe, it’s often a quicker and simpler operation to reinitialize the slave: – STOP SLAVE; – Drop all the databases (once we’ve decided they’re too far gone to be useful anyway). – Acquire a fresh backup from the master, or from another slave. – Restore the backup to reinitialize the damaged slave. – CHANGE MASTER to the right binlog coordinate. – START SLAVE;

© 2011 – 2013 PERCONA 75 Non-Deterministic Changes

• SQL statements may change data differently on the slave than on the master. • Examples: UPDATE … ORDER BY RAND() LIMIT 1; INSERT INTO table (pk) VALUES (UUID()); UPDATE … WHERE ts > SYSDATE();

© 2011 – 2013 PERCONA 76 Out of Band Changes

• Some misbehaving applications (or misbehaving users) may change data directly on the slave. • Mitigation strategy: – Enable the read_only option for all instances except the primary master. mysql> SET GLOBAL read_only=1; – The root user and the replication SQL thread can still make changes.

© 2011 – 2013 PERCONA 77 Bad Server Ids

• Misconfiguration of server_id can prevent replication from running: mysql> START SLAVE; ERROR 1200 (HY000): The server is not configured as slave; fix in config file or with CHANGE MASTER TO • Mitigation strategy: As the error suggests, set server_id and restart the instance.

© 2011 – 2013 PERCONA 78 Non-Replicated Data

• Some changes depend on data that doesn’t exist on the slave. – Temporary tables. – Replication-filtered tables. • Mitigation strategies: – Avoid using temp tables as a source for hybrid read/write operations (e.g. INSERT…SELECT, multi-table UPDATE/DELETE, etc.). – Use ROW-based replication.

© 2011 – 2013 PERCONA 79 Risks of Dual Masters

• Since replication is asynchronous, your applications may change data on two masters simultaneously, introducing a consistency violation that isn’t caught until the changes propagate. – E.g., duplicate key violations.

© 2011 – 2013 PERCONA 80 Risks of Dual Masters

• Mitigation strategies: – Write to one master at a time. Make the other read_only. – Let applications write changes to both masters, but be careful to write only to one instance or the other for a given subset. – Configure each instance so that one allocates odd values, and the other one allocates even values. E.g. in /etc/my.cnf: [server] auto_increment_increment=2 auto_increment_offset=N

© 2011 – 2013 PERCONA 81 Logs Out of Sync

• Errors in downloading binary logs can stop replication and report an error: Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file' • For example, the slave was stopped for a long time, and when it resumed, the binary log file it had been reading had been purged on the master.

© 2011 – 2013 PERCONA 82 Oversized Packets

• Large data payloads (e.g. large BLOB/TEXT data) can be too large for the default packet size limit. • The default is 4MB*; the maximum is 1GB. • Mitigation strategy: increase this configuration setting in /etc/my.cnf on both master and slave: [server] max_allowed_packet = 100M • MySQL 5.1.64, 5.5.26, 5.6.6 introduces new variable slave_max_allowed_packet, default value 1GB.

* default max_allowed_packet is 1MB in MySQL < 5.6

© 2011 – 2013 PERCONA 83 Disk Space Exhaustion

• The master can run out of disk space as binary logs accumulate, even if the database isn’t large. • The slave can run out of disk space by downloading binary logs. • Mitigation strategies: – Provision disks liberally, with plenty of space to spare. Don’t run at 90%+ disk full. – Set up tools to alert you when disk space is running out. – Use expire_logs_days and PURGE BINARY LOGS to free disk space as needed.

© 2011 – 2013 PERCONA 84 Lost Events

• Any GRANT statement that fails on the master causes replication to stop: Last_Errno: 1590 Last_Error: The incident LOST_EVENTS occured on the master. Message: error writing to the binary log • Skip the GRANT statement on all slaves, and restart replication.

http://bugs.mysql.com/bug.php?id=68892

© 2011 – 2013 PERCONA 85 REPLICATION TOOLS AND ALTERNATIVES

© 2011 – 2013 PERCONA 86 Tools and Alternatives

• MMM • Pacemaker and PRM • MHA • Percona XtraDB Cluster • Continuent Tungsten • DRBD

© 2011 – 2013 PERCONA 87 MySQL Multi-Master (MMM)

• Manages master-master pair, slaves, and virtual IPs. • MMM was once popular, but bugs made it unreliable. • It’s now “abandonware”—we don’t recommend it.

Server 1 Server 2 Server 1 Server 2

Server 3 Server 4 Server 5 Server 3 Server 4 Server 5

© 2011 – 2013 PERCONA 88 Pacemaker and PRM

• PRM: MySQL resource agent for Pacemaker, part of an open-source stack of tools for high availability. • Distributed monitor daemon chooses one to be master, and automatically executes CHANGE MASTER for others to be slaves. • If the master fails, PRM chooses another node to be the new master and reassigns the “master” virtual IP.

Server 1 Server 1

Server 2 Server 3 Server 4 Server 2 Server 3 Server 4

© 2011 – 2013 PERCONA 89 MHA

• Automatic failover like PRM. • Also fixes slaves to ensure they subscribe to the correct binlog position. • Failover can therefore take more time. • Monitor is a single point of failure (SPOF), and can do the wrong thing if cluster members are unreachable.

Monitor Monitor Server 1 Server 1

Server 2 Server 3 Server 4 Server 2 Server 3 Server 4

© 2011 – 2013 PERCONA 90 Percona XtraDB Cluster

• Synchronous replication—every node is writable. – http://www.percona.com/software/percona-xtradb-cluster

Master Master

Master Master http://www.mysqlperformanceblog.com/2012/12/04/a-closer-look-at-percona--cluster-for-mysql/

© 2011 – 2013 PERCONA 91 Continuent Tungsten Replicator

• Replaces traditional MySQL replication. • Advanced clustering and replication technology with many enhanced features. – Heterogeneous replication to Oracle and PostgreSQL. – Slaves can take changes from multiple masters. – Global transaction ID makes changing masters simpler. – Parallel replication—easier for slaves to stay in sync. • http://www.continuent.com/

© 2011 – 2013 PERCONA 92 DRBD

• Block-level synchronous filesystem replication. • MySQL can’t run on the slave while replicating. • When you do start up the slave instance, MySQL must perform crash recovery.

Master Slave

© 2011 – 2013 PERCONA 93 Replication CONCLUSION

© 2011 – 2013 PERCONA 94 Conclusion

• Replication is a common solution for several database administration problems, including scaling and redundancy. • We have covered setup, maintenance, and troubleshooting replication, as well as tools and some alternatives.

Questions?

© 2011 – 2013 PERCONA 95 Server Troubleshooting

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA 1 Table of Contents

1. Understanding Server Components 3. SHOW GLOBAL STATUS

2. Diagnostic Tools 4. Case Studies

© 2011 – 2013 PERCONA 2 Server Troubleshooting UNDERSTANDING SYSTEM COMPONENTS

© 2011 – 2013 PERCONA 3 Overview

• Understanding resource consumption and how it is reported is very important. For example: – Is CPU usage at 100% a problem? .. how about at 60%? – What does Linux mean when it says my RAID controller is 100% utilized?

© 2011 – 2013 PERCONA 4 The Components of a System

CPU Memory

Disks Network

© 2011 – 2013 PERCONA 5 Let’s Start with Some Context

L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Zippy 3,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns

See: http://www.linux-mag.com/cache/7589/1.html http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

© 2011 – 2013 PERCONA 6 Let’s Start with Some Context

L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Zippy 3,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns SSD are somewhere between network and disk… 100,000 Round trip within same datacenter 500,000 ns ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns

See: http://www.linux-mag.com/cache/7589/1.html http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

© 2011 – 2013 PERCONA 7 CPUs

• CPUs are very fast relative to every other component. • In a naive vmstat analysis, CPU will rarely look like the bottleneck. – Few CPU problems are reported as 100% utilization. – That does not mean they are not a problem.

© 2011 – 2013 PERCONA 8 CPU Scalability

• [Perfect World] As we add CPUs we get a linear throughput increase, provided we have sufficient concurrency:

© 2011 – 2013 PERCONA 9 CPU Scalability

• [Reality] We never quite follow the theoretical curve:

Notice the gap starting to widen?

© 2011 – 2013 PERCONA 10 Mutex Contention. This is the most likely reason.

© 2011 – 2013 PERCONA 11 What’s a Mutex?

© 2011 – 2013 PERCONA 12 What’s a Mutex? (cont.)

© 2011 – 2013 PERCONA 13 What’s a Mutex? (cont.)

© 2011 – 2013 PERCONA 14 Mutexes become hotspots

• The longer the mutex is held, the more likely you can hold up other tasks—and reduce CPU scalability:

CPUs in use These CPUs are It may not even show one waiting. They CPU at 100%. For can’t complete example the CPU holding any work. the mutex could be waiting on blocking IO.

© 2011 – 2013 PERCONA 15 Next thing to know about CPUs

• Not all tasks arrive on time. Take the following example of a manufacturing process:

Each widget is exactly Mechanical arm can pick up 1 one second apart. widget/second, stamp it, and place it on the second belt.

© 2011 – 2013 PERCONA 16 Next thing to know about CPUs

• Not all tasks arrive on time. Take the following example of a manufacturing process:

M

Each widget is exactly Mechanical arm can pick up 1 one second apart. widget/second, stamp it, and place it on the second belt.

© 2011 – 2013 PERCONA 17 Throughput Question

• There is only one mechanical arm—no parallelism is possible. – Service time of the mechanical arm is 1 second. – Maximum capacity is 60 boxes/minute. • Can we have a throughput of 60 boxes/minute and a response time of 1 second?

In this example we can. But only because the arrival rate of the widgets is controlled.

© 2011 – 2013 PERCONA 18 Important Real-Life Difference

• The arrival rate of requests is not evenly distributed:

M

A lot of queuing Timeslice is not 0.5 seconds apart. applies to this used, and is Some queuing has last request. ‘lost’ forever. to apply.

© 2011 – 2013 PERCONA 19 Takeaways

• If you have random arrivals—you may not be able to reach capacity and have an acceptable response time. • All CPUs hitting 100% may never happen. • Just because you don’t see CPUs hitting 100% it does not mean that you do not have a problem. – There may still be a response time impact.

© 2011 – 2013 PERCONA 20 Conclusion

• CPUs being used could be a good thing. – It shows the efficiency of a storage engine to be able to use resources—and not being blocked waiting. • CPUs being close to maxed out could be the symptom of a very bad thing. – It could be that response time is suffering because the chance of queuing gets higher as utilization increases.

© 2011 – 2013 PERCONA 21 The Components of a System

CPU Memory

Disks Network

© 2011 – 2013 PERCONA 22 Memory

• Think of memory as no different to any other cache. • Much like you can measure memcached hit/miss ratio, it’s very easy to find a memory miss:

[root@train ~]# vmstat 5 procs ------memory------swap------io------system------cpu------r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 45628 37812 37332 110324 0 0 1 3 1 1 0 0 100 0 0 0 0 45628 37804 37332 110324 0 0 0 0 24 16 0 0 100 0 0 0 0 45628 37804 37332 110324 0 0 0 0 19 15 0 0 100 0 0 ..

Were there any physical reads in the last ten seconds? No.

© 2011 – 2013 PERCONA 23 Huh? I Need to Read Files?

• If you throw enough memory at your system, you may not ever need to once caches are warm* • MySQL will read rows directly from caches. • The Operating System itself will allocate caches for filesystem activity. – Reading a file does not need to convert into a physical read operation.

* Filesystem(s) need to be mounted with noatime.

© 2011 – 2013 PERCONA 24 Filesystem Caches on Linux

• Linux will use leftover memory for filesystem caches. • It may even swap out some of our processes if it believes that the memory is better spent for caches.

[root@train ~]# free –m total used free shared buffers cached Mem: 245 208 36 0 36 107 -/+ buffers/cache: 64 181 Swap: 511 44 467

Space taken up by filesystem caches.

© 2011 – 2013 PERCONA 25 Decreasing Cache Misses • Increase the amount of memory! – If it is reads (‘cache misses’) that are your problem, this is often a much more cost effective solution than buying fast disks/SSDs.

Performance degradation when a working set doesn’t fit in memory. SSDs don’t help as much as memory.

http://www.mysqlperformanceblog.com/2010/04/08/fast-ssd-or-more-memory/

© 2011 – 2013 PERCONA 26 Decreasing Cache Misses (cont.)

• Reduce what needs to be in cache. – More effective use of indexes? – Partitioning or archiving out older data? – Compressing large TEXT/BLOB columns? – Ensure that appropriate data types are used?

© 2011 – 2013 PERCONA 27 The Peril of Cold Caches

• Warm up speed is becoming very important. • A server with 64GB+ RAM could take hours to be functional post-restart. – ... and days to be completely warm. – It’s not just the speed it takes to read from disk, it’s the speed for caches to settle. – Buffer pool dump/restore feature may help in this case (MySQL 5.6 or Percona Server).

© 2011 – 2013 PERCONA 28 The Components of a System

CPU Memory

Disks Network

© 2011 – 2013 PERCONA 29 Reapplying Context:

L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Zippy 3,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns

See: http://www.linux-mag.com/cache/7589/1.html and Google http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

© 2011 – 2013 PERCONA 30 Mental Math

• 10,000,000 ns = 10ms = 100 operations/second. – The figure Google quoted here is about the average for a 7200RPM drive. – When we talk about our storage devices, we most commonly measure them in IOPS (IO operations per second). – So a 7200RPM drive can do approximately 100IOPS. – 15K RPM disks might do ~160-180 IOPS.

© 2011 – 2013 PERCONA 31 Why Count “Operations?”

• Because there’s not much difference between doing one small request versus one slightly larger request.

Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns

Includes a disk seek

© 2011 – 2013 PERCONA 32 Yes, There’s a Gap

• For each disk operation: – Millions of CPU operations can be done. – Hundreds of thousands of memory operations can be done. • MySQL algorithms try to avoid as much disk access as possible – Buffer pool, transaction log, read ahead, key cache, filesystem cache, …

© 2011 – 2013 PERCONA 33 Yes, There’s a Gap

• The next question becomes: – When is touching disks unavoidable?

© 2011 – 2013 PERCONA 34 First with Reads

• Reads are able to use caches. Add more memory to improve hit efficiency. • How much memory does not depend on the size of your data - it is related to what needs to be kept in memory. – This could be between 0-100% of data, depending on access pattern.

We call this concept a “working set.”

© 2011 – 2013 PERCONA 35 Then with Writes

• Writes are not able to use caches in the same way. – They can buffer changes (dirty pages) in memory, but the transaction log must be flushed to disk to guarantee durability. – [Default] InnoDB will flush the log buffer and wait on each commit. The syncing is often a performance bottleneck on systems without a RAID controller/write- back cache.

© 2011 – 2013 PERCONA 36 The Components of a System

CPU Memory

Disks Network

© 2011 – 2013 PERCONA 37 Network

• Does not commonly cause throughput problems. • May be the cause of response time problems. • Easiest ways to mitigate: – Reduce needless round trips; cache query results; merge queries before sending them to MySQL. – Reduce needlessly large result sets; avoid SELECT * and fetching too many rows; avoid “Full join.”

© 2011 – 2013 PERCONA 38 Network (cont.)

• Involves more “external” technology than solving problems for other components. Always interesting: – Are there a lot of sockets in TIME_WAIT? – A lot going to port 53 (DNS)? lsof is one tool to use here.

© 2011 – 2013 PERCONA 39 Server Troubleshooting DIAGNOSTIC TOOLS

© 2011 – 2013 PERCONA 40 Diagnostic Tools

✦ We’re going to focus on a few of these tools:

iostat vmstat mpstat

top dmesg perf/oprofile

innotop SHOW PROCESSLIST SAR

free strace Poor Man’s Profiler

GDB lsof ps

netstat ping SHOW GLOBAL STATUS

SHOW INNODB STATUS Raid Controller Utilities pt-query-digest

© 2011 – 2013 PERCONA 41 top

• Purely a sanity check. Before starting diagnostics, confirm what is really running. – On a DB server mysqld should be at the top. – Most of the time there should only be one mysqld process. • Example usage: The -bn1 allows you to run it non- $ top -bn1 interactively. This is helpful when you want to record all of your diagnostic collection activities with ‘script’

© 2011 – 2013 PERCONA 42 © 2011 – 2013 PERCONA 43 ps

• Check for all running MySQL servers: $ ps aux | grep mysqld • Verify that the sum of VSZ adds up to roughly the amount of memory used by the system: $ ps -e -o vsz | awk '{size += $1}END{print(size)}'

This shows virtual size. It is helpful to know, because our database has less opportunities to use swap than other applications.

© 2011 – 2013 PERCONA 44 vmstat

• You can watch a variety of current resource usage on the system. For example: $ vmstat 5

This is very general-purpose, but may give you a good sense of the system as a whole, and what other diagnostic tools should be run.

© 2011 – 2013 PERCONA 45 vmstat Output

root@ubuntu:~# vmstat 5 procs ------memory------swap------io---- -system------cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 9 7 2808 159360 378048 26494424 0 0 6 61 1 0 1 2 98 0 0 1 2808 166260 378140 26343512 0 0 22483 8739 6244 59414 9 4 61 26 0 11 2808 160064 378192 26188864 0 0 23728 4444 6191 71401 11 4 61 24 7 13 2808 164408 378236 26023756 0 0 24036 4618 6769 75098 11 4 57 29 7 10 2808 161044 378340 25860012 0 0 23203 8597 7266 84357 12 4 59 24 7 8 2808 167432 378404 25705900 0 0 20429 5858 7047 84135 13 4 61 22 7 12 2808 159216 378520 25565520 0 0 21101 11900 7494 89128 13 4 54 29

Ignore the These first line. should be zero. What does “12%” mean if you have These are IO 8 cores? statistics.

© 2011 – 2013 PERCONA 46 mpstat

• Installed as part of sysstat—not installed by default on most Linux distros (sar is also in sysstat). • More useful than vmstat because it shows individual CPUs. Example: $ mpstat -P ALL 5

© 2011 – 2013 PERCONA 47 mpstat 5

10:36:12 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 10:36:17 PM all 18.81 0.05 3.22 0.22 0.24 2.71 0.00 74.75 13247.40 10:36:17 PM 0 19.57 0.00 3.52 0.98 0.20 2.74 0.00 72.99 1939.00 10:36:17 PM 1 18.27 0.00 3.08 0.38 0.19 2.50 0.00 75.58 1615.40 10:36:17 PM 2 19.09 0.20 3.35 0.20 0.39 1.97 0.00 74.80 1615.60 10:36:17 PM 3 17.73 0.00 3.47 0.39 0.39 3.08 0.00 74.95 1615.40 10:36:17 PM 4 18.15 0.00 2.70 0.00 0.39 2.70 0.00 76.06 1615.60 10:36:17 PM 5 19.38 0.00 3.10 0.19 0.39 2.52 0.00 74.42 1615.40 10:36:17 PM 6 18.39 0.00 3.45 0.00 0.19 2.49 0.00 75.48 1615.40 10:36:17 PM 7 19.96 0.20 2.94 0.00 0.00 3.33 0.00 73.58 1615.40 10:36:17 PM 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Much better break down of utilization by CPU core.

© 2011 – 2013 PERCONA 48 free

• Doesn’t really show anything vmstat won’t. • It has very handy one line math to show caches.

$ free -m total used free shared buffers cached Mem: 32177 30446 1730 0 368 16649 -/+ buffers/cache: 13428 18748 Swap: 4095 2 4093 Filesystem This is memory used Ignore swap counts caches will by applications—a lot here. They are better appear in here. less than the 30446 on observed in vmstat. the previous line.

© 2011 – 2013 PERCONA 49 netstat (cont.)

• Show count of states: $ netstat -antp | awk '{print $6}' | sort | uniq -c | sort -rn • Show count of peers: $ netstat -antp | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn

© 2011 – 2013 PERCONA 51 ping

• Useful for checking for packet loss and understanding how much of an effect network latency has on response time. For example: $ ping sb1.percona.comPING sb1.percona.com (66.135.55.221) 56(84) bytes of data. 64 bytes from sb1.percona.com (66.135.55.221): icmp_seq=1 ttl=56 time=8.30 ms 64 bytes from sb1.percona.com (66.135.55.221): icmp_seq=2 ttl=56 time=8.12 ms 64 bytes from sb1.percona.com (66.135.55.221): icmp_seq=3 ttl=56 time=8.07 ms 64 bytes from sb1.percona.com (66.135.55.221): icmp_seq=4 ttl=56 time=8.23 ms 64 bytes from sb1.percona.com (66.135.55.221): icmp_seq=5 ttl=56 time=8.13 ms 64 bytes from sb1.percona.com (66.135.55.221): icmp_seq=6 ttl=56 time=8.27 ms

--- sb1.percona.com ping statistics --- 6 packets transmitted, 6 received, 0% packet loss, time 5000ms rtt min/avg/max/mdev = 8.074/8.189/8.303/0.111 ms

Good result in same datacenter is ~0.5ms. Higher latency networks might be 1-2ms.

© 2011 – 2013 PERCONA 52 iostat

• Much better IO statistics than what vmstat provides. Example usage: $ iostat -dx 5 (need x for extended statistics) $ iostat -kx 5 (show CPU stats at the same time) • Main problems are that: – It lumps reads and writes together. A caching RAID controller makes writes cheaper, skewing the result. – The kernel is unaware of storage-level parallelism, it assumes all devices have a single platter.

© 2011 – 2013 PERCONA 53 root@ubuntu:~# iostat -dx 5 .. Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 10.40 224.20 1810.40 212.40 57920.00 8273.60 32.72 11.41 5.65 0.49 100.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 10.40 224.20 1810.40 212.40 57920.00 8273.60 32.72 11.41 5.65 0.49 100.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.60 0.00 6.40 0.00 10.67 0.00 6.67 6.67 0.40 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 1807.20 434.00 57510.40 8252.80 29.34 11.42 5.10 0.44 99.60 fioa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 13.80 240.20 1933.20 250.80 61849.60 15096.00 35.23 12.34 5.65 0.46 100.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 13.80 240.20 1933.20 250.80 61849.60 15096.00 35.23 12.34 5.65 0.46 100.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 1947.80 491.00 61875.20 15096.00 31.56 12.49 5.11 0.41 100.00 fioa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

© 2011 – 2013 PERCONA 54 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 21.60 310.00 2543.60 341.00 81376.00 8404.80 31.12 21.04 7.28 0.35 100.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 21.60 310.00 2543.60 341.00 81376.00 8404.80 31.12 21.04 7.28 0.35 100.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 What 0.00 is the 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 Write 0.00 IOPS 0.00 0.00 0.00 0.00 0.00 0.00 0.00 average wait time Read IOPSdm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 for 0.00each request? 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 2567.80 651.00 81459.20 8404.80 27.92 21.27 6.59 0.31 100.00 fioa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 19.60 354.20 2486.40 450.80 79526.40 20241.60 33.97 21.69 7.38 0.34 100.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 19.60 354.20 2486.60 450.80 79532.80 20241.60 33.97 21.69 7.38 0.34 100.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00Await might0.00 be0.00 longer 0.00 for 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00larger requests.0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00What 0.00is the 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 2506.80 805.00 79545.60 20241.60queue 30.13 length? 22.06 6.66 0.30 100.00 fioa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00% util 0.00isn’t accurate most of the time.

© 2011 – 2013 PERCONA 55 perf/oprofile

See where the majority of CPU time is spent.

© 2011 – 2013 PERCONA 56 Poor Man’s Profiler

• The goal of pt-pmp is to not necessarily show where time is spent, but what threads are blocking on. – This is then aggregated.

http://poormansprofiler.org/

© 2011 – 2013 PERCONA 57 Poor Man’s Profiler

291 pthread_cond_wait@@GLIBC_2.3.2,one_thread_per_connection_end,handle_one_connection 57 read,my_real_read,my_net_read,do_command,handle_one_connection,start_thread 26 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle, .. 3 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,srv_purge_worker_thread 1 select,os_thread_sleep,srv_purge_thread 1 select,os_thread_sleep,srv_master_thread 1 select,os_thread_sleep,srv_lock_timeout_and_monitor_thread 1 select,os_thread_sleep,srv_error_monitor_thread 1 select,handle_connections_sockets,main,select 1 read,vio_read_buff,my_real_read,my_net_read,cli_safe_read,handle_slave_io 1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event, .. 1 pread64,os_file_pread,os_file_read,fil_io,buf_read_page_low,buf_read_page, .. 1 pread64,os_file_pread,os_file_read,fil_io,buf_read_page_low,buf_read_page, .. 1 do_sigwait,sigwait,signal_hand

© 2011 – 2013 PERCONA 58 pt-query-digest

• Aggregate by query fingerprint and report in order of greatest response time.

root@ubuntu:~# perl pt-query-digest /bench/mysqldata/ubuntu-slow.log

# 1461.1s user time, 39.2s system time, 22.20M rss, 57.52M vsz # Overall: 7.26M total, 38 unique, 17.28k QPS, 18.88x concurrency ______# total min max avg 95% stddev median # Exec time 7929s 12us 918ms 1ms 4ms 10ms 138us # Lock time 154s 0 17ms 21us 36us 33us 18us # Rows sent 5.90M 0 246 0.85 0.99 6.71 0.99 # Rows exam 6.90M 0 495 1.00 0.99 13.48 0 # Time range 2009-09-13 17:26:54 to 2009-09-13 17:33:54 # bytes 765.14M 6 599 110.56 202.40 65.01 80.10 # Rows read 0 0 0 0 0 0 0

http://www.percona.com/doc/percona-toolkit/pt-query-digest.html

© 2011 – 2013 PERCONA 59 .. # Rank Query ID Response time Calls R/Call Item # ======# 1 0x813031B8BBC3B329 1793.7763 23.9% 274698 0.006530 COMMIT # 2 0x10BEBFE721A275F6 1758.1369 23.5% 859757 0.002045 INSERT order_line # 3 0xBD195A4F9D50914F 924.4553 12.3% 859770 0.001075 SELECT UPDATE stock # 4 0x6E70441DF63ACD21 596.6281 8.0% 859769 0.000694 UPDATE stock # 5 0x5E61FF668A8E8456 448.0148 6.0% 1709675 0.000262 SELECT stock # 6 0x0C3504CBDCA1EC89 308.9468 4.1% 86102 0.003588 UPDATE customer # 7 0xA0352AA54FDD5DF2 307.4916 4.1% 86103 0.003571 UPDATE order_line # 8 0xFFDA79BA14F0A223 192.8587 2.6% 86122 0.002239 SELECT customer warehouse # 9 0x0C3DA99DF6138EB1 191.9911 2.6% 86120 0.002229 SELECT UPDATE customer # 10 0xBF40A4C7016F2BAE 109.6601 1.5% 860614 0.000127 SELECT item # 11 0x8B2716B5B486F6AA 107.9319 1.4% 86120 0.001253 INSERT history # 12 0x255C57D761A899A9 103.9751 1.4% 86120 0.001207 UPDATE warehouse # 13 0xF078A9E73D7A8520 102.8506 1.4% 86120 0.001194 UPDATE district # 14 0x9577D48F480A1260 91.3182 1.2% 56947 0.001604 SELECT customer # 15 0xE5E8C12332AD11C5 87.2532 1.2% 86122 0.001013 SELECT UPDATE district # 16 0x2276F0D2E8CC6E22 86.1945 1.1% 86122 0.001001 UPDATE district # 17 0x9EB8F1110813B80D 83.1471 1.1% 86106 0.000966 UPDATE orders # 18 0x0BF7CEAD5D1D2D7E 80.5878 1.1% 86122 0.000936 INSERT orders # 19 0xAC36DBE122042A66 74.5417 1.0% 8612 0.008656 SELECT order_line # 20 0xF8A4D3E71E066ABA 46.7978 0.6% 8612 0.005434 SELECT orders

© 2011 – 2013 PERCONA 60 .. # Query 1: 655.60 QPS, 4.28x concurrency, ID 0x813031B8BBC3B329 at byte 518466 # This item is included in the report because it matches --limit. # pct total min max avg 95% stddev median # Count 3 274698 # Exec time 22 1794s 12us 918ms 7ms 2ms 43ms 332us # Lock time 0 0 0 0 0 0 0 0 # Rows sent 0 0 0 0 0 0 0 0 # Rows exam 0 0 0 0 0 0 0 0 # Users 1 [root] # Hosts 1 localhost # Databases 1 tpcc # Time range 2009-09-13 17:26:55 to 2009-09-13 17:33:54 # bytes 0 1.57M 6 6 6 6 0 6 # Query_time distribution # 1us # 10us ### # 100us ################################################################ # 1ms ## # 10ms ## # 100ms # # 1s # 10s+ commit\G

© 2011 – 2013 PERCONA 61 .. # Query 2: 2.05k QPS, 4.20x concurrency, ID 0x10BEBFE721A275F6 at byte 17398977 # This item is included in the report because it matches --limit. # pct total min max avg 95% stddev median # Count 11 859757 # Exec time 22 1758s 64us 812ms 2ms 9ms 9ms 224us # Lock time 17 27s 13us 9ms 31us 44us 26us 28us # Rows sent 0 0 0 0 0 0 0 0 # Rows exam 0 0 0 0 0 0 0 0 # Users 1 [root] # Hosts 1 localhost # Databases 1 tpcc # Time range 2009-09-13 17:26:55 to 2009-09-13 17:33:54 # bytes 22 170.52M 192 213 207.97 202.40 0.58 202.40 # Query_time distribution # 1us # 10us # # 100us ################################################################ # 1ms ############ # 10ms ### # 100ms # # 1s # 10s+ # Tables # SHOW TABLE STATUS FROM `tpcc` LIKE 'order_line'\G # SHOW CREATE TABLE `tpcc`.`order_line`\G INSERT INTO order_line (ol_o_id, ol_d_id, ol_w_id, ol_number, ol_i_id, ol_supply_w_id, ol_quantity, ol_amount, ol_dist_info) VALUES (3669, 4, 65, 1, 6144, 38, 5, 286.943756103516, 'sRgq28BFdht7nemW14opejRj')\G

© 2011 – 2013 PERCONA 62 .. # Query 4: 2.05k QPS, 1.42x concurrency, ID 0x6E70441DF63ACD21 at byte 192769443 # This item is included in the report because it matches --limit. # pct total min max avg 95% stddev median # Count 11 859769 # Exec time 7 597s 67us 794ms 693us 467us 6ms 159us # Lock time 12 19s 9us 10ms 21us 31us 25us 19us # Rows sent 0 0 0 0 0 0 0 0 # Rows exam 0 0 0 0 0 0 0 0 # Users 1 [root] # Hosts 1 localhost # Databases 1 tpcc # Time range 2009-09-13 17:26:55 to 2009-09-13 17:33:54 # bytes 7 56.36M 64 70 68.73 65.89 0.30 65.89 # Query_time distribution # 1us # 10us # # 100us ################################################################ # 1ms # # 10ms # # 100ms # # 1s # 10s+ # Tables # SHOW TABLE STATUS FROM `tpcc` LIKE 'stock'\G # SHOW CREATE TABLE `tpcc`.`stock`\G UPDATE stock SET s_quantity = 79 WHERE s_i_id = 89277 AND s_w_id = 51\G # Converted for EXPLAIN # EXPLAIN select s_quantity = 79 from stock where s_i_id = 89277 AND s_w_id = 51\G

© 2011 – 2013 PERCONA 63 pt-query-digest (1)

• How to use: – Technique #1 - Try and find queries over a threshold. – Technique #2 - Log all queries and then report on what is expensive, and frequent.

© 2011 – 2013 PERCONA 64 pt-query-digest (2)

• The query_time distribution is very helpful! – It helps you visualize changes in execution plans or bottlenecks related to concurrency. – If you see a problem, then you can read min/max/avg/median/95pct/stddev.

© 2011 – 2013 PERCONA 65 Query Review

• pt-query-digest can show you only new queries that are slow, not the ones you’ve previously audited and have already tuned: $ pt-query-digest --review h=host1,D=test,t=query_review /path/to/slow.log

http://www.percona.com/doc/percona-toolkit/pt-query-digest.html

© 2011 – 2013 PERCONA 66 Server Troubleshooting SHOW GLOBAL STATUS

© 2011 – 2013 PERCONA 67 SHOW GLOBAL STATUS

• MySQL reports many counters of operations that have happened. – It’s very useful information to know—if we look at just something like iostat, we’ll never tell how many ‘cache hits’ we had. • In some cases, there are opportunities to change configuration settings. – ... but be careful what you change!

© 2011 – 2013 PERCONA 68 Example mysql> show global status [like 'pattern']; +------+------+ | Variable_name | Value | +------+------+ | Aborted_clients | 0 | | Aborted_connects | 0 | . . . | Uptime | 8675309 | | Uptime_since_flush_status | 8675309 | +------+------+ 374 rows in set (0.00 sec)

© 2011 – 2013 PERCONA 69 Beware of Skew

• Don’t just run SHOW GLOBAL STATUS and make a decision. – The rate of change in recent moments is more important than the count since day one. • Aggregate over an interval. – pt-mext is a simple aggregation tool http://www.percona.com/doc/percona-toolkit/pt- mext.html – Graph the output to get an understanding of trends that change these counters.

© 2011 – 2013 PERCONA 70 Example: pt-mext

relative mode; columns show deltas delay in seconds count of instead of absolute between iterations iterations counts $ pt-mext -r -- mysqladmin ext –i 10 –c 5 | grep ”^Uptime " Uptime 8675309 10 10 10

© 2011 – 2013 PERCONA 71 How It Starts

• Understanding the workload a little via read to write ratio. In 10 second mext averages:

Com_select 6285383 208991 206585 Com_insert 162929 5389 5381 Com_update 213110 7069 7043 Com_delete 12545 419 415

© 2011 – 2013 PERCONA 72 Alternative View • Another way of viewing this is to look and Handler counts. A handler operation is one inside the storage engine:

Handler_commit 10571381 351173 348265 Handler_delete 125076 4170 4150 Handler_read_first 1 0 0 Handler_read_key 17359898 577000 571492 Handler_read_next 7602090 252679 249803 Handler_read_prev 0 0 0 Handler_read_rnd 0 0 0 Handler_read_rnd_next 250478392 8394893 8211593 Handler_rollback 1263 36 38 Handler_update 3259942 108016 107546 Handler_write 41377463 1379512 1360403

© 2011 – 2013 PERCONA 73 Line of Thinking

• iostat is showing a lot of writes. • We have a small amount of Com_insert, Com_update, Com_Delete but a large amount of handler writer? • Our workload should be read-heavy. What can cause more writing?

© 2011 – 2013 PERCONA 74 Temporary Tables on Disk?

• These are often caused by some internal GROUP BYs or an ORDER BY that can’t use an index. • They default to memory unless they grow too big, but... • All temporary tables with text/blob columns will be created on disk regardless!

© 2011 – 2013 PERCONA 75 Temporary Tables on Disk (cont.)

• It’s more important to see how many of these disk temporary tables are created at once, than a ratio of memory to disk. • You may be able to change tmp_table_size and max_heap_table_size to increase the threshold.

Created_tmp_disk_tables 0 0 0 Created_tmp_files 5 0 0 Created_tmp_tables 12550 421 417

© 2011 – 2013 PERCONA 76 Binary Log Cache?

• Updates are buffered before being written to the binary log. If they’re too big, the buffer creates a temporary file on disk: mysql> show global status like 'binlog%'; +------+------+ | Variable_name | Value | +------+------+ | Binlog_cache_disk_use | 1082 | | Binlog_cache_use | 78328 | +------+------+ 2 rows in set (0.00 sec) Added in 5.5.9. It is used • Corresponding Session Variable: to cache changes to non- binlog_cache_size transactional tables during a transaction. binlog_statement_cache_size

© 2011 – 2013 PERCONA 77 Sorting Data?

mysql> show global status like 'sort%'; +------+------+ | Variable_name | Value | +------+------+ | Sort_merge_passes | 9924 | | Sort_range | 234234 | | Sort_rows | 9438998 | | Sort_scan | 24333 | +------+------+ 4 rows in set (0.00 sec)

• Corresponding Session Variable: sort_buffer_size

© 2011 – 2013 PERCONA 78 Sorting Data (cont.)

• Caused by: – ORDER BY (and not being able to use an index for sorting). – GROUP BY (instead of GROUP BY c ORDER BY NULL). • sort_merge_passes is incremented every time the internal sort algorithm has to loop over more than once to sort. – A small number is healthy. Be careful not to over set the sort_buffer_size. – Sometimes I look at how many sort_merge_passes occur per second (run SHOW GLOBAL STATUS more than once).

© 2011 – 2013 PERCONA 79 Every Setting Has a Range!

• You really can have too much of a good thing. • It takes more resources to allocate larger chunks of memory, and in some cases you’ll miss valuable CPU caches. • It’s better to set the default sort_buffer_size small. A given thread can increase it as a session variable.

© 2011 – 2013 PERCONA 80 Query Cache

mysql> show global status like 'Qcache%'; +------+------+ | Variable_name | Value | You want a large +------+------+ ratio between | Qcache_free_memory | 99812 | hits and inserts, | Qcache_hits | 210213 | e.g. 10:1 | Qcache_inserts | 82333 | | Qcache_not_cached | 2032 | | Qcache_queries_in_cache | 5322 | Why wouldn’t a +------+------+ query be cached? 8 rows in set (0.00 sec)

© 2011 – 2013 PERCONA 81 Query Cache (cont.)

• You really need to have many more hits than inserts, but query cache tuning is not that simple. • Increasing query cache allocation chunk size: – query_cache_min_res_unit=16K • The query cache doesn’t scale well on machines with many cores. You may disable it: – query_cache_type=0 – query_cache_size=0

© 2011 – 2013 PERCONA 82 Table Locks

• Some storage engines (MyISAM, Memory) have table level locking. Under concurrency this can be a real contention point: mysql> show global status like 'table_locks%'; +------+------+ | Variable_name | Value | +------+------+ | Table_locks_immediate | 52323 | | Table_locks_waited | 3293 | +------+------+ 2 rows in set (0.00 sec) • Tip: You really need to watch this one in particular in mext. Locking problems tend to snowball.

© 2011 – 2013 PERCONA 83 Table Cache

• Every session requires a cached handle to each table used by that session.

mysql> show global status like 'Open%tables'; +------+------+ | Variable_name | Value | +------+------+ | Open_tables | 64 | | Opened_tables | 532432 | +------+------+ 2 rows in set (0.00 sec) • Corresponding Global Variables: – table_open_cache – table_definition_cache

© 2011 – 2013 PERCONA 84 Table Cache (cont.)

• The table cache is searched linearly when MySQL tries to open tables. • This means that opening tables may slow down as the size of the table cache is increased. • Check ‘Opened_tables’ in SHOW GLOBAL STATUS Consider increasing the cache until you see 10 opens/ sec or less.

A larger cache will use more CPU but decrease IO. If you open a lot of tables you may see better overall performance with a larger cache.

© 2011 – 2013 PERCONA 85 Thread Cache

• Each connection in MySQL is a thread. You can reduce Operating System thread creation/destruction with a small thread_cache: mysql> show global status like 'threads%'; +------+------+ | Variable_name | Value | +------+------+ | Threads_cached | 16 | | Threads_connected | 67 | | Threads_created | 4141 | | Threads_running | 6 | +------+------+ 4 rows in set (0.00 sec) • Corresponding Global Variable: Not necessary for you to tune this on modern Linux thread_cache_size unless you are Facebook

© 2011 – 2013 PERCONA 86 Max Connections

• Seeing max_used_connections equal to max_connections indicates that a connection was likely refused at some point: mysql> show global status like 'max%'; +------+------+ | Variable_name | Value | +------+------+ | Max_used_connections | 401 | +------+------+ 1 row in set (0.00 sec) • Corresponding Global Variable: max_connections

© 2011 – 2013 PERCONA 88 Cartesian Products?

• Joining two tables without an index on either can often mean you’re doing something wrong. • You can see this if Select_full_join > 0: mysql> show global status like 'Select_full_join'; +------+------+ | Variable_name | Value | +------+------+ | Select_full_join | 0 | +------+------+ 1 row in set (0.00 sec)

© 2011 – 2013 PERCONA 89 Finding Cartesian Products

• Also happens when you have no join condition: – SELECT * FROM Table1, Table2;

• Watch the verbose slow-query log in Percona Server.

© 2011 – 2013 PERCONA 90 Best Way to Review Global Status?

• Automated tools look at ratios. Ratios conceal magnitude. • Many stat counters also conceal magnitude. For example – How many rows were inserted into that Created_tmp_disk_tables? – How many table locks occurred in the last 30 seconds?

© 2011 – 2013 PERCONA 91 Automated Tools (cont.)

• Matthew Montgomery’s Tuning Primer (Bash) – https://launchpad.net/mysql-tuning-primer • Mysqltuner (Perl) – https://launchpad.net/mysqltuner

• These tools suffer similar limitations to most automated tools—they average the statistics since server startup. You usually care more about a very small window during peak time.

© 2011 – 2013 PERCONA 92 Server Troubleshooting CASE STUDIES

© 2011 – 2013 PERCONA 93 Case Studies

• Three case studies. • Three ways to solve problems: – Measure, Measure, Measure – Trust your instincts and measure. – Difficult cases to measure, reason by logic. • “Reasoning by logic” is by far the hardest way to solve problems.

© 2011 – 2013 PERCONA 94 Server Troubleshooting CASE STUDY #1

© 2011 – 2013 PERCONA 95 Scope of the Problem

• Overnight the query performance went from <1ms to 50x worse. – Nothing changed in terms of server configuration, schema. • It wasn’t due to CPUs reaching capacity. The customer was able to throttle the server to 1/2 of its workload: – From 20k QPS to 10k QPS. – No improvement.

© 2011 – 2013 PERCONA 96 Train of Thought

• Change in config client doesn't know about? • Hardware problem such as a failing disks? • Load increase or data growth or QPS crossed a “tipping point?” – The QPS problem seems like it has been eliminated, but it’s still possible that what needs to be in caches doesn’t have “a long tail.”

© 2011 – 2013 PERCONA 97 Train of Thought (cont.)

• Schema changes client doesn't know about, or InnoDB has regenerated statistics and now has a bad plan? • Network component such as DNS? – This is a common offender. Without skip-name- resolve, new connections into MySQL will sometimes backup.

© 2011 – 2013 PERCONA 98 Elimination of Easy Possibilities

• ALL queries are slower in the slow query log. – Eliminates DNS as a possibility. – Eliminates “random bad plan” theory. • Queries are slow when run via Unix socket – Eliminates network. • No errors in dmesg or RAID controller – Suggests (doesn't eliminate) that hardware is not the problem.

© 2011 – 2013 PERCONA 99 Easy Elimination (cont.)

• Detailed historical metrics show no change in Handler_ graphs: – Suggests (doesn't eliminate) that indexing is not the problem. – Confirms that workload isn’t completely different (customer didn’t forget to tell us about a new feature they deployed on Friday..) – Also, combined with the fact that ALL queries are 50x slower, very strong reason to believe indexing is not the problem.

© 2011 – 2013 PERCONA 100 Investigation of the Obvious

• Aggregation of SHOW PROCESSLIST shows queries are not in Locked status. • Investigating SHOW INNODB STATUS shows no problems with semaphores, transaction states such as “commit.” main thread, or other likely culprits.

© 2011 – 2013 PERCONA 101 Investigation (cont.)

• However, SHOW INNODB STATUS shows many queries in "" status, as here: – ---TRANSACTION 4 3879540100, ACTIVE 0 sec, process no 26028, OS thread id 1344928080 MySQL thread id 344746, query id 1046183178 Normally there is a 10.16.221.148 webuser state after the SELECT .... username. Is the query not running? • All such queries are simple and well-optimized according to EXPLAIN.

© 2011 – 2013 PERCONA 102 Investigation (cont.)

• No obvious answer yet, start gathering hardware information: – The system has 8 CPUs, Intel(R) Xeon(R) CPU E5450 @ 3.00GHz – The system has a RAID controller with 8 Intel XE-25 SSD drives behind it, with BBU and WriteBack caching.

© 2011 – 2013 PERCONA 103 vmstat 5

r b swpd free buff cache si so bi bo in cs us sy id wa 4 0 875356 1052616 372540 8784584 0 0 13 3320 13162 49545 18 7 75 0 4 0 875356 1070604 372540 8785072 0 0 29 4145 12995 47492 18 7 75 0 3 0 875356 1051384 372544 8785652 0 0 38 5011 13612 55506 22 7 71 0

No swapping. Never waiting on disks. Percentage user time and percentage system time - aggregated across CPUs

© 2011 – 2013 PERCONA 104 iostat -dx 5

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 61.20 1.20 329.20 15.20 4111.20 24.98 0.03 0.09 0.09 3.04 dm-0 0.00 0.00 0.80 390.60 12.80 4112.00 21.08 0.03 0.08 0.07 2.88 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 65.80 0.60 346.40 9.60 4974.40 28.73 0.04 0.11 0.09 3.20 dm-0 0.00 0.00 0.60 410.80 9.60 4968.80 24.20 0.04 0.10 0.08 3.28 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.40 58.20 1.00 308.80 16.00 3320.80 21.54 0.03 0.11 0.10 3.04 dm-0 0.00 0.00 1.40 362.00 16.00 3300.80 18.25 0.04 0.11 0.08 3.04

What’s the backlog of tasks look like?

How long does each task take to complete. How much of the time is there always an outstanding task.

© 2011 – 2013 PERCONA 105 mpstat 5

10:36:12 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 10:36:17 PM all 18.81 0.05 3.22 0.22 0.24 2.71 0.00 74.75 13247.40 10:36:17 PM 0 19.57 0.00 3.52 0.98 0.20 2.74 0.00 72.99 1939.00 10:36:17 PM 1 18.27 0.00 3.08 0.38 0.19 2.50 0.00 75.58 1615.40 10:36:17 PM 2 19.09 0.20 3.35 0.20 0.39 1.97 0.00 74.80 1615.60 10:36:17 PM 3 17.73 0.00 3.47 0.39 0.39 3.08 0.00 74.95 1615.40 10:36:17 PM 4 18.15 0.00 2.70 0.00 0.39 2.70 0.00 76.06 1615.60 10:36:17 PM 5 19.38 0.00 3.10 0.19 0.39 2.52 0.00 74.42 1615.40 10:36:17 PM 6 18.39 0.00 3.45 0.00 0.19 2.49 0.00 75.48 1615.40 10:36:17 PM 7 19.96 0.20 2.94 0.00 0.00 3.33 0.00 73.58 1615.40 10:36:17 PM 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Are the CPUs at least evenly loaded? Good to know, but still doesn’t rule out mutex contention.

© 2011 – 2013 PERCONA 106 Premature Conclusion

• As a result of all the above, it is reasonably safe to conclude that nothing external to the database is obviously the problem. • The system is not virtualized – I expect the database to be able to perform normally. • What to do next?

© 2011 – 2013 PERCONA 107 What Next?

• Think about tool options. It’s not IO bound, try to use a tool to make things easy. • Solution: – use oprofile.

© 2011 – 2013 PERCONA 108 root@dbserver:~/percona# opreport --demangle=smart --symbols --merge tgid `which mysqld` samples % image name app name symbol name 893793 31.1273 /no-vmlinux /no-vmlinux (no symbols) 325733 11.3440 mysqld mysqld Query_cache::free_memory_block(Query_cache_block*) 117732 4.1001 libc libc (no symbols) 102349 3.5644 mysqld mysqld my_hash_sort_bin 76977 2.6808 mysqld mysqld MYSQLparse(void*) 71599 2.4935 libpthread libpthread pthread_mutex_trylock 52203 1.8180 mysqld mysqld read_view_open_now 46516 1.6200 mysqld mysqld Query_cache::invalidate_query_block_list(THD*, Query_cache_block_table*) 42153 1.4680 mysqld mysqld Query_cache::write_result_data() 37359 1.3011 mysqld mysqld MYSQLlex(void*, void*) 35917 1.2508 libpthread libpthread __pthread_mutex_unlock_usercnt 34248 1.1927 mysqld mysqld __intel_new_memcpy 33825 1.1780 mysqld mysqld rec_get_offsets_func 25713 0.8955 mysqld mysqld my_pthread_fastmutex_lock 22541 0.7850 mysqld mysqld page_rec_get_n_recs_before 20322 0.7077 mysqld mysqld buf_page_get_gen 19037 0.6630 mysqld mysqld lex_start(THD*) 17818 0.6205 mysqld mysqld Query_cache::free_query(Query_cache_block*) 17509 0.6098 mysqld mysqld btr_search_guess_on_hash 17495 0.6093 mysqld mysqld find_field_in_table_ref( 14224 0.4954 mysqld mysqld build_template(row_prebuilt_struct*, THD*, st_table*, unsigned int) 13575 0.4728 mysqld mysqld query_cache_query_get_key 13308 0.4635 /usr/bin/oprofiled /usr/bin/oprofiled (no symbols) 13072 0.4552 mysqld mysqld Protocol::send_fields(List*, unsigned int) 12615 0.4393 /usr/lib/libperl.so.5.8.8 /usr/lib/libperl.so.5.8.8 (no symbols) 12242 0.4263 mysqld mysqld btr_cur_search_to_nth_level 11880 0.4137 mysqld mysqld page_cur_search_with_match 11343 0.3950 mysqld mysqld my_hash_search

©.. 2011 – 2013 PERCONA 109 Solution

• Start innotop (just to have a realtime monitor). • Disable query cache. • Watch QPS change in innotop.

© 2011 – 2013 PERCONA 110 Additional Confirmation • The slow query log also confirms queries back to normal tail -f /var/log/slow.log | perl pt-query-digest --run-time 30s --report-format=profile # Profile # Rank Query ID Response time Calls R/Call Item # ======# 1 0x5CE5EC5A7CA344DD 2.3601 15.9% 12773 0.0002 SELECT team_member # 2 0xE1D373DA4E0F4D7A 2.3244 15.6% 9488 0.0002 SELECT tg_user # 3 0x950A5CF5173D3022 1.9800 13.3% 5693 0.0003 SELECT namespace_member # 4 0x02B7087599A7C6BB 1.7745 11.9% 5662 0.0003 SELECT namespace_p?p_key # 5 0x6D26A1663AE2F07A 1.6751 11.3% 7266 0.0002 SELECT host # 6 0x75960C3BD6637C00 1.1919 8.0% 5318 0.0002 SELECT host # 7 0x813031B8BBC3B329 1.1193 7.5% 8545 0.0001 COMMIT # 8 0x0262228C76E3BDFD 0.9228 6.2% 5408 0.0002 SELECT pref # 9 0x5B0232CD0D7A122F 0.3382 2.3% 1879 0.0002 SELECT namespace_member # 10 0xFB44D5AA1D96A090 0.1700 1.1% 1142 0.0001 SELECT namespace_member # 11 0xC83E431FCADB7E4B 0.1539 1.0% 850 0.0002 SELECT team_member # 12 0x19C8068B5C1997CD 0.1464 1.0% 9637 0.0000 ROLLBACK # 13 0x46ED81A7F2B93617 0.1381 0.9% 690 0.0002 UPDATE tg_user # 14 0x010D1348A9CC32EC 0.1373 0.9% 846 0.0002 SELECT namespace # 15 0xC5FF324E9F0795CB 0.1195 0.8% 544 0.0002 SELECT namespace_member # 16 0xCCE9F94F19CB7DA2 0.1144 0.8% 673 0.0002 SELECT namespace # 17 0xB269C2A859F7F1AE 0.1074 0.7% 561 0.0002 SELECT namespace # 18 0x943798A09019B333 0.0984 0.7% 5315 0.0000 SHOW WARNINGS © 2011 – 2013 PERCONA 111 Server Troubleshooting CASE STUDY #2

© 2011 – 2013 PERCONA 112 Information Provided

• About 4PM on Saturday, queries suddenly began taking insanely long to complete. – From sub-millisecond to many minutes. – As far as the customer knew, nothing had changed. – Nobody was at work. – They had disabled selected apps where possible to reduce load.

© 2011 – 2013 PERCONA 113 Overview

• They are running 5.0.77-percona-highperf-b13. • The server has an EMC SAN – with a RAID5 array of 5 disks, and LVM on top of that – Server has 2 quad-core CPUSXeon L5420 @ 2.50GHz. – No virtualization. • They tried restarting mysqld – It has 64GB of RAM, so it's not warm yet.

© 2011 – 2013 PERCONA 114 Train of Thought

• The performance drop is way too sudden and large. – On a weekend, when no one is working on the system!? – Something is seriously wrong. – Look for things wrong first.

© 2011 – 2013 PERCONA 115 Elimination of Easy Possibilities

• First, confirm that queries are actually taking a long time to complete. – They all are, as seen in processlist. • Check the SAN status. – They checked and reported that it's not showing any errors or failed disks.

© 2011 – 2013 PERCONA 116 Investigation of the Obvious

• Server's incremental status variables don't look amiss • 150+ queries in commit status. • Many transactions are waiting for locks inside InnoDB – But no semaphore waits, and main thread seems OK. • iostat and vmstat at 5-second intervals: – Suspicious IO performance and a lot of iowait – But virtually no work being done.

© 2011 – 2013 PERCONA 117 iostat

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 49.00 10.00 104.00 320.00 8472.00 77.12 2.29 20.15 8.78 100.10 sdb1 0.00 49.00 10.00 104.00 320.00 8472.00 77.12 2.29 20.15 8.78 100.10 sdc 0.00 17.00 0.00 6.00 0.00 184.00 30.67 0.00 0.00 0.00 0.00 sdc1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc5 0.00 17.00 0.00 6.00 0.00 184.00 30.67 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 23.00 0.00 184.00 8.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 9.00 152.00 288.00 7920.00 50.98 3.47 21.61 6.21 100.00

© 2011 – 2013 PERCONA 118 vmstat

r b swpd free buff cache si so bi bo in cs us sy id wa st 5 1 176 35607308 738468 19478720 0 0 48 351 0 0 1 0 96 3 0 0 1 176 35605912 738472 19478820 0 0 560 848 2019 2132 4 1 83 13 0 0 2 176 35605788 738480 19479048 0 0 608 872 2395 2231 0 1 85 14 0 0 1 176 35604664 738484 19479128 0 0 688 1692 2082 1785 0 0 85 15 0 1 2 176 35604540 738496 19479436 0 0 528 876 2513 2311 0 0 84 15 0 1 2 176 35604076 738500 19479484 0 0 480 1092 1962 1684 0 0 84 16 0 1 1 176 35603084 738500 19479572 0 0 624 808 1888 1635 0 0 84 16 0 1 2 176 35602348 738500 19479608 0 0 704 792 2014 1729 1 0 84 15 0 1 1 176 35601604 738504 19479704 0 0 496 1116 2140 1910 0 0 85 15 0 1 1 176 35601140 738508 19479736 0 0 464 896 2116 1927 0 0 85 14 0 1 3 176 35599900 738508 19479908 0 0 1328 1020 2083 1869 0 1 83 17 0 1 3 176 35596660 738508 19479944 0 0 1792 696 1855 1754 1 1 81 17 0 1 3 176 35594496 738512 19480028 0 0 1732 776 2016 1848 1 0 81 18 0

© 2011 – 2013 PERCONA 119 From vmstat/iostat

• It looks like something is blocking commits • Likely to be either a serious bug (a transaction that has gotten the commit mutex and is hung?) or a hardware problem. • IO unreasonably slow, so that is probably the problem.

© 2011 – 2013 PERCONA 120 Analysis

• Because the system is not doing anything… – Profiling where CPU time is spent is probably useless. – We already know that it's spent waiting on mutexes in the commit problem, so oprofile will probably show nothing. – Other options that come to mind: • profile IO calls with strace -c • benchmark the IO system, since it seems to be suspicious. • But first, a bit more investigation.

© 2011 – 2013 PERCONA 121 Stack Dump

[root@db203 ~]# perl bt-aggregate stacktrace 154 threads with the following stack trace: #0 0x000000359920ce74 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000035992088e0 in _L_lock_1167 () from /lib64/libpthread.so.0 #2 0x0000003599208839 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x000000000062d9ed in innobase_xa_prepare (thd=0x2ab438cf5960, all=true) at ha_innodb.cc:7577 #4 0x000000000061d64e in ha_commit_trans (thd=0x2ab438cf5960, all=true) at handler.cc:706 1 threads with the following stack trace: #0 0x000000359920dde8 in pread64 () from /lib64/libpthread.so.0 1 threads with the following stack trace: #0 0x00000035986cc837 in fdatasync () from /lib64/libc.so.6 #1 0x00000000007d5a5f in my_sync (fd=12, my_flags=16) at my_sync.c:52 #2 0x00000000005e401e in MYSQL_LOG::flush_and_sync (this=) at log.cc:1819

© 2011 – 2013 PERCONA 122 oprofile

• As expected: nothing useful in oprofile

samples % symbol name 6331 15.3942 buf_calc_page_new_checksum 2121 5.1573 sync_array_print_long_waits 2004 4.8728 MYSQLparse(void*) 1724 4.1920 srv_lock_timeout_and_monitor_thread 1441 3.5039 rec_get_offsets_func 1098 2.6698 my_utf8_uni 780 1.8966 mem_pool_fill_free_list 762 1.8528 my_strnncollsp_utf8 682 1.6583 buf_page_get_gen 650 1.5805 MYSQLlex(void*, void*) 604 1.4687 btr_search_guess_on_hash 566 1.3763 read_view_open_now

© 2011 – 2013 PERCONA 123 strace -c

• Nothing relevant after 30 seconds or so.

[root@db203 ~]# strace -cp 24078 Process 24078 attached - interrupt to quit Process 24078 detached % time seconds usecs/call calls errors syscall 100.00 0.098978 14140 7 select 0.00 0.000000 0 7 accept 0.00 0.000000 0 7 getsockname 0.00 0.000000 0 14 setsockopt 0.00 0.000000 0 2 clone 0.00 0.000000 0 35 fcntl 0.00 0.000000 0 10 futex

© 2011 – 2013 PERCONA 124 Examine History

• Look at sar for historical reference. • Ask the client to look at their graphs to see if there are obvious changes around 4:00PM.

© 2011 – 2013 PERCONA 125 sar 04:00:01 PM CPU %user %nice %system %iowait %steal %idle 04:00:01 PM all 0.73 0.00 0.43 5.33 0.00 93.51 04:10:01 PM all 0.71 0.00 0.41 5.03 0.00 93.85 04:20:01 PM all 0.68 0.00 0.39 4.76 0.00 94.17 04:30:01 PM all 0.71 0.00 0.39 6.51 0.00 92.39 04:40:01 PM all 0.42 0.00 0.22 16.44 0.00 82.92 04:50:01 PM all 0.45 0.00 0.24 15.87 0.00 83.45 05:00:01 PM all 0.49 0.00 0.25 15.81 0.00 83.45 05:10:01 PM all 0.47 0.00 0.25 15.90 0.00 83.38 05:20:01 PM all 0.46 0.00 0.24 15.77 0.00 83.53 05:30:01 PM all 0.45 0.00 0.24 16.02 0.00 83.29

04:00:01 PM tps rtps wtps bread/s bwrtn/s 04:00:01 PM 1211.86 101.74 1110.12 4137.71 28573.15 04:10:01 PM 1143.72 96.40 1047.33 3838.95 27059.94 04:20:01 PM 1088.95 92.68 996.27 3817.55 25423.51 04:30:01 PM 1081.20 91.65 989.55 3752.29 25487.12 04:40:01 PM 452.65 54.85 397.80 2633.19 8366.46 04:50:01 PM 511.75 52.75 459.00 2494.71 12460.27 05:00:01 PM 516.54 53.59 462.95 2515.42 10101.05 05:10:01 PM 517.63 54.63 463.01 2553.41 10248.53 05:20:01 PM 509.73 53.60 456.13 2568.57 11770.04 05:30:01 PM 515.03 58.53 456.50 2799.31 10294.01

© 2011 – 2013 PERCONA 126 Observations

• Writes dropped dramatically around 4:40. – … at the same time iowait increased a lot. – Corroborated by the client's graphs! • Points to decreased performance of the IO subsystem? • SAN attached by fibre channel, so it could be – this server – the SAN – the connection – the specific device on the SAN.

© 2011 – 2013 PERCONA 127 Elimination of Options

• Benchmark /dev/sdb1 and see if it looks reasonable. • This box or the SAN? – Check the same thing from another server. • Tool: use iozone with the -I flag (O_DIRECT). • The result was 54 writes per second on the first iteration – Canceled it after that because that took so long.

© 2011 – 2013 PERCONA 128 Pay Dirt!

• Before I could repeat, customer said RAID failed after all • Moral of the story: information != facts • Customer’s web browser had cached SAN status page!

© 2011 – 2013 PERCONA 129 Server Troubleshooting CASE STUDY #3

© 2011 – 2013 PERCONA 130 Information from the Start

• Sometimes (once every day or two) the server starts to reject connections with a max_connections error. • This lasts from 10 seconds to a couple of minutes and is sporadic. • Server specs: – 16 cores – 12GB of RAM, 900MB data – Data on Intel XE-25 SSD – Running MySQL 5.1 with InnoDB Plugin

© 2011 – 2013 PERCONA 131 Train of Thought

• Pile-ups cause long queue waits? – thus incoming new connections exceed max_connections? • Pile-ups can be – the query cache – InnoDB mutexes – et cetera...

© 2011 – 2013 PERCONA 132 Elimination

• There are no easy possibilities. • We'd previously worked with this client and the DB wasn't the problem then. • Queries aren't perfect, but are still running in less than 10ms normally.

© 2011 – 2013 PERCONA 133 Investigation

• Nothing is obviously wrong. • Server looks fine in normal circumstances.

© 2011 – 2013 PERCONA 134 Analysis

• We are going to have to capture server activity when the problem happens. • We can't do anything without good diagnostic data. • Decision: install pt-stalk and wait.

© 2011 – 2013 PERCONA 135 Analysis (cont.)

• After several pile-ups nothing very helpful was gathered – But then we got a good one – This took days/a week • Result of diagnostics data: too much information!

© 2011 – 2013 PERCONA 136 During the Freeze

• Connections increased from normal 5-15 to over 300. • QPS was about 1-10k. – Lots of Com_admin_commands. – Vast majority of “real” queries are Com_select (300-2000 per second) – There are only 5 or so Com_update, other Com_ are zero.

© 2011 – 2013 PERCONA 137 During the Freeze (cont.)

• No table locking. • Lots of query cache activity, but normal-looking. – no lowmem_prunes. • 20 to 100 sorts per second – between 1k and 12k rows sorted per second.

© 2011 – 2013 PERCONA 138 During the Freeze (cont.)

• Between 12 and 90 temp tables created per second – about 3 to 5 of them created on disk. • Most queries doing index scans or range scans—not full table scans or cross joins. • InnoDB operations are just reads, no writes. • InnoDB doesn't write much log or anything.

© 2011 – 2013 PERCONA 139 During the Freeze (cont.)

• InnoDB status: – InnoDB main thread was in “flushing buffer pool pages” and there were basically no dirty pages. – Most transactions were waiting in the InnoDB queue. “12 queries inside InnoDB, 495 queries in queue” – The log flush process was caught up. – The InnoDB buffer pool wasn't even close to being full (much bigger than the data size).

© 2011 – 2013 PERCONA 140 During the Freeze (cont.)

• There were mostly 2 types of queries in SHOW PROCESSLIST, most of them in the following states:

$ grep State: status-file | sort | uniq -c | sort -nr 161 State: Copying to tmp table 156 State: Sorting result 136 State: statistics

© 2011 – 2013 PERCONA 141 iostat

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda3 0.04 493.63 0.65 15.49 142.18 4073.09 261.18 0.17 10.68 1.02 1.65 sda3 0.00 8833.00 1.00 500.00 8.00 86216.00 172.10 5.05 11.95 0.59 29.40 sda3 0.00 33557.00 0.00 451.00 0.00 206248.00 457.31 123.25 238.00 1.90 85.90 sda3 0.00 33911.00 0.00 565.00 0.00 269792.00 477.51 143.80 245.43 1.77 100.00 sda3 0.00 38258.00 0.00 649.00 0.00 309248.00 476.50 143.01 231.30 1.54 100.10 sda3 0.00 34237.00 0.00 589.00 0.00 281784.00 478.41 142.58 232.15 1.70 100.00 sda3 0.00 11029.00 0.00 384.00 0.00 162008.00 421.90 71.80 238.39 1.73 66.60 sda3 0.00 36.00 0.00 14.00 0.00 400.00 28.57 0.01 0.93 0.36 0.50 sda3 0.00 18.00 0.00 13.00 0.00 248.00 19.08 0.01 0.92 0.23 0.30 sda3 0.00 38.00 0.00 13.00 0.00 408.00 31.38 0.01 0.92 0.23 0.30 sda3 0.00 15.00 0.00 13.00 0.00 224.00 17.23 0.00 0.15 0.15 0.20 sda3 0.00 40.00 0.00 16.00 0.00 448.00 28.00 0.01 0.50 0.19 0.30 sda3 0.00 19.00 0.00 12.00 0.00 248.00 20.67 0.01 0.42 0.17 0.20 sda3 0.00 17.00 0.00 14.00 0.00 248.00 17.71 0.01 0.36 0.21 0.30 sda3 0.00 22.00 0.00 12.00 0.00 272.00 22.67 0.00 0.17 0.17 0.20 sda3 0.00 734.00 0.00 131.00 0.00 6920.00 52.82 0.43 3.31 0.21 2.70 sda3 0.00 30.00 0.00 16.00 0.00 368.00 23.00 0.01 0.50 0.12 0.20 sda3 0.00 18.00 0.00 12.00 0.00 240.00 20.00 0.01 0.83 0.17 0.20 sda3 0.00 35.00 0.00 15.00 0.00 400.00 26.67 0.01 0.93 0.20 0.30 sda3 0.00 11.00 0.00 11.00 0.00 176.00 16.00 0.00 0.27 0.09 0.10 sda3 0.00 22.00 0.00 14.00 0.00 288.00 20.57 0.00 0.21 0.21 0.30 sda3 0.00 146.00 0.00 405.00 0.00 4408.00 10.88 1.71 4.22 0.08 3.30 sda3 0.00 20.00 0.00 13.00 0.00 264.00 20.31 0.01 0.54 0.15 0.20 sda3 0.00 13418.00 0.00 108.00 0.00 45576.00 422.00 23.98 70.35 1.59 17.20 sda3 0.00 31233.00 0.00 513.00 0.00 238480.00 464.87 125.17 219.29 1.95 100.00 sda3 0.00 19725.00 0.00 483.00 0.00 239784.00 496.45 124.55 318.01 2.03 98.10 sda3 0.00 62.00 0.00 19.00 0.00 648.00 34.11 0.02 1.00 0.16 0.30

© 2011 – 2013 PERCONA 142 vmstat

r b swpd free buff cache si so bi bo in cs us sy id wa st 50 2 86064 1186648 3087764 4475244 0 0 5 138 0 0 1 1 98 0 0 13 0 86064 1922060 3088700 4099104 0 0 4 37240 312832 50367 25 39 34 2 0 2 5 86064 2676932 3088812 3190344 0 0 0 136604 116527 30905 9 12 71 9 0 1 4 86064 2782040 3088812 3087336 0 0 0 153564 34739 10988 2 3 86 9 0 0 4 86064 2871880 3088812 2999636 0 0 0 163176 22950 6083 2 2 89 8 0 0 4 86064 3002924 3088812 2870352 0 0 0 131532 32138 9234 3 2 87 7 0 0 0 86064 3253988 3088836 2794932 0 0 0 29664 34756 11057 3 4 91 3 0 0 0 86064 3254104 3088860 2794604 0 0 0 200 24995 9419 1 1 97 0 0 0 0 86064 3255184 3088900 2794772 0 0 0 124 29767 10042 3 2 95 0 0 2 0 86064 3254660 3088900 2794840 0 0 0 204 12570 4181 2 1 98 0 0 1 0 86064 3254692 3088900 2794856 0 0 0 112 12447 3374 1 1 98 0 0 0 0 86064 3254556 3088912 2794876 0 0 0 224 22128 7584 2 2 97 0 0 1 0 86064 3255020 3088912 2794920 0 0 0 124 12875 3422 1 1 98 0 0 0 0 86064 3254952 3088912 2794936 0 0 0 124 15209 4333 1 1 98 0 0 0 0 86064 3255100 3088912 2794960 0 0 0 136 13568 4351 1 1 98 0 0 0 0 86064 3255120 3088912 2794980 0 0 0 3460 19657 5690 2 1 97 0 0 1 0 86064 3254488 3088912 2794996 0 0 0 184 31300 7393 5 2 94 0 0 0 0 86064 3255488 3088912 2795116 0 0 0 120 22892 6468 3 1 96 0 0 0 0 86064 3255080 3088936 2795136 0 0 0 200 21948 6303 3 1 96 0 0 2 0 86064 3255204 3088936 2795160 0 0 0 88 15222 4805 2 1 98 0 0 1 0 86064 3255896 3088936 2795176 0 0 0 144 20555 5956 2 1 97 0 0 0 0 86064 3254596 3088936 2795188 0 0 0 2204 18818 5079 2 1 95 2 0 4 0 86064 3255560 3088936 2795228 0 0 0 132 24550 6266 3 2 95 0 0 1 4 86064 3011800 3088952 3029380 0 0 0 70528 38483 10295 4 4 89 3 0 0 2 86064 3169196 3088956 2877628 0 0 0 143468 49020 9422 4 3 83 9 0 2 0 86064 3254888 3089028 2795476 0 0 0 47924 29703 7856 2 2 90 6 0 2 0 86064 3254912 3089028 2795512 0 0 0 324 27352 7536 3 2 95 0 0

© 2011 – 2013 PERCONA 143 iostat, Formatted Incrementally m m dev reads rd_mrg rd_sectors ms_reading writes wr_mrg wr_sectors ms_writing cur_ios ms_doing_io ms_wghdt 0 0 sda3 1 0 8 51 498 8833 85192 5871 -28 292 4993 0 0 sda3 0 0 0 0 658 44808 304056 144370 130 1232 176432 0 0 sda3 0 0 0 0 569 34133 269472 155917 13 1005 144815 0 0 sda3 0 0 0 0 725 42361 349696 146777 -6 1004 143371 0 0 sda3 0 0 0 0 518 29256 239008 139677 -8 1005 145328 0 0 sda3 0 0 0 0 168 434 66848 37659 -129 280 14491 0 0 sda3 0 0 0 0 14 36 400 13 0 5 13 0 0 sda3 0 0 0 0 13 18 248 12 0 3 12 0 0 sda3 0 0 0 0 13 38 408 12 0 3 12 0 0 sda3 0 0 0 0 13 15 224 2 0 2 2 0 0 sda3 0 0 0 0 16 40 448 8 0 3 8 0 0 sda3 0 0 0 0 12 19 248 5 0 2 5 0 0 sda3 0 0 0 0 14 17 248 5 0 3 5 0 0 sda3 0 0 0 0 12 22 272 2 0 2 2 0 0 sda3 0 0 0 0 131 734 6920 434 0 27 434 0 0 sda3 0 0 0 0 16 30 368 8 0 2 8 0 0 sda3 0 0 0 0 12 18 240 10 0 2 10 0 0 sda3 0 0 0 0 15 35 400 14 0 3 14 0 0 sda3 0 0 0 0 11 11 176 3 0 1 3 0 0 sda3 0 0 0 0 398 143 4328 1703 0 34 1703 0 0 sda3 0 0 0 0 21 25 368 8 0 2 8 0 0 sda3 0 0 0 0 13 20 264 7 0 2 7 0 0 sda3 0 0 0 0 430 26860 194664 89081 48 766 99648 0 0 sda3 0 0 0 0 582 37453 284544 159783 41 1264 167989 0 0 sda3 0 0 0 0 92 63 44632 24832 -89 123 6059 0 0 sda3 0 0 0 0 19 62 648 19 0 3 19 0 0 sda3 0 0 0 0 96 510 4848 182 0 21 182 0 0 sda3 0 0 0 0 13 19 256 12 0 2 12 0 0 sda3 0 0 0 0 16 21 296 15 0 2 15

© 2011 – 2013 PERCONA 144 oprofile

samples % image name app name symbol name 473653 63.5323 no-vmlinux no-vmlinux /no-vmlinux 95164 12.7646 mysqld mysqld /usr/libexec/mysqld 53107 7.1234 libc-2.10.1.so libc-2.10.1.so memcpy 13698 1.8373 ha_innodb.so ha_innodb.so build_template() 13059 1.7516 ha_innodb.so ha_innodb.so btr_search_guess_on_hash 11724 1.5726 ha_innodb.so ha_innodb.so row_sel_store_mysql_rec 8872 1.1900 ha_innodb.so ha_innodb.so rec_init_offsets_comp_ordinary 7577 1.0163 ha_innodb.so ha_innodb.so row_search_for_mysql 6030 0.8088 ha_innodb.so ha_innodb.so rec_get_offsets_func 5268 0.7066 ha_innodb.so ha_innodb.so cmp_dtuple_rec_with_match

© 2011 – 2013 PERCONA 145 Analysis

• There is a lot of data here. • Most of it points to nothing in particular except “need more research.” – For example, in oprofile, what does build_template() do in InnoDB? – Why is memcpy() such a big consumer of time? – What is hidden within the 'mysqld' image/symbol? • We could spend a lot of time on these things.

© 2011 – 2013 PERCONA 146 Analysis (cont.)

• In looking for things that just don't make sense, the iostat data is very strange. • We can see hundreds of MB per second written to disk for sustained periods • But there isn't even that much data in the whole database. • So clearly this can't simply be InnoDB’s “furious flushing” problem. • mysqladmin status confirms that.

© 2011 – 2013 PERCONA 147 Analysis (cont.)

• Virtually no reading from disk is happening in this period of time. • Raw disk stats show that all the time is consumed in writes. • There is an enormous queue on the disk.

© 2011 – 2013 PERCONA 148 Analysis (cont.)

• There was no swap activity, and 'ps' (not shown) confirmed that nothing else significant was happening. • 'df -h' and 'lsof' (not shown) showed that: – mysqld's temp files became large – disk free space was noticeably changed while this pattern happened. • So mysqld was writing GB to disk in short bursts.

© 2011 – 2013 PERCONA 149 Analysis (cont.)

• Although this is not fully instrumented inside of MySQL, we know that – MySQL only writes data, logs, sort, and temp tables to disk. – Thus, we can eliminate data and logs. • Discussion with developers revealed that some kinds of caches could expire and cause a stampede on the database.

© 2011 – 2013 PERCONA 150 Conclusion

• Based on reasoning and knowledge of internals: – It’s likely that poorly optimized queries are causing a storm of very large temp tables on disk.

© 2011 – 2013 PERCONA 151 Plan of Attack

• Solution: Optimize the two major kinds of queries found in SHOW PROCESSLIST, so they don't use temp tables on disk. • Queries may run fine on at a time, but when there is a rush on the database, they can pile up and overtax the I/O.

© 2011 – 2013 PERCONA 152 Case Studies Conclusion

#1 Query cache can cause more harm than good. #2 Hardware failure is misreported, but all other tools point toward this root cause. #3 Cache miss “stampede” of expensive queries.

© 2011 – 2013 PERCONA 153 Monitoring

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA 1 Table of Contents

1. Introduction 3. Monitoring Configuration

2. Monitoring Tools 4. Trending

© 2011 – 2013 PERCONA 2 Introduction

• The industry tends to split monitoring into two different types of software: – Alerting Tools – Trending Tools • Some software will handle both tasks.

© 2011 – 2013 PERCONA 3 What Is Alerting?

• Something is seriously wrong, or outside the expected threshold we have set, i.e. – The server is down. – CPU usage is through the roof. – The battery has failed on the raid controller.

© 2011 – 2013 PERCONA 4 Alerting Tools

• Nagios – It has a huge marketshare & a plugin interface, which makes it very popular. • Icinga: https://www.icinga.org/ – Forked from Nagios • Percona Monitoring Plugins – http://www.percona.com/software/percona-monitoring- plugins/

© 2011 – 2013 PERCONA 5 What Is Trending?

• Visual representation of system activity, i.e. – At 3AM each day we have this massive spike, why is it? – It looks like we’re about to run out of disk space... – Are there any spikes caused by cache stampedes, or cron jobs?

© 2011 – 2013 PERCONA 6 Trending Tools

• Cacti, Munin, MRTG, Ganglia – ... there are actually quite a few based on RRDtool. • What is going to influence your decision the most is template (graph) availability. • Also in Percona Monitoring Plugins – http://www.percona.com/software/percona-monitoring- plugins/

© 2011 – 2013 PERCONA 7 Hybrid Tools

• MySQL Enterprise Monitor – Available as part of a subscription coverage with Oracle.

© 2011 – 2013 PERCONA 8 OpenTSDB

• A distributed, scalable Time Series Database (TSDB) written on top of HBase. • http://opentsdb.net

© 2010–2013 The OpenTSDB Authors.

© 2011 – 2013 PERCONA 9 Graphite

• Graphite is a highly scalable real-time graphing system. • http://graphite.wikidot.com/

Image used under Creative Commons 3.0

© 2011 – 2013 PERCONA 10 Advice When Configuring Alerts + Trending

© 2011 – 2013 PERCONA 11 Limit False Positives

• One bad response time among many. – Given many thousands, it doesn’t matter if one person waited 5 seconds for a page to load. • An ongoing performance problem, or a task of no clear direction to action immediately. – i.e. there are always a lot of users online at a certain time of the day.

© 2011 – 2013 PERCONA 12 Limit False Positives (cont.)

• Alerts you want to wake you: – Server is down or response times (plural) are being critically missed. – Hardware failure detected. – You have detected a problem that is going to cost you more time to repair if you let it continue.

© 2011 – 2013 PERCONA 13 Alerts in Percona Monitoring Plugins

• LVM snapshots are running out of copy-on-write space. • pt-deadlock-logger has recorded too many recent deadlocks. • MySQL’s files are deleted. • MySQL data directory privileges are wrong. • Problems inside InnoDB. – Long-running transaction blocking others. – Too many transactions in LOCK WAIT state. – Any transaction is too long-running. • The mysqld PID file is missing. • MySQL processlist has dangerous patterns. • MySQL replication becomes delayed. • MySQL replication stops. • Check MySQL SHOW GLOBAL STATUS output. • pt-table-checksum finds data differences on a replica. • Low memory, or when a process uses too much memory. http://www.percona.com/doc/percona-monitoring-plugins/nagios/index.html#list-of-plugins

© 2011 – 2013 PERCONA 14 Identify Any Patterns

• Are these spikes reporting queries run from cron jobs? • Is it caches being regenerated?

© 2011 – 2013 PERCONA 15 Identify Any Patterns

• Are these spikes reporting queries run from cron? • Is it caches being regenerated?

© 2011 – 2013 PERCONA 16 Every Pattern Is Important

• Even a boring graph like this one can still help:

This is the best time to take a backup or run any scheduled maintenance.

If the gap between these two points is not wide enough, it’s very difficult to justify LVM backups without over provisioning.

© 2011 – 2013 PERCONA 17 Capacity Planning

• There is no ‘clear’ counter to indicate MySQL load, but a few graphs can give a general idea.

For future planning, it would be great to look into response times during these spikes!

These are the most basic ‘row’ operations inside of InnoDB.

© 2011 – 2013 PERCONA 18 Capacity Planning (cont.)

• This is another ‘general load’ counter. In this case it’s commands issued, not rows:

There is some skew, because a SELECT statement could retrieve 1 row or 100K.

© 2011 – 2013 PERCONA 19 Capacity Planning (cont.)

• This shows the number of connections the server has (Threads connected) If Max_used = Max_Connections, someone has been refused. Configuration item: max_connections.

© 2011 – 2013 PERCONA 20 Correlation to Server Events

• The Handler_read_rnd_next (spike in blue) looks like a mysqldump backup. Was this the best time?

Correlation normally spans many graphs. Was this the best time to backup the server?

Is there any random IO post backup?

Were there a lot of threads backed up?

© 2011 – 2013 PERCONA 21 Correlation (cont.)

• It looks like the server was restarted. How long did

it take to load back up? Was it scheduled maintenance, or accidental?

How many physical reads before/after restart, and for how long?

What was the average user response time immediately post restart?

Can we lay off the reports while the server warms up?

If we trickle load activity, how long do we need to warm up? http://code.google.com/p/mysql-cacti-templates/wiki/MySQLTemplates#InnoDB_Buffer_Pool

© 2011 – 2013 PERCONA 22 InnoDB Health Issues

• Concurrent transactions • IO, disk flushing

© 2011 – 2013 PERCONA 23 Replication Health

• Amount of bytes that have not yet been flushed to disk.

© 2011 – 2013 PERCONA 24 System Related Resources

• Not specific to MySQL. • Monitoring general system resources in the same trending tool can be helpful, because you can correlate system events to MySQL events.

© 2011 – 2013 PERCONA 25 System Related Resources (cont.)

Disk full Useful on SSDs w/TRIM when correlating to performance. Useful just to not run out of space! CPU usage May be related to poor response times? I/O wait time Unreliable performance could indicate contention, is it a SAN, or a shared resource? Swap activity Rate of si/so is important, amount used is not important. Filesystem cache Important for binary logs, transaction logs, relay log performance. Memory usage Likely has correlation to filesystem cache size and swap activity. As data grows, so might memory needs.

© 2011 – 2013 PERCONA 26 Security

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA 1 Table of Contents

1. Operating System/Network 6. Resource Consumption

2. MySQL Privilege System 7. Application-Introduced Problems

© 2011 – 2013 PERCONA 2 Introduction

• Security doesn’t completely exist in MySQL. • It is safe to assume that if you give someone a minimally privileged account, they can perpetrate denial-of-service attacks.

© 2011 – 2013 PERCONA 3 Security OPERATING SYSTEM / NETWORK

© 2011 – 2013 PERCONA 4 Introduction (cont.)

• We will divide security into four topics: – Operating System / Network – MySQL Privilege System – Resource Consumption – Application-Introduced Problems

© 2011 – 2013 PERCONA 5 Operating System / Network

• Most of these just make sense: – Don’t expose MySQL’s port 3306 to the world. – Ideally, run MySQL on a dedicated machine. – Have MySQL run as a mysql user which doesn’t have a shell log in. – Use an up to date Operating System under current vendor support.

© 2011 – 2013 PERCONA 6 OS / Network (cont.)

• Most people run MySQL with SE Linux disabled. – It’s not well enough tested or documented with MySQL—expect additional debugging.

© 2011 – 2013 PERCONA 7 OS / Network (cont.)

• MySQL has built-in SSL, but using an OS tunnel or VPN is a better option where available. – MySQL security updates will only be included to the last minor release (i.e. 5.6.xx). There aren’t supposed to be any incompatible changes, but there occasionally are. – MySQL’s built-in SSL has had a history of having bugs. – Secure networks are important when sending replication traffic over the internet, or when using authentication plugins.

© 2011 – 2013 PERCONA 8 Security MYSQL PRIVILEGE SYSTEM

© 2011 – 2013 PERCONA 9 MySQL Passwords

• In MySQL authentication protocol, passwords are not sent as plaintext. – This is not true for MySQL’s pluggable authentication like PAM.

© 2011 – 2013 PERCONA 10 MySQL Passwords

• MySQL 5.6 passwords may be stored as a SHA-256 hash with salt. • MySQL 4.1-5.5 passwords are stored as a double- SHA1 hash. • MySQL 4.0 and earlier passwords were DES- encrypted. – Some sites still have old_passwords=1 even if they now use a more recent version of MySQL.

© 2011 – 2013 PERCONA 11 MySQL Passwords

• Password expiration • Password strength • Disabling accounts

© 2011 – 2013 PERCONA 12 Granting Privileges

• There is no concept of object ownership. You just grant a series of permissions based on a pattern. • No support for SQL roles or user groups. • Username and host combined grants access. It’s possible to have different permissions based on where you access from.

http://dev.mysql.com/doc/mysql-security-excerpt/5.6/en/privileges-provided.html

© 2011 – 2013 PERCONA 13 More Examples

• Column level privileges also exist - but are not recommended. Full list of privileges: – http://dev.mysql.com/doc/refman/5.6/en/grant.html

© 2011 – 2013 PERCONA 14 Security RESOURCE CONSUMPTION

© 2011 – 2013 PERCONA 15 Resource Consumption

• Very important–can’t prevent one user from perpetrating denial of service. • Trivial crashing bugs are fixed, but the more subtle ones always exist:

© 2011 – 2013 PERCONA 16 And...

© 2011 – 2013 PERCONA 17 What Happens?

• Much like session variables are unlimited, so are temporary files. It took a few minutes, but:

© 2011 – 2013 PERCONA 18 Security APPLICATION-INTRODUCED PROBLEMS

© 2011 – 2013 PERCONA 19 Application-Introduced Problems

• “SQL Injection.” – It turns out this is incredibly difficult to prevent. – And incredibly common! • Mitigation strategies: – Don’t execute untrusted content as SQL; let user input supply values, but not code. – Assign limited privileges. – Watch resource consumption. – ... There’s really not much more you can do with syntactically valid queries from a permitted user.

© 2011 – 2013 PERCONA 20 Additional Topics:

• More specific issues with SQL injection: – http://www.percona.com/webinars/2012-07-25-sql- injection-myths-and-fallacies • Staying on top of security vulnerabilities: – http://lists.mysql.com/announce • Achieving PCI Compliance: – http://www.youtube.com/watch?v=yt1EsE-M2a8 with slides at: http://www.percona.com/files/presentations/UC2010-Achieving-PCI- Compliance-with-MySQL.pdf

© 2011 – 2013 PERCONA 21 Hardware and Operating Systems

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA 1 Common Questions

• Virtualization (cloud) or bare metal servers? • SSD or not SSD? • One big machine, or a few small machines? • Debian or Red Hat? • Raid 5 or RAID 10? • Filesystems? • Kernel / OS Settings?

© 2011 – 2013 PERCONA 2 Virtualization

• Not entirely a technical question. • Many cloud environments have limited hardware choices, with the higher power options very expensive: • On Amazon EC2, the most memory you can have is 68.4G for $1752/month or $1144 reserved for 12 months.

© 2011 – 2013 PERCONA 3 Virtualization (cont.)

• There are a range of technical problems best solved via hardware. • If one [large] machine could do the work, it normally* does not make sense to make changes to software to work with 10 smaller nodes. • For many customers “cloud = agility.” This is not true when unnecessary complexity reduces agility.

* Clarified in a few slides time!

© 2011 – 2013 PERCONA 4 Bare Metal Servers

• Newer Nehalem servers have up to 64 memory slots. • That’s 1TB memory using 16GB DIMMs or 256GB using the cheaper 4GB DIMMs. • For IO, there are flash PCI cards which are capable of 10K+ IOPS. • A hard drive might be capable of 100-200 IOPS.

There is no technical restriction from using flash or 256GB memory in a virtualized machine. It’s just not common in cloud hosting.

© 2011 – 2013 PERCONA 5 Bare Metal (cont.)

• Simple can be better. • You reliably know your minimum performance. • You can reliably tell that a gigabit ethernet link is yours alone. • You can size settings like innodb_io_capacity to use “all free capacity.” I.e., you don’t care if this results in more IO, you would rather all available capacity be used. • In practice this can make debugging problems much easier, due to less unknowns. Many of our customers can not tell if it was last week’s deployment that suddenly made the application slow, or other users on the same system being more active.

© 2011 – 2013 PERCONA 6 One Big Machine or Many Small?

• Depends on the goal: • One large machine is the easiest to deploy and manage. • Many small servers can be helpful for the purposes of isolation (based on either task or customer). • Examples of isolation: • Create slaves for reporting queries. If your customers pay $1000 month • Create slaves which are used by each for a SAAS application, they often have a certain “expectation” of Sphinx for fulltext indexing. performance.

© 2011 – 2013 PERCONA 7 SSD or Not?

• There are some very strong SSD options already available. Many customers are using them in production. • Understanding if it is the best choice is workload dependent. i.e. • Reads vs Writes / Memory Fit? • Need for better response or throughput?

© 2011 – 2013 PERCONA 8 Advantage of SSDs (1)

• Fast access time means that the cache miss path is far less expensive. i.e. • ~10ms changes to less than 1ms per IO. • This means that: • Some applications may not need nearly as high cache hit ratios. A 50ms page load can barely afford any cache misses. • Some operational issues become easier, such as a reduced warm up time post restart.

© 2011 – 2013 PERCONA 9 Advantage of SSDs (2)

• Throughput of SSDs is much cheaper than hard drives, even if storage costs more. • Many can do 10K+ IOPS.

© 2011 – 2013 PERCONA 10 Debian or Red Hat (etc)?

• Tends not to matter much. • What matters the most is that the release is supported for the duration of time the server will be deployed. • Fedora, Gentoo, Ubuntu (non LTS) are likely not good choices for this reason.

© 2011 – 2013 PERCONA 11 RAID5 or RAID10? (1)

• Workload specific—most likely answer is RAID10. • If cache fit is large enough, reads can be nearly eliminated, and writes are more an issue. • With RAID5 if you do not write a full stripe, you need to read before you write to recalculate parity. • For sequential writes only, RAID5 may perform better for the same number of disks.

© 2011 – 2013 PERCONA 12 RAID5 or RAID10? (2)

• Always difficult to answer this question with 100% confidence. • RAID controller vendors provide little transparency into internal operations. • Just because an optimization could theoretically apply, does not guarantee that it “does apply.”

© 2011 – 2013 PERCONA 13 Stripe Size?

• Similar difficulty to answer reliably. • What probably matters most is: • What is the vendor default? • Are you ever writing across stripe boundaries?

© 2011 – 2013 PERCONA 14 Stripe Size (cont.)

• What is the vendor default? • Likely has the most optimizations. Any changes need to be verified. • Are you ever writing across stripe boundaries? • InnoDB almost always writes 16K at a time. • If you have a 16K stripe, but InnoDB pages are non-aligned each write will be on two stripes. • Aligning can be difficult[1]. • Some customers choose larger stripe sizes to’’amortize’’ these boundary-writes. http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/

© 2011 – 2013 PERCONA 15 Other RAID Controller Tips

• You want to purchase the battery option. • This allows you to configure caches to write-back mode: • Performance from the application on fsync is very good. • Data durability is still available. • Merging can happen on the RAID controller before writing down to the physical disks.

© 2011 – 2013 PERCONA 16 Filesystems

• Use XFS when using multiple disks. • Supports better concurrency. • ext3 will serialize a write to an individual file. • It can not take advantage of InnoDB multiple write threads effectively. • A fsync operation is also serialized.

© 2011 – 2013 PERCONA 17 Kernel and OS Settings

• Mount filesystems with noatime • Set vm.swappiness = 0 in /etc/sysctl.conf • [With RAID] Change the IO schedulers from the default to either deadline or noop. • Check with: cat /sys/block/DEVICE/queue/scheduler • Change this persistently in /etc/grub.conf

© 2011 – 2013 PERCONA 18 Possible Networking Wins

• It might be worth increasing /proc/sys/net/ipv4/ip_local_port_range to get more local TCP/IP ports available if handling a lot of connections. • Decreasing the value of /proc/sys/net/ipv4/tcp_fin_timeout can help you reduce the time it takes to idle-recycle a connection. • Technically it breaks the standard, but should work fine on a local network.

© 2011 – 2013 PERCONA 19 InnoDB Basics

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA 1 Table of Contents

1. History and Editions of InnoDB 4. Transaction Log

2. Storage Engines 5. UNDO Operation

3. Basic Operation 6. Checkpointing

© 2011 – 2013 PERCONA 2 InnoDB Basics HISTORY AND EDITIONS OF INNODB

© 2011 – 2013 PERCONA 3 InnoDB History

• 1994 – First line of code written by Innobase • 1999 – InnoDB “complete.” • 2001 – First alpha of working with MySQL. • 2004 – file-per-table introduced, • 2005 – MySQL 5.0 with COMPACT row format. • 2008 – MySQL 5.1 released w/ pluggable SE • 2008 – Oracle announces InnoDB Plugin 1.0 • 2010 – MySQL 5.5 released by Oracle – InnoDB Plugin 1.1 is the default storage engine • 2013 – InnoDB plugin built into MySQL 5.6

© 2011 – 2013 PERCONA 4 InnoDB History (cont.) • In 2005 Innobase was purchased by Oracle • In 2008 MySQL AB was purchased by Sun Microsystems – Sun focused on and Maria storage engines to compete with InnoDB, since the only major transactional storage engine was controlled by Oracle • Sun Microsystems was acquired by Oracle in 2009 – Oracle kills Falcon and Maria – InnoDB storage engine primary area of development for MySQL – InnoDB 1.1 + MySQL 5.5 first Oracle MySQL release – 2013 – MySQL 5.6.10 contains numerous InnoDB improvements

© 2011 – 2013 PERCONA 5 InnoDB Basics STORAGE ENGINES

© 2011 – 2013 PERCONA 6 InnoDB / MySQL Relationship

• The two systems communicate mostly on a row- based relationship, a.k.a: – The storage engine API. – The Handler Interface.

© 2011 – 2013 PERCONA 7 For example:

• SELECT * FROM title WHERE title='Bambi';

mysql> show session status like 'Handler%'; mysql> show session status like 'Handler%'; +------+------+ +------+------+| | Variable_name | Value | Variable_name | Value | +------+------+ +------+------+ | Handler_commit | 0 | | Handler_commit | 1 | | Handler_delete | 0 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_discover | 0 | | Handler_prepare | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_first | 1 | | Handler_read_key | 0 | | Handler_read_key | 1 | | Handler_read_next | 0 | | Handler_read_next | 0 | | Handler_read_prev | 0 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 0 | | Handler_read_rnd | 0 | | Handler_rollback | 0 | | Handler_read_rnd_next | 1543765 | | Handler_savepoint | 0 | | Handler_rollback | 0 | | Handler_savepoint_rollback | 0 | | Handler_savepoint | 0 | | Handler_update | 0 | | Handler_savepoint_rollback | 0 | | Handler_write | 0 | | Handler_update | 0 | +------+------+ | Handler_write | 43 | 15 rows in set (0.01 sec) +------+------+ 15 rows in set (0.00 sec)

© 2011 – 2013 PERCONA 8 InnoDB / MySQL Relationship

• What is a storage engine? – Handles storage of the rows inside of a MySQL table. – When a table is created the storage engine is selected. – Prior to MySQL 5.5 the default storage engine is MyISAM. – InnoDB is the preferred storage engine for performance and reliability.

http://www.mysqlperformanceblog.com/2007/01/08/innodb-vs-myisam-vs-falcon-benchmarks-part-1/

© 2011 – 2013 PERCONA 9 MyISAM vs InnoDB

MyISAM InnoDB Table level locking Row level locking Not ACID compliant Fully ACID compliant Caches only indexes Caches index and data All blobs inline in row Off-page BLOB storage No data checksums Page level checksum No log, crash can lose data Transaction log for crash safe recovery

© 2011 – 2013 PERCONA 10 This Course’s Focus

• We’re concerned mostly about what happens below that storage engine API. – i.e. “What’s Inside InnoDB.”

• Remember that there are other moving pieces sitting above InnoDB that we will not be talking about. – That means that if by slip of the tongue we refer to log files. You can assume we mean ib_logfile0 and ib_logfile1, and not the binary log.

© 2011 – 2013 PERCONA 11 Converting MyISAM to InnoDB

ALTER TABLE table_name ENGINE=INNODB;

This is also how you defragment or “rebuild” an InnoDB table.

If mysql_upgrade tells you to run REPAIR TABLE against an InnoDB table, it really means you should do this instead.

© 2011 – 2013 PERCONA 12 InnoDB Basics BASIC OPERATION

© 2011 – 2013 PERCONA 13 First, Some Prerequisites

© 2011 – 2013 PERCONA Numbers Everyone Should Know

L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Zippy 3,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns

See: http://www.linux-mag.com/cache/7589/1.html and Google http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

© 2011 – 2013 PERCONA 15 Numbers Everyone Should Know

L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Zippy 3,000 ns Send 2K bytes over 1 Gbps network 20,000 ns ReadSSD are 1 somewhereMB sequentially between network from memoryand disk… 100,000250,000 nsns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns

See: http://www.linux-mag.com/cache/7589/1.html and Google http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

© 2011 – 2013 PERCONA 16 About Disks

• 10,000,000 ns = 10ms = 100 operations/second. – This is about the average for a 7200RPM drive. • The actual time has dramatic variation (~0.2ms– ~20ms). – The variation is because disks are mechanical. – We can write sequentially much faster than randomly.

© 2011 – 2013 PERCONA 17 Everything Is Buffered!

• When you write to a file, here’s what happens in the Operating System:

Block 9, 10, 1, 4, 200, 5

Block 1, 4, 5, 9, 10, 200

What happens to this buffer if we lose power?

© 2011 – 2013 PERCONA 18 The OS provides a way!

• $ man fsync Hint: MyISAM just writes Synopsis to the OS buffer and has no durability. #include int fsync(int fd); int fdatasync(int fd);

Description fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) where that file resides. The call blocks until the device reports that the transfer has completed. It also flushes metadata information associated with the file (see stat(2)).

http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/

© 2011 – 2013 PERCONA 19 Knowing This

• InnoDB wants to try and reduce random IO. • It can’t (safely) rely on the operating system’s write buffering and be ACID compliant. – .. and InnoDB algorithms have to compensate.

© 2011 – 2013 PERCONA 20 Basic Operation (High Level)

Log Files

SELECT * FROM City WHERE CountryCode=′ AUS′

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 21 Basic Operation (High Level)

Log Files

SELECT * FROM City WHERE CountryCode=′ AUS′

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 22 Basic Operation (High Level)

Log Files

SELECT * FROM City WHERE CountryCode=′ AUS′

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 23 Basic Operation (High Level)

Log Files

SELECT * FROM City WHERE CountryCode=′ AUS′

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 24 Basic Operation (High Level)

Log Files

SELECT * FROM City WHERE CountryCode=′ AUS′

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 25 Basic Operation (cont.)

Log Files

START TRANSACTION;

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 26 Basic Operation (cont.)

Log Files

START TRANSACTION;

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 27 Basic Operation (cont.)

Log Files

UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS'

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 28 Basic Operation (cont.)

Log Files

UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS'

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 29 Basic Operation (cont.)

Log Files

UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS'

One row updated Buffer Pool Tablespace

© 2011 – 2013 PERCONA 30 Basic Operation (cont.)

Log Files

COMMIT;COMMIT;COMMIT;

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 31 Basic Operation (cont.)

Will not return OK Until log is synced

Log Files

COMMIT;COMMIT;COMMIT;

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 32 Basic Operation (cont.)

Will not return OK Until log is synced 01010 Log Files 1

COMMIT;COMMIT;COMMIT;

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 33 Basic Operation (cont.)

Will not return OK Until log is synced 01010 Log Files 1

COMMIT;COMMIT;COMMIT;

2

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 34 Basic Operation (cont.)

01010 Background Log Files operation: Flush dirty page to disk

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 35 Basic Operation (cont.)

01010 Background operation: Log Files Clean on disk and buffer pool

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 36 Basic Operation (cont.)

01010 01 Log Files

Checkpoint noted in log

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 37 Why Don’t We Update?

• This is an optimization! – The log file IO is sequential and much cheaper than live updates. – The IO for the eventual updates to the tablespace can be optimized as well. • Provided that you saved enough to recover, this shouldn’t matter should it? It turns out this is one of the main tuning points for InnoDB.

© 2011 – 2013 PERCONA 38 InnoDB Basics TRANSACTION LOG

© 2011 – 2013 PERCONA 39 More on Logs

• Logs are only read from during recovery. – Not even read when we need to write down dirty pages! Log Files • To track pages we have two lists: – The flush list—dirty pages need to be flushed before eviction. – The LRU—clean pages may be evicted. • All log writes are assigned a log sequence number (LSN). This is the number of bytes ever written to the log.

© 2011 – 2013 PERCONA 40 Recovery Process (Simplified)

• InnoDB replays through the log files, starting from the latest completed checkpoint. • Since dirty pages may have been flushed out of order, InnoDB may discover that some changes have already been made. – We can tell if a pages is up to date, because each page internally stores the LSN when it was modified, and the LSN when it was flushed.

© 2011 – 2013 PERCONA 41 Log Files and Checkpoints

• Most database systems work this way: – In Oracle the transaction logs are called “Redo Logs.” • The background process of syncing dirty pages is normally referred to as a “Checkpoint.” • InnoDB has fuzzy checkpointing (more on this later).

© 2011 – 2013 PERCONA 42 Log Writing

• You should increase innodb_log_file_size. Allows InnoDB to “smooth out” background IO. • You should put logs on a separate spindle by specifying innodb_log_group_home_dir. • Be aware that your total log space usage is: – innodb_log_file_size * innodb_log_files_in_group • You can change innodb_log_files_in_group, but there is no significant benefit from this.

© 2011 – 2013 PERCONA 43 Log Writing (cont.)

• innodb_flush_log_at_trx_commit = 1 • Change to 0 or 2 to reduce the durability of this write. – Requires less flushing—particularly helpful on systems without writeback caches! • innodb_log_buffer_size may also help buffer changes longer before writing to the logs. – Very workload dependent—tends to be more helpful for writing big TEXT/BLOB changes.

© 2011 – 2013 PERCONA 44 innodb_flush_log_at_trx_commit

Log Files

Buffer Pool Log Buffer OS Cache RAID Cache Physical Storage

1 fsync every commit 2 fsync once per second 0 fsync once per second

© 2011 – 2013 PERCONA 45 Test Crash Recovery First Command Window Second Command Window CREATE TABLE my_large_innodb ( # Kill the server during the UPDATE: id INT AUTO_INCREMENT PRIMARY KEY, $ sudo killall -9 mysqld a char(255)) ENGINE=InnoDB; # Watch the error log: $ tail -f /data/mysql/db1.err # Create first row Jan 29 15:21:56 130129 15:21:56 InnoDB: Database was INSERT INTO my_large_innodb (a) not shut down normally! VALUES (REPEAT('a', 255)); Jan 29 15:21:56 InnoDB: Starting crash recovery. # double the number of rows . . . Jan 29 15:22:21 InnoDB: 1 transaction(s) which must be INSERT INTO my_large_innodb (a) rolled back or cleaned up SELECT REPEAT('a', 255) FROM Jan 29 15:22:21 InnoDB: in total 54683 row operations my_large_innodb; to undo . . . Jan 29 15:22:24 InnoDB: Starting in background the # Double the rows repeatedly rollback of uncommitted transactions # until it takes >10 seconds Jan 29 15:22:24 130129 15:22:24 InnoDB: Rolling back trx with id 1E359, 54683 rows to undo INSERT INTO my_large_innodb SELECT NULL, . . . REPEAT('a', 255) FROM my_large_innodb; Jan 29 15:22:29 InnoDB: Rolling back of trx id 1E359 completed Jan 29 15:22:29 130129 15:22:29 InnoDB: Rollback of UPDATE my_large_innodb non-prepared transactions completed SET a = REPEAT('b', 255); Jan 29 15:22:30 130129 15:22:30 [Note] /usr/sbin/mysqld: ready for connections. # Kill while UPDATE is executing! (see next)

© 2011 – 2013 PERCONA 46 What’s the Best Log File Size?

• It’s a tradeoff between performance benefit by delaying checkpointing versus time to perform crash recovery—so it depends on your recovery time objective. • It’s not possible to make generic statements such as “a 128M log file will recover in 10 minutes.”

© 2011 – 2013 PERCONA 47 Why Is That?

• Picture the log visually:

Not all flushing to the tablespace is done in order. Recoverability is the By default request merging is difference between these two done to neighbor pages. points. If flushing dirty pages is behind, it will take longer.

Change for which page has been modified in tablespace Change written to log file Unused space in the log file

© 2011 – 2013 PERCONA 48 Those Numbers in INNODB STATUS

Last modification made, --- which has not necessarily LOG --- been flushed. Log sequence number 43921309413 Log flushed up to 43921308508 Last successful operation Last checkpoint at 43497448671 that has been flushed to .. the log files.

Last successful point for • Rough sizing rule of thumb is to trackwhich all background operations have been change in “log sequence number” overcompleted. an hour. i.e. – 3951840903-3836410803 = 115430100 bytes = 110M or (55M * 2 log files) http://www.mysqlperformanceblog.com/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/

© 2011 – 2013 PERCONA 49 Frequently Asked Questions

Q: What do we write to the log? A) Committed changes B) Uncommitted changes C) Both types of changes

© 2011 – 2013 PERCONA 50 Frequently Asked Questions A: C) Both types of changes

© 2011 – 2013 PERCONA 51 Frequently Asked Questions Q: When doing crash recovery are all the changes since the last checkpoint (including the uncommitted changes) replayed by the database?

© 2011 – 2013 PERCONA 52 Frequently Asked Questions A: Yes. Both committed and uncommitted changes are replayed.

This means that the database must also retain information about how to UNDO the uncommitted changes.

© 2011 – 2013 PERCONA 53 Frequently Asked Questions Q: How does the database know which transactions are committed, and that the database shut down correctly?

© 2011 – 2013 PERCONA 54 Frequently Asked Questions A: The log records when transactions start and complete and if they committed or rolled back. It also marks shutdowns and checkpoints. If the log does not contain a commit marker for a transaction, then the transaction must be rolled back.

© 2011 – 2013 PERCONA 55 InnoDB Basics UNDO OPERATION

© 2011 – 2013 PERCONA 56 Undo Information

• The transaction logs store REDO information—and data is written whether committed or not. InnoDB also has to be able to UNDO changes. – It has to be able to show old versions of the data to older transactions that are still active. – It has to be able to reverse a transaction. • The undo information is stored in the InnoDB tablespace in a structure called a rollback segment. • MySQL 5.5 ships with 128 rollback segments.

© 2011 – 2013 PERCONA 57 Basic Operation in Higher Detail

Log Files

START TRANSACTION;

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 58 Basic Operation in Higher Detail

Log Files

START TRANSACTION;

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 59 Basic Operation (cont.)

Log Files

DELETE FROM City' WHERE name = ‘Morgansville' AND CountryCode='AUS'

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 60 Basic Operation (cont.)

Log Files

DELETE FROM City' WHERE name = ‘Morgansville' AND CountryCode='AUS'

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 61 Basic Operation (cont.)

Mark the page dirty

Log Files

DELETE FROM City' WHERE name = ‘Morgansville' AND CountryCode='AUS'

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 62 Basic Operation (cont.)

Copy old row to rollback segment

Log Files

DELETE FROM City' WHERE name = ‘Morgansville' AND CountryCode='AUS'

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 63 Basic Operation (cont.)

Return result to client

Log Files

DELETE FROM City WHERE name = ‘Morgansville' AND CountryCode='AUS'

One row deleted

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 64 Basic Operation (cont.)

Log Files

COMMIT;

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 65 Basic Operation (cont.)

Log Files

COMMIT;

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 66 Basic Operation (cont.)

Will not return OK Until log is synced 01010 Log Files 1

COMMIT;COMMIT;

2

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 67 Basic Operation (cont.)

Checkpoint operation: Flush dirty page to disk 01010 Log Files

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 68 Basic Operation (cont.)

Checkpoint operation: Write page to disk and mark clean in buffer pool 01010 Log Files

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 69 Basic Operation (cont.)

01010 01 Log Files

Checkpoint noted in log

deleted version of row still needs to be cleaned up

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 70 Basic Operation (cont.)

01010 01 Log Files

Purge operation cleans up.

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 71 Basic Operation (cont.)

Purge completed 01010 0101 Log Files

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 72 InnoDB Basics CHECKPOINTING

© 2011 – 2013 PERCONA 73 Checkpointing

• Flushing down dirty pages in the background is called checkpointing • InnoDB has fuzzy checkpointing, which means that there may still be dirty pages in the buffer pool after a checkpoint. • The InnoDB plugin tries to adapt to the rate at which pages are made dirty and flush faster. More on this later.

© 2011 – 2013 PERCONA 74 Fuzzy Checkpointing

These pages still dirty in buffer pool

New checkpoint LSN Last checkpoint LSN

These pages will be flushed to disk

Change for which page has been modified in tablespace Change written to log file Unused space in the log file

© 2011 – 2013 PERCONA 75 Writing Down Data

• When dirty pages are flushed to disk they are done so using the type of IO specified using innodb_flush_method: – The default flush method (no value set) opens logs and data files using buffered IO and calls fsync() to flush data to disk when necessary for ACID compliance. – If the flush method is set to O_DSYNC, then InnoDB opens log files with O_SYNC, and uses buffered IO for data files. – If the flush method is set to O_DIRECT, then InnoDB opens log files with buffered IO and uses direct (unbuffered synchronous) IO on data files.

© 2011 – 2013 PERCONA 76 In Other Words

innodb_flush_method Log Files Data Files unset (default) Buffered I/O Buffered I/O O_DSYNC Write with O_SYNC Buffered I/O O_DIRECT Buffered I/O Direct DMA write ALL_O_DIRECT (XtraDB) Direct DMA write Direct DMA write

© 2011 – 2013 PERCONA 77 innodb_flush_method (default)

01010 0101 Log Files Buffered

OS Cache Buffered

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 78 innodb_flush_method=O_DSYNC

01010 0101 Log Files O_SYNC

OS Cache Buffered

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 79 innodb_flush_method=O_DIRECT

01010 0101 Log Files Buffered

OS Cache

Direct DMA write Buffer Pool Tablespace

© 2011 – 2013 PERCONA 80 innodb_flush_method=ALL_O_DIRECT

01010 0101

Direct DMA write Log Files

OS Cache

Direct DMA write Buffer Pool Tablespace

© 2011 – 2013 PERCONA 81 Why Use O_DIRECT?

• InnoDB caches data pages in the buffer pool. • Without direct I/O, the filesystem caches the same data that InnoDB is already caching, so we have double caching. • If you have a RAID controller, it’s probably caching data as well, so without O_DIRECT you will have triple caching.

© 2011 – 2013 PERCONA 82 Why Not Use O_DIRECT?

• Storage may be slow – No RAID cache – Storage on SAN – Amazon EBS. • Allowing the OS to cache IO requests can give better response time. • Hard to make a general rule—it’s best to test your hardware against your write load.

© 2011 – 2013 PERCONA 83 InnoDB Internals

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA 1 Table of Contents

1. Tablespaces 6. Adaptive Hash

2. Row Formats 7. Insert Buffer

3. Other On Disk Structures 8. Double Write

4. In Memory Information 9. Configuration Parameters

5. Clustered Index

© 2011 – 2013 PERCONA 3 InnoDB Internals TABLESPACES

© 2011 – 2013 PERCONA 4 On Disk Format

• InnoDB stores everything in tablespaces (except the logs). • There is always a “system” or “central” tablespace. • This consists of one physical file called ibdata1 by default. • This file is located in innodb_data_home_dir • By default all tables and indexes go into this file • It’s possible to configure this to be many physical files: ibdata1 (20G), ibdata2 (20G), ibdata3 (20G). • The last file (ibdata3) can autoextend, but every other file is fixed in length.

© 2011 – 2013 PERCONA 5 On Disk Format (cont.)

• Q: Why would you have ibdata1, ibdata2, ibdata3 and not just ibdata1? • A: From InnoDB’s perspective it makes no difference. From your perspective it may be easier to backup split files. • Some filesystems may fragment / have different performance characteristics with very large files. Don’t assume, test! • Some backup tools can backup split files in parallel.

© 2011 – 2013 PERCONA 6 On Disk Format (cont.)

• InnoDB fills each tablespace serially, that is the first before the next, until the last. • There’s no advanced features such as mirroring, striping or round robin writing to the physical files. • InnoDB wants you to use the Operating System to do this (i.e. hardware or software RAID). • Each page is filled 93% (15/16) leaving a small gap. This is also not configurable.

© 2011 – 2013 PERCONA 7 Raw Partitions

• You can bypass the filesystem and configure InnoDB raw partitions. • Not many people are doing this—it’s no longer a performance advantage when compared against modern filesystems like XFS, ext4 and ZFS. • Must be initialized specially before use. • Not recommended.

© 2011 – 2013 PERCONA 8 File Per Table Tablespaces

• You can configure innodb-file-per-table. • Each table has its own tablespace. • It may be composed of only one physical file per table. • You can store some tables in file-per-table and some in the central tablespace (but this is hard to maintain). • Can you store multiple tables in one file-per-table? Can you control storing one table in several physical files? • No, you cannot.

© 2011 – 2013 PERCONA 9 Is File Per Table Better? • Yes (now default in 5.6) • In a central tablespace, when you TRUNCATE/DROP a table, the space is marked as free, and may be reused, but the tablespace files never contract. • In file-per-table, space is reclaimed when you TRUNCATE/DROP a table, reducing the size of a physical backup. • This option is required to use the Barracuda file format. • Modern filesystems perform best with more than one file. • Backup tools can work in parallel.

© 2011 – 2013 PERCONA 10 Is File Per Table Better? (cont.)

• Performance is similar in most cases. • [Con] It incurs additional file descriptors. • [Con] More complicated internal synchronization. For example, on startup checking 100K tables exist takes some time. • [Pro] If using O_DIRECT and ext3, reduces IO serialization.

© 2011 – 2013 PERCONA 11 “Central” Tablespace Contents

• The central tablespace always contains: • A tablespace header • The InnoDB data dictionary • Undo Information • The Doublewrite buffer • The Insert buffer • It often also contains table data and indexes.

© 2011 – 2013 PERCONA 12 Contents, Visually*

Table Space Header

Data Dictionary

Undo Information

File Spaces

Free Space

* A simplification—not exact.

© 2011 – 2013 PERCONA 13 Contents, Visually*

Table Space Header

Data Dictionary

Undo Information

File Spaces

Free Space

* A simplification - not exact.

© 2011 – 2013 PERCONA 14 Contents, Visually*

Table Space Header

Data Dictionary

Undo Information

File Spaces

Free Space

* A simplification - not exact.

© 2011 – 2013 PERCONA 15 Contents, Visually*

Table Space Header

Data Dictionary

Undo Information

File Spaces

Free Space

* A simplification - not exact.

© 2011 – 2013 PERCONA 16 Contents, Visually*

Table 1 Table Space Header

Data Dictionary

Undo Information

File Spaces

Free Space

* A simplification - not exact.

© 2011 – 2013 PERCONA 17 Contents, Visually*

Table 1 Table Space Header

Data Dictionary

Table 2 Undo Information

File Spaces

Table 3 Free Space

* A simplification - not exact.

© 2011 – 2013 PERCONA 18 Visually with File Per Table*

Table Space Header

Data Dictionary

Table 1 Undo Information

File Spaces

Table 2 Free Space

* A simplification—not exact.

© 2011 – 2013 PERCONA 19 Expanding on Storage

• Most internal allocations are one extent at a time.* • innodb_autoextend_increment (how big to expand the tablespace at a time) is also measured in extents.

Segment Extent Extent

Segment Extent Extent Extent Extent

Tablespace Segment Extent (64 pages)

* Not exact. Example coming in a few slides.

© 2011 – 2013 PERCONA 20 Expanding on Storage (cont.)

• Everything fits into pages, which are 16 KB by default*. • A data page looks something like this:

HEADER

Row Row

Row Row

Row Row

TRAILER

* Page size is configurable in MySQL 5.6 and in XtraDB.

© 2011 – 2013 PERCONA 21 Expanding on Storage (cont.)

• Some of the meta data in the header includes: • Information on what page type it is. • A page checksum. • The Log Sequence Number. • The last value of the Log Sequence number as flushed to disk. • Some of the information in the trailer includes: • The last 4 digits of the page LSN repeated.

© 2011 – 2013 PERCONA 22 Page Checksum

• A very important defensive feature of InnoDB. • On read from storage into buffer pool a check is done to detect silent corruption. • On failure, InnoDB intentionally crashes the server!* • Checksums are updated when writing pages back to disk. • Cost “should be” low relative For example, “numbers everyone to the cost of the IO. should know” says compressing 1K of • “Fast checksum” is available data takes 3us, while an IO takes 10ms. in XtraDB. This may be important for workloads with a high page turnover rate, and fast storage (SSDs).

* Behavior can be changed in Percona Server with XtraDB.

© 2011 – 2013 PERCONA 23 InnoDB Internals ROW FORMATS

© 2011 – 2013 PERCONA 24 What Does a Row Look Like?

• The default format is COMPACT— Transaction ID this stores NULL values efficiently. Roll Pointers • Most important features to note here are: Field Pointers • Storage of Transaction ID. • Storage of pointer to older version Field #1 of row in UNDO space. Field #2

Field #3

© 2011 – 2013 PERCONA 25 Field -> Row -> Page

• All fields are always stored inline in the row (with exceptions; see the next slide). • Rows can not be split across pages. • InnoDB requires that two rows must fit within one page.* • This means that there is an effective row size limit of approximately 8000 bytes (with the exception of variable width fields).

* This restriction changes to 1 row for the compressed row format.

© 2011 – 2013 PERCONA 26 Blob/Long Text Storage

• VARCHAR/TEXT/BLOB/VARBINARY (strings) are the special case. • They are all handled the same by InnoDB: • If they fit in the row, they will be stored in the row. • The InnoDB plugin optimizes off-page storage of string columns when using the Barracuda file format and the DYNAMIC or COMPRESSED row formats are selected for a table.

© 2011 – 2013 PERCONA 27 Blob/Long Text Storage

• Selecting columns for off-page storage: • If a row is too large to fit on the page due to strings, the longest string is selected for off page storage • If the row is still too large, the next largest string is selected • This process continues until the row fits, or all strings have been selected for off page storage. If the row still does not fit, an error results.

© 2011 – 2013 PERCONA 28 Blob/Long Text (cont.)

• How are blobs stored on the page? • Antelope or ROW_FORMAT=REDUNDANT|COMPACT • First 768 bytes are stored on the parent page, remainder is stored on overflow page(s) and a twenty byte pointer is stored on the parent page pointing to the overflow page(s). • Barracuda and ROW_FORMAT=DYNAMIC|COMPRESSED • The string is stored entirely on overflow page(s). On parent page, a 20 byte pointer to overflow page(s) is stored.

© 2011 – 2013 PERCONA 29 Blob/Long Text (cont.)

• Allocation of off-page storage for large strings • InnoDB uses modified InnoDB allocation rules: • Initially allocates 1 page at a time up to 32 pages. • Then allocates 1 extent at a time. • These segments will not be shared between off-page blobs.

© 2011 – 2013 PERCONA 30 Blob/Long Text “Bloat”

• A 600K blob will take approximately 1.5MB for storage! • .. and a very small blob (800 bytes?) that does not fit in the main row will take a full 16K over flow page! • Updates are also never in place—and always in a new extent. • Blobs are always effectively written at least twice*: • Transaction Log and Tablespace.

* As well as doublewrite buffer (to be covered later) and the binary log if it is enabled

© 2011 – 2013 PERCONA 31 Blob/Long Text (cont.)

• Some optimization is possible understanding this: • Simple compression (even application-side) may be enough to keep blobs less than 32 pages or located on-page. • One large Blob may be faster than several medium ones due to wasted space in allocation.

© 2011 – 2013 PERCONA 32 Blob/Long Text (cont.)

• [False-Optimization] InnoDB will NOT read blobs unless they are touched by the query. • i.e. No need to apply optimizations such as moving BLOBs to separate table to condense buffer pool memory consumption. • [Good-optimization] It may sometimes make sense to serialize infrequently used/unindexed columns to a BLOB column on very wide tables. • [Don’t use when] The average blob still fits on the page. If this is the case the row size will still be inflated and you lose ability to easily access the columns.

© 2011 – 2013 PERCONA 33 InnoDB Internals OTHER ON DISK STRUCTURES

© 2011 – 2013 PERCONA 34 What does Undo look like?

• Undo information is stored in a rollback segment. • A rollback segment consists of 1024 undo slots. • Each open transaction requires at least 1 undo slot. • How many rollback segments are there? • MySQL up to 5.1: one segment. • MySQL 5.5: 128 segments. • MySQL 5.6: configurable up to 128 segments. • Rollback segments are in the global tablespace by default, but may optionally be located externally in MySQL 5.6.

© 2011 – 2013 PERCONA 35 What Does the Log File Look Like?

• The log file does not store a complete Space ID + Page ID is the internal addressing system. page, it just stores the changes that need to be made to recreate the page: ibdata1 is always Space ID 0. Page ID is the page number • Space ID + Page ID + Offset + Payload. inside that space. • Additional markers such as transaction start/end too. • All writes are 512 byte aligned*.

* Percona server allows innodb_log_block_size to be adjusted to 4096

© 2011 – 2013 PERCONA 36 InnoDB Internals IN MEMORY INFORMATION

© 2011 – 2013 PERCONA 37 InnoDB Buffer Pool

• Configured by innodb_buffer_pool_size • The main setting for InnoDB caching—responsible for all pages types (data, indexes, undo, insert buffer...) • You may be surprised how much insert buffer and undo pages can use at times!

© 2011 – 2013 PERCONA 42 InnoDB Buffer Pool (cont.)

• Recommended size is “about 50-80% of memory,” but there are better ways of calculating. • [Warning] Meta data always consumes an additional 5-10% more space on top of that. • [Warning] Some subsystems in MySQL rely on healthy OS caches, i.e. binary logs, relay logs.

http://www.mysqlperformanceblog.com/2010/08/23/innodb-memory-allocation-ulimit-and-opensuse/

© 2011 – 2013 PERCONA 43 InnoDB Buffer Pool (cont.)

• Exception cases always exist… • Your database is small and fits in a few MB of buffer pool. • You run other services on the same host. • You have multiple MySQL instances on the same host.

© 2011 – 2013 PERCONA 44 InnoDB Buffer Pool (cont.)

• It’s okay to pick an 80%-like number as a starting point. • It’s important to confirm with statistics on free memory and refine accordingly:

$ free -m total used free shared buffers cached Mem: 32177 30446 1730 0 368 16649 -/+ buffers/cache: 13428 18748 Swap: 4095 2 4093 Filesystem Memory used Memory used caches. excluding caches. excluding caches.

TIP: Most operating systems delay allocation. Even if InnoDB asks for 25GB on startup, the memory is only used the first time it’s read/written to.

© 2011 – 2013 PERCONA 45 Buffer Pool Recycling

• The LRU is a list of the clean (unmodified) pages in the buffer pool. • Lesser-used pages are likely to be evicted so that queries can load new pages when the buffer pool is full. • The pages to be evicted are the least recently used, i.e. the “oldest” pages. • Any read operation causes a page to move to the top of the LRU, making it most recently used, or “young.”

© 2011 – 2013 PERCONA 46 Buffer Pool Recycling (cont.)

most recently used

least recently used

© 2011 – 2013 PERCONA 47 Buffer Pool Recycling (cont.)

new page loads from disk, becomes “youngest”

oldest page is evicted

© 2011 – 2013 PERCONA 48 Buffer Pool Recycling (cont.)

a query reads an existing page, moving it to the top of the LRU—the “youngest” position

© 2011 – 2013 PERCONA 49 Buffer Pool Eviction Protection

a large table-scan has the risk of filling the buffer pool with pages that will only be read once, and causing many other pages to be evicted

© 2011 – 2013 PERCONA 50 Buffer Pool Eviction Protection (cont.)

• The solution is that the buffer pool is actually divided into two parts: • Top 5/8 part contains “hot” pages that have lived in the buffer pool for a while. They are requested frequently enough to avoid eviction. • Bottom 3/8 part contains “cold” pages that have lived in the buffer pool for a short time. This counts as the bottom of the LRU, and the pages are more likely to be evicted.

This feature was introduced in MySQL 5.1 with the InnoDB plugin.

© 2011 – 2013 PERCONA 51 Buffer Pool Eviction Protection (cont.)

“hot” = “youngest”

“cold” = “oldest”

© 2011 – 2013 PERCONA 52 Buffer Pool Eviction Protection (cont.)

new page loads from disk, but enters in the old part

oldest page is evicted

© 2011 – 2013 PERCONA 53 Buffer Pool Eviction Protection (cont.)

at worst, a table-scan causes evictions only in the old part of the buffer pool, which allows the most frequently- requested (“hot”) pages to stay in the young part.

© 2011 – 2013 PERCONA 54 Buffer Pool Eviction Protection (cont.)

• You can tune the size of the part of the buffer pool reserved for old pages, as a percentage: • innodb_old_blocks_pct = 37 (default for 3/8)

• How long does a page have to stay in the old part before it’s promoted to the young part? • innodb_old_blocks_time = N (milliseconds) • MySQL 5.6 default is 1000ms. • MySQL 5.1 & 5.5 default is 0ms (but 1000ms works well).

© 2011 – 2013 PERCONA 55 Leave Some Memory for FS cache

• InnoDB uses the filesystem cache for writing to the transaction logs, and by default for writing to data files. • It’s important to leave enough free memory in the system for FS cache. • FS cache is also used when writing the binary and relay logs, temporary tables on disk, the binary log cache and other disk based operations like filesorts.

© 2011 – 2013 PERCONA 56 Other Memory Settings

• Maximum percentage of the buffer pool that can contain dirty pages: innodb_max_dirty_pages_pct=N (default: 75) • You could decrease this if you are concerned that you are evicting clean pages too frequently to service reads, because the dirty pages can’t be made free. • Previously, this setting was the only way to increase the rate at which the background thread flushed dirty pages.

In InnoDB plugin/XtraDB, likely better way to set this is innodb_io_capacity=N

© 2011 – 2013 PERCONA 57 innodb_io_capacity

• MySQL 5.1 had an upper limit of 100 background operations* per second (harcoded). • MySQL 5.5 introduced innodb_io_capacity as a configurable variable. Its default value is 200 IOPS. • Only used: • During “aggresive flushing” (i.e. when modified pages exceeds innodb_max_dirty_pages_pct). • For “estimate” adaptive flushing. • On server idleness.

* Only main thread background operations are accounted for this limit (flushing dirty pages and insert buffer merges). Eg. Not for purge or foreground operations

© 2011 – 2013 PERCONA 58 Other Memory Settings (cont.)

• innodb_log_buffer_size = N (default 8MB) • Size of buffer to spool changes before writing to logs. • SHOW GLOBAL STATUS LIKE 'innodb_log_waits'; • This indicates the number of times the buffer was full and needed to be synchronously flushed. • Default is normally fine. Usually only an issue when writing large TEXT/BLOBs.

© 2011 – 2013 PERCONA 59 Statistics Sampling

• InnoDB only keeps statistics in memory, not on disk.* • Sampling is performed when a table is first opened, and estimates are based on an estimate from sampling 8 random pages. • This number is used whether the table have 10 rows or 10 million rows.

In InnoDB plugin this is now configurable with innodb_stats_sample_pages. The setting is global, and will apply to all tables.

*In XtraDB 12 statistics can be retained with innodb_use_sys_stats_table=1.

© 2011 – 2013 PERCONA 60 Statistics (cont.)

• Statistics are regenerated on most metadata commands: • SHOW TABLE STATUS • SHOW INDEX • Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.STATISTICS. • Disable with innodb_stats_on_metadata=0 This is why I_S (this is disabled by default in MySQL 5.6). seems so slow! • Also: • When using ANALYZE TABLE. • When the table size changes by more than 1/16th. • If more than 2,000,000,000 rows have been inserted. • Disable with innodb_stats_auto_update=0 (XtraDB only).

© 2011 – 2013 PERCONA 61 InnoDB Internals CLUSTERED INDEX

© 2011 – 2013 PERCONA 62 Clustered Index

• Everything in InnoDB is an index: • Data is stored in a clustered index organized by the primary key. In the absence of a primary key, the first unique not null key is selected*. • Other indexes are stored in secondary indexes.

* In the absence of a unique key, a hidden 6 byte key is created.

© 2011 – 2013 PERCONA 63 What Is a Clustered Index?

• First let’s look at how MyISAM stores data*: Staff.MYD Staff.MYI 8 ID First Name 1 Peter 4 12 2 Vadim 2 6 10 14 1 3 5 7 9 11 13 15 7 Morgan 4 Justin Data is stored “roughly” in insertion order, with no guarantees, i.e. 5 Baron .. .. Deleted rows may be filled with newer records.

* Illustrating B-Tree as Binary Tree for simplicity

© 2011 – 2013 PERCONA 64 What Is a Clustered Index?

• First let’s look at how MyISAM stores data*: Staff.MYD Staff.MYI 8 ID First Name 1 Peter 4 12 2 Vadim 2 6 10 14 1 3 5 7 9 11 13 15 7 Morgan 4 Justin 5 Baron .. ..

* Illustrating B-Tree as Binary Tree for simplicity

© 2011 – 2013 PERCONA 65 What Is a Clustered Index? (cont.)

• A MyISAM primary key lookup looks something like this: Staff.MYD Staff.MYI 8 ID First Name 1 Peter 4 12 2 Vadim 2 6 10 14 1 3 5 7 9 11 13 15 7 Morgan 4 Justin

Traverse the index to find 5 Baron the address of the row we .. .. are looking for.

© 2011 – 2013 PERCONA 66 What Is a Clustered Index? (cont.)

• A MyISAM primary key lookup looks something like this: Staff.MYD Staff.MYI 8 ID First Name Lookup the address 1 Peter 4 12in the data file. 2 Vadim 2 6 10 14 1 3 5 7 9 11 13 15 7 Morgan 4 Justin

Traverse the index to find 5 Baron the address of the row we .. .. are looking for.

© 2011 – 2013 PERCONA 67 What Is a Clustered Index? (cont.)

• MyISAM secondary key lookups are the same as PK lookups: Staff.MYD Extension Staff.MYI number 8 ID First Name 1 Peter 4 12 2 Vadim 2 6 10 14 1 3 5 7 9 11 13 15 7 Morgan 4 Justin

Step 2 – fseek 5 Baron Step 1 – Traverse to offset .. .. index to get file offset

© 2011 – 2013 PERCONA 68 What Is a Clustered Index? (cont.) • An InnoDB Primary Key lookup looks like this: Staff.ibd

12 0xABCD

4 0xACDC 12 0xACDC

2 0xACDC 6 0xACDC 10 0xACDC 14 0xACDC

1 .. 3 .. 5 .. 7 .. 9 .. 11 .. 13 .. 15 ..

* Illustrating B-Tree as Binary Tree for simplicity

© 2011 – 2013 PERCONA What Is a Clustered Index (cont.)

• An InnoDB Primary Key lookup looks like this: Staff.ibd Traverse the index to find the full row. 12 0xACDC

4 0xACDC 12 0xACDC

STOP 14 0xACDC 2 Vadim, 1234, male, 7, .. HERE

1 .. 3 .. 5 .. 7 .. 9 .. 11 .. 13 .. 15 ..

* Illustrating B-Tree as Binary Tree for simplicity

© 2011 – 2013 PERCONA Secondary Index • A secondary key lookup looks like this:

extension_number

8

4 12 12 0xACDC

2 6 10 14 1 3 5 7 9 11 13 15 4 0xACDC 12 0xACDC

2 0xACDC 6 0xACDC 10 0xACDC 14 0xACDC

1 .. 3 .. 5 .. 7 .. 9 .. 11 .. 13 .. 15 ..

© 2011 – 2013 PERCONA 71 Secondary Index (cont.) • A secondary key lookup looks like this:

extension_number

8

4 12 12 0xACDC

2 6 10 14 1 3 5 7 9 11 13 15 4 0xACDC 12 0xACDC

2 0xACDC 6 0xACDC 10 0xACDC 14 0xACDC Traverse the index to find the value of the primary key. 1 .. 3 .. 5 .. 7 .. 9 .. 11 .. 13 .. 15 ..

© 2011 – 2013 PERCONA 72 Secondary Index (cont.) • A secondary key lookup looks like this:

extension_number Traverse the primary 8 key to find the full row.

4 12 12 0xACDC

2 6 10 14 1 3 5 7 9 11 13 15 4 0xACDC 12 0xACDC

2 0xACDC 6 0xACDC 10 0xACDC 14 0xACDC Traverse the index to Extension 7 “points to” PK id value 5 find the value of the primary key. 1 .. 3 .. 5 .. 7 .. 9 .. 11 .. 13 .. 15 ..

© 2011 – 2013 PERCONA 73 Index Consequences

• This design has some interesting consequences: • Primary key lookups are very fast. • Inserting data in order is fast—out of order can be very slow, and cause fragmentation. • Secondary indexes can become very large if you have a large primary key. • In practical terms this means: • Don’t use UUIDs for InnoDB tables!

© 2011 – 2013 PERCONA 74 InnoDB Internals ADAPTIVE HASH

© 2011 – 2013 PERCONA 75 Adaptive Hash

• Secondary index lookups are slower in InnoDB: • First you need to scan the secondary index. • Then you can scan the primary key index. • But most workloads have hotspots. • InnoDB monitors index usage. Frequently accessed values an inserted into an in-memory hash table to accelerate lookups. • The hash table does not have to cover the whole index.

© 2011 – 2013 PERCONA 76 Adaptive Hash (cont.)

extension_number

8

4 12 12 0xACDC

2 6 10 14 1 3 5 7 9 11 13 15 4 0xACDC 12 0xACDC

2 0xACDC 6 0xACDC 10 0xACDC 14 0xACDC

1 .. 3 .. 5 .. 7 .. 9 .. 11 .. 13 .. 15 ..

© 2011 – 2013 PERCONA 77 Adaptive Hash (cont.) Search the adaptive hash

extension_number

8

4 12 12 0xACDC

2 6 10 14 1 3 5 7 9 11 13 15 4 0xACDC 12 0xACDC

2 0xACDC 6 0xACDC 10 0xACDC 14 0xACDC

1 .. 3 .. 5 .. 7 .. 9 .. 11 .. 13 .. 15 ..

© 2011 – 2013 PERCONA 78 Adaptive Hash (cont.) Search the adaptive hash

extension_number

8 Find a pointer directly to 12 0xACDC 4 12 the leaf node of the clustered index. 2 6 10 14 1 3 5 7 9 11 13 15 4 0xACDC 12 0xACDC

2 0xACDC 6 0xACDC 10 0xACDC 14 0xACDC

1 .. 3 .. 5 .. 7 .. 9 .. 11 .. 13 .. 15 ..

© 2011 – 2013 PERCONA 79 Adaptive Hash (cont.)

• Not much transparency into its operation: ------INSERT BUFFER AND ADAPTIVE HASH INDEX ------.. Hash table size 31874747, node heap has 9526 buffer(s) 25448.49 hash searches/s, 54424.21 non-hash searches/s

Number of pages that the hash table takes up. How many spaces in i.e. 9526 = 126M the hash table.

© 2011 – 2013 PERCONA 80 Adaptive Hash Concerns

• Can be the cause some multi-cpu scalability issues. • This problem increases as the number of cores increases • In SHOW INNODB status, waits on btr0sea.c may indicate adaptive hash contention. • Disabling it will cause worse “performance,” but better concurrency.

© 2011 – 2013 PERCONA 81 innodb_adaptive_hash_index_partitions

• Percona Server feature. • Partition selection is based on index_id. • A hot table or index can prevent this from being useful—table partitioning can help in this case

© 2011 – 2013 PERCONA 82 Adaptive Hash Concerns

• Look in SHOW ENGINE INNODB STATUS for locks on btr0sea.c.

------SEMAPHORES ------Thread 140054029002496 has waited at btr0sea.c line 631 for 1.00 seconds the semaphore: X-lock (wait_ex) on RW-latch at 0x78733f8 created in file btr0sea.c line 182 a writer (thread id 140054029002496) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0sea.c line 879 Last time write locked in file btr0cur.c line 1896

© 2011 – 2013 PERCONA 83 InnoDB Internals CHANGE BUFFER

© 2011 – 2013 PERCONA 84 Reads vs. Writes

• Adaptive Hash helps improve secondary index reads. • Change Buffer helps improve secondary index writes.

© 2011 – 2013 PERCONA 85 Change Buffer

• Very useful for non-unique indexes. • Reduces random IO by delaying index creation and first writing to a buffer. • Potential for merging update requests. • Only works for non-unique indexes • This is because the index has to be checked for uniqueness violations when a unique index is modified. • Only used when the index page is not already in the buffer pool, otherwise the modification is made directly to the page

© 2011 – 2013 PERCONA 86 Change Buffer (cont.)

• Builds the index pages just-in-time if destination page is loaded into buffer pool. • Entirely safe—never returns wrong data. • But slows down lookup operations slightly because the insert buffer has to be checked for lookup operations

© 2011 – 2013 PERCONA 87 Change Buffer Insert in Buckley

The insert performance relies on the tree fitting in memory.

© 2011 – 2013 PERCONA 88 Change Buffer

Insert in Myers

The insert performance relies on the tree fitting in memory.

© 2011 – 2013 PERCONA 89 Change Buffer

Insert in Jones

The insert performance relies on the tree fitting in memory.

© 2011 – 2013 PERCONA 90 Change Buffer

Random insert performance relies on the tree fitting in memory.

© 2011 – 2013 PERCONA 91 Change Buffer (cont.) Insert in Buckley

Insert Buffer Insert in Myers

Insert in Jones

© 2011 – 2013 PERCONA 92 Change Buffer (cont.)

Insert Buffer

1 check buffer

Select Jones 2 get page

3 Return merged results

© 2011 – 2013 PERCONA 93 Change Buffer Concerns

• Potentially takes up to 25% of the buffer pool by default! • Max memory is configurable in MySQL 5.6. • May want to disable it on SSDs. • Still has the advantage that requests are reduced for merging, but random IO reduction is of no help. • Buffer pool may be better spent on other pages.

© 2011 – 2013 PERCONA 94 Insert Buffer Efficiency Current size of used memory in the insert buffer in pages. i.e. 1 page = 16K ------INSERT BUFFER AND ADAPTIVE HASH INDEX ------Ibuf: size 1, free list len 16829, seg size 16831, 0 inserts, 0 merged recs, 1 merges .. Total amount of space allocated, but Total amount of space that has been unused. 16829 pages = 262 MB. allocated for the insert buffer. 16831 pages = 262 MB.

© 2011 – 2013 PERCONA 95 Insert Buffer Efficiency (cont.)

------INSERT BUFFER AND ADAPTIVE HASH INDEX ------Ibuf: size 1, free list len 16829, seg size 16831, 0 inserts, 0 merged recs, 1 merges .. A “merge” indicates How many merged records success at eliminating a Number of row inserts physical IO operation. into the insert buffer the inserts have resulted in. since server startup.

© 2011 – 2013 PERCONA 96 Additional Notes

• Change buffering was originally called the insert buffering. • In many parts of the source code and server diagnostics it still appears as “insert buffer.” • The change buffer has been extended to handle DELETE and UPDATE in MySQL 5.5. You can enable or disable individual features. innodb_change_buffering=inserts

© 2011 – 2013 PERCONA 97 InnoDB Internals DOUBLE WRITE BUFFER

© 2011 – 2013 PERCONA 98 Double Write Buffer

• This is a feature for data integrity, not performance. • Changes to table space are “double written” to a section of the main tablespace by default. This prevents partially written/corrupt pages. • Filesystem journals prevent meta data corruption, but data can still be problematic.

© 2011 – 2013 PERCONA 99 Double Write Buffer (cont.) • The extended description of how the background thread flushes dirty pages: Sync to double write area.

Buffer

Sync individual pages to correct destinations. Buffer Pool Tablespace

© 2011 – 2013 PERCONA 100 Actual Implementation

• The double-write area is 2MB and consists of two parts. • i.e. 2x64 pages = two extents • Writing to this buffer is serialized. InnoDB synchronously confirms that data is written. • Then individual pages are written asynchronously to destination locations using InnoDB write threads.

© 2011 – 2013 PERCONA 101 Double Write Buffer (cont.)

• Q: When would you want to disable the double write buffer? • A: Transactional filesystems like ZFS or btrfs ensure that writes are atomic. This is about the only reason you would want to disable it.

© 2011 – 2013 PERCONA 102 Double Write Buffer Concerns

• Doublewrite buffer concentrates writes to one main area of the ibdata1 tablespace. • Performance becomes more critical on storage media with limited write capacity and/or media that has no random IO penalty (SSDs). • In XtraDB it’s possible to move the doublewrite buffer out of the main tablespace.

© 2011 – 2013 PERCONA 103 InnoDB Internals CONFIGURATION PARAMETERS

© 2011 – 2013 PERCONA 104 Let’s Recap…

© 2011 – 2013 PERCONA Basic Operation (again)

Log Files

SELECT * FROM City WHERE CountryCode=′ AUS′

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 106 Basic Operation (again)

01010

Log Files

UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS'

Buffer Pool Tablespace

© 2011 – 2013 PERCONA 107 Basic Operation (again)

Set innodb_buffer_pool_size to “50- Move the log files to separate 80%” of memory. spindles (sequential IO) 01010 01010 Log Files Set innodb_flush_log_at_trx_commit=2 if If innodb_log_waits > 0 the buffer durability is not as important. beingUPDATE filled City SET (innodb_log_buffer_sizename = 'Morgansville' ) before WHERE name = 'Brisbane' writingAND to CountryCode the log files='AUS' may be too Typically use small. innodb_flush_method=O_DIRECT if using a Hardware RAID controller.

Increase the size of the log files: • innodb_log_file_size • innodb_log_files_in_group Buffer Pool Tablespace

© 2011 – 2013 PERCONA 108 InnoDB Concurrency and Locking

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA 1 Table of Contents

1. MVCC 4. Configuration Parameters

2. Purging Old Data 5. Maintenance Tasks

3. Locking

© 2011 – 2013 PERCONA 2 InnoDB Concurrency MVCC

© 2011 – 2013 PERCONA 3 Multiversion Concurrency Control

• InnoDB has a transaction ID counter which just keeps on incrementing:

------TRANSACTIONS ------Trx id counter 0 14080

• This counter is written to the row header of each row.

© 2011 – 2013 PERCONA 4 The Reason Why Users

id name favourite food User Friendship 1 Morgan Onion Soup

2 Tom Meatloaf user_left user_right

3 Baron Organic Chocolate 1 2 2 1

Blog Posts 3 1

id user_id favourite food 1 3

1 1 Hello, today is the ..

2 2 Yow!

3 1 An update from ...

© 2011 – 2013 PERCONA 5 The Reason Why Users

id name favourite food User Friendship 1 Morgan Onion Soup

2 Tom Meatloaf user_left user_right

3 Baron Organic Chocolate 1 2 2 1

Blog Posts 3 1

id user_id favourite food 1 3

1 1 Hello, today is the ..

2 2 Yow!

3 1 An update from ...

© 2011 – 2013 PERCONA 6 The Reason Why Users

id name favourite food User Friendship 1 Morgan Onion Soup

2 Tom Meatloaf user_left user_right

3 Baron Organic Chocolate 1 2

4 Justin Pizza 2 1 Blog Posts 3 1

id user_id favourite food 1 3

1 1 Hello, today is the ..

2 2 Yow!

3 1 An update from ...

© 2011 – 2013 PERCONA 7 The Reason Why Users

id name favourite food User Friendship 1 Morgan Onion Soup

2 Tom Meatloaf user_left user_right

3 Baron Organic Chocolate 1 2

4 Justin Pizza 2 1 Blog Posts 3 1

id user_id favourite food 1 3

1 1 Hello, today is the ..

2 2 Yow!

3 1 An update from ...

© 2011 – 2013 PERCONA 8 The Reason Why Users

id name favourite food User Friendship 1 Morgan Onion Soup

2 Tom Meatloaf user_left user_right

3 Baron Organic Chocolate 1 2

4 Justin Pizza 2 1 Blog Posts 3 1

id user_id favourite food 1 3

1 1 Hello, today is the ..

2 2 Yow!

3 1 An update from ...

© 2011 – 2013 PERCONA 9 The Reason Why Users

id name favourite food User Friendship 1 Morgan Onion Soup

2 Tom Meatloaf user_left user_right

3 Baron Organic Chocolate 1 2

4 Justin Pizza 2 1 Blog Posts 3 1

id user_id favourite food 1 3

1 1 Hello, today is the .. 3 4

2 2 Yow! 4 3

3 1 An update from ...

© 2011 – 2013 PERCONA 10 The Reason Why Users

id name favourite food User Friendship 1 Morgan Onion Soup

2 Tom Meatloaf user_left user_right

3 Baron Organic Chocolate 1 2

4 Justin Pizza 2 1 Blog Posts 3 1

id user_id favourite food 1 3

1 1 Hello, today is the .. 3 4

2 2 Yow! 4 3

3 1 An update from ...

© 2011 – 2013 PERCONA 11 Upon Recovery Users

id name favourite food User Friendship 1 Morgan Onion Soup

2 Tom Meatloaf user_left user_right

3 Baron Organic Chocolate 1 2 2 1

Blog Posts 3 1

id user_id favourite food 1 3

1 1 Hello, today is the .. 3 4

2 2 Yow! 4 3

3 1 An update from ...

© 2011 – 2013 PERCONA 12 The Problem?

• This race condition is true whether it be three tables or a single table. • ACID requirement: • Every transaction in the database should see the database as it were at one consistent point in time. • If you don’t use transaction commands, each statement is it’s own transaction - but the same must also be true.

© 2011 – 2013 PERCONA 13 Atomicity

Consistency

ACID

Isolation

Durability

14

© 2011 – 2013 PERCONA How MyISAM Solves This Users

id name favourite food User Friendship 1 Morgan Onion Soup

2 Tom Meatloaf user_left user_right

3 Baron Organic Chocolate 1 2 2 1

Blog Posts 3 1

id user_id favourite food 1 3

1 1 Hello, today is the ..

2 2 Yow!

3 1 An update from ...

© 2011 – 2013 PERCONA 15 How MyISAM Solves This Users

[Read id name favourite food Lock] User Friendship 1 Morgan Onion Soup [Read user_left user_right 2 Tom Meatloaf Lock] 3 Baron Organic Chocolate 1 2 2 1

Blog Posts 3 1

[Read id user_id favourite food 1 3 Lock] 1 1 Hello, today is the ..

2 2 Yow!

3 1 An update from ...

© 2011 – 2013 PERCONA 16 How MyISAM Solves This Users

[Read id name favourite food Lock] User Friendship 1 Morgan Onion Soup [Read user_left user_right 2 Tom Meatloaf Lock] 3 Baron Organic Chocolate 1 2 2 1

Blog Posts 3 1

[Read id user_id favourite food 1 3 Lock] 1 1 Hello, today is the ..

2 2 Yow!

3 1 An update from ...

© 2011 – 2013 PERCONA 17 How MyISAM Solves This Users

[Read id name favourite food Lock] User Friendship 1 Morgan Onion Soup [Read user_left user_right 2 Tom Meatloaf Lock] 3 Baron Organic Chocolate 1 2 2 1

Blog Posts 3 1

[Read id user_id favourite food 1 3 Lock] 1 1 Hello, today is the ..

2 2 Yow!

3 1 An update from ...

© 2011 – 2013 PERCONA 18 How MyISAM Solves This Users

[Read id name favourite food Lock] User Friendship 1 Morgan Onion Soup [Read user_left user_right 2 Tom Meatloaf Lock] 3 Baron Organic Chocolate 1 2 2 1

Blog Posts 3 1

[Read id user_id favourite food 1 3 Lock] 1 1 Hello, today is the ..

2 2 Yow!

3 1 An update from ...

© 2011 – 2013 PERCONA 19 How InnoDB Implements This Users User Friendship

Trx ID id name favourite food Trx ID user_left user_right

65 1 Morgan Onion Soup 80 1 2

72 2 Tom Meatloaf 80 2 1

108 3 Baron Organic Chocolate 110 3 1 110 1 3 Blog Posts

Trx ID id user_id favourite food The Trx ID is hidden 81 1 1 Hello, today is the .. metadata. We can make sure the read operation doesn’t see 92 2 2 Yow! newer versions it is not supposed to see yet. 130 3 1 An update from ...

© 2011 – 2013 PERCONA 20 Consistent Snapshots

mysql> START TRANSACTION WITH CONSISTENT SNAPSHOT;

Starts a new transaction. We’ll say this transaction is transaction id 131.

© 2011 – 2013 PERCONA 21 Read During a Transaction (cont.) Users User Friendship

Trx ID id name favourite food Trx ID user_left user_right Trx

65 1 Morgan Onion Soup 80 1 2 131 72 2 Tom Meatloaf 80 2 1

108 3 Baron Organic Chocolate 110 3 1 110 1 3 Blog Posts

Trx ID id user_id favourite food Trx When the backup examines the

131 81 1 1 Hello, today is the .. third row it will look in the 92 2 2 Yow! rollback segment to get the older version of the row. 132 3 1 This row changed

© 2011 – 2013 PERCONA 22 In MVCC

• Multi-Versioning Concurrency Control • Readers don’t block writers. • You can get a consistent view of the data by just remembering what transaction id you are up to:

Trx read view will not see trx with id>= 0 80157601, sees <0 80157597

• Older versions can be read from UNDO space.

© 2011 – 2013 PERCONA 23 MVCC Exercise

• Demonstrates how MVCC allows concurrent reads and writes against the same data. • Also demonstrates a deadlock. • note: This requires two separate sessions, and all of the steps completed in this exact order. $ mysqladmin -uroot create world $ gunzip -c world-innodb.sql.gz | \ mysql -uroot world

© 2011 – 2013 PERCONA 24 ALTER TABLE City engine=InnoDB; ALTER TABLE City ADD INDEX (Name); START TRANSACTION;

START TRANSACTION;

SELECT * FROM City WHERE name = 'Seattle';

SELECT * FROM City WHERE name = 'Seattle';

UPDATE City SET Name = 'Not Seattle' WHERE name = 'Seattle';

SELECT * FROM City WHERE name = 'Seattle';

UPDATE City SET name = 'Not New York' WHERE name = 'New York';

SELECT * FROM City WHERE name = 'New York';

UPDATE City SET name = 'Not Boston' WHERE name = 'Boston';

UPDATE City SET name = 'Seattle2' WHERE name = 'Seattle';

UPDATE City SET Name = 'New York2' WHERE name = 'New York';

© 2011 – 2013 PERCONA 25 MVCC Drawbacks (1/2)

• On modification, previous versions of rows are relocated to UNDO space. • With one very old transaction - or many concurrent transactions, you may need to retain a lot of old data: ------TRANSACTIONS History list length shows the ------number of unpurged transactions that have modified data. Trx id counter 0 14080 Purge done for trx's n:o < 0 12929 undo n:o < 0 0 History list length 3

Tip: As well as holding many old versions, long transactions may increase locking contention - since locks are held for the duration of the transaction.

© 2011 – 2013 PERCONA 26 MVCC Drawbacks (2/2)

• The row must store coordinates for MVCC to be able to work. This storage overhead is most visible - and can be significant with very short rows: • DB_TRX_ID (6 bytes) - transaction inserted/updated row. • DB_ROLL_PTR (7 bytes) - pointer to previous row version in undo space.

© 2011 – 2013 PERCONA 27 Multi Versioning Indexes

• Indexes contain pointers to all versions. • i.e. Index key 5 will point to all rows which were 5 in the past. • Index pages also contain TRX_ID information / understand versioning. This is required to be able to support “covering indexes.” • Many old versions in indexes is a potential performance problem: • It can slow down access from additional traversal overhead. • Even when older versions are purged, “holes” are left in their place.

© 2011 – 2013 PERCONA 28 MVCC Performance (1/2)

• It is the whole row (excluding off-page blobs) that is relocated to undo space on modification. • This means very wide rows with only one field being modified incur significant overhead. • A separate table to store “counters” or other hot columns can often make sense.

© 2011 – 2013 PERCONA 29 MVCC Performance (2/2)

• Not all performance characteristics are well instrumented. For example - • InnoDB has a counter “Rows Read.” • One single “row read” could correspond to thousands of versions/index entries being traversed. • [Tip] It is best to keep transactions as short-lived as possible. Especially if there are many concurrent updates.

© 2011 – 2013 PERCONA 30 InnoDB Concurrency PURGING OLD DATA

© 2011 – 2013 PERCONA 31 Cleaning up Garbage

• Old row and index entries need to be removed. This can not happen until they are no longer needed by any active transaction - • REPEATABLE READ - requires that all changes be available dating back to the start of the transaction. • READ-COMMITTED - requires everything be available at statement start. • The purging is handled via the “server main thread”*.

* Other options exist in XtraDB / MySQL 5.5

© 2011 – 2013 PERCONA 32 Purging Entries from Undo (1/2)

• Default purge operation is single threaded, and is performed along with other maintenance tasks by the main thread (5.1 behavior). • If many updates happen concurrently it is possible that this thread may not be able to keep. • i.e. the history list length just grows, showing a count of unpurged transactions. • You can use innodb_max_purge_lag (default: 0) to throttle updates when purging is behind.

© 2011 – 2013 PERCONA 33 Purging Entries from Undo (2/2)

• It is possible to configure the purge operation to work in it own separate thread: • MySQL 5.1 (only XtraDB): innodb_use_purge_thread=1 • MySQL 5.5: innodb_purge_threads=1 • XtraDB 5.1 and MySQL 5.6 allow for multiple threads (number of threads > 1), but with diminishing returns. • http://www.mysqlperformanceblog.com/2011/05/03/multiple- purge-threads-in-percona-server-5-1-56-and-mysql-5-6-2/

© 2011 – 2013 PERCONA 34 InnoDB Concurrency LOCKING

© 2011 – 2013 PERCONA 35 Locking

• Some common advice is, “indexes help SELECT, but add overhead to INSERT/UPDATE/DELETE.” • But it’s critical that UPDATE/DELETE queries are well indexed too. • Without indexes, InnoDB has to lock more rows than you might expect. • This additional locking is performed by InnoDB to ensure consistency with statement based replication...

© 2011 – 2013 PERCONA 36 Repeat Without Index

• note: This requires two separate sessions, and all of the steps completed in this exact order.

• Important: you must drop and re-import the “world” database in order to complete this exercise

© 2011 – 2013 PERCONA 37 ALTER TABLE City DROP INDEX Name;

START TRANSACTION;

START TRANSACTION;

SELECT * FROM City WHERE name = 'Seattle';

SELECT * FROM City WHERE name = 'Seattle';

UPDATE City SET Name = 'Not Seattle' WHERE name = 'Seattle';

SELECT * FROM City WHERE name = 'Seattle';

UPDATE City SET name = 'Not New York' WHERE name = 'New York';

SELECT * FROM City WHERE name = 'New York';

UPDATE City SET name = 'Not Boston' WHERE name = 'Boston';

UPDATE City SET name = 'Seattle2' WHERE name = 'Seattle';

UPDATE City SET Name = 'New York2' WHERE name = 'New York';

© 2011 – 2013 PERCONA 38 Locking and Replication

• Updates always apply to the latest committed row version, in spite of MVCC. • On a replication slave, a hybrid read/update can force locks on a SELECT. • For example, this query locks rows in OriginTable, and therefore blocks writes by the replication thread: • INSERT INTO TempTable SELECT * FROM OriginTable; • Likewise REPLACE INTO…SELECT, CREATE TABLE…SELECT, multi-table UPDATE, etc.

© 2011 – 2013 PERCONA 39 Locking and Replication (cont.)

• You can improve this by using row-based replication and read-committed, so it’s okay and expected for replication updates to apply to the most committed version of data. • binlog-format = ROW • transaction-isolation = READ-COMMITTED • It won’t work for REPEATABLE-READ (the default).

http://harrison-fisk.blogspot.com/2009/02/my-favorite-new-feature-of-mysql-51.html

© 2011 – 2013 PERCONA 40 More on Locking

• Two common types of locks you may be familiar with: • SHARED (S) aka READ LOCKs. • EXCLUSIVE (X) aka WRITE LOCKs. • In InnoDB most of the time when we talk about “row locking” rows, it is X locks: • An S lock never occurs for reads because of MVCC. • S locks are normally only visibly with foreign key constraints.

© 2011 – 2013 PERCONA 41 Foreign Key Locking

CREATE TABLE parent ( id INT AUTO_INCREMENT PRIMARY KEY, bogus_column CHAR(32) ) ENGINE=InnoDB;

CREATE TABLE child ( id INT AUTO_INCREMENT PRIMARY KEY, parent_id INT NOT NULL, bogus_column CHAR(32), FOREIGN KEY (parent_id) REFERENCES parent (id) ) ENGINE=InnoDB;

INSERT INTO parent (bogus_column) VALUES ('aaa'), ('bbb'), ('ccc'), ('ddd'), ('eee');

INSERT INTO child (parent_id,bogus_column) VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc'), (4, 'ddd'), (5, 'eee');

© 2011 – 2013 PERCONA 42 START TRANSACTION;

START TRANSACTION;

UPDATE child SET parent_id = 5 WHERE parent_id = 4;

UPDATE parent SET bogus_column = 'new!' WHERE id = 4;

UPDATE parent SET bogus_column = 'new!' WHERE id = 5;

This last statement will block waiting on a lock. InnoDB will expose this in information_schema.innodb_locks

© 2011 – 2013 PERCONA 43 InnoDB Concurrency CONFIGURATION PARAMETERS

© 2011 – 2013 PERCONA 44 No Longer Encouraged Settings

• These settings were particularly important in early 5.0 releases for CPU scalability: • innodb_thread_concurrency = N (default now: 0) • innodb_concurrency_tickets = N (default: 500) • Both are discouraged for the majority of users—as InnoDB scalability has improved internally, and setting a thread concurrency actually adds some locking.

http://www.mysqlperformanceblog.com/2010/05/24/tuning-innodb-concurrency-tickets/

© 2011 – 2013 PERCONA 45 Tuning Concurrency Tickets

• [Common Choice] Set concurrency tickets to the 99 percentile of what each transaction reads rows. • Most transactions will therefore run unimpeded. • The 1 percent of more expensive queries may take longer as they are executed in steps.

http://www.mysqlperformanceblog.com/2010/05/24/tuning-innodb-concurrency-tickets/

© 2011 – 2013 PERCONA 46 InnoDB Concurrency MAINTENANCE TASKS

© 2011 – 2013 PERCONA 47 Maintenance Tasks

• [Default] InnoDB has one background thread. This is referred to as “the main thread.” • Or srv_master_thread. • In MySQL 5.5 and in XtraDB, it’s also possible to purge undo in its own thread(s). • Deadlock detection is also a thread—but we have very little visibility to its operation.

Source: Mark Callaghan (Facebook) http://www.percona.com/ppc2009/PPC2009_Life_of_a_dirty_pageInnoDB_disk_IO.pdf

© 2011 – 2013 PERCONA 48 The Main Thread

• It has three main ‘loops’: • Once per second. • Once per ten seconds. • On Idle.

© 2011 – 2013 PERCONA 49 Once Per Second

Tunable on XtraDB • Typical tasks completed: (innodb_ibuf_accel_rate) • Force transaction log to disk. • Merge up to 5 insert buffer pages if idle. • Write 100 dirty pages from the buffer pool.

Tuned by innodb_io_capacity if using the InnoDB plugin or XtraDB!

© 2011 – 2013 PERCONA 50 Once Per 10 Seconds

• Typical tasks completed: • Force transaction log to disk. • Merge up to 5* insert buffer pages if idle. • Write 100* dirty pages from the buffer pool. • Remove deleted rows.

*See note on previous slide

© 2011 – 2013 PERCONA 51 When Server Is Idle

• Typical tasks completed: • Remove deleted rows • Merge up to 20* insert buffer pages. • Write 100* dirty pages from the buffer pool.

*See note on previous slide

© 2011 – 2013 PERCONA 52 InnoDB Diagnostics

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA 1 Table of Contents

1. Example Diagnostics 4. INFORMATION_SCHEMA 2. SHOW ENGINE INNODB 5. InnoTop STATUS 3. Show Global Status

© 2011 – 2013 PERCONA 2 InnoDB Diagnostics EXAMPLE DIAGNOSTICS

© 2011 – 2013 PERCONA 3 InnoDB Status File

• When innodb_status_file=1, InnoDB outputs SHOW ENGINE INNODB STATUS to a file called innodb_status. every 15 seconds. • InnoDB removes the file on normal shutdown. • If a file exists for a that is not the current mysqld proces id, then a crash occurred, about the time of the last file update. • Multiple innodb_status. files will exist if you have had multiple crashes.

© 2011 – 2013 PERCONA 11 Monitor Tables

• Exist as a way of asking InnoDB to print information to your error log. CREATE TABLE innodb_monitor (a INT) ENGINE=InnoDB; CREATE TABLE innodb_lock_monitor (a INT) ENGINE=InnoDB; CREATE TABLE innodb_table_monitor (a INT) ENGINE=InnoDB; CREATE TABLE innodb_tablespace_monitor (a INT) ENGINE=InnoDB; • It doesn’t matter which database these tables are in, or what columns are defined, or what data the tables contain. Only the table name and the engine matter.

© 2011 – 2013 PERCONA 12 Monitor / Lock Monitor

• The innodb_monitor writes the output of SHOW ENGINE INNODB STATUS to your error log file every 15 seconds. • innodb_lock_monitor outputs the same content to the error log, plus a subset of the locks for each transaction. • The monitor is the only guaranteed way to get full output: • MySQL <= 5.1 truncates status output at 64KB. • MySQL >= 5.5 truncates status output at 1MB.

© 2011 – 2013 PERCONA 13 Table Monitor

• The innodb_table_monitor writes information about InnoDB’s internal data dictionary. • MySQL 5.6 and XtraDB offers some new INFORMATION_SCHEMA tables for the data dictionary (though this doesn’t output automatically like the monitor).

© 2011 – 2013 PERCONA 14 Tablespace Monitor

• The tablespace monitor shows information about physical

storage: Space 0 is the system table space files (ibdata1). 100901 11:51:31 INNODB TABLESPACE MONITOR OUTPUT ======------This monitor doesn’t support FILE SPACE INFO: id 0 file-per-table. size 640, free limit 320, free extents 2 not full frag extents 1: used pages 52, full frag extents 0 We can find out more about first seg id not used 0 2730 what each segment ID SEGMENT id 0 1 space 0; page 2; res 2 used 2; full ext 0 relates to with the table fragm pages 2; free extents 0; not full extents 0: pages 0 monitor. SEGMENT id 0 2 space 0; page 2; res 1 used 1; full ext 0 fragm pages 1; free extents 0; not full extents 0: pages 0 res = pages allocated for the SEGMENT id 0 3 space 0; page 2; res 1 used 1; full ext 0 segment. fragm pages 1; free extents 0; not full extents 0: pages 0 .. used = how many of these SEGMENT id 0 2725 space 0; page 2; res 1 used 1; full ext 0 are in use. fragm pages 1; free extents 0; not full extents 0: pages 0 full ext = extents completely NUMBER of file segments: 26 full. Validating tablespace .. Validation ok More information: http://www.markleith.co.uk/?p=25

© 2011 – 2013 PERCONA 15 Monitor Tables (cont.)

• [Warning] Writing to the error log file constantly reduces the readability of it. • In Percona Server the tablespace monitors are less important: • The “TRANSACTIONS” section of SHOW ENGINE INNODB STATUS is moved to the end (meaning most useful data fits in the 64KB limit). • Data dictionary is exposed via INFORMATION_SCHEMA. • XtraBackup can report fragmentation statistics.

© 2011 – 2013 PERCONA 16 InnoDB Diagnostics SHOW ENGINE INNODB STATUS

© 2011 – 2013 PERCONA 17 InnoDB Status mysql> SHOW ENGINE INNODB STATUS\G *************************** 1. row *************************** Type: InnoDB Name: Status: ======100618 16:25:26 INNODB MONITOR OUTPUT ======Per second averages calculated from the last 4 seconds

Check that the average you are reading is at least 20-30 seconds. If it’s too brief, you’ll get skewed averages. Run the command again.

© 2011 – 2013 PERCONA 18 InnoDB Status Sections

• Semaphores • Deadlocks • Foreign Keys • Transactions • File I/O • Insert Buffer and Adaptive Hash • Log • Buffer Pool and Memory • Row Operations

© 2011 – 2013 PERCONA 19 ------SEMAPHORES ------OS WAIT ARRAY INFO: reservation count 996617, signal count 628914 --Thread 1397500240 has waited at ../../storage/innobase/include/log0log.ic line 322 for 0.0000 seconds the semaphore: Mutex at 0x19a112f0 created file log/log0log.c line 746, lock var 0 waiters flag 0 --Thread 1403357520 has waited at ../../storage/innobase/include/log0log.ic line 322 for 0.0000 seconds the semaphore: Mutex at 0x19a112f0 created file log/log0log.c line 746, lock var 0 waiters flag 0 --Thread 1399363920 has waited at ../../storage/innobase/include/log0log.ic line 322 for 0.0000 seconds the semaphore: Mutex at 0x19a112f0 created file log/log0log.c line 746, lock var 0 waiters flag 0 --Thread 1396967760 has waited at ../../storage/innobase/include/log0log.ic line 322 for 0.0000 seconds the semaphore: Mutex at 0x19a112f0 created file log/log0log.c line 746, lock var 0 waiters flag 0 --Thread 1400961360 has waited at ../../storage/innobase/include/log0log.ic line 322 for 0.0000 seconds the semaphore: Mutex at 0x19a112f0 created file log/log0log.c line 746, lock var 0 waiters flag 0 --Thread 1401760080 has waited at btr/btr0cur.c line 442 for 0.0000 seconds the semaphore: S-lock on RW-latch at 0x19d2fc60 created in file dict/dict0dict.c line 1635 number of readers 0, waiters flag 0, lock_word: 100000 Last time read locked in file btr/btr0cur.c line 442 Last time write locked in file btr/btr0cur.c line 435 Mutex spin waits 4126944, rounds 24444227, OS waits 265505 RW-shared spins 566249, OS waits 598523; RW-excl spins 91973, OS waits 44173 Spin rounds per wait: 5.92 mutex, 25.80 RW-shared, 39.81 RW-excl

© 2011 – 2013 PERCONA 20 Semaphores

• Can show if default spin lock was used, or OS wait. See “Mutex spin waits 5672442, rounds 3899888, OS waits 4719” • Spin lock burns CPU • OS Wait requires expensive context switching back in. • Increase innodb_sync_spin_loops to context switch less but wait longer on mutexes. • If an OS wait was required, it should show where in the source this occurred. • “btr0sea” are adaptive hash waits. “Fixes” exist for all • “trx0rseg” is rollback segment. of these hot spots. • “buf0buf” is the main buffer pool mutex.

© 2011 – 2013 PERCONA 21 .. ------LATEST DETECTED DEADLOCK ------060717 4:16:48 *** (1) TRANSACTION: TRANSACTION 0 42313619, ACTIVE 49 sec, process no 10099, OS thread id 3771312 starting index read mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 320 MySQL thread id 30898, query id 100626 localhost root Updating update iz set pad='a' where i=2 *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 16403 n bits 72 index `PRIMARY` of table `test/iz` trx id 0 42313619 lock_mode X locks rec but not gap waiting Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000002; asc ;; 1: len 6; hex 00000285a78f; asc ;; 2: len 7; hex 00000040150110; asc @ ;; 3: len 10; hex 61202020202020202020; asc a ;;

In MySQL 5.6, you can log all deadlocks to the error log. © 2011 – 2013 PERCONA 22 .. *** (2) TRANSACTION: TRANSACTION 0 42313620, ACTIVE 24 sec, process no 10099, OS thread id 4078512 starting index read, thread declared inside InnoDB 500 mysql tables in use 1, locked 1 3 lock struct(s), heap size 320 MySQL thread id 30899, query id 100627 localhost root Updating update iz set pad='a' where i=1 *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 0 page no 16403 n bits 72 index `PRIMARY` of table `test/iz` trx id 0 42313620 lock_mode X locks rec but not gap Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000002; asc ;; 1: len 6; hex 00000285a78f; asc ;; 2: len 7; hex 00000040150110; asc @ ;; 3: len 10; hex 61202020202020202020; asc a ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 16403 n bits 72 index `PRIMARY` of table `test/iz` trx id 0 42313620 lock_mode X locks rec but not gap waiting Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000001; asc ;; 1: len 6; hex 00000285a78e; asc ;; 2: len 7; hex 000000003411d9; asc 4 ;; 3: len 10; hex 61202020202020202020; asc a ;;

*** WE ROLL BACK TRANSACTION (2)

© 2011 – 2013 PERCONA 23 ------LATEST DETECTED DEADLOCK ------090221 15:54:57 *** (1) TRANSACTION: TRANSACTION 0 1736253712, ACTIVE 5 sec, process no 8189, OS thread id 2011474240 setting auto-inc lock mysql tables in use 1, locked 1 LOCK WAIT 1 lock struct(s), heap size 368 MySQL thread id 6304968, query id 1702990793 10.x.x.x webuser update INSERT INTO `my_table` (`event_id`, `updated_at`, `news_feed_only`, `actee_id`, `extra_id`, `actor_id`, `actee_type`, `user_id`, `actor_type`, `extra_type`, `event_type`, `created_at`) VALUES(251183223, '2009-02-21 23:54:52', 0, 114040217, 742361767, 110698807, 'user', 114040217, 'user', 'bonus', 'standard', '2009-02-21 23:54:52') *** (1) WAITING FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `my_database/my_table` trx id 0 1736253712 lock mode AUTO-INC waiting

© 2011 – 2013 PERCONA 24 *** (2) TRANSACTION: TRANSACTION 0 1736253703, ACTIVE (PREPARED) 5 sec, process no 8189, OS thread id 1844541760 mysql tables in use 1, locked 1 2 lock struct(s), heap size 368, undo log entries 1 MySQL thread id 6304653, query id 1702990765 10.x.x.x webuser update INSERT INTO `my_table` (`event_id`, `updated_at`, `news_feed_only`, `actee_id`, `extra_id`, `actor_id`, `actee_type`, `user_id`, `actor_type`, `extra_type`, `event_type`, `created_at`) VALUES(251183379, '2009-02-21 23:54:51', 0, 113419017, 742361765, 115279155, 'user', 115279155, 'user', 'NewsStoryData', 'stanard', '2009-02-21 23:54:51') *** (2) HOLDS THE LOCK(S): TABLE LOCK table `my_database/my_table` trx id 0 1736253703 lock mode AUTO-INC *** (2) WAITING FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `my_database/my_table` trx id 0 1736254081 lock mode AUTO-INC waiting TOO DEEP OR LONG SEARCH IN THE LOCK TABLE WAITS-FOR GRAPH *** WE ROLL BACK TRANSACTION (2)

© 2011 – 2013 PERCONA 25 Deadlocks

• Locking is done at the index level—so if your queries are not indexed well, it’s not quite row level locking. • Two common issues you should be aware of: • Auto_increment scalability is improved in 5.1: http://dev.mysql.com/doc/refman/5.1/en/innodb-auto- increment-handling.html • Updating the same row near simultaneously can cause a deadlock. Both connections acquire a shared lock before one can escalate to an exclusive lock.

© 2011 – 2013 PERCONA 26 Deadlocks (cont.)

• The “not quite row level” is next-key locking for binary log safety. Sometimes it’s interesting to spot one of the transactions holds way too many row locks:

---TRANSACTION 1931, ACTIVE 39 sec, OS thread id 4327256064 fetching rows mysql tables in use 1, locked 1 23137 lock struct(s), heap size 2062320, 1249317 row lock(s), undo log entries 1 MySQL thread id 1, query id 67 localhost root Updating UPDATE my_locking_innodb SET a = REPEAT('b', 255) WHERE id2 = 323255

© 2011 – 2013 PERCONA 27 .. ------LATEST FOREIGN KEY ERROR ------060717 4:29:00 Transaction: TRANSACTION 0 336342767, ACTIVE 0 sec, process no 3946, OS thread id 1151088992 inserting, thread declared inside InnoDB 500 mysql tables in use 1, locked 1 3 lock struct(s), heap size 368, undo log entries 1 MySQL thread id 9697561, query id 188161264 localhost root update insert into child values(2,2) Foreign key constraint fails for table `test/child`: , CONSTRAINT `child_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`id`) ON DELETE CASCADE Trying to add in child table, in index `par_ind` tuple: DATA TUPLE: 2 fields; 0: len 4; hex 80000002; asc ;; 1: len 6; hex 000000000401; asc ;;

But in parent table `test/parent`, in index `PRIMARY`, the closest match we can find is record: PHYSICAL RECORD: n_fields 3; 1-byte offs TRUE; info bits 0 0: len 4; hex 80000001; asc ;; 1: len 6; hex 0000140c2d8f; asc - ;; 2: len 7; hex 80009c40050084; asc @ ;;

© 2011 – 2013 PERCONA 28 Foreign Keys

• This section (like others) is added as required. • Usually fairly simple to debug without developer locking information, i.e.

MySQL thread id 9697561, query id 188161264 localhost root update insert into child values(2,2) Foreign key constraint fails for table `test/child`: CONSTRAINT `child_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`id`) ON DELETE CASCADE

© 2011 – 2013 PERCONA 29 .. ------TRANSACTIONS ------Trx id counter 0 80157601 Purge done for trx's n:o <0 80154573 undo n:o <0 0 History list length 6 Total number of lock structs in row lock hash table 0 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0 0, not started, process no 3396, OS thread id 1152440672 MySQL thread id 8080, query id 728900 localhost root show innodb status ---TRANSACTION 0 80157600, ACTIVE 4 sec, process no 3396, OS thread id 1148250464, thread declared inside InnoDB 442 mysql tables in use 1, locked 0 MySQL thread id 8079, query id 728899 localhost root Sending data select sql_calc_found_rows * from b limit 5 Trx read view will not see trx with id>= 0 80157601, sees <0 80157597 ---TRANSACTION 0 80157599, ACTIVE 5 sec, process no 3396, OS thread id 1150142816 fetching rows, thread declared inside InnoDB 166 mysql tables in use 1, locked 0 MySQL thread id 8078, query id 728898 localhost root Sending data select sql_calc_found_rows * from b limit 5 Trx read view will not see trx with id>= 0 80157600, sees <0 80157596

© 2011 – 2013 PERCONA 30 .. ------History list length is number of TRANSACTIONS unpurged transactions in undo space. ------Trx id counter 0 80157601 If this keeps increasing, then the Purge done for trx's n:o <0 80154573 purgeundo n:o can’t<0 keep0 up. History list length 6 Total number of lock structs in row lock hash table 0 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0 0, not started, process no 3396, OS thread id 1152440672 MySQL thread id 8080, query id 728900 localhost root show innodb status ---TRANSACTION 0 80157600, ACTIVE 4 sec, process no 3396, OS thread id 1148250464, thread declared inside InnoDB 442 mysql tables in use 1, locked 0 MySQL thread id 8079, query id 728899 localhost root Sending data select sql_calc_found_rows * from b limit 5 Trx read view will not see trx with id>= 0 80157601, sees <0 80157597 ---TRANSACTION 0 80157599, ACTIVE 5 sec, process no 3396, OS thread id 1150142816 fetching rows, thread declared inside InnoDB 166 mysql tables in use 1, locked 0 MySQL thread id 8078, query id 728898 localhost root Sending data select sql_calc_found_rows * from b limit 5 Trx read view will not see trx with id>= 0 80157600, sees <0 80157596

© 2011 – 2013 PERCONA 31 .. ------TRANSACTIONS ------This 8080 is the corresponding thread Trx id counter 0 80157601number shown in SHOW PROCESSLIST. Purge done for trx's Youn:o can<0 80154573KILL CONNECTION undo n:o <0 8080;0 History list length 6 Total number of lock structs in row lock hash table 0 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0 0, not started, process no 3396, OS thread id 1152440672 MySQL thread id 8080, query id 728900 localhost root show innodb status ---TRANSACTION 0 80157600, ACTIVE 4 sec, process no 3396, OS thread id 1148250464, thread declared inside InnoDB 442 mysql tables in use 1, locked 0 MySQL thread id 8079, query id 728899This 728900localhost is theroot corresponding Sending data query select sql_calc_found_rows * fromnumber. b limit You 5 can KILL QUERY 728900; Trx read view will not see trx with id>= 0 80157601, sees <0 80157597 ---TRANSACTION 0 80157599, ACTIVE 5 sec, process no 3396, OS thread id 1150142816 fetching rows, thread declared inside InnoDB 166 mysql tables in use 1, locked 0 MySQL thread id 8078, query id 728898 localhost root Sending data select sql_calc_found_rows * from b limit 5 Trx read view will not see trx with id>= 0 80157600, sees <0 80157596

© 2011 – 2013 PERCONA 32 .. ------TRANSACTIONS ------Trx id counter 0 80157601 Purge done for trx's n:o <0 80154573 undoSome n:o numbers<0 0 are not well documented. History list length 6 For example 442 is “the number of Total number of lock structs in row lockconcurrency hash table tickets 0 left for this thread.” LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0 0, not started, process no 3396, OS thread id 1152440672 MySQL thread id 8080, query id 728900 localhost root show innodb status ---TRANSACTION 0 80157600, ACTIVE 4 sec, process no 3396, OS thread id 1148250464, thread declared inside InnoDB 442 mysql tables in use 1, locked 0 MySQL thread id 8079, query id 728899 localhost root Sending data select sql_calc_found_rows * from b limit 5 Trx read view will not see trx with id>= 0 80157601, sees <0 80157597 ---TRANSACTION 0 80157599, ACTIVE 5 sec, process no 3396, OS thread id 1150142816 fetching rows, thread declared inside InnoDB 166 mysql tables in use 1, locked 0 MySQL thread id 8078, query id 728898 localhost root Sending data select sql_calc_found_rows * from b limit 5 Trx read view will not see trx with id>= 0 80157600, sees <0 80157596

© 2011 – 2013 PERCONA 33 -- XtraDB --

------Rewrite 48-bit transaction TRANSACTIONS ------ID numbers in hex. Trx id counter 11D771A Purge done for trx's n:o < 11D1AC8 undo n:o < 0 History list length 11436 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0, not started, process no 25247, OS thread id 1091651920 MySQL thread id 34, query id 5287216 localhost pz show engine innodb status Percona adds details about locks held by ---TRANSACTION 0, not started, process no 25247, OS thread id 1091385680 MySQL thread id 33, query id 90089 localhost root this transaction. ---TRANSACTION 11D7719, ACTIVE 0 sec, process no 25247,Note OS thethread “IX” id table 1401227600 locks, which updating only or deleting block table-level changes. mysql tables in use 1, locked 1 4 lock struct(s), heap size 1216, 2 row lock(s), undo log entries 2 MySQL thread id 21, query id 5287414 localhost root Updating UPDATE orders SET o_carrier_id = ? WHERE o_id = ? AND o_d_id = ? AND o_w_id = ? Trx read view will not see trx with id >= 11D771A, sees < 11D7665 TABLE LOCK table `tpcc`.`new_orders` trx id 11D7719 lock mode IX RECORD LOCKS space id 49 page no 1867 n bits 664 index `PRIMARY` of table `tpcc`.`new_orders` trx id 11D7719 lock_mode X locks rec but not gap TABLE LOCK table `tpcc`.`orders` trx id 11D7719 lock mode IX RECORD LOCKS space id 50 page no 22715 n bits 456 index `PRIMARY` of table `tpcc`.`orders` trx id 11D7719 lock_mode X locks rec but not gap

© 2011 – 2013 PERCONA 35 .. ------FILE I/O ------I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) I/O thread 3 state: waiting for i/o request (write thread) Pending normal aio reads: 0, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 17909940 OS file reads, 22088963 OS file writes, 1743764 OS fsyncs 0.20 reads/s, 16384 avg bytes/read, 5.00 writes/s, 0.80 fsyncs/s

© 2011 – 2013 PERCONA 36 ------FILE I/O ------I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) I/O thread 3 state: waiting for i/o request (read thread) I/O thread 4 state: waiting for i/o request (read thread) I/O thread 5 state: waiting for i/o request (read thread) I/O thread 6 state: waiting for i/o request (read thread) I/O thread 7 state: waiting for i/o request (read thread) I/O thread 8 state: waiting for i/o request (read thread) I/O thread 9 state: waiting for i/o request (read thread) I/O thread 10 state: waiting for i/o request (read thread) I/O thread 11 state: waiting for i/o request (read thread) I/O thread 12 state: waiting for i/o request (read thread) I/O thread 13 state: waiting for i/o request (read thread) I/O thread 14 state: waiting for i/o request (read thread) I/O thread 15 state: waiting for i/o request (read thread) I/O thread 16 state: waiting for i/o request (read thread) I/O thread 17 state: waiting for i/o request (read thread) I/O thread 18 state: waiting for i/o request (write thread) I/O thread 19 state: waiting for i/o request (write thread) I/O thread 20 state: waiting for i/o request (write thread) I/O thread 21 state: waiting for i/o request (write thread) I/O thread 22 state: waiting for i/o request (write thread) I/O thread 23 state: waiting for i/o request (write thread) I/O thread 24 state: waiting for i/o request (write thread) I/O thread 25 state: waiting for i/o request (write thread) I/O thread 26 state: waiting for i/o request (write thread) I/O thread 27 state: waiting for i/o request (write thread) I/O thread 28 state: waiting for i/o request (write thread) I/O thread 29 state: waiting for i/o request (write thread) I/O thread 30 state: waiting for i/o request (write thread) I/O thread 31 state: waiting for i/o request (write thread) I/O thread 32 state: waiting for i/o request (write thread) I/O thread 33 state: waiting for i/o request (write thread) Pending normal aio reads: 0, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 295336 OS file reads, 159620 OS file writes, 154669 OS fsyncs 738.63 reads/s, 16397 avg bytes/read, 1027.19 writes/s, 1010.44 fsyncs/s

© 2011 – 2013 PERCONA 37 File I/O

• These are tablespace reads/writes. Later on there’s a section that shows operations on the InnoDB log. • Any of the “pending”’ counts always being high values could mean a loaded IO system. • You can count reads+writes+fsyncs in terms of IOPS. • I/O from temp tables, filesort, etc. aren’t counted here. • Poorer IO systems really suffer on fsyncs, but this should be a non issue on a battery backed write-back cache.

© 2011 – 2013 PERCONA 38 ------INSERT BUFFER AND ADAPTIVE HASH INDEX ------Ibuf: size 4289, free list len 6928, seg size 11218, 689814 inserts, 67723 merged recs, 17341 merges Hash table size 31874747, node heap has 9526 buffer(s) 25448.49 hash searches/s, 54424.21 non-hash searches/s

© 2011 – 2013 PERCONA 39 Insert Buffer

• All measurements in pages: • size = current size • seg size = size allocated up to • free list length = free space • Merges is approximately insert buffer efficiency.

© 2011 – 2013 PERCONA 40 Adaptive Hash

• There’s really not anything you can do with these numbers, so it’s really just informational.

© 2011 – 2013 PERCONA 41 .. --- LOG --- Log sequence number 84 3000620880 Log flushed up to 84 3000611265 Last checkpoint at 84 2939889199 0 pending log writes, 0 pending chkp writes 14073669 log i/o's done, 10.90 log i/o's/second

-- XtraDB Example --

--- LOG --- Log sequence number 43921309413 Log flushed up to 43921308508 Last checkpoint at 43497448671 Max checkpoint age 1303883551 Modified age 423860742 Checkpoint age 423860742 0 pending log writes, 0 pending chkp writes 154286 log i/o's done, 1008.78 log i/o's/second

© 2011 – 2013 PERCONA 42 Log

• The first three numbers show a lot of information on how the background flushing is going. • Log sequence number is current number ‘’handed out’’ • Log flushed up to shows where we’ve written up to in the transaction logs. • Checkpointed up to shows how far behind our dirty page writing may be.

© 2011 – 2013 PERCONA 43 .. ------BUFFER POOL AND MEMORY ------Total memory allocated 16515072000; in additional pool allocated 0 Internal hash tables (constant factor + variable factor) Adaptive hash index 411076168 (254997976 + 156078192) Page hash 15938152 Dictionary cache 63827784 (63751024 + 76760) File system 87336 (82672 + 4664) Lock system 39898760 (39844312 + 54448) Recovery system 0 (0 + 0) Threads 41493 6(406936 + 8000) Dictionary memory allocated 76760 Buffer pool size 983040 Buffer pool size, bytes 16106127360 Free buffers 669543 Database pages 303971 Modified db pages 246726 Pending reads 1 Pending writes: LRU 0, flush list 0, single page 0 Pages read 295410, created 8561, written 12606 739.63 reads/s, 62.56 creates/s, 57.94 writes/s Buffer pool hit rate 998 / 1000 LRU len: 303971, unzip_LRU len: 0 I/O sum[40576]:cur[526], unzip sum[0]:cur[0]

© 2011 – 2013 PERCONA 45 Buffer Pool

• Total memory allocated is always interesting compared to buffer pool size * 16KB • It really shows where InnoDB has stolen extra memory for overhead. • Modified database pages are “dirty pages’’. • The hit rate shows the efficiency of the cache, and how often a page had to be loaded from disk.

© 2011 – 2013 PERCONA 46 Defining a Working Set

• You don’t need as much memory as you do data. • Data naturally has hotspots that need to be accessed frequently, but older data can just reside on disk. • Some applications have a working set of just a few percent, others are close to 100%.

© 2011 – 2013 PERCONA 47 ------ROW OPERATIONS ------0 queries inside InnoDB, 0 queries in queue 33 read views open inside InnoDB Main thread process no. 25247, id 1395636560, state: sleeping Number of rows inserted 813805, updated 1627101, deleted 62432, read 6888247 5632.92 inserts/s, 11247.52 updates/s, 432.17 deletes/s, 47422.71 reads/s

© 2011 – 2013 PERCONA 48 Row Operations

• Always interesting to see the logical rows operations versus what file IO/buffer pool sections said. • Great to graph this as an indication of capacity of the server. • The “main thread” is the server background thread. • “sleeping” means the thread is in the idle loop. • “waiting for server activity” means all background work is complete.

© 2011 – 2013 PERCONA 49 InnoDB Diagnostics SHOW GLOBAL STATUS

© 2011 – 2013 PERCONA 50 Show Global Status

• Many InnoDB statistics are in SHOW GLOBAL STATUS. mysql> show global status like 'innodb%'; • These values are the same as some of the indicators in SHOW ENGINE INNODB STATUS. • Display statistics in graphical monitoring interfaces (e.g. Cacti) to reduce skew and show trends.

© 2011 – 2013 PERCONA 51 +------+------+ | Variable_name | Value | +------+------+ | Innodb_buffer_pool_pages_data | 28 | | Innodb_buffer_pool_pages_dirty | 0 | | Innodb_buffer_pool_pages_flushed | 1 | | Innodb_buffer_pool_pages_free | 31972 | | Innodb_buffer_pool_pages_misc | 0 | | Innodb_buffer_pool_pages_total | 32000 | | Innodb_buffer_pool_read_ahead | 0 | | Innodb_buffer_pool_read_ahead_evicted | 0 | | Innodb_buffer_pool_read_requests | 374 | | Innodb_buffer_pool_reads | 29 | | Innodb_buffer_pool_wait_free | 0 | | Innodb_buffer_pool_write_requests | 1 | . . .

© 2011 – 2013 PERCONA 52 . . . | Innodb_data_fsyncs | 7 | | Innodb_data_pending_fsyncs | 0 | | Innodb_data_pending_reads | 0 | | Innodb_data_pending_writes | 0 | | Innodb_data_read | 2658304 | | Innodb_data_reads | 42 | | Innodb_data_writes | 7 | | Innodb_data_written | 35328 | | Innodb_dblwr_pages_written | 1 | | Innodb_dblwr_writes | 1 | | Innodb_have_atomic_builtins | ON | | Innodb_log_waits | 0 | | Innodb_log_write_requests | 0 | | Innodb_log_writes | 2 | . . .

© 2011 – 2013 PERCONA 53 . . . | Innodb_os_log_fsyncs | 5 | | Innodb_os_log_pending_fsyncs | 0 | | Innodb_os_log_pending_writes | 0 | | Innodb_os_log_written | 1024 | | Innodb_page_size | 16384 | | Innodb_pages_created | 0 | | Innodb_pages_read | 28 | | Innodb_pages_written | 1 | | Innodb_row_lock_current_waits | 0 | | Innodb_row_lock_time | 0 | | Innodb_row_lock_time_avg | 0 | | Innodb_row_lock_time_max | 0 | | Innodb_row_lock_waits | 0 | | Innodb_rows_deleted | 0 | | Innodb_rows_inserted | 0 | | Innodb_rows_read | 14 | | Innodb_rows_updated | 0 | +------+------+

© 2011 – 2013 PERCONA 54 InnoDB Diagnostics INFORMATION_SCHEMA

© 2011 – 2013 PERCONA 55 Information Schema Tables Table Name Description INNODB_CMP & Operations on compressed tables. INNODB_CMP_RESET INNODB_CMPMEM & Status of compressed tables in the buffer pool. INNODB_CMPMEM_RESET INNODB_TRX State of every currently running transaction. INNODB_LOCKS Information about locks being waited for. INNODB_LOCK_WAITS Transactions waiting for locks. INNODB_BUFFER_PAGE Information about each page in the buffer pool. INNODB_BUFFER_PAGE_LRU Eviction order of clean pages in the buffer pool. INNODB_BUFFER_POOL_STATS Buffer pool info, as SHOW INNODB STATUS.

© 2011 – 2013 PERCONA 56 Extra Tables in Percona Server Table Name Description CLIENT_STATISTICS Stats about client connections. INDEX_STATISTICS Index usage. GLOBAL_TEMPORARY_TABLE All threads’ temporary tables. QUERY_RESPONSE_TIME No. queries per order of magnitude. TABLE_STATISTICS Table usage. TEMPORARY_TABLES Current thread’s temporary tables. THREAD_STATISTICS Activity per thread. USER_STATISTICS Activity per user. XTRADB_ADMIN_COMMAND For ad hoc save/restore of buffer pool.

© 2011 – 2013 PERCONA 59 Extra Tables in Percona Server Table Name Description INNODB_BUFFER_POOL_PAGES All allocates pages in the buffer pool. INNODB_BUFFER_POOL_PAGES_BLOB Blob-type pages in the buffer pool. INNODB_BUFFER_POOL_PAGES_INDEX Index-type pages in the buffer pool. INNODB_CHANGED_PAGES Page change tracking. INNODB_INDEX_STATS Stats of cached indexes. INNODB_RSEG Rollback segments. INNODB_TABLE_STATS Stats of cached tables. INNODB_UNDO_LOGS Status of internal undo log records.

© 2011 – 2013 PERCONA 60 Extra Tables in Percona Server Table Name Description INNODB_SYS_COLUMNS Info on columns. INNODB_SYS_FIELDS InnoDB index key fields. INNODB_SYS_FOREIGN Info on foreign key constraints. INNODB_SYS_FOREIGN_COLS Info on columns in foreign keys. INNODB_SYS_INDEXES Info on indexes. INNODB_SYS_STATS Stats of table indexes. INNODB_SYS_TABLES Info on tables. INNODB_SYS_TABLESTATS Performance stats of tables.

© 2011 – 2013 PERCONA 61 InnoDB Diagnostics INNOTOP

© 2011 – 2013 PERCONA 63 Innotop

• A powerful console tool like “top” that displays a dynamically updating view of queries, InnoDB resource use, transactions, and locks. • https://code.google.com/p/innotop/ • Also useful as an interface for killing runaway processes. • Safe to run on production systems under load—no more burden than running SHOW ENGINE INNODB STATUS every few seconds.

© 2011 – 2013 PERCONA 64 Conclusion

Percona Training http://www.percona.com/training

© 2011 – 2013 PERCONA 1 Thank you!

• Thank you again for choosing to take a Percona course. – http://percona.com/training

© 2011 – 2013 PERCONA 3 Feedback Survey

• You will find it as the last link at the bottom of the Moodle page for this course. • After completing it, you will have access to your certificate of completion for this course.

• The survey is anonymous, but we encourage you to leave your name and email address so we can follow up.

© 2011 – 2013 PERCONA 4 Q & A

© 2011 – 2013 PERCONA 5 Find Us On Social Media

• Twitter: – https://twitter.com/Perconatraining – https://twitter.com/Percona

• Facebook: – https://www.facebook.com/Percona

© 2011 – 2013 PERCONA 6