WHITEPAPER Why Choose Mysql? Why Choose Mysql?

WHITEPAPER Why Choose MySQL? Why Choose MySQL? MySQL is one of the most widely used relational databases, powering nine out of ten websites around the world. With this level of adoption, it must be a good fit for your needs, right? Not so fast. Even though it is used by many top websites, it’s not a good fit for everyone. While a 90% usage rate is an impressive usage statistic, 10% of websites DO NOT use MySQL. Why not? MySQL’s ease of deployment, open source knowledge, scalability, and transactional capability make it a great choice in many, but not all, scenarios. In this whitepaper, we’ll look at where MySQL makes sense, and some places where it is a less ideal choice. MySQL is based on commonly known and documented ANSI SQL 99 standards. It was designed by MySQL AB and is available in both open source and enterprise editions. What is MySQL? MySQL is a relational database management system. It stores data in a table format and supports queries written using Structured Query Language (SQL). SQL has been around for a long time and is well known and documented. This provides a solid knowledge base for the use of MySQL. MySQL excels at processing transactions and can be made to be fully ACID-compliant. ACID refers to Atomicity, Consistency, Isolation and Durability. The ACID standard is a way of measuring how well a database ensures that a transaction is completed in a timely manner. MySQL was designed to support very large databases and can be scaled to meet a high data storage and user demand. As a relational database, it stores its data in tables, using a field/value model. It relies on SQL for queries, so it has an extensive knowledge base. Sometimes, MySQL is perceived as being too rigid in its data structures. This is due to the ability to create and enforce primary key and foreign key constraints and the requirement that the data being loaded fit the current schema. Primary and foreign key constraints would determine, for example, that you could not load a financial transaction if the account number recorded in the transaction did not exist in another table. The table recording the transaction would have account_number as a foreign key and the table recording account owners would have account_number as a primary key. The inherent properties of a primary key also mean that you could not have a duplicate value nor could the field be null in the account owner table. These keys make future searches easier since the key values are stored in a sorted format, much like locating a file in a filing cabinet is made easier since the contents as usually sorted alphabetically. Additionally, MySQL requires that the database schema be designed prior to writing or reading data. The schema outlines the various fields that are expected in the file(s) to be loaded and defines an expected data type for each field. For example, you may have a transaction_time field which is defined as storing date/time information. Thus, a value of 09:22:44 is acceptable, but a value of “2 PM” is not. If a file being loaded contains field names that are not recognized by MySQL, that data is not loaded. In our example, if the field is named transaction_time in the database, but the data file names that field xaction_time, the data cannot be loaded. Additionally, MySQL requires that the database schema be designed prior to writing or reading data. The schema outlines the various fields that are expected in the file(s) to be loaded and defines an expected data type for each field. For example, you may have a transaction_time field which is defined as storing date/time information. Thus, a value of 09:22:44 is acceptable, but a value of “2 PM” is not. If a file being loaded contains field names that are not recognized by MySQL, that data is not loaded. In our example, if the field is named transaction_time in the database, but the data file names that field xaction_time, the data cannot be loaded. MySQL ingests data from a variety of sources through data loading. These files can come from different places but the contents must be standardized to meet the requirements of the existing database design. MySQL can replicate data to other nodes in either a master/slave relationship or through the implementation of a clustered environment, where multiple servers are kept up to date through the application. When data needs to be read from the database, a MySQL query is used. This query, which conforms to the ANSI SQL 99 language standards, is run against the database. Again, if tables or fields are referenced in the query that do not exist in the current schema, the query fails. Otherwise, the query returns the requested information to the application. In MySQL (and other open source databases), as the database grows in size, the time it can take applications to search and find data can take more and more time. This impacts data performance as the number of application requests on the database scales up. Primary and foreign keys act as a shortcut of shorts to finding the relevant data for a particular application request. This speeds up responses from the database to application requests and improves overall application performance. If MySQL receives a data feed that contains unknown or incorrectly named values, the load is rejected. For some purposes, like financial markets, this can be helpful, but it is a limitation on flexibility. If you know the structure of your data and can define it solidly, MySQL can be a great fit. This is due to the fact that data being loaded into MySQL must meet the current schema design. The schema is matched on load and then used during querying to determine what data to find and where to look for it. This concept is referred to as “schema on read”. When you are looking for schema flexibility or “schema on load” capabilities, MySQL may not be ideal. MySQL supports a high availability (HA) environment, where each piece of data is stored redundantly on another node in a clustered environment. This ensures that the data is still accessible, even in the case of a node failure. MySQL functions on a quorum model for high availability and will stay up and running so long as a majority of the nodes in a cluster are up and running. Customer Relationship Management (CRM) Let’s say that you run a CRM site, used to record and track all of your customer interactions. It is important that the information retrieved when a customer contacts you is up to date and contains all of the relevant information needed to manage their account. You also need to be able to scale up to handle data from a large number of customers and a high volume of read and write transactions. In this case, MySQL is a good choice. One of the benefits of storing this type of data is that the interaction information is known prior to an interaction occurring. There will be identifying information, dates and timestamps of interactions, and reports on what occurred during the interaction. Given that the data being stored is predictable, it fits well into MySQL’s controlled schema. If a new interaction data point is needed, the schema can be altered to support the new field(s), but this is a non-trivial task and should be undertaken only when needed. The relational aspect of MySQL enables it to use storage efficiently. With each interaction, there is some user information that needs to be recorded. Rather than recording the user’s name, email, phone number, and so on for each interaction, we can use a primary/foreign key relationship to manage this data more efficiently. There will be a users table which holds a primary key field called user_id. By defining this field as a primary key, it is guaranteed to always have a unique value and cannot be left empty. This prevents us from tracking information that does not come from our user community. Next, that user_id field is added to an interaction table as a foreign key. This ensures that each interaction is associated to a known user and prevents us from reporting on an interaction for someone who is not already a known user in our database. The referential integrity constraints keep the data current and allow us to record only the smallest value to identify the user for each interaction. Additionally, if a user needs to change some aspect of their account data, for example, if they moved to a new address, the change need only be made in the users table to be reflected in all user interactions. The other areas where MySQL excels in this instance are scalability and responsiveness during periods of high usage. MySQL is designed to scale as needed to accommodate the data it needs to consume. Many of the largest databases in existence are managed using MySQL. It can also handle the high rate of read and write requests in an environment like this. HA is also important to an environment like this, since you need to know that you can expect to always have access to relevant customer data. The benefits of a known schema, ability to scale, and responsiveness to high user demands make MySQL a good choice in an environment like this. If the data being managed is more variable, as would be the case when a company is receiving data feeds from multiple disparate sources, MySQL may be too constrained to manage this traffic.

Load more