MaxScale 1.0-beta is out – Happy Birthday MaxScale!

It was a year ago, on a nice Sunday night of the English Summer (apologies for the oxymoron), that Mark Riddoch came to see me and together we headed to the Vansittart Arms, our local family pub round the corner. A pint of London Pride on one side and a Honey Dew on the other were the perfect add-on to Mark’s MacBook Pro, on which Mark was showing me the 0.1 version of MaxScale. It was the result of the joint efforts of Mark’s team, Massimiliano and Vilho, who had worked hard to bring to life the first version of something that I believe will be a natural addition to clusters of MySQL/Percona/MariaDB servers in the near future.

A year ago, Mark showed me a basic debugging interface for MaxScale. We went through some parts of the code and the internal structures, and we looked at the way his team had kept everything sleek and lightweight. It was the implementation of hours and hours spent jotting ideas, diagrams, comments from users and customers. We spent hours at Caffe Trieste in San Francisco, at the Google Campus and at the National Theatre in London, in Berlin with my friend Kris and in many busy airports around Europe and US. Last but not least, the whiteboard in my garden office had been filled and wiped hundreds of times in order to try and find good ideas and the right components for MaxScale.

Today MaxScale has reached its beta stage. It is like a birthday, not only because a year has passed and we have reached a certain level of maturity, but mainly because, as it happens for humans, we like to set milestones in order to better organise our life and our work, and to set and achieve goals. But MaxScale is already evolving from 1.0: the code in various branches on GitHub is already showing more interesting and exciting features, which the team is developing for the next versions.

What is important and I want to talk about today is what is available in the 1.0-beta. We have now a new set of modules that are used to create filters, log queries and results, transform the requests to the database and the data retrieved. The read/write splitter is now available for both MySQL Replication and Galera, opening new possibilities to scale even better. In addition to these features, we now have dynamic balance server weighting, Node and Session Replication Consistency checks, automatic failover to multiple slaves and a more clear mechanism to implement high availability for MaxScale itself.

Let’s stick to the basics

For those who do not know MaxScale yet, here is the 60 seconds pitch. MaxScale is a database-centric proxy. It is “database centric” because it has been designed with database operations in mind, covering the typical I/O, computation and resource management of a databases. “Proxy” means that it sits between two components of a data infrastructure, where at least one of these components is a database. This means that MaxScale can sit between a client and one or more database servers, between a database master and one or more database slaves, or between two or more paired databases.

MaxScale’s core is based in Linux epoll calls and is optimised to be lightweight and low latency. MaxScale’s architecture relies on the use of pluggable modules that are combined together to offer authentication, protocol management, filtering, logging, monitoring and routing. In simple terms, the opportunities are endless: you simply need to define your objective and know the MaxScale API to build a proxy system that will take care of the communication and the resource usage of its components.

MaxScale as a proxy between client applications and a cluster of MySQL and MariaDB Replication servers and Galera Cluster. Red lines are read/write, blue lines are read/only.

MaxScale 1.0 beta comes with a set of interesting modules:

  • Authentication: MySQL/MariaDB authentication is operated inside MaxScale and the authentication to one or more servers is executed asynchronously. This module reduces the overall latency, especially when MaxScale is co-located with the application server.
  • Protocol: MaxScale provides client and backend MySQL connectivity.
  • Monitoring: MaxScale 1.0 beta comes with monitoring modules that are designed to work with single MySQL, Percona and MariaDB servers, with MySQL and MariaDB Replication and with Galera-based clusters.
  • Filter & Logging: this is the last addition to the set of modules in MaxScale. There are now some interesting logging modules used to monitor queries and results, and to transform queries captured using regular expressions.
  • Routing: MaxScale 1.0 beta comes with 2 routing modules, one to load-balance read/only connections on slave nodes or to load-balance read/write connections to Galera nodes, and another to route statements on nodes that are part of MySQL, MariaDB Replication and Galera-based clusters.
High availability for MaxScale: redundant MaxScales and co-location with the application servers.

You can find more details on the modules in Mark Riddoch’s blog posts.

Why a Proxy?

You may have seen the latest announcements from Oracle regarding a long-awaited and great product, MySQL Fabric. Fabric proudly claims to be proxy-free, and people usually ask me to compare MaxScale with Fabric and what the pros and cons of the two products are. First of all, I believe these products serve different scopes and they overlap only for a small set of features.

Fabric, as Oracle says in the first sentence of its web page is a framework for managing a farm of MySQL Servers. The focus is on the management of a number of servers that work together to provide a database infrastructure for your application. The servers are mapped together to provide availability and scalability, for example through database sharding. In order to use Fabric, you must upgrade to the newest versions of your database connectors and servers. For some applications, you may also need to modify the code in order to use some Fabric features (for example when Fabric is used with MySQL Replication, in order to load balance workload on read slaves).

MaxScale is meant to be a dispatcher of your database communications. In doing so, MaxScale can reduce the number of I/O ops, log and modify queries and results, and optimize the use of the database servers. MaxScale is designed to work transparently with all the connectors and database servers from version 4.1 to the latest MariaDB 10.X, and to react in real time to the requests of the clients, to the current workload and to the status of the database infrastructure. By doing so, MaxScale offers better availability and scalability – you may say, like Fabric does, but looking at what scalability and availability means, MaxScale is focused on the optimal use of the database servers (for example with a continuous monitoring of the database workload), instead of looking at a farm of servers as a whole.

The first baby step

This is the first baby step for MaxScale and users are warned that there is still a lot of work to do to improve it and to make it more stable. The fact that MaxScale is now beta means that it has reached a maturity in terms of features and many bugs have been fixed in the last 6 months, but the product is not production ready yet, unless used in a thoroughly tested and consolidated environment, i.e. where no changes in terms of versions or features are applied to the database and application servers. The next months will be devoted to catch more bugs, to benchmark MaxScale on real cases and in extreme conditions, such as heavy workloads in typical web-based applications for social networking, e-commerce, gaming and collaboration. The next objective is of course to see a robust version that can be declared production ready, i.e. in common terms we have thoroughly tested on live environments and we have caught and fixed all the known P1 & P2 bugs.

As usual, there is more to come, but in the meantime, we need your help to improve MaxScale – you can find the source code here (warning: the build is not optimal yet!). The fuss-free compiled versions are heremaxscale@googlegroups.com is now very active, we have many people who submit comments and requests every day, and this is already a success per se.

 

MaxScale 0.6

Kudos to the SkySQL Engineering team, who released the 3rd alpha version of MaxScale, a database proxy for MySQL, MariaDB and Percona servers, labeled MaxScale 0.6.

This version comes with two important additions:

  • A feature-complete read/write splitting module, i.e. read and write operations are now balanced in a smarter way to master or slave servers.
  • New client-based features, such as a version string that provides compatibility with the major MariaDB and MySQL connectors, the ability to connect through the root user and the use of the unix socket when MaxScale is co-located with a client application on the same server.

Binaries and source code are “hidden” here: http://downloads.skysql.com/files/SkySQL/MaxScale.

The project is on GitHub: https://github.com/skysql/MaxScale/

Other helpful links are:

We have still some work to do before we can reach a beta stage. We definitely need to catch more bugs, but we also want to extend the features of the product. We are working on different types of load balancing, on the compatibility with Galera, MySQL Cluster/NDB, MariaDB Spider and on many other aspects.

At the moment we have two routing modules:

  • The readconnroute module is a connection load balancer that accepts 2 types of connections from the client (read/write and read/only) and it balances the connections among all the masters and the slaves in a MySQL and MariaDB Cluster. This is helpful for applications that have apre-defined read/write and read/only operations. If the backend is based on a Galera Cluster, readconnroute acts as a “simple” load balancer for all the connections.
  • The readwritesplit module is a statement load balancer that accepts a connection that can send read and write operations. The module parses each statement and decides whether must be sent to a master server, a slave server or all the servers. The backend is a MariaDB or a MySQL Replication cluster.

There is definitely more to come. For example, we want to provide read/write splitting for Galera clusters in order to avoid deadlocks and to guarantee read operations without replication lag. We are also working on a version that can handle slaves with different specification and clusters that can be organised on multiple locations, i.e. where there is a mix of local and remote servers. Last but not least, we are working on benchmarks to show the clear benefits of MaxScale, where it fits and where it doesn’t. in a typical IT infrastructure.

As usual, help, suggestions and comments are more than welcome.

When the Innovator’s Dilemma hit the MySQL World

What’s the connection between databases and fruit flies?

Some of you may be familiar with the bestseller in business literature The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail, by Harvard professor Clayton Christensen. In his book, Prof. Christensen compared disk drives to fruit flies. The comparison relates to the rapid changes that disrupted the disk drive industry for decades. That disruption is compared to the rapid changes that take place to fruit flies that live only for few hours, and for this reason researchers can study and analyse their behaviour.

In the software business, you can replace disk drives with databases. Obviously, databases live longer than fruit flies, but it is an industry that sometimes shows schizophrenic changes. Whilst it is true that relational databases have being dominated the scene for decades, readers with grey hair and reading glasses have certainly experienced tens, if not hundreds of innovative changes in databases since the 70’s. Hierarchical, relational, object oriented, object relational, framework dependent, multi-dimensional and many other types of databases have been introduced to satisfy the need for faster, wider, more sophisticated data management. The 21st century, with the advent of web and mobile applications, has been so far the playground for even more innovative products, most of them under the umbrella of two overused terms, Big Data and NoSQL. Make no mistake, there is even more to come: the Internet of Things will sparkle a new wave of innovation in the industry.

How does this long premise apply to MySQL and MariaDB? The truth is, MySQL has been stagnating for almost 6 years. In 6 years, we have been witnesses to the explosion of a huge amount of data that has to be stored, retrieved and managed. We have also seen a significant change in the type of data: accounting, retail and financial information were predominant in the 80’s and 90’s whilst today it is all about multimedia content – videos, photos, audio clips, but also documents, books, executable programs, text messages, logs and tracing information – today’s “typical” data.

When I say that MySQL has been stagnating, I do not mean that MySQL has not been improved. Versions 5.5 and 5.6 are substantial milestones in MySQL’s history, but Oracle engineers have been focusing on fixing old-time issues, mostly performance related. In a way, some of the companies in the MySQL ecosystem have been led to follow the same path. These companies have worked on improving and stabilising their products, mainly due to the limits in the investments in disruptive technologies that they can afford. A good example is Tokutek: Tim Callaghan and Co. have done a tremendous job and their storage engine has reached a good and strong maturity, but the prize for the most innovative product in the Tokutek offices today goes to TokuMX, which is related to MongoDB, not MySQL.

5.5 and 5.6 are what professor Christensen calls sustaining technologies. Sustaining technologies, generally speaking, improve the performance of a product – “performance” not only in database terms, but in a more generic term, applicable to any product. What we are seeking here is disruptive technologies, and they are difficult to find. Disruptive technologies look at a problem with a new perspective, with a new way to fix issues, they introduce new and more innovative features. At the beginning, they may provide worse performances, but in the long term they simply provide “a better solution.”

The proof to this point – i.e. that 5.5 and 5.6 are sustaining technologies – is the recent announcement from Facebook, Google, Twitter and LinkedIn about WebScaleSQL. WebScaleSQL is today an even better MySQL, with more tests, better code, and better optimisation. In WebScaleSQL there are some interesting features that are the seed for more disruptive technology, such as the read-ahead mechanism and more.

Disruptive technologies in the MySQL ecosystem

Are there disruptive technologies in the MySQL ecosystem? Indeed there are. MySQL 5.6 was announced as a GA on 5th of February 2013. That is a sustaining technology. The following day, the MariaDB Foundation announced the second alpha version of MariaDB 10. Version 10.0.1 was a very incomplete and disruptive technology. It took almost 14 months to create something innovative and to incorporate components that other companies have been trying to develop, sometimes for many years.

Why do I call MariaDB 10 a disruptive technology? Here are the most important points, in case you have missed some:

  • For the first time, we have removed (or at least significantly reduced) the slave replication lag. Parallel replication is one of the most important and long awaited features which opens the doors to a more extensive use of master/slave replication for read scalability. See Kristian Nielsen’s blog on the subject.
  • We have introduced multi-source replication and more sophisticated replication topologies. We can now consolidate data coming from multiple masters into a slave node. It means, for example, that we can manage to connect multiple departments and data centers together. This is a significant contribution provided by Lixun Peng and Taobao.
  • We are sensibly reducing the downtime for online databases. The combination of MariaDB’s Global Transaction ID (which is fundamentally different from MySQL 5.6 GTID) and Galera Replication from Codership takes the availability of MySQL databases into a brand new territory, at a lower cost and with a significant reduction in the complexity of the infrastructure. Please note that MariaDB 10 Galera Cluster is not GA yet.
  • We have a significantly improved built-in sharding solution for MariaDB. 6 years ago, a young man in a Spiderman suit presented for the first time a new storage engine, called Spider. A few days ago, that same man, Kentoku Shiba, released Spider 3.2. The engine is now included in MariaDB 10 and is an extremely promising technology that has been substantially improved, and now includes also high availability, in-shard scalability and distributed transactions. Spider 3.2 is not GA yet, but previous versions are already used in production in medium size sites around the globe and we expect a significant boost in its usage in the future.
  • We have reorganised the way we integrate MariaDB/MySQL with other data sources. On one side, we have improved the way storage engines can interact with the core of MariaDB: for example, tables can be created or altered using engine-specific attributes that simplify the use of new features in the engine. One example of new engines developed by MariaDB is Cassandra, which allows the use of MariaDB and the MySQL clients as an interface to read and write data from/to a Cassandra cluster. We have also introduced a brand new storage engine called Connect. The Connect engine, as the name says, connects external sources, such as external DB tables (also from other DBMS), files, folders, logs, etc., in reading and writing. Users can now use applications built on MySQL and MariaDB to directly access heterogenous environments, without moving data back and forth among databases. They can also create joins and view results that are a combination of data coming from multiple sources. This is the very first version of Connect, so you should use it carefully.

This is only a subset of the new features available in MariaDB. They are paired with the robustness of consolidated engines and the core of MariaDB, which can be used for the application that made MySQL and MariaDB so popular.

From this analysis, there are important takeaways. Small companies in our business may struggle to innovate, due to the amount of investments needed. To some extent, the MariaDB Foundation has become the hub for such a collaboration and I really hope that MariaDB and WebScaleSQL will one day be able to join their efforts and innovate together. Disruptive technologies usually take a longer time and bigger investments to reach a reasonable maturity: a close collaboration can speed up this process, make features available sooner and allow MySQL and MariaDB to compete in innovation with NoSQL initiatives and products.

From the innovator’s to the DBA’s dilemma – innovation vs. compatibility?

A by-product of innovation is the difficulty of keeping the product compatible with the past. No matter how hard one works to make everything completely compatible with previous versions and features, disruption in innovation may also mean disruption in the compatibility with the mainstream technology. This is the reason why, above all, MariaDB is now a fork of MySQL and no longer a branch. Although the codebase has its strong roots in MySQL 5.5, it is inevitable that MariaDB and MySQL will diverge more and more in the future.

That said, in our industry backward compatibility is paramount and we break it only when it becomes an insurmountable obstacle to innovation – insurmountable here means that the effort and complexity added to keep a new version compatible with previous versions, is not justified. The good news is that MariaDB 10 does not present significant backward compatibility with MySQL 5.5, but forward compatibility with MySQL 5.6 and future versions. Once you fork a product, you inevitably create a new product. The other good news is that modularity makes things simple. For example, if XtraDB and InnoDB evolve within the boundaries of the current storage engine API, it will be relatively simple for MariaDB 10 to adopt the new engines.

How much do MySQL 5.6 and MariaDB 10 diverge today? Can we still say that MariaDB is a drop-in replacement for MySQL? This question attracts different answers, depending on who’s answering it. The truth is, nothing, not even minor versions of the same product, can claim to be fully compatible. I remember a customer who migrated from MySQL 5.5. to MariaDB 5.5. When one of our best consultants tested the application with MariaDB, he found some issues. Ironically, the issues were caused by a patch that fixed an old bug in MySQL 5.5: the application was designed to live with that bug and the fix in MariaDB caused an unexpected behaviour.

Back to more details, we can reasonably say that MySQL 5.6 and MariaDB 10 are “application compatible”. It means that from a typical application perspective, where developers and end-users use standard DML and DDL commands, there are no significant differences between the two products. What might differ is the DBA’s view. Replication is an example of this incompatibility. MySQL 5.6 and MariaDB 10 have a different GTID: MySQL has a transaction ID based on the server UUID and a transaction sequence, whilst MariaDB has a combination of domain, server and event ID that identifies a set of transactions in an event group. The incompatibility is not only in the functions and in the instruments available to the DBA, but most importantly it is in the infrastructure and in the tools that the DBA can use. The same applies to the automatic failover in Replication and the use of multi-source replication, which again provides an incompatible format for the commands used to administer replication.

Unfortunately, there is not a full list of the differences between MySQL 5.6 and MariaDB 10, and the list would have little meaning, considering that a new feature adds lots of small changes in the database. The list below is a small set of what you can expect to be different. More details are available here and here.

  • MySQL 5.6 has Unicode support in the mysql command line utility
  • At the moment, the memcached protocol is not available in MariaDB 10
  • In MariaDB 10 the default storage engine for temporary tables cannot be specified
  • In MySQL 5.6 it is possible to exchange partitions between tables
  • Binlog variables and settings are different
  • The block encryption mode for block-mode AES algorithms are handled by a parameter in Mysql 5.6
  • MySQL 5.6 can specify where to write the core file if the server crashes
  • MariaDB 10 provides a full set of parameters for the Aria engine
  • MariaDB 10 provides more parameters to handle the connection pool
  • MariaDB 10 provides more parameters to change the behaviour of the optimizer
  • MariaDB 10 offers XtraDB parameters that are not available in MySQL 5.6
  • MariaDB 10 identifies the transactional state of a session
  • MariaDB 10 provides dynamic and virtual columns functions

Time for conclusions

This post is way longer than I expected, I hope it has been interesting for some of my usual 25 readers… I dare to make one final statement: the MySQL, MariaDB and the ecosystem offer today a choice of sustaining and disruptive technologies. Many databases only offer one option or the other and very few offer both. The beauty of this story is yet to come. Let’s find out more this week at Percona Live.

 

When the Innovator’s Dilemma hit the MySQL World

What’s the connection between databases and fruit flies?

Some of you may be familiar with the bestseller in business literature The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail, by Harvard professor Clayton Christensen. In his book, Prof. Christensen compared disk drives to fruit flies. The comparison relates to the rapid changes that disrupted the disk drive industry for decades. That disruption is compared to the rapid changes that take place to fruit flies that live only for few hours, and for this reason researchers can study and analyse their behaviour.

In the software business, you can replace disk drives with databases. Obviously, databases live longer than fruit flies, but it is an industry that sometimes shows schizophrenic changes. Whilst it is true that relational databases have being dominated the scene for decades, readers with grey hair and reading glasses have certainly experienced tens, if not hundreds of innovative changes in databases since the 70’s. Hierarchical, relational, object oriented, object relational, framework dependent, multi-dimensional and many other types of databases have been introduced to satisfy the need for faster, wider, more sophisticated data management. The 21st century, with the advent of web and mobile applications, has been so far the playground for even more innovative products, most of them under the umbrella of two overused terms, Big Data and NoSQL. Make no mistake, there is even more to come: the Internet of Things will sparkle a new wave of innovation in the industry.

How does this long premise apply to MySQL and MariaDB? The truth is, MySQL has been stagnating for almost 6 years. In 6 years, we have been witnesses to the explosion of a huge amount of data that has to be stored, retrieved and managed. We have also seen a significant change in the type of data: accounting, retail and financial information were predominant in the 80’s and 90’s whilst today it is all about multimedia content – videos, photos, audio clips, but also documents, books, executable programs, text messages, logs and tracing information – today’s “typical” data.

When I say that MySQL has been stagnating, I do not mean that MySQL has not been improved. Versions 5.5 and 5.6 are substantial milestones in MySQL’s history, but Oracle engineers have been focusing on fixing old-time issues, mostly performance related. In a way, some of the companies in the MySQL ecosystem have been led to follow the same path. These companies have worked on improving and stabilising their products, mainly due to the limits in the investments in disruptive technologies that they can afford. A good example is Tokutek: Tim Callaghan and Co. have done a tremendous job and their storage engine has reached a good and strong maturity, but the prize for the most innovative product in the Tokutek offices today goes to TokuMX, which is related to MongoDB, not MySQL.

5.5 and 5.6 are what professor Christensen calls sustaining technologies. Sustaining technologies, generally speaking, improve the performance of a product – “performance” not only in database terms, but in a more generic term, applicable to any product. What we are seeking here is disruptive technologies, and they are difficult to find. Disruptive technologies look at a problem with a new perspective, with a new way to fix issues, they introduce new and more innovative features. At the beginning, they may provide worse performances, but in the long term they simply provide “a better solution.”

The proof to this point – i.e. that 5.5 and 5.6 are sustaining technologies – is the recent announcement from Facebook, Google, Twitter and LinkedIn about WebScaleSQL. WebScaleSQL is today an even better MySQL, with more tests, better code, and better optimisation. In WebScaleSQL there are some interesting features that are the seed for more disruptive technology, such as the read-ahead mechanism and more.

Disruptive technologies in the MySQL ecosystem

Are there disruptive technologies in the MySQL ecosystem? Indeed there are. MySQL 5.6 was announced as a GA on 5th of February 2013. That is a sustaining technology. The following day, the MariaDB Foundation announced the second alpha version of MariaDB 10. Version 10.0.1 was a very incomplete and disruptive technology. It took almost 14 months to create something innovative and to incorporate components that other companies have been trying to develop, sometimes for many years.

Why do I call MariaDB 10 a disruptive technology? Here are the most important points, in case you have missed some:

  • For the first time, we have removed (or at least significantly reduced) the slave replication lag. Parallel replication is one of the most important and long awaited features which opens the doors to a more extensive use of master/slave replication for read scalability. See Kristian Nielsen’s blog on the subject.
  • We have introduced multi-source replication and more sophisticated replication topologies. We can now consolidate data coming from multiple masters into a slave node. It means, for example, that we can manage to connect multiple departments and data centers together. This is a significant contribution provided by Lixun Peng and Taobao.
  • We are sensibly reducing the downtime for online databases. The combination of MariaDB’s Global Transaction ID(which is fundamentally different from MySQL 5.6 GTID) and Galera Replication from Codership takes the availability of MySQL databases into a brand new territory, at a lower cost and with a significant reduction in the complexity of the infrastructure. Please note that MariaDB 10 Galera Cluster is not GA yet.
  • We have a significantly improved built-in sharding solution for MariaDB. 6 years ago, a young man in a Spiderman suit presented for the first time a new storage engine, called Spider. A few days ago, that same man, Kentoku Shiba, released Spider 3.2. The engine is now included in MariaDB 10 and is an extremely promising technology that has been substantially improved, and now includes also high availability, in-shard scalability and distributed transactions. Spider 3.2 is not GA yet, but previous versions are already used in production in medium size sites around the globe and we expect a significant boost in its usage in the future.
  • We have reorganised the way we integrate MariaDB/MySQL with other data sources. On one side, we have improved the way storage engines can interact with the core of MariaDB: for example, tables can be created or altered using engine-specific attributes that simplify the use of new features in the engine. One example of new engines developed by MariaDB is Cassandra, which allows the use of MariaDB and the MySQL clients as an interface to read and write data from/to a Cassandra cluster. We have also introduced a brand new storage engine called Connect. The Connect engine, as the name says, connects external sources, such as external DB tables (also from other DBMS), files, folders, logs, etc., in reading and writing. Users can now use applications built on MySQL and MariaDB to directly access heterogenous environments, without moving data back and forth among databases. They can also create joins and view results that are a combination of data coming from multiple sources. This is the very first version of Connect, so you should use it carefully.
This is only a subset of the new features available in MariaDB. They are paired with the robustness of consolidated engines and the core of MariaDB, which can be used for the application that made MySQL and MariaDB so popular.
From this analysis, there are important takeaways. Small companies in our business may struggle to innovate, due to the amount of investments needed. To some extent, the MariaDB Foundation has become the hub for such a collaboration and I really hope that MariaDB and WebScaleSQL will one day be able to join their efforts and innovate together. Disruptive technologies usually take a longer time and bigger investments to reach a reasonable maturity: a close collaboration can speed up this process, make features available sooner and allow MySQL and MariaDB to compete in innovation with NoSQL initiatives and products.

From the innovator’s to the DBA’s dilemma – innovation vs. compatibility?

A by-product of innovation is the difficulty of keeping the product compatible with the past. No matter how hard one works to make everything completely compatible with previous versions and features, disruption in innovation may also mean disruption in the compatibility with the mainstream technology. This is the reason why, above all, MariaDB is now a fork of MySQL and no longer a branch. Although the codebase has its strong roots in MySQL 5.5, it is inevitable that MariaDB and MySQL will diverge more and more in the future.
That said, in our industry backward compatibility is paramount and we break it only when it becomes an insurmountable obstacle to innovation – insurmountable here means that the effort and complexity added to keep a new version compatible with previous versions, is not justified. The good news is that MariaDB 10 does not present significant backward compatibility with MySQL 5.5, but forward compatibility with MySQL 5.6 and future versions. Once you fork a product, you inevitably create a new product. The other good news is that modularity makes things simple. For example, if XtraDB and InnoDB evolve within the boundaries of the current storage engine API, it will be relatively simple for MariaDB 10 to adopt the new engines.
How much do MySQL 5.6 and MariaDB 10 diverge today? Can we still say that MariaDB is a drop-in replacement for MySQL? This question attracts different answers, depending on who’s answering it. The truth is, nothing, not even minor versions of the same product, can claim to be fully compatible. I remember a customer who migrated from MySQL 5.5. to MariaDB 5.5. When one of our best consultants tested the application with MariaDB, he found some issues. Ironically, the issues were caused by a patch that fixed an old bug in MySQL 5.5: the application was designed to live with that bug and the fix in MariaDB caused an unexpected behaviour.
Back to more details, we can reasonably say that MySQL 5.6 and MariaDB 10 are “application compatible”. It means that from a typical application perspective, where developers and end-users use standard DML and DDL commands, there are no significant differences between the two products. What might differ is the DBA’s view. Replication is an example of this incompatibility. MySQL 5.6 and MariaDB 10 have a different GTID: MySQL has a transaction ID based on the server UUID and a transaction sequence, whilst MariaDB has a combination of domain, server and event ID that identifies a set of transactions in an event group. The incompatibility is not only in the functions and in the instruments available to the DBA, but most importantly it is in the infrastructure and in the tools that the DBA can use. The same applies to the automatic failover in Replication and the use of multi-source replication, which again provides an incompatible format for the commands used to administer replication.
Unfortunately, there is not a full list of the differences between MySQL 5.6 and MariaDB 10, and the list would have little meaning, considering that a new feature adds lots of small changes in the database. The list below is a small set of what you can expect to be different. More details are available here and here.
  • MySQL 5.6 has Unicode support in the mysql command line utility
  • At the moment, the memcached protocol is not available in MariaDB 10
  • In MariaDB 10 the default storage engine for temporary tables cannot be specified
  • In MySQL 5.6 it is possible to exchange partitions between tables
  • Binlog variables and settings are different
  • The block encryption mode for block-mode AES algorithms are handled by a parameter in Mysql 5.6
  • MySQL 5.6 can specify where to write the core file if the server crashes
  • MariaDB 10 provides a full set of parameters for the Aria engine
  • MariaDB 10 provides more parameters to handle the connection pool
  • MariaDB 10 provides more parameters to change the behaviour of the optimizer
  • MariaDB 10 offers XtraDB parameters that are not available in MySQL 5.6
  • MariaDB 10 identifies the transactional state of a session
  • MariaDB 10 provides dynamic and virtual columns functions

 

Time for conclusions

This post is way longer than I expected, I hope it has been interesting for some of my usual 25 readers… I dare to make one final statement: the MySQL, MariaDB and the ecosystem offer today a choice of sustaining and disruptive technologies. Many databases only offer one option or the other and very few offer both. The beauty of this story is yet to come. Let’s find out more this week at Percona Live.

 

A Close Encounter with MaxScale

MaxScale is the new proxy server from the SkySQL/MariaDB team. It provides Connection Load Balancing (CLB) and Statement Load Balancing (SLB) out of the box. This post is a [relatively] quick “how to” install, configure and test SLB with the read/write splitting module.

Step 1 – Server preparation

If you do not have many HW resources, you may run everything on a single Linux instance, but the best way to test MaxScale is to use at least 4 servers: one for MaxScale and for the client apps, one as Master and two as slaves – so, 4 in total. In this post I am going a bit further, I will use 5 servers:
Max 0 – For client apps (192.168.56.20)
Max 1 – The master server (192.168.56.21)
Max 2 – The first slave (192.168.56.22)
Max 3 – The second slave (192.168.56.23)
Max 4 – The third slave (192.168.56.24)
Max 6 – The MaxScale server (192.168.56.26)

In order to do proper tests (i.e. you may want to test performance and read scalability), bare metal servers are recommended, but just for a taste of how you can use MaxScale, 5 VMs or instances on your favourite cloud provider will suffice.
Because of my nomadic behaviour, I have prepared everything on my laptop using VirtualBox. I have created one VM with CentOS 6.5, using 512MB RAM and I have now 5 copies of the same machine. The IP addresses assigned in this test are listed above within brackets.

If you use VMs, make sure they are in the same network and they can connect to each other (in VirtualBox, if you use a host-only network, you should set Allow All in the Promiscous Mode setting. If you use a host-only adapter, you should also have a NAT or a Bridge adapter to connect to the Internet and download the necessary packages.

Adapter 1 is used to connect to the internet and download the packages needed for the test:

Adapter 2 is used to connect to the other VMs with a fixed IP address:

Step 2 – Installation

First of all, you must install your favourite version of MySQL/MariaDB/Percona on your servers. In this test, I used MariaDB 10.0.7, downloaded from a yum repo. You can refer to https://downloads.mariadb.org/mariadb/repositories to find out the right repository and download the package, or simply go to this page for the other downloads.
In this test, the repo file is:
[root@Sky0 ~]# cat /etc/yum.repos.d/MariaDB.repo
# MariaDB 10.0 CentOS repository list - created 2013-12-31 11:11 UTC
# http://mariadb.org/mariadb/repositories/
[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.0/centos6-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1
The software installed on the 5 VMs is listed here:

On Max 0:

  • MariaDB Client
[root@Max0 ~]# yum install MariaDB-client

On Max 1 to 4:

  • MariaDB Client
  • MariaDB Server
[root@Max1 ~]# yum install MariaDB-server MariaDB-client

On Max 6:

  • MaxScale
[root@Max6 ~]# curl https://downloads.skysql.com/files/SkySQL/MaxScale/maxscale.preview.0.4.tar.gz > maxscale.preview.0.4.tar.gz
[root@Max6 ~]# cd /usr/local/
[root@Max6 local]# mkdir skysql
[root@Max6 local]# cd skysql
[root@Max6 skysql]# tar xzvf ~/maxscale.preview.0.4.tar.gz
maxscale/
maxscale/Documentation
/maxscale/Documentation/MaxScale Configuration And Usage Scenarios.pdf
...
maxscale/SETUP
[root@Max6 skysql]#
If you prefer to compile your own version of MaxScale, you can get the source code from GitHub.

Step 3 – Configuration

Max 1 to 4 must run MariaDB Replication, with Max 1 as master and Max 2, 3 & 4 as slaves.
The configuration file should have the server-id and the binary log setup.
[root@Max6 ~]# tail -5 /etc/my.cnf.d/server.cnf 
[mariadb-10.0]
server-id=1   <<<— and 2, 3 & 4 for Max 2, Max 3 and Max 4
log-bin
Start the four servers in the usual way:
[root@Max6 ~]# /etc/init.d/mysql start
Some DB users are needed for Replication and for MaxScale. In this case we have:
  • repluser – for MariaDB Replication
  • maxuser – generic user for MaxScale
  • maxmon – user for the MaxScale Monitoring module
In this test we do not care much about security, we simply create the users as:
 MariaDB [test]> create user repluser identified by 'maxpwd';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> grant replication slave on *.* to repluser@'%';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> create user maxuser identified by 'maxpwd';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> grant all on *.* to maxuser@'%';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> create user maxmon identified by 'maxpwd';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> grant replication client on *.* to maxmon@'%';
Query OK, 0 rows affected (0.00 sec)
Now you can set Replication between Max 1 and the other three servers.
On Max 2, 3 and 4 you can execute these two commands:
MariaDB [(none)]> change master to master_host='192.168.56.21',
                  master_port=3306,
                  master_user='repulser',
                  master_password='repluser',
                  master_use_gtid=slave_pos;
Query OK, 0 rows affected (0.01 sec)
 
MariaDB [(none)]> start slave;
 
Query OK, 0 rows affected (0.01 sec)
On Max 6, the most important step is to create the MaxScale configuration file. The file is located by default in $MAXSCALE_HOME/etc, in our test is /usr/local/skysql/maxscale/MaxScale/etc and the file is MaxScale.cnf.
The file has various sections.
MaxScale Section
This is a generic section. for the moment we simply tell MaxScale to use a single thread (for users)
[maxscale] 
threads=1
Monitor Section
This section is used to set the monitoring module. The servers to monitor are named here and they will be configured in other sections.
[MariaDB10 Monitor]
type=monitor
module=mysqlmon
servers=max1,max2,max3,max4
user=maxmon
passwd=maxpwd
SLB Section
This section is used to set the Statement Load Balancer. We will use the readwritesplit module available with MaxScale.
[RW Split Router]
type=service
router=readwritesplit
servers=max1,max2,max3,max4
user=maxuser
passwd=maxpwd
HTTP Section
This section is used to provide a RESTful API and it is still experimental.
[HTTPD Router]
type=service
router=testroute
servers=max1,max2,max3,max4
Debug Section
We will use this section to administer and debug MaxScale. MaxScale is still at early stage and at the moment the debug module is also used as administration module.
[Debug Interface]
type=service
router=debugcli
Listener Sections
We set three listeners, for the SLB module, for the Debug module and for the HTTP module.
[RW Split Listener]
type=listener
service=RW Split Router
protocol=MySQLClient
port=4006
 
[Debug Listener]
type=listener
service=Debug Interface
protocol=telnetd
port=4442
 
[HTTPD Listener]
type=listener
service=HTTPD Router
protocol=HTTPD
port=6444
Servers Sections
Finally, we set the 4 servers:
[max1]
type=server
address=192.168.56.21
port=3306
protocol=MySQLBackend
 
[max2]
type=server
address=192.168.56.22
port=3306
protocol=MySQLBackend
 
[max3]
type=server
address=192.168.56.23
port=3306
protocol=MySQLBackend
 
[max4]
type=server
address=192.168.56.24
port=3306
 
protocol=MySQLBackend
Putting all together, the file looks like this:
#
# Number of server threads
# Valid options are:
#      threads=
[maxscale]
threads=1

#
# Define a monitor that can be used to determine the state and role of
# the servers.
#
# Valid options are:
#
#      module=
#      servers=,,...
#      user =
#                          slave client privileges>
#      passwd=
[MariaDB10 Monitor]
type=monitor
module=mysqlmon
servers=max1,max2,max3,max4
user=maxmon
passwd=maxpwd

#
# A series of service definition
#
# Valid options are:
#
#      router=
#      servers=,,...
#      user=
#      passwd=
#
# Valid router modules currently are:
#      readwritesplit, readconnroute and debugcli
[RW Split Router]
type=service
router=readwritesplit
servers=max1,max2,max3,max4
user=maxuser
passwd=maxpwd

[HTTPD Router]
type=service
router=testroute
servers=max1,max2,max3,max4

[Debug Interface]
type=service
router=debugcli
 
#
# Listener definitions for the services
#
# Valid options are:
#
#      service=
#      protocol=
#      port=
[RW Split Listener]
type=listener
service=RW Split Router
protocol=MySQLClient
port=4006
 
[Debug Listener]
type=listener
service=Debug Interface
protocol=telnetd
port=4442
 
[HTTPD Listener]
type=listener
service=HTTPD Router
protocol=HTTPD
port=6444
 
# Definition of the servers
[max1]
type=server
address=192.168.56.21
port=3306
protocol=MySQLBackend
 
[max2]
type=server
address=192.168.56.22
port=3306
protocol=MySQLBackend
 
[max3]
type=server
address=192.168.56.23
port=3306
protocol=MySQLBackend
 
[max4]
type=server
address=192.168.56.24
port=3306
protocol=MySQLBackend

Step 4 – Running MaxScale

You can start MaxScale in many ways, but I would recommend to create a script like this:
[root@Max6 ~]# cat bin/maxscale
MAXSCALE_HOME=/usr/local/skysql/maxscale/MaxScale
LD_LIBRARY_PATH=/usr/local/skysql/maxscale/lib
/usr/local/skysql/maxscale/bin/maxscale
For the administration commands, the script would simply be:
[root@Max6 ~]# cat bin/maxscale_admin
telnet localhost 4442
Now we are ready to test MaxScale.
[root@Max6 ~]# maxscale
 
SkySQL MaxScale     Sun Jan  5 21:10:33 2014
------------------------------------------------------
Info :  MaxScale will be run in a daemon process.
        See the log from the following log files:
Error log     :     /usr/local/skysql/maxscale/MaxScale/log/skygw_err1.log
Message log   :     /usr/local/skysql/maxscale/MaxScale/log/skygw_msg1.log
Trace log     :     /usr/local/skysql/maxscale/MaxScale/log/skygw_trace1.log
Debug log     :     /usr/local/skysql/maxscale/MaxScale/log/skygw_debug1.log 

Listening MySQL connections at 0.0.0.0:4006
Listening http connections at 0.0.0.0:6444 
Listening telnet connections at 0.0.0.0:4442
You can execute this command on Max 0:
[root@Max0 ~]# mysql -u maxuser -h192.168.56.26 -P4006 -pmaxpwd
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 1608
Server version: 5.5.22-SKYSQL-0.1.0 MariaDB Server
 
Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MySQL [(none)]> \s
--------------
mysql  Ver 15.1 Distrib 10.0.7-MariaDB, for Linux (x86_64) using readline 5.1
 
Connection id:         1608
Current database:
Current user:          maxuser@skycluster6
SSL:                   Not in use
Current pager:         stdout
Using outfile:         ''
Using delimiter:       ;
Server:                MySQL
Server version:        5.5.22-SKYSQL-0.1.0
MariaDB Server
Protocol version:      10
Connection:            192.168.56.26 via TCP/IP
Server characterset:   latin1
Db     characterset:   latin1
Client characterset:   latin1
Conn.  characterset:   latin1
TCP port:              4006
Uptime:                2 hours 55 min 2 sec
 
Threads: 6  Questions: 150  Slow queries: 0  Opens: 1  Flush tables: 1  Open tables: 63  Queries per second avg: 0.014
--------------
 
MySQL [(none)]>
The server and server versions are old names that do not provide any meaning and they will fixed in the next release – but they refer to MaxScale.
In order to check if MaxScale is working, check the process list on each MariaDB Server:
Max 1:
MariaDB [test]> show processlist;
+----+---------+-------------------+------+-------------+------+-----------------------------------------------------------------------+------------------+----------+
| Id | User    | Host              | db   | Command     | Time | State                                                                 | Info             | Progress |
+----+---------+-------------------+------+-------------+------+-----------------------------------------------------------------------+------------------+----------+
|  3 | root    | localhost         | test | Query       |    0 | init                                                                  | show processlist |    0.000 |
|  4 | root    | skycluster2:34549 | NULL | Binlog Dump | 3066 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL             |    0.000 |
|  5 | root    | skycluster4:59562 | NULL | Binlog Dump | 2941 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL             |    0.000 |
|  6 | root    | skycluster3:48031 | NULL | Binlog Dump | 2936 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL             |    0.000 |
| 11 | maxmon  | skycluster6:32784 | NULL | Sleep       |    7 |                                                                       | NULL             |    0.000 |
| 13 | maxuser | skycluster6:32791 | NULL | Sleep       |   45 |                                                                       | NULL             |    0.000 |
+----+---------+-------------------+------+-------------+------+-----------------------------------------------------------------------+------------------+----------+
6 rows in set (0.00 sec)
Max 2:
MariaDB [test]> show processlist;
+----+-------------+-------------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
| Id | User        | Host              | db   | Command | Time | State                                                                       | Info             | Progress |
+----+-------------+-------------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
|  4 | system user |                   | NULL | Connect | 3143 | Waiting for master to send event                                            | NULL             |    0.000 |
|  5 | system user |                   | NULL | Connect | -952 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL             |    0.000 |
|  6 | root        | localhost         | test | Query   |    0 | init                                                                        | show processlist |    0.000 |
|  9 | maxmon      | skycluster6:54821 | NULL | Sleep   |    5 |                                                                             | NULL             |    0.000 |
+----+-------------+-------------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
4 rows in set (0.00 sec)
Max 3:
MariaDB [(none)]> show processlist;
+----+-------------+-------------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
| Id | User        | Host              | db   | Command | Time | State                                                                       | Info             | Progress |
+----+-------------+-------------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
|  3 | root        | localhost         | NULL | Query   |    0 | init                                                                        | show processlist |    0.000 |
|  4 | system user |                   | NULL | Connect | 3058 | Waiting for master to send event                                            | NULL             |    0.000 |
|  5 | system user |                   | NULL | Connect | -905 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL             |    0.000 |
|  8 | maxmon      | skycluster6:57298 | NULL | Sleep   |    9 |                                                                             | NULL             |    0.000 |
|  9 | maxuser     | skycluster6:57302 | NULL | Sleep   |  167 |                                                                             | NULL             |    0.000 |
+----+-------------+-------------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
5 rows in set (0.00 sec)
Max 4:
MariaDB [test]> show processlist;
+----+-------------+-------------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
| Id | User        | Host              | db   | Command | Time | State                                                                       | Info             | Progress |
+----+-------------+-------------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
|  3 | root        | localhost         | test | Query   |    0 | init                                                                        | show processlist |    0.000 |
|  4 | system user |                   | NULL | Connect | 3112 | Waiting for master to send event                                            | NULL             |    0.000 |
|  5 | system user |                   | NULL | Connect |  655 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL             |    0.000 |
| 13 | maxmon      | skycluster6:32771 | NULL | Sleep   |    8 |                                                                             | NULL             |    0.000 |
+----+-------------+-------------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
4 rows in set (0.00 sec)
As you can see from the process list, the connection from Max 0 has been set to Max 1 for R/W and to Max 3 for R/O operations.

Step 5 Testing MaxScale

A super-simple test would be to check the number of queries and connections on each server before, during and after a mysqlslap session.
First, let’s prepare a basic command like:
[root@Sky1 ~]# cat mon.sql
select * from information_schema.global_status
 where variable_name in (
 'COM_SELECT',
 'COM_INSERT',
 'THREADS_CONNECTED’ );
…and…
[root@Sky1 ~]# cat mon
watch --interval=1 'mysql < mon.sql'
So you can run ./mon on each MariaDB/MySQL server, you should see a page like this, refreshing every second:
Every 1.0s: mysql < mon.sql     Tue Jan  7 23:11:17 2014 
VARIABLE_NAME   VARIABLE_VALUE
COM_INSERT               54856
COM_SELECT               85798
THREADS_CONNECTED            2
The threads on the master server are, as you have already seen in the process list, the replication threads and the MaxScale monitor. At least one of the thread on the slave servers is the MaxScale Monitor. On these servers, you will see the value of the COM_SELECT variable stepping +2 every second because the MaxScale monitor checks the status of the database every half second.
Now, if we run mysqlslap with 128 concurrent connections from Max 0:
[root@Sky0 ~]# mysqlslap -a -umaxuser -h192.168.56.26 -P4006 -pmaxpwd --create-schema=slap1 -c128
Benchmark
     Average number of seconds to run all queries: 6.792 seconds
     Minimum number of seconds to run all queries: 6.792 seconds
     Maximum number of seconds to run all queries: 6.792 seconds
     Number of clients running queries: 128
     Average number of queries per client: 0
Running the test on my laptop, the time is pretty irrelevant. What is relevant though is the increment that you can notice with the watch command on the DB servers:
The number of connected threads on Max 1 (the master node) should go a bit beyond 128 connections;
  • The number of connected threads on Max 2, 3 and 4 (the slave nodes) should be a bit more than 43;
  • On Max 1, the number of COM_INSERT increases significantly, COM_SELECT keeps increasing at the pace of 1 every 0.5 second;
  • On Max 2, 3 and 4, both the number of COM_INSERT and COM_SELECT increases. COM_INSERT increases because the replication thread is adding writing data. The number of COM_SELECT will increase equally on all the slaves because MaxScale is balancing the read queries on all the available slaves.
More tests and in-depth to come. In the meantime, please help us by testing maxscale, provide feedback, comments, suggestions, and submit bugs.
The MaxScale project is on GitHub, here: https://github.com/skysql/MaxScale