What’s the connection between databases and fruit flies?
Some of you may be familiar with the bestseller in business literature The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail, by Harvard professor Clayton Christensen. In his book, Prof. Christensen compared disk drives to fruit flies. The comparison relates to the rapid changes that disrupted the disk drive industry for decades. That disruption is compared to the rapid changes that take place to fruit flies that live only for few hours, and for this reason researchers can study and analyse their behaviour.
In the software business, you can replace disk drives with databases. Obviously, databases live longer than fruit flies, but it is an industry that sometimes shows schizophrenic changes. Whilst it is true that relational databases have being dominated the scene for decades, readers with grey hair and reading glasses have certainly experienced tens, if not hundreds of innovative changes in databases since the 70’s. Hierarchical, relational, object oriented, object relational, framework dependent, multi-dimensional and many other types of databases have been introduced to satisfy the need for faster, wider, more sophisticated data management. The 21st century, with the advent of web and mobile applications, has been so far the playground for even more innovative products, most of them under the umbrella of two overused terms, Big Data and NoSQL. Make no mistake, there is even more to come: the Internet of Things will sparkle a new wave of innovation in the industry.
How does this long premise apply to MySQL and MariaDB? The truth is, MySQL has been stagnating for almost 6 years. In 6 years, we have been witnesses to the explosion of a huge amount of data that has to be stored, retrieved and managed. We have also seen a significant change in the type of data: accounting, retail and financial information were predominant in the 80’s and 90’s whilst today it is all about multimedia content – videos, photos, audio clips, but also documents, books, executable programs, text messages, logs and tracing information – today’s “typical” data.
When I say that MySQL has been stagnating, I do not mean that MySQL has not been improved. Versions 5.5 and 5.6 are substantial milestones in MySQL’s history, but Oracle engineers have been focusing on fixing old-time issues, mostly performance related. In a way, some of the companies in the MySQL ecosystem have been led to follow the same path. These companies have worked on improving and stabilising their products, mainly due to the limits in the investments in disruptive technologies that they can afford. A good example is Tokutek: Tim Callaghan and Co. have done a tremendous job and their storage engine has reached a good and strong maturity, but the prize for the most innovative product in the Tokutek offices today goes to TokuMX, which is related to MongoDB, not MySQL.
5.5 and 5.6 are what professor Christensen calls sustaining technologies. Sustaining technologies, generally speaking, improve the performance of a product – “performance” not only in database terms, but in a more generic term, applicable to any product. What we are seeking here is disruptive technologies, and they are difficult to find. Disruptive technologies look at a problem with a new perspective, with a new way to fix issues, they introduce new and more innovative features. At the beginning, they may provide worse performances, but in the long term they simply provide “a better solution.”
The proof to this point – i.e. that 5.5 and 5.6 are sustaining technologies – is the recent announcement from Facebook, Google, Twitter and LinkedIn about WebScaleSQL. WebScaleSQL is today an even better MySQL, with more tests, better code, and better optimisation. In WebScaleSQL there are some interesting features that are the seed for more disruptive technology, such as the read-ahead mechanism and more.
Disruptive technologies in the MySQL ecosystem
Are there disruptive technologies in the MySQL ecosystem? Indeed there are. MySQL 5.6 was announced as a GA on 5th of February 2013. That is a sustaining technology. The following day, the MariaDB Foundation announced the second alpha version of MariaDB 10. Version 10.0.1 was a very incomplete and disruptive technology. It took almost 14 months to create something innovative and to incorporate components that other companies have been trying to develop, sometimes for many years.
Why do I call MariaDB 10 a disruptive technology? Here are the most important points, in case you have missed some:
- For the first time, we have removed (or at least significantly reduced) the slave replication lag. Parallel replication is one of the most important and long awaited features which opens the doors to a more extensive use of master/slave replication for read scalability. See Kristian Nielsen’s blog on the subject.
- We have introduced multi-source replication and more sophisticated replication topologies. We can now consolidate data coming from multiple masters into a slave node. It means, for example, that we can manage to connect multiple departments and data centers together. This is a significant contribution provided by Lixun Peng and Taobao.
- We are sensibly reducing the downtime for online databases. The combination of MariaDB’s Global Transaction ID (which is fundamentally different from MySQL 5.6 GTID) and Galera Replication from Codership takes the availability of MySQL databases into a brand new territory, at a lower cost and with a significant reduction in the complexity of the infrastructure. Please note that MariaDB 10 Galera Cluster is not GA yet.
- We have a significantly improved built-in sharding solution for MariaDB. 6 years ago, a young man in a Spiderman suit presented for the first time a new storage engine, called Spider. A few days ago, that same man, Kentoku Shiba, released Spider 3.2. The engine is now included in MariaDB 10 and is an extremely promising technology that has been substantially improved, and now includes also high availability, in-shard scalability and distributed transactions. Spider 3.2 is not GA yet, but previous versions are already used in production in medium size sites around the globe and we expect a significant boost in its usage in the future.
- We have reorganised the way we integrate MariaDB/MySQL with other data sources. On one side, we have improved the way storage engines can interact with the core of MariaDB: for example, tables can be created or altered using engine-specific attributes that simplify the use of new features in the engine. One example of new engines developed by MariaDB is Cassandra, which allows the use of MariaDB and the MySQL clients as an interface to read and write data from/to a Cassandra cluster. We have also introduced a brand new storage engine called Connect. The Connect engine, as the name says, connects external sources, such as external DB tables (also from other DBMS), files, folders, logs, etc., in reading and writing. Users can now use applications built on MySQL and MariaDB to directly access heterogenous environments, without moving data back and forth among databases. They can also create joins and view results that are a combination of data coming from multiple sources. This is the very first version of Connect, so you should use it carefully.
This is only a subset of the new features available in MariaDB. They are paired with the robustness of consolidated engines and the core of MariaDB, which can be used for the application that made MySQL and MariaDB so popular.
From this analysis, there are important takeaways. Small companies in our business may struggle to innovate, due to the amount of investments needed. To some extent, the MariaDB Foundation has become the hub for such a collaboration and I really hope that MariaDB and WebScaleSQL will one day be able to join their efforts and innovate together. Disruptive technologies usually take a longer time and bigger investments to reach a reasonable maturity: a close collaboration can speed up this process, make features available sooner and allow MySQL and MariaDB to compete in innovation with NoSQL initiatives and products.
From the innovator’s to the DBA’s dilemma – innovation vs. compatibility?
A by-product of innovation is the difficulty of keeping the product compatible with the past. No matter how hard one works to make everything completely compatible with previous versions and features, disruption in innovation may also mean disruption in the compatibility with the mainstream technology. This is the reason why, above all, MariaDB is now a fork of MySQL and no longer a branch. Although the codebase has its strong roots in MySQL 5.5, it is inevitable that MariaDB and MySQL will diverge more and more in the future.
That said, in our industry backward compatibility is paramount and we break it only when it becomes an insurmountable obstacle to innovation – insurmountable here means that the effort and complexity added to keep a new version compatible with previous versions, is not justified. The good news is that MariaDB 10 does not present significant backward compatibility with MySQL 5.5, but forward compatibility with MySQL 5.6 and future versions. Once you fork a product, you inevitably create a new product. The other good news is that modularity makes things simple. For example, if XtraDB and InnoDB evolve within the boundaries of the current storage engine API, it will be relatively simple for MariaDB 10 to adopt the new engines.
How much do MySQL 5.6 and MariaDB 10 diverge today? Can we still say that MariaDB is a drop-in replacement for MySQL? This question attracts different answers, depending on who’s answering it. The truth is, nothing, not even minor versions of the same product, can claim to be fully compatible. I remember a customer who migrated from MySQL 5.5. to MariaDB 5.5. When one of our best consultants tested the application with MariaDB, he found some issues. Ironically, the issues were caused by a patch that fixed an old bug in MySQL 5.5: the application was designed to live with that bug and the fix in MariaDB caused an unexpected behaviour.
Back to more details, we can reasonably say that MySQL 5.6 and MariaDB 10 are “application compatible”. It means that from a typical application perspective, where developers and end-users use standard DML and DDL commands, there are no significant differences between the two products. What might differ is the DBA’s view. Replication is an example of this incompatibility. MySQL 5.6 and MariaDB 10 have a different GTID: MySQL has a transaction ID based on the server UUID and a transaction sequence, whilst MariaDB has a combination of domain, server and event ID that identifies a set of transactions in an event group. The incompatibility is not only in the functions and in the instruments available to the DBA, but most importantly it is in the infrastructure and in the tools that the DBA can use. The same applies to the automatic failover in Replication and the use of multi-source replication, which again provides an incompatible format for the commands used to administer replication.
Unfortunately, there is not a full list of the differences between MySQL 5.6 and MariaDB 10, and the list would have little meaning, considering that a new feature adds lots of small changes in the database. The list below is a small set of what you can expect to be different. More details are available here and here.
- MySQL 5.6 has Unicode support in the mysql command line utility
- At the moment, the memcached protocol is not available in MariaDB 10
- In MariaDB 10 the default storage engine for temporary tables cannot be specified
- In MySQL 5.6 it is possible to exchange partitions between tables
- Binlog variables and settings are different
- The block encryption mode for block-mode AES algorithms are handled by a parameter in Mysql 5.6
- MySQL 5.6 can specify where to write the core file if the server crashes
- MariaDB 10 provides a full set of parameters for the Aria engine
- MariaDB 10 provides more parameters to handle the connection pool
- MariaDB 10 provides more parameters to change the behaviour of the optimizer
- MariaDB 10 offers XtraDB parameters that are not available in MySQL 5.6
- MariaDB 10 identifies the transactional state of a session
- MariaDB 10 provides dynamic and virtual columns functions
Time for conclusions
This post is way longer than I expected, I hope it has been interesting for some of my usual 25 readers… I dare to make one final statement: the MySQL, MariaDB and the ecosystem offer today a choice of sustaining and disruptive technologies. Many databases only offer one option or the other and very few offer both. The beauty of this story is yet to come. Let’s find out more this week at Percona Live.