Percona Live Europe is now over, MySQL is not

Percona Live Europe is now more than a week away. l left Amsterdam with a positive thought: it has been the best European event for MySQL so far. Maybe the reason is that I saw the attendance increasing, or maybe it was the quality of the talks, or because I heard others making the same comment, and I also saw a reinvigorated MySQL ecosystem.
There are three main aspects I want to highlight.

1. MySQL 5.7 and the strong presence of the Oracle/MySQL team

There have been good talks and keynotes on MySQL 5.7. It is a sign of the strong commitment of Oracle towards MySQL. I think there is an even more important point. The most interesting features in 5.7 and the projects still in MySQL Labs derive or are in some way inspired by features available from other vendors. Some examples:

  • The JSON datatype from MySQL and MariaDB – two fairly different approaches, but definitely an interesting addition
  • Improvements in the optimizer from MySQL and MariaDB. There is a pretty long list of differences, this slide deck can help understand them a bit better…
  • Improvement for semi-sync replication from MySQL and WebScaleSQL
  • Automatic failover with replication from MySQL and MHA
  • Multi-source replication from MySQL and MariaDB 10
  • Group replication in MySQL and MariaDB 10 – Here things differ quite a lot, but the concept is similar.
  • MySQL router in MySQL and MaxScale – Again, a different approach but similar concepts to achieve the same results

My intent here is not to compare the features-I am simply pointing out that the competition among projects in the MySQL ecosystem is at least inspirational and can offer great advantages to the end user. Of course the other side of the coin is the creation of almost identical features, and the addition of more confusion and incompatibilities among the distributions.

2. The Pluggable Storage Engine Architecture is alive and kicking

Oracle’s commitment to improving InnoDB has been great so far, and hopefully InnoDB will get even better in the future. That said, the Pluggable Storage Engine Architecture was a unique feature for a long time. There have been two recent additions to the list of storage engines that have been around for long time. Today TokuDB, Infobright, InfiniDB, and ScaleDB share the advantage of being pluggable to MySQL with Deep and RocksDB. RocksDB is also pluggable to MongoDB, and even more important, it has been designed with a specific use case in mind.

3. Great support from the users

The three aspects have similar weight in measuring the health of MySQL, but this is my favourite, because it demonstrates how important MySQL is for some of the most innovative companies on the planet. Despite Kristian Koehntopp’s great keynote, showing us how boring the technology is at Booking.com, nobody really thought it was true. Using a stable and mature product like MySQL is not boring, it is wise. But this was not the only presentation that we enjoyed from the end users. Many showed a great use of MySQL, especially compared to the levels of scalability and performance that NoSQL databases ( these two combined aspects being the number 1 reason for using a NoSQL DB) struggle to produce with certain workloads.
I am looking forward to seeing the next episode, at Percona Live 2016 in Santa Clara.

This time it is real…

A few months ago I updated my profile on LinkedIN, and adjusted my position as CTO and founder of Athoa Ltd, a British company currently active for translation services and events that in the past hosted a couple of interesting open source projects. I simply forgot to disable the email notification to my connections, set by default, and in 2-3 hours I received tens of messages from friends and ex-colleagues who were curious to hear about my new adventure.

Today, I changed my profile on LinkedIN again and have left the email notification set on purpose.

As of today, I join the team at ScaleDB. My role is to define the product and the strategy for the company, working closely with CEO Tom Arthur, CTO Moshe Shadmon, CMO Mike Hogan and the rest of the team.

Leaving Canonical

The last nine months at Canonical have been an outstanding and crazily intense journey. I learned as I never learned before about systems and network infrastructures, and I met an amazing team of core engineers. It has been a unique experience, one of those that only come along once in a lifetime – I really mean it – and I will never forget it.

The decision to leave Canonical came after a lot of thinking and many sleepless nights. I met so many great people that in many ways, are making history in IT. In my team under Dan Poler, I worked with experienced Cloud and Solutions Architects that can analyze problems, discuss architectures and suggest solutions from the high level view, down to the kernel of the operating system and even to the silicon of systems and devices. Chris Kenyon and John Zannos teams are called “Sales”, but they are really advisors for a growing ecosystem of providers and adopters of Ubuntu and OpenStack technologies.

I have been inspired by the dedication and leadership of Canonical CEO Jane Silber. Jane has the difficult job of leading a company that is moving at lightspeed in many different directions, so that the technology that powers clouds, networks, end users and small devices can share the same kernel and will eventually converge. Jane is in my opinion the leading force, making Canonical blossom like a plum tree in mid-winter, when the rest of the of the nature still sleeps under the snow.

My greatest experience at Canonical has been working with Mark Shuttleworth. Mark is an inspiration not only for the people of Canonical or for the users of Ubuntu, but for us all. Mark’s energy and passion are second only to his great vision for the future. I recommend everybody to follow Mark’s blog and watch or attend his talks. His attention to detail and search for perfection never shadows the core message and understanding of the big picture; for this reason, both experienced listeners and newbies will have takeaways from his talks.

Back in June last year, I decided to join Canonical because of Mark’s vision. His ideas were in sync with what I wanted to bring at SkySQL/MariaDB. At Canonical, I could see this vision materialize in the direction the products were going, only on larger scale. This experience has reinforced in me the belief that we have an amazing opportunity right in front of us. The world is changing dramatically and at a speed that is incomparable with the past, even when compared with the first 10 years of the new millennium. We must think out of the box and reconsider the models that companies have used so far to sustain their business, since some of them are already anachronistic and create artificial barriers that will eventually collapse.

This experience at Canonical will stay with me forever and I hope to make a good use of what I have learned so far and all that I will learn in the future from Mark.

Joining ScaleDB

The last Percona Live was a great event. It was great to see so many friends and ex-colleagues again, now working on different companies but gathering together once a year as in a school reunion. Percona has now become a mature company, but more importantly, it has reached its maturity growing organically. The results are outstanding and the new course to be a global player in the world of databases looks even more promising.

The list of people and companies I would like to mention is simply too long and it would be a subject for a post per se. I found the MySQL world more active than ever. In this Percona Live I found the perfect balance between solid and mature technologies that are constantly improving, and new and disruptive technologies that are coming out under the same MySQL roof.

I simply feel as I am part of this world, and it is part of me. I worked with databases in many different roles for all my life, first with Digital/Oracle RDB and Digital/HP Datatrieve, then with IBM/Informix, Oracle, Sybase and SQLServer, and last with MySQL. I am looking at this world with the eyes of someone who has been enriched by new experiences. I simply think I have more to offer to this market than to networks and systems infrastructures. I therefore decided to come back. I also feel I can offer more in designing and defining products than in running services.

ScaleDB seems to me the company where I can express myself and I can help more at this point of my working life. With my previous role as advisor for the company, working on products and strategies just feels natural to me. The position is also compatible with my intention to improve and extend my involvement in the MySQL ecosystem, not only as MariaDB Ambassador, but also and equally advocating for Oracle and Percona products.

I also believe that MySQL should not be an isolated world from the rest of the database market. I already expressed my interest in Hadoop and other DB technologies in the past, and I believe that there should be more integration and sharing of information and experiences among these products.

I’ve known and have been working with Moshe Shadmon, ScaleDB CTO, for many years. Back in 2007, we spent time together discussing the use, advantages and disadvantages of distributed databases. At the time, we were talking about the differences between Oracle RAC, MySQL/NDB and DB/2, their strong and weak points, what needed to be improved. That was the time when ScaleDB as a technology started taking the shape that it has today.

ScaleDB is an amazing technology. It is currently usable as a storage engine with MariaDB 10.0, it has been developed with the idea of a cluster database from the ground up. As for MySQL in 2005, when the goal was to provide performance, scalability and ease of use in a single product, ScaleDB today provides more performance and greater scalability, without compromising availability and the use of standard SQL/MySQL. The engineering team at ScaleDB has recently worked on an amazing extension of their technology to sustain fast inserts and real-time queries on commodity hardware, at a fraction of the cost of NoSQL alternatives. This addition makes ScaleDB the perfect solution for storing and retrieving time series data, which is the essence for stream analytics and Internet of Things.

I believe ScaleDB has the incredible potential to become a significant player in the DB world, not only in MySQL. I feel excited and honored to be given the opportunity to work on this new adventure. I will try my hardest to serve the MySQL ecosystem in the best possible way, contributing to its success and improving the collaboration of companies – providers, customers, developers and end users – in MySQL and in the world of databases.

Now hop onto the new ride, the future is already here…

It does not matter if Aurora performs 1x or 10x MySQL: it _is_ a big thing

I spent the last 4 years at SkySQL/MariaDB working on versions of MySQL that could be “suitable for the cloud”. I strongly believed that the world needed a version of MySQL that could work in the cloud even better than its comparable version on bare metal. Users and administrators wanted to benefit from the use of cloud infrastructures and at the same time they wanted to achieve the same performance and overall stability of their installations on bare metal. Unfortunately, ACID-compliant databases in the cloud suffer from the issues that any centrally controlled and strictly persistent system can get when hosted on highly distributed and natively stateless infrastructures.

In this post I am not going to talk about the improvements needed for MySQL in the cloud – I will tackle this topic in a future post. Today I’d like to focus on the business side of RDS and Aurora.

In the last 4 years I had endless discussions over the use of Galera running in AWS on standard EC2 instances. I tried to explain many times that having Galera in such environment was almost pointless, since administrators did not have real control of the infrastructure. The reasons have nothing to do with the quality and the features of Galera, rather with the use of a technology placed in the wrong layer of the *aaS stack. Last but not least, I tried many times to guide the IT managers through the jungle of hidden costs of an installation of Galera (and other clustering technologies) in EC2, working through VPCs, guaranteed IOPs, dedicated instances and AZs etc.

I had interesting meetings with customers and prospects to help them in the analysis of the ROI of a migration and the TCO of an IT service in a public cloud. One example in particular, a media company in North America, was extremely interesting. The head of IT decided that a web service had to be migrated to AWS. The service had predicable web access peaks, mainly during public events – a perfect fit for AWS. When an event is approaching, Sysadms can launch more instances, then they can close them when the event ends. Unfortunately, the same approach cannot be applied to database servers, as their systems require to keep data available at all times. Each new event requires more block storage with higher IOPs and the size and flavour of the DB instances becomes so high spec that the overall cost of running everything in EC2 is higher than the original installation.

Aurora from an end customer perspective

Why is Aurora a big thing? Here are some points to consider:

1. No hidden costs in public clouds

The examples of Galera and the DB servers in AWS that I mentioned, are only two of the surprises that IT managers will find in their bills. There is a very good reason why public clouds have (or should have) a DBaaS offering: databases should be part of IaaS. They must make the most out of the knowledge of the bare metal layer, in terms of physical location, computing and storage performance, redundancy and reliability etc. Cloud users must use the database confidently, leaving typical administration tasks such as data backups and replication to automated systems that are part of the infrastructure. Furthermore, end customers want to work with databases that do not suffer resource contention in terms of processing, storage and network – or at least not in a way that is perceivable from an application standpoint. As we select EBS disks with requested IOPs, we must be able to use a database server with requested QPSs – whatever we define as “Query”. The same should happen for private clouds, since technologies, benefits and disadvantages are substantially the same. In AWS, RDS has already these features, but Aurora simply promises a better experience with more performance and reliability. Sadly, not many alternatives are available from other cloud providers.

2. Reduce the churn rate

A consequence of the real or expected hidden costs is a relatively high churn rate that affects many IT projects in AWS. DevOps start with AWS because it is simple and available immediately, but as the the business grows, so does the bill, and sometimes the growth is not proportional. Amazon needed to remove the increase in costs for their database as one of the reasons to leave or reduce the use of a public cloud, and Aurora is a significant step in this direction. I expect end customers to be more keen to keep their applications on AWS in the long run.

A strong message to the MySQL Ecosystem

There are lots of presentations and analysis around MySQL and the MySQL flavours, yet none of these analysis looks at the generated revenues from the right perspective. Between 2005 and 2010, MySQL was a hot technology that many considered as a serious alternative to closed source relational databases. In 2014, with an amazing combination of factors:

  • A vast number of options available as open source technologies in the database market
  • A substantial change in the IT infrastructure, focused on virtualisation and cloud operations
  • A substantial change in the development of applications and in the types of applications, now dominated by a DevOps approach
  • A fracture in the MySQL ecosystem, caused by forks and branches that generated competition but also confusion in the market
  • An increasing demand for databases focused on rich media and big data
  • A relatively stable and consolidated MySQL
  • A good level of knowledge and skills available in the market
    (…and the list goes on…)

All these factors have not only limited the growth in revenues in the MySQL ecosystem, but have basically shrunk them – if you do not consider the revenues coming from DBaaS. Here is a pure speculation: Oracle gets a good chunk of their revenues for MySQL from OEMs (i.e. commercial licenses) and from existing not-only-MySQL Oracle customers. Although Percona works hard in producing a more differentiated software product (and kudos for the work that the Percona software team does in terms of tooling and integration), the company adopted a healthy, but clearly services-focused business model. The MariaDB approach is more similar to Oracle, but without commercial licenses and without a multi-billion$ customers base. Yet, when you review the now 18 months’ old keynote from 451 research at Percona Live, you realise that the focus on “Who uses MySQL?” is pretty irrelevant: MySQL is ubiquitous and will be the most used open source database in the upcoming years. The question we should ask is rather, “Who pays for MySQL?”, or even better, “Why should one pay for MySQL?”: a reasonable fee paid for MySQL is the lymph that companies in the MySQL ecosystem need to survive and innovate.

Right now, in the MySQL ecosystem, Amazon is the real winner. Unfortunately, there are no public figures that can prove my point, not from the MySQL vendors, nor from Amazon. DBaaS is at the moment the only way to make users pay for a standard, fully compatible MySQL. In topping up X% of standard EC2 instances, Amazon provides a risk free, ready to use and near-zero administration database – and this is a big thing for DevOps, startup, application-focused teams who need to keep their resource costs down and will never hire super experts in MySQL.

With Aurora, Amazon has significantly raised the bar in the DBaaS competition, where there are no real competitors to Amazon at the moment. Dimitry Kravtchuck wrote in his blog that “MySQL 5.7 is already doing better” than Aurora. I have no doubts that a pure 5.7 on bare metal delivers great performance, but I think there are some aspects to consider. First of all, Aurora target customers do not want to deal with my.cnf parameters – even if we found out that Aurora is in reality nothing more than a smart configurator for MySQL, which can magically adapt mysqld instances on a given workload, it would still be good enough for the end customers. Second and most important point, Aurora is [supposed to be] the combination of standard MySQL (i.e. not an esoteric and innodb-incompatible storage engine) that delivers good performance in AWS – if Amazon found out that they can provide the same cloud features using a new and stable version of MySQL 5.7, I have no doubts they would replace Aurora with a better version of MySQL, probably keeping the same name and ignoring the version number, and even more importantly enjoying the revenues that the new improved version would generate.

The ball is in our court

With Aurora, Amazon is well ahead of any other vendor – public cloud and MySQL-based technology vendors – in providing MySQL as DBaaS. Rest assured that Google, Azure, Rackspace, HPCloud, Joyent and others are not sitting there watching, but in the meantime they are behind Aurora. Some interesting projects that can fill the gap are going on at the moment. Tesora is probably the most active company in this area, focusing on OpenStack as the platform for public and private clouds. Continuent provides a technology that could be adapted for this use too and the recent acquisition from VMware may give a push to some projects in this area. Considering the absence of the traditional MySQL players in this area, on one side this is an opportunity for new players to invest in products that are very likely to generate revenues in the near future. On the other hand, it is a concern that the lack of options for DBaaS will convince more end customers to adopt NoSQL technologies, which are more suitable to work in distributed, cloud infrastructures.

Moving On

I have a difficult task of making this post interesting, helpful and personal at the same time. I think the main goal is to balance these aspects, and I really appreciate your comments and suggestions that I will add here.

For the busy readers who may be put off by the length of this post, here is a very short summary: I spent 4 wonderful years, first as the head of Field Services, then as a CTO, I believe it is now time for a change, so I am leaving SkySQL. I am leaving behind a great company and very good friends, but I am not disappearing completely, and I will continue supporting the work I started and the projects I created with the help of such great people.

For many, leaving a company is not easy, and it is extremely difficult if you have contributed to its creation and development since the beginning. Even more difficult is to depart from ideas and projects that you have shaped and designed, together with the people who have contributed to them and that I am sure they will continue to work on these projects with great success.

The reasons for SkySQL

In the past 4 years, I have been asked many times why we had created SkySQL and how SkySQL is different from other providers, such as Percona and Oracle.
Since the beginning, the first and most important objective for SkySQL was to provide the best products and services around MySQL. In order to achieve this objective, we had created a network of partners and we were working closely with them to support our customers in the best possible way. The value added by SkySQL to the offering was a strong team of consultants and architects who could suggest and implement MySQL solutions, and a stellar Technical Support team who could provide the best possible answers to a large variety of technical and consultative issues that customers might encounter.
Having many options to choose from was certainly good, but it introduced another issue: not all the products could work well together. Customers demanded solutions from a single vendor that could go beyond the “typical” MySQL database + backup + monitor: they wanted to have a set of products that was tested and guaranteed to work together. This was the first motivation for the first big effort at SkySQL in terms of products and tools, when we defined the SkySQL Reference Architecture.

The Reference Architecture was the result of many hours spent in meetings and solitary thinking in my home office during the Christmas holidays in 2010, when the business slowed down and I could dedicate more CPU cycles to the subject. We worked on the project for 4 months and we launched the Architecture as a concept at the MySQL/Percona Conference in 2011. We demonstrated the SkySQL Reference Architecture with a tool that users could access online, in order to automatically generate and activate a fully functional cluster of MySQL replicated servers with MONyog, MySQL Replication, a cluster software with resource agent in AWS. Severalnines had a similar approach for MySQL Cluster and later for MySQL Replication and Galera, but at that time only SkySQL had the full automation and a selection of different engines and uses, from the configuration to the MySQL prompt. Later, Percona introduced a web tool that could provide an optimised configuration file.

The evolution of the Reference Architecture was the SkySQL Data Suite (SDS). The concept was similar, but the main difference was that for the first time we added SkySQL Intellectual Property to MySQL. The suite was packaged with an administration tool that was designed and built by SkySQL. The first target was the Cloud, specifically AWS and OpenStack. The initial idea was to have SDS seamlessly deployed on bare OS, on clouds or in hybrid environments. All the tools have been designed with programmable and user interfaces, in order to satisfy different customers’ needs. An independent presentation of SDS is available here.

In 2013, the company merged with Monty Program, and we suddenly found ourselves in a position where software development was a fundamental part of our offering. We moved the focus of the Data Suite to MariaDB and we rebranded it as MariaDB Enterprise, but more importantly, we combined the value and the skills of our services team with the core team of the original development of MySQL. The merge resulted in a company with all the credentials needed to excel and innovate in the MySQL world. But the key question at this point was: is this enough to make MySQL even more successful? Is a better MariaDB (or indeed MySQL) the right answer to the data management needs in 2010s and beyond?

The evolution of MySQL and MariaDB

The answer to the previous questions is not surprisingly a “no”. Indeed, users need a better MySQL (or MariaDB). Traditionally, they demanded more performance, more availability and more scalability, and many players have contributed in their own way to the cause.
Still, there is something missing. The competition from NoSQL solutions is, to say the least, intense. It is probably true that the MySQL adoption is not declining (as some analysts say), but the adoption of NoSQL is way bigger in absolute terms. And more important, the majority of the new initiatives and startups that once were the lymph that flowed in the MySQL Community, have now moved to NoSQL.
From a purely technical (and generic) perspective, when MySQL and NoSQL are tested and measured in a fair way, MySQL can provide in many cases better performance and robustness. Scalability, on the other hand, is a big issue as it has always been – it was an issue for bigger servers in the past, it is an issue for distributed systems now. The search for a better scalability is the primary reason why we have created MaxScale.

You may have read a lot about MaxScale, or you may want to read more here and here. In simple terms, MaxScale is a highly scalable, lightweight proxy system aimed at distributing and scaling parts of a database server that do not need to reside in its core. There is a similarity to this approach in the NoSQL world and certainly in many home made solitions. The mongos / mongod binomial is a good example of what MaxScale can achieve with MySQL, but this is only half of the story. MaxScale is generic in nature, what makes it a relevant component of the IT infrastucture are its plugins. By loading different plugins you can make MaxScale a proxy for multiple client protocols, or a proxy for geographically replicated servers, or to integrate different replication technologies, and so on.

I believe that we need MaxScale for MySQL and MariaDB. Incidentally, Max is the name of Monty’s son, so we have covered all his heirs (at least so far). In designing MaxScale, I wanted to provide a link between a technology that was good for servers available in the 90s and today’s infrastructures.

A difficult choice?

One might ask, if I feel so strong about MaxScale and its fundamental role, why am I leaving it behind? The fact is, I am not. The project is in good hands, thanks to the great work and dedication from Mark Riddoch, Massimiliano Pinto and Vilho Raatikka. The concept, the ideas and the architecture are here to stay. MaxScale is shaped today as we – Mark Riddoch, Massimo Brignoli and I – wanted it, as we have designed it during long hours of work and passionate discussions.
When your kids grow up and are ready to walk alone in the world, you need to let them go. MaxScale can now walk with MySQL and MariaDB, and SkySQL will take a good care of their path together. So now, I may move on and I have time to raise other kids.

A look at the future

As for me, I am technically embracing a wider range of technologies. I will not be focused only to MySQL, but rest assured that these 10 years will always remain in my heart. I will work on IT infrastructures and systems where databases play the central and most important role, but I will look at the customers’ needs as a whole. I will carry on my duty for the MySQL User Group in London, that has now reached the reasonable size of 40-50 attendees per session, every other month. I will not move my MySQL blog, so any MySQL-related post will be available on izoratti.blogspot.com and it will be aggregated on PlanetMySQL. But I will have a collection various topics in my personal blog www.ivanzoratti.com. I will cover more databases, HPC, OpenStack and OSs. I will also have a section dedicated to an important aspect of my life, which is the study of Kung Fu in its inner and outer styles. I started learning Kung Fu almost 30 years ago, first for 12 years, then I abandoned it for another 12 years, until I realised the importance of this practice in my life. I have to thank some of my best friends for that, they really helped me a lot in good and bad times.

So, even if I will not wear a T-shirt with a seal (or a sea lion, as it is more fashionable these days), you will probably see me around at conferences and exhibitions, or perhaps you will not see me, but I will work, as I have done in these 10 years, behind the scenes to make MySQL the good and strong database that can help in creating the next Facebook or the next Twitter of this world.

All the best to all of you.

MaxScale 1.0-beta is out – Happy Birthday MaxScale!

It was a year ago, on a nice Sunday night of the English Summer (apologies for the oxymoron), that Mark Riddoch came to see me and together we headed to the Vansittart Arms, our local family pub round the corner. A pint of London Pride on one side and a Honey Dew on the other were the perfect add-on to Mark’s MacBook Pro, on which Mark was showing me the 0.1 version of MaxScale. It was the result of the joint efforts of Mark’s team, Massimiliano and Vilho, who had worked hard to bring to life the first version of something that I believe will be a natural addition to clusters of MySQL/Percona/MariaDB servers in the near future.

A year ago, Mark showed me a basic debugging interface for MaxScale. We went through some parts of the code and the internal structures, and we looked at the way his team had kept everything sleek and lightweight. It was the implementation of hours and hours spent jotting ideas, diagrams, comments from users and customers. We spent hours at Caffe Trieste in San Francisco, at the Google Campus and at the National Theatre in London, in Berlin with my friend Kris and in many busy airports around Europe and US. Last but not least, the whiteboard in my garden office had been filled and wiped hundreds of times in order to try and find good ideas and the right components for MaxScale.

Today MaxScale has reached its beta stage. It is like a birthday, not only because a year has passed and we have reached a certain level of maturity, but mainly because, as it happens for humans, we like to set milestones in order to better organise our life and our work, and to set and achieve goals. But MaxScale is already evolving from 1.0: the code in various branches on GitHub is already showing more interesting and exciting features, which the team is developing for the next versions.

What is important and I want to talk about today is what is available in the 1.0-beta. We have now a new set of modules that are used to create filters, log queries and results, transform the requests to the database and the data retrieved. The read/write splitter is now available for both MySQL Replication and Galera, opening new possibilities to scale even better. In addition to these features, we now have dynamic balance server weighting, Node and Session Replication Consistency checks, automatic failover to multiple slaves and a more clear mechanism to implement high availability for MaxScale itself.

Let’s stick to the basics

For those who do not know MaxScale yet, here is the 60 seconds pitch. MaxScale is a database-centric proxy. It is “database centric” because it has been designed with database operations in mind, covering the typical I/O, computation and resource management of a databases. “Proxy” means that it sits between two components of a data infrastructure, where at least one of these components is a database. This means that MaxScale can sit between a client and one or more database servers, between a database master and one or more database slaves, or between two or more paired databases.

MaxScale’s core is based in Linux epoll calls and is optimised to be lightweight and low latency. MaxScale’s architecture relies on the use of pluggable modules that are combined together to offer authentication, protocol management, filtering, logging, monitoring and routing. In simple terms, the opportunities are endless: you simply need to define your objective and know the MaxScale API to build a proxy system that will take care of the communication and the resource usage of its components.

MaxScale as a proxy between client applications and a cluster of MySQL and MariaDB Replication servers and Galera Cluster. Red lines are read/write, blue lines are read/only.

MaxScale 1.0 beta comes with a set of interesting modules:

  • Authentication: MySQL/MariaDB authentication is operated inside MaxScale and the authentication to one or more servers is executed asynchronously. This module reduces the overall latency, especially when MaxScale is co-located with the application server.
  • Protocol: MaxScale provides client and backend MySQL connectivity.
  • Monitoring: MaxScale 1.0 beta comes with monitoring modules that are designed to work with single MySQL, Percona and MariaDB servers, with MySQL and MariaDB Replication and with Galera-based clusters.
  • Filter & Logging: this is the last addition to the set of modules in MaxScale. There are now some interesting logging modules used to monitor queries and results, and to transform queries captured using regular expressions.
  • Routing: MaxScale 1.0 beta comes with 2 routing modules, one to load-balance read/only connections on slave nodes or to load-balance read/write connections to Galera nodes, and another to route statements on nodes that are part of MySQL, MariaDB Replication and Galera-based clusters.
High availability for MaxScale: redundant MaxScales and co-location with the application servers.

You can find more details on the modules in Mark Riddoch’s blog posts.

Why a Proxy?

You may have seen the latest announcements from Oracle regarding a long-awaited and great product, MySQL Fabric. Fabric proudly claims to be proxy-free, and people usually ask me to compare MaxScale with Fabric and what the pros and cons of the two products are. First of all, I believe these products serve different scopes and they overlap only for a small set of features.

Fabric, as Oracle says in the first sentence of its web page is a framework for managing a farm of MySQL Servers. The focus is on the management of a number of servers that work together to provide a database infrastructure for your application. The servers are mapped together to provide availability and scalability, for example through database sharding. In order to use Fabric, you must upgrade to the newest versions of your database connectors and servers. For some applications, you may also need to modify the code in order to use some Fabric features (for example when Fabric is used with MySQL Replication, in order to load balance workload on read slaves).

MaxScale is meant to be a dispatcher of your database communications. In doing so, MaxScale can reduce the number of I/O ops, log and modify queries and results, and optimize the use of the database servers. MaxScale is designed to work transparently with all the connectors and database servers from version 4.1 to the latest MariaDB 10.X, and to react in real time to the requests of the clients, to the current workload and to the status of the database infrastructure. By doing so, MaxScale offers better availability and scalability – you may say, like Fabric does, but looking at what scalability and availability means, MaxScale is focused on the optimal use of the database servers (for example with a continuous monitoring of the database workload), instead of looking at a farm of servers as a whole.

The first baby step

This is the first baby step for MaxScale and users are warned that there is still a lot of work to do to improve it and to make it more stable. The fact that MaxScale is now beta means that it has reached a maturity in terms of features and many bugs have been fixed in the last 6 months, but the product is not production ready yet, unless used in a thoroughly tested and consolidated environment, i.e. where no changes in terms of versions or features are applied to the database and application servers. The next months will be devoted to catch more bugs, to benchmark MaxScale on real cases and in extreme conditions, such as heavy workloads in typical web-based applications for social networking, e-commerce, gaming and collaboration. The next objective is of course to see a robust version that can be declared production ready, i.e. in common terms we have thoroughly tested on live environments and we have caught and fixed all the known P1 & P2 bugs.

As usual, there is more to come, but in the meantime, we need your help to improve MaxScale – you can find the source code here (warning: the build is not optimal yet!). The fuss-free compiled versions are heremaxscale@googlegroups.com is now very active, we have many people who submit comments and requests every day, and this is already a success per se.

 

When the Innovator’s Dilemma hit the MySQL World

What’s the connection between databases and fruit flies?

Some of you may be familiar with the bestseller in business literature The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail, by Harvard professor Clayton Christensen. In his book, Prof. Christensen compared disk drives to fruit flies. The comparison relates to the rapid changes that disrupted the disk drive industry for decades. That disruption is compared to the rapid changes that take place to fruit flies that live only for few hours, and for this reason researchers can study and analyse their behaviour.

In the software business, you can replace disk drives with databases. Obviously, databases live longer than fruit flies, but it is an industry that sometimes shows schizophrenic changes. Whilst it is true that relational databases have being dominated the scene for decades, readers with grey hair and reading glasses have certainly experienced tens, if not hundreds of innovative changes in databases since the 70’s. Hierarchical, relational, object oriented, object relational, framework dependent, multi-dimensional and many other types of databases have been introduced to satisfy the need for faster, wider, more sophisticated data management. The 21st century, with the advent of web and mobile applications, has been so far the playground for even more innovative products, most of them under the umbrella of two overused terms, Big Data and NoSQL. Make no mistake, there is even more to come: the Internet of Things will sparkle a new wave of innovation in the industry.

How does this long premise apply to MySQL and MariaDB? The truth is, MySQL has been stagnating for almost 6 years. In 6 years, we have been witnesses to the explosion of a huge amount of data that has to be stored, retrieved and managed. We have also seen a significant change in the type of data: accounting, retail and financial information were predominant in the 80’s and 90’s whilst today it is all about multimedia content – videos, photos, audio clips, but also documents, books, executable programs, text messages, logs and tracing information – today’s “typical” data.

When I say that MySQL has been stagnating, I do not mean that MySQL has not been improved. Versions 5.5 and 5.6 are substantial milestones in MySQL’s history, but Oracle engineers have been focusing on fixing old-time issues, mostly performance related. In a way, some of the companies in the MySQL ecosystem have been led to follow the same path. These companies have worked on improving and stabilising their products, mainly due to the limits in the investments in disruptive technologies that they can afford. A good example is Tokutek: Tim Callaghan and Co. have done a tremendous job and their storage engine has reached a good and strong maturity, but the prize for the most innovative product in the Tokutek offices today goes to TokuMX, which is related to MongoDB, not MySQL.

5.5 and 5.6 are what professor Christensen calls sustaining technologies. Sustaining technologies, generally speaking, improve the performance of a product – “performance” not only in database terms, but in a more generic term, applicable to any product. What we are seeking here is disruptive technologies, and they are difficult to find. Disruptive technologies look at a problem with a new perspective, with a new way to fix issues, they introduce new and more innovative features. At the beginning, they may provide worse performances, but in the long term they simply provide “a better solution.”

The proof to this point – i.e. that 5.5 and 5.6 are sustaining technologies – is the recent announcement from Facebook, Google, Twitter and LinkedIn about WebScaleSQL. WebScaleSQL is today an even better MySQL, with more tests, better code, and better optimisation. In WebScaleSQL there are some interesting features that are the seed for more disruptive technology, such as the read-ahead mechanism and more.

Disruptive technologies in the MySQL ecosystem

Are there disruptive technologies in the MySQL ecosystem? Indeed there are. MySQL 5.6 was announced as a GA on 5th of February 2013. That is a sustaining technology. The following day, the MariaDB Foundation announced the second alpha version of MariaDB 10. Version 10.0.1 was a very incomplete and disruptive technology. It took almost 14 months to create something innovative and to incorporate components that other companies have been trying to develop, sometimes for many years.

Why do I call MariaDB 10 a disruptive technology? Here are the most important points, in case you have missed some:

  • For the first time, we have removed (or at least significantly reduced) the slave replication lag. Parallel replication is one of the most important and long awaited features which opens the doors to a more extensive use of master/slave replication for read scalability. See Kristian Nielsen’s blog on the subject.
  • We have introduced multi-source replication and more sophisticated replication topologies. We can now consolidate data coming from multiple masters into a slave node. It means, for example, that we can manage to connect multiple departments and data centers together. This is a significant contribution provided by Lixun Peng and Taobao.
  • We are sensibly reducing the downtime for online databases. The combination of MariaDB’s Global Transaction ID (which is fundamentally different from MySQL 5.6 GTID) and Galera Replication from Codership takes the availability of MySQL databases into a brand new territory, at a lower cost and with a significant reduction in the complexity of the infrastructure. Please note that MariaDB 10 Galera Cluster is not GA yet.
  • We have a significantly improved built-in sharding solution for MariaDB. 6 years ago, a young man in a Spiderman suit presented for the first time a new storage engine, called Spider. A few days ago, that same man, Kentoku Shiba, released Spider 3.2. The engine is now included in MariaDB 10 and is an extremely promising technology that has been substantially improved, and now includes also high availability, in-shard scalability and distributed transactions. Spider 3.2 is not GA yet, but previous versions are already used in production in medium size sites around the globe and we expect a significant boost in its usage in the future.
  • We have reorganised the way we integrate MariaDB/MySQL with other data sources. On one side, we have improved the way storage engines can interact with the core of MariaDB: for example, tables can be created or altered using engine-specific attributes that simplify the use of new features in the engine. One example of new engines developed by MariaDB is Cassandra, which allows the use of MariaDB and the MySQL clients as an interface to read and write data from/to a Cassandra cluster. We have also introduced a brand new storage engine called Connect. The Connect engine, as the name says, connects external sources, such as external DB tables (also from other DBMS), files, folders, logs, etc., in reading and writing. Users can now use applications built on MySQL and MariaDB to directly access heterogenous environments, without moving data back and forth among databases. They can also create joins and view results that are a combination of data coming from multiple sources. This is the very first version of Connect, so you should use it carefully.

This is only a subset of the new features available in MariaDB. They are paired with the robustness of consolidated engines and the core of MariaDB, which can be used for the application that made MySQL and MariaDB so popular.

From this analysis, there are important takeaways. Small companies in our business may struggle to innovate, due to the amount of investments needed. To some extent, the MariaDB Foundation has become the hub for such a collaboration and I really hope that MariaDB and WebScaleSQL will one day be able to join their efforts and innovate together. Disruptive technologies usually take a longer time and bigger investments to reach a reasonable maturity: a close collaboration can speed up this process, make features available sooner and allow MySQL and MariaDB to compete in innovation with NoSQL initiatives and products.

From the innovator’s to the DBA’s dilemma – innovation vs. compatibility?

A by-product of innovation is the difficulty of keeping the product compatible with the past. No matter how hard one works to make everything completely compatible with previous versions and features, disruption in innovation may also mean disruption in the compatibility with the mainstream technology. This is the reason why, above all, MariaDB is now a fork of MySQL and no longer a branch. Although the codebase has its strong roots in MySQL 5.5, it is inevitable that MariaDB and MySQL will diverge more and more in the future.

That said, in our industry backward compatibility is paramount and we break it only when it becomes an insurmountable obstacle to innovation – insurmountable here means that the effort and complexity added to keep a new version compatible with previous versions, is not justified. The good news is that MariaDB 10 does not present significant backward compatibility with MySQL 5.5, but forward compatibility with MySQL 5.6 and future versions. Once you fork a product, you inevitably create a new product. The other good news is that modularity makes things simple. For example, if XtraDB and InnoDB evolve within the boundaries of the current storage engine API, it will be relatively simple for MariaDB 10 to adopt the new engines.

How much do MySQL 5.6 and MariaDB 10 diverge today? Can we still say that MariaDB is a drop-in replacement for MySQL? This question attracts different answers, depending on who’s answering it. The truth is, nothing, not even minor versions of the same product, can claim to be fully compatible. I remember a customer who migrated from MySQL 5.5. to MariaDB 5.5. When one of our best consultants tested the application with MariaDB, he found some issues. Ironically, the issues were caused by a patch that fixed an old bug in MySQL 5.5: the application was designed to live with that bug and the fix in MariaDB caused an unexpected behaviour.

Back to more details, we can reasonably say that MySQL 5.6 and MariaDB 10 are “application compatible”. It means that from a typical application perspective, where developers and end-users use standard DML and DDL commands, there are no significant differences between the two products. What might differ is the DBA’s view. Replication is an example of this incompatibility. MySQL 5.6 and MariaDB 10 have a different GTID: MySQL has a transaction ID based on the server UUID and a transaction sequence, whilst MariaDB has a combination of domain, server and event ID that identifies a set of transactions in an event group. The incompatibility is not only in the functions and in the instruments available to the DBA, but most importantly it is in the infrastructure and in the tools that the DBA can use. The same applies to the automatic failover in Replication and the use of multi-source replication, which again provides an incompatible format for the commands used to administer replication.

Unfortunately, there is not a full list of the differences between MySQL 5.6 and MariaDB 10, and the list would have little meaning, considering that a new feature adds lots of small changes in the database. The list below is a small set of what you can expect to be different. More details are available here and here.

  • MySQL 5.6 has Unicode support in the mysql command line utility
  • At the moment, the memcached protocol is not available in MariaDB 10
  • In MariaDB 10 the default storage engine for temporary tables cannot be specified
  • In MySQL 5.6 it is possible to exchange partitions between tables
  • Binlog variables and settings are different
  • The block encryption mode for block-mode AES algorithms are handled by a parameter in Mysql 5.6
  • MySQL 5.6 can specify where to write the core file if the server crashes
  • MariaDB 10 provides a full set of parameters for the Aria engine
  • MariaDB 10 provides more parameters to handle the connection pool
  • MariaDB 10 provides more parameters to change the behaviour of the optimizer
  • MariaDB 10 offers XtraDB parameters that are not available in MySQL 5.6
  • MariaDB 10 identifies the transactional state of a session
  • MariaDB 10 provides dynamic and virtual columns functions

Time for conclusions

This post is way longer than I expected, I hope it has been interesting for some of my usual 25 readers… I dare to make one final statement: the MySQL, MariaDB and the ecosystem offer today a choice of sustaining and disruptive technologies. Many databases only offer one option or the other and very few offer both. The beauty of this story is yet to come. Let’s find out more this week at Percona Live.

 

When the Innovator’s Dilemma hit the MySQL World

What’s the connection between databases and fruit flies?

Some of you may be familiar with the bestseller in business literature The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail, by Harvard professor Clayton Christensen. In his book, Prof. Christensen compared disk drives to fruit flies. The comparison relates to the rapid changes that disrupted the disk drive industry for decades. That disruption is compared to the rapid changes that take place to fruit flies that live only for few hours, and for this reason researchers can study and analyse their behaviour.

In the software business, you can replace disk drives with databases. Obviously, databases live longer than fruit flies, but it is an industry that sometimes shows schizophrenic changes. Whilst it is true that relational databases have being dominated the scene for decades, readers with grey hair and reading glasses have certainly experienced tens, if not hundreds of innovative changes in databases since the 70’s. Hierarchical, relational, object oriented, object relational, framework dependent, multi-dimensional and many other types of databases have been introduced to satisfy the need for faster, wider, more sophisticated data management. The 21st century, with the advent of web and mobile applications, has been so far the playground for even more innovative products, most of them under the umbrella of two overused terms, Big Data and NoSQL. Make no mistake, there is even more to come: the Internet of Things will sparkle a new wave of innovation in the industry.

How does this long premise apply to MySQL and MariaDB? The truth is, MySQL has been stagnating for almost 6 years. In 6 years, we have been witnesses to the explosion of a huge amount of data that has to be stored, retrieved and managed. We have also seen a significant change in the type of data: accounting, retail and financial information were predominant in the 80’s and 90’s whilst today it is all about multimedia content – videos, photos, audio clips, but also documents, books, executable programs, text messages, logs and tracing information – today’s “typical” data.

When I say that MySQL has been stagnating, I do not mean that MySQL has not been improved. Versions 5.5 and 5.6 are substantial milestones in MySQL’s history, but Oracle engineers have been focusing on fixing old-time issues, mostly performance related. In a way, some of the companies in the MySQL ecosystem have been led to follow the same path. These companies have worked on improving and stabilising their products, mainly due to the limits in the investments in disruptive technologies that they can afford. A good example is Tokutek: Tim Callaghan and Co. have done a tremendous job and their storage engine has reached a good and strong maturity, but the prize for the most innovative product in the Tokutek offices today goes to TokuMX, which is related to MongoDB, not MySQL.

5.5 and 5.6 are what professor Christensen calls sustaining technologies. Sustaining technologies, generally speaking, improve the performance of a product – “performance” not only in database terms, but in a more generic term, applicable to any product. What we are seeking here is disruptive technologies, and they are difficult to find. Disruptive technologies look at a problem with a new perspective, with a new way to fix issues, they introduce new and more innovative features. At the beginning, they may provide worse performances, but in the long term they simply provide “a better solution.”

The proof to this point – i.e. that 5.5 and 5.6 are sustaining technologies – is the recent announcement from Facebook, Google, Twitter and LinkedIn about WebScaleSQL. WebScaleSQL is today an even better MySQL, with more tests, better code, and better optimisation. In WebScaleSQL there are some interesting features that are the seed for more disruptive technology, such as the read-ahead mechanism and more.

Disruptive technologies in the MySQL ecosystem

Are there disruptive technologies in the MySQL ecosystem? Indeed there are. MySQL 5.6 was announced as a GA on 5th of February 2013. That is a sustaining technology. The following day, the MariaDB Foundation announced the second alpha version of MariaDB 10. Version 10.0.1 was a very incomplete and disruptive technology. It took almost 14 months to create something innovative and to incorporate components that other companies have been trying to develop, sometimes for many years.

Why do I call MariaDB 10 a disruptive technology? Here are the most important points, in case you have missed some:

  • For the first time, we have removed (or at least significantly reduced) the slave replication lag. Parallel replication is one of the most important and long awaited features which opens the doors to a more extensive use of master/slave replication for read scalability. See Kristian Nielsen’s blog on the subject.
  • We have introduced multi-source replication and more sophisticated replication topologies. We can now consolidate data coming from multiple masters into a slave node. It means, for example, that we can manage to connect multiple departments and data centers together. This is a significant contribution provided by Lixun Peng and Taobao.
  • We are sensibly reducing the downtime for online databases. The combination of MariaDB’s Global Transaction ID(which is fundamentally different from MySQL 5.6 GTID) and Galera Replication from Codership takes the availability of MySQL databases into a brand new territory, at a lower cost and with a significant reduction in the complexity of the infrastructure. Please note that MariaDB 10 Galera Cluster is not GA yet.
  • We have a significantly improved built-in sharding solution for MariaDB. 6 years ago, a young man in a Spiderman suit presented for the first time a new storage engine, called Spider. A few days ago, that same man, Kentoku Shiba, released Spider 3.2. The engine is now included in MariaDB 10 and is an extremely promising technology that has been substantially improved, and now includes also high availability, in-shard scalability and distributed transactions. Spider 3.2 is not GA yet, but previous versions are already used in production in medium size sites around the globe and we expect a significant boost in its usage in the future.
  • We have reorganised the way we integrate MariaDB/MySQL with other data sources. On one side, we have improved the way storage engines can interact with the core of MariaDB: for example, tables can be created or altered using engine-specific attributes that simplify the use of new features in the engine. One example of new engines developed by MariaDB is Cassandra, which allows the use of MariaDB and the MySQL clients as an interface to read and write data from/to a Cassandra cluster. We have also introduced a brand new storage engine called Connect. The Connect engine, as the name says, connects external sources, such as external DB tables (also from other DBMS), files, folders, logs, etc., in reading and writing. Users can now use applications built on MySQL and MariaDB to directly access heterogenous environments, without moving data back and forth among databases. They can also create joins and view results that are a combination of data coming from multiple sources. This is the very first version of Connect, so you should use it carefully.
This is only a subset of the new features available in MariaDB. They are paired with the robustness of consolidated engines and the core of MariaDB, which can be used for the application that made MySQL and MariaDB so popular.
From this analysis, there are important takeaways. Small companies in our business may struggle to innovate, due to the amount of investments needed. To some extent, the MariaDB Foundation has become the hub for such a collaboration and I really hope that MariaDB and WebScaleSQL will one day be able to join their efforts and innovate together. Disruptive technologies usually take a longer time and bigger investments to reach a reasonable maturity: a close collaboration can speed up this process, make features available sooner and allow MySQL and MariaDB to compete in innovation with NoSQL initiatives and products.

From the innovator’s to the DBA’s dilemma – innovation vs. compatibility?

A by-product of innovation is the difficulty of keeping the product compatible with the past. No matter how hard one works to make everything completely compatible with previous versions and features, disruption in innovation may also mean disruption in the compatibility with the mainstream technology. This is the reason why, above all, MariaDB is now a fork of MySQL and no longer a branch. Although the codebase has its strong roots in MySQL 5.5, it is inevitable that MariaDB and MySQL will diverge more and more in the future.
That said, in our industry backward compatibility is paramount and we break it only when it becomes an insurmountable obstacle to innovation – insurmountable here means that the effort and complexity added to keep a new version compatible with previous versions, is not justified. The good news is that MariaDB 10 does not present significant backward compatibility with MySQL 5.5, but forward compatibility with MySQL 5.6 and future versions. Once you fork a product, you inevitably create a new product. The other good news is that modularity makes things simple. For example, if XtraDB and InnoDB evolve within the boundaries of the current storage engine API, it will be relatively simple for MariaDB 10 to adopt the new engines.
How much do MySQL 5.6 and MariaDB 10 diverge today? Can we still say that MariaDB is a drop-in replacement for MySQL? This question attracts different answers, depending on who’s answering it. The truth is, nothing, not even minor versions of the same product, can claim to be fully compatible. I remember a customer who migrated from MySQL 5.5. to MariaDB 5.5. When one of our best consultants tested the application with MariaDB, he found some issues. Ironically, the issues were caused by a patch that fixed an old bug in MySQL 5.5: the application was designed to live with that bug and the fix in MariaDB caused an unexpected behaviour.
Back to more details, we can reasonably say that MySQL 5.6 and MariaDB 10 are “application compatible”. It means that from a typical application perspective, where developers and end-users use standard DML and DDL commands, there are no significant differences between the two products. What might differ is the DBA’s view. Replication is an example of this incompatibility. MySQL 5.6 and MariaDB 10 have a different GTID: MySQL has a transaction ID based on the server UUID and a transaction sequence, whilst MariaDB has a combination of domain, server and event ID that identifies a set of transactions in an event group. The incompatibility is not only in the functions and in the instruments available to the DBA, but most importantly it is in the infrastructure and in the tools that the DBA can use. The same applies to the automatic failover in Replication and the use of multi-source replication, which again provides an incompatible format for the commands used to administer replication.
Unfortunately, there is not a full list of the differences between MySQL 5.6 and MariaDB 10, and the list would have little meaning, considering that a new feature adds lots of small changes in the database. The list below is a small set of what you can expect to be different. More details are available here and here.
  • MySQL 5.6 has Unicode support in the mysql command line utility
  • At the moment, the memcached protocol is not available in MariaDB 10
  • In MariaDB 10 the default storage engine for temporary tables cannot be specified
  • In MySQL 5.6 it is possible to exchange partitions between tables
  • Binlog variables and settings are different
  • The block encryption mode for block-mode AES algorithms are handled by a parameter in Mysql 5.6
  • MySQL 5.6 can specify where to write the core file if the server crashes
  • MariaDB 10 provides a full set of parameters for the Aria engine
  • MariaDB 10 provides more parameters to handle the connection pool
  • MariaDB 10 provides more parameters to change the behaviour of the optimizer
  • MariaDB 10 offers XtraDB parameters that are not available in MySQL 5.6
  • MariaDB 10 identifies the transactional state of a session
  • MariaDB 10 provides dynamic and virtual columns functions

 

Time for conclusions

This post is way longer than I expected, I hope it has been interesting for some of my usual 25 readers… I dare to make one final statement: the MySQL, MariaDB and the ecosystem offer today a choice of sustaining and disruptive technologies. Many databases only offer one option or the other and very few offer both. The beauty of this story is yet to come. Let’s find out more this week at Percona Live.