ScaleDB ONE: Let’s Get Started

ScaleDB 15.10 is out. Some users have downloaded it and tested it and we have received pretty positive feedback, but also some requests to have more info and help on how to start. I will try to condense here the basic steps to install and test ScaleDB ONE for the first time.

First of all, some terminology. We have two versions: ScaleDB ONE and ScaleDB Cluster. ScaleDB ONE is meant to be used on a single machine (ONE = One Node Edition), whether it is a VM, a cloud instance or a physical server, whilst ScaleDB Cluster is the full size, multi-node cluster that everybody expects to run for mission critical applications. This means that the typical use cases for ScaleDB ONE are testing and development, data marts and streaming data collection and analysis that are limited by a single server (although you can always replicate your data to another server using the standard MySQL Replication). ScaleDB Cluster instead, is highly available out of the box, with no single point of failure, and scaleable on demand (i.e. you do not need replication to set up availability and a more scalable environment).

From now on, in this post I will refer to ScaleDB as ScaleDB ONE.

Now, some prerequisites. ScaleDB has been tested on CentOS 6.7, CentOS 7.1 and on Ubuntu 14.04.3 Trusty Tahr. 1GB of memory and few GB of free disk space would be enough to test the product but, as for any other databases, the more cores, memory and storage you can add, the better. One interesting aspect is that, if you are planning to store a large amount of data, you will be very pleased with the performance you can get from ScaleDB using magnetic HDDs instead of SSD (but this is a topic for another post).

On CentOS 6.7, you must add nmap and nc, since you will need them later to interact with the ScaleDB daemons. A yum install nmap-ncat should do the trick. If you have done a minimal install of CentOS 7.1, I would also recommend to install nettools.

Another mandatory requirement is the installation of the AIO libraries (with yum install libaio). For Ubuntu 14.04.3, you will be required to install the AIO libraries, with sudo apt-get install libaio1.

The last two steps are not mandatory, but they will make your life easier: create a scaledb user that can be a sudoer and disable the firewall on your testing machine. From now on, I assume that you will log in as scaledb.

Downloading the software

If you have not downloaded ScaleDB ONE yet, you will have to hit few pages on the ScaleDB website, but the process is very straightforward. Just go to, click the Download button, scroll down to the bottom and click another Download button (this time it is grey). The next screen is used to select the type of download. You have two choices:

  • VirtualBox Image, which will allow you to download a OVA (a compressed VirtualBox image file), so everything is self contained in a CentOS 7 image and you do not need to install any software.
  • Tarball, which allows you to decompress and unarchive the ScaleDB product. We do not have Linux packages yet, they will be available soon.

Once you select your favourite download, all you have to do is to fill five fields, then you will receive am email with your personalised link to the downloads. You can keep this link, we will update your environment with new versions. Soon we will also add a yum and a apt repository.

You can use the unique URL to download ScaleDB ONE. You must add /scaledb-15.10/latest-release to the URL or you can simply browse the repository to search the release you need. In the latest-release folder, you will find 3 tarballs:

  1. The ScaleDB ONE UDE: UDE stands for Universal Data Engine. This is the ScaleDB engine. Right now (2015-11-17), the latest UDE tarball is scaledb-15.10.1-13199-ude.tgz.
  2. MariaDB: We use MariaDB as database server. ScaleDB can work on its own, i.e. you can access the engine by using the ScaleDB API, but for the MySQL and MariaDB users we have created a storage engine layer, so ScaleDB is fully accessible from MariaDB. The version available with 15.10 is MariaDB 10.0.14, soon we will release a version that works with the latest MariaDB 10.1. For Ubuntu you can use scaledb-15.10.1-mariadb-10.0.14-glibc2.14.tgz, for CentOS scaledb-15.10.1-mariadb-10.0.14.tgz.

You can donwload the tarballs on your own machine and then copy them to the testing machine or you can download them directly into the testing machine with a wget command.

Assuming that the tarballs are in the root directory of the scaledb user, you can now uncompress and copy the files on /usr/local with these commands:

sudo tar xzvf ~/scaledb-15.10.1-13199-ude.tgz -C /


sudo tar xzvf ~/scaledb-15.10.1-mariadb-10.0.14.tgz -C /

That’s it! ScaleDB is ready to be used.

In order to make things easier for anybody who wants to test ScaleDB ONE, we have assumed that the software will be installed in /usr/local and we have a predefined configuration. More specifically:

For MariaDB:

  • The base directory is /usr/local/mysql
  • The data directory is /usr/local/mysql/data
  • The configuration file is /usr/local/mysql/my.cnf
  • The admin user is root (no password)

For the ScaleDB engine:

  • The base directory is /usr/local/scaledb
  • The data directory is /usr/local/scaledb/data
  • There are three configuration files:
    • Storage Engine: /usr/local/mysql/scaledb.cnf
    • Storage Node: /usr/local/scaledb/cas.cnf
    • Lock Manager: /usr/local/scaledb/slm.cnf

At this point you can launch ScaleDB ONE with this script:

/usr/local/scaledb/scripts/scaledb_one start

If you add /usr/local/scaledb/scripts to your PATH you would see something like:

scaledb@ONE:~$ scaledb_one start
 Starting ScaleDB CAS server...
 ScaleDB CAS Server started.
 Starting ScaleDB SLM server...
 ScaleDB SLM server started.
 Starting MariaDB Server...
 MariaDB Server started.
 ScaleDB ONE started.

The script starts both MariaDB server and the ScaleDB engine. The ScaleDB engine is formed by two daemons, the CAS Server (Cache Accelerated Storage Server) and the SLM Server (ScaleDB Lock Manager Server). The same script must be used to stop the environment, by simply using scaledb_one stop.

Testing ScaleDB ONE

I will give you more information on how to properly test ScaleDB very soon. In the meantime, let’s just see if it works as expected.

The MariaDB interface has no difference, the only news is the ScaleDB storage engine:


Now it is time to test the engine. We can start by creating a table. The command to create a streaming table is a bit different from a standard InnoDB table. Here is an example:

MariaDB [(none)]> CREATE TABLE test.test (
 -> create_time timestamp NOT NULL,
 -> account char(8) NOT NULL,
 -> store int(10) UNSIGNED NOT NULL,
 -> amount decimal(8,2) NOT NULL,
 -> KEY create_time (create_time) RANGE_KEY=SYSTEM )
 Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]>

And here is the explanation line by line:

  • Streaming tables are special tables in ScaleDB that are used to store streaming data. They are really fast and work as a constant stream, i.e. you can INSERT the data, you can SELECT to run a query and you can DELETE the oldest data, but you cannot UPDATE any row or DELETE a row with a general condition.
  • The id column is the primary key: streaming tables always require a primary key.
  • The create_time is a timestamp associated to a range key, a special time-based key used in ScaleDB to query by a time interval.
  • The table attributes to add are the engine (ScaleDB) and the table type (STREAMING). At the moment we do not recommend to use any other type of table.

When you have received an OK, you have created your first ScaleDB table – congratulations!

And finally a simple INSERT and SELECT:

MariaDB [(none)]> INSERT INTO test.test VALUES (NULL, NULL, 'A', 1, 100);
Query OK, 1 row affected (1.54 sec)

MariaDB [(none)]> SELECT * FROM test.test;
 | id  | create_time         | account | store | amount |
 | 256 | 2015-11-16 06:30:24 | A       |     1 | 100.00 |
 1 row in set (0.00 sec)

MariaDB [(none)]>

One warning, if you run a SELECT query immediately after the INSERT and you do not see the that you have just inserted, it is a “normal behaviour”. By default, ScaleDB has a time window of 30 seconds that is used to flush a large number of rows in one go. This behaviour can be changed to a more usual OLTP behaviour, but the price is a lower load rate. In ScaleDB Cluster, the rows are safely stored on two servers and they are not lost in case of fault.

Welcome ScaleDB 15.10!

Time really flies. A bit less than 4 months ago, I wrote a post about my decision to join ScaleDB. Today, after 4 months and a lot of excitement working with a great team and genuinely good people, I am proud to announce that the first version of ScaleDB is available to the public.

ScaleDB 15.10 Ararat

We decided to number this version 15.10 and to name it Ararat. Indeed, we intend to follow the release cycle of other famous software projects, such as Ubuntu, OpenStack and recently CentOS. Our logo is a peak, we are all about scaling, as in our name and as the main objective of our products. To scale comes from the Latin word scandere, i.e. ‘to climb’. Mount Ararat is one of the most beautiful peaks in the whole world, yet hard to climb and full of significance and mystery for many. It looked natural for us to start our journey naming the product after this mountain.

ScaleDB 15.10 is the first public version of our product. So far, we’ve been using a private beta and we have been working with users, developers and DBAs to make the product available to the public for a more general use.

We have customers and community users who use ScaleDB in production- in the last year we have worked hard to fix all the S1 bugs known to us, but as with any software, we cannot guarantee that the quality of the product will be top notch right from its first public version, therefore we strongly recommend you thoroughly test ScaleDB 15.10 before deploying it in a production environment.

The software is available for download from our website as a tarball, and we are going to provide Red Hat and Ubuntu packages very soon. You can click here, fill in a quick form and receive information on how to download and use ScaleDB within minutes. We will setup an account for you, in which you will also find updates, patches and new releases.

Streaming data, time series and realtime analytics

The main objective of ScaleDB 15.10 is to provide a Big Data solution with MySQL, currently in the form of streaming data and realtime analytics (see some extra info here). From the perspective of a MySQL user, ScaleDB is a standard storage engine that can be plugged into MariaDB. Behind the scenes, we make extensive use of special handlers in MariaDB 10.1 that extend the condition pushdown to the storage engine (although in the very first version of 15.10 we still recommend MariaDB 10.0) and to a cluster of machines. We also call the ScaleDB Cluster IDC, Intelligent Data Cluster.

ScaleDB can handle millions of inserts per second loaded by parallel workers, whilst hundreds of concurrent users can analyse the very same data in realtime. We ran some basic tests and one of them is published here: it can give you an idea of the type of analysis and scalability you may expect from ScaleDB.

We use a different and innovative approach to storing, indexing and accessing the data. The best fit for ScaleDB is time series data, which probably represent a significant part of the data currently qualified as Big Data. That said, ScaleDB can also be used to store and analyse not only time series, but any kind of data, although we are not currently focused on rich data such as documents and multimedia.

ScaleDB ONE and ScaleDB Cluster

ScaleDB 15.10 comes in two flavours, ScaleDB ONE and ScaleDB Cluster.

ScaleDB ONE stands for One Node Edition. It is a single node version of the product. DBAs can install and use the product on a single node. Performance is great for many use cases, and ScaleDB ONE can already sustain hundreds of thousands of inserts per second and real time analysis on a single node. ScaleDB ONE is completely free, and support can be purchased on request.

ScaleDB Cluster is the fully loaded version of ScaleDB that can scale up to many Terabytes of data and hundreds of concurrent users. ScaleDB Cluster is available as a commercial license with technical support that can be purchased from ScaleDB Inc.

What’s next?

Well, this is just the start. We will talk more about ScaleDB in future posts, from its internal structures, to advanced indexing, scalability, roadmap and much more! As they often tell me as a frequent flyer: sit back, relax, and enjoy the journey with ScaleDB.

Percona Live Europe is now over, MySQL is not

Percona Live Europe is now more than a week away. l left Amsterdam with a positive thought: it has been the best European event for MySQL so far. Maybe the reason is that I saw the attendance increasing, or maybe it was the quality of the talks, or because I heard others making the same comment, and I also saw a reinvigorated MySQL ecosystem.
There are three main aspects I want to highlight.

1. MySQL 5.7 and the strong presence of the Oracle/MySQL team

There have been good talks and keynotes on MySQL 5.7. It is a sign of the strong commitment of Oracle towards MySQL. I think there is an even more important point. The most interesting features in 5.7 and the projects still in MySQL Labs derive or are in some way inspired by features available from other vendors. Some examples:

  • The JSON datatype from MySQL and MariaDB – two fairly different approaches, but definitely an interesting addition
  • Improvements in the optimizer from MySQL and MariaDB. There is a pretty long list of differences, this slide deck can help understand them a bit better…
  • Improvement for semi-sync replication from MySQL and WebScaleSQL
  • Automatic failover with replication from MySQL and MHA
  • Multi-source replication from MySQL and MariaDB 10
  • Group replication in MySQL and MariaDB 10 – Here things differ quite a lot, but the concept is similar.
  • MySQL router in MySQL and MaxScale – Again, a different approach but similar concepts to achieve the same results

My intent here is not to compare the features-I am simply pointing out that the competition among projects in the MySQL ecosystem is at least inspirational and can offer great advantages to the end user. Of course the other side of the coin is the creation of almost identical features, and the addition of more confusion and incompatibilities among the distributions.

2. The Pluggable Storage Engine Architecture is alive and kicking

Oracle’s commitment to improving InnoDB has been great so far, and hopefully InnoDB will get even better in the future. That said, the Pluggable Storage Engine Architecture was a unique feature for a long time. There have been two recent additions to the list of storage engines that have been around for long time. Today TokuDB, Infobright, InfiniDB, and ScaleDB share the advantage of being pluggable to MySQL with Deep and RocksDB. RocksDB is also pluggable to MongoDB, and even more important, it has been designed with a specific use case in mind.

3. Great support from the users

The three aspects have similar weight in measuring the health of MySQL, but this is my favourite, because it demonstrates how important MySQL is for some of the most innovative companies on the planet. Despite Kristian Koehntopp’s great keynote, showing us how boring the technology is at, nobody really thought it was true. Using a stable and mature product like MySQL is not boring, it is wise. But this was not the only presentation that we enjoyed from the end users. Many showed a great use of MySQL, especially compared to the levels of scalability and performance that NoSQL databases ( these two combined aspects being the number 1 reason for using a NoSQL DB) struggle to produce with certain workloads.
I am looking forward to seeing the next episode, at Percona Live 2016 in Santa Clara.

This time it is real…

A few months ago I updated my profile on LinkedIN, and adjusted my position as CTO and founder of Athoa Ltd, a British company currently active for translation services and events that in the past hosted a couple of interesting open source projects. I simply forgot to disable the email notification to my connections, set by default, and in 2-3 hours I received tens of messages from friends and ex-colleagues who were curious to hear about my new adventure.

Today, I changed my profile on LinkedIN again and have left the email notification set on purpose.

As of today, I join the team at ScaleDB. My role is to define the product and the strategy for the company, working closely with CEO Tom Arthur, CTO Moshe Shadmon, CMO Mike Hogan and the rest of the team.

Leaving Canonical

The last nine months at Canonical have been an outstanding and crazily intense journey. I learned as I never learned before about systems and network infrastructures, and I met an amazing team of core engineers. It has been a unique experience, one of those that only come along once in a lifetime – I really mean it – and I will never forget it.

The decision to leave Canonical came after a lot of thinking and many sleepless nights. I met so many great people that in many ways, are making history in IT. In my team under Dan Poler, I worked with experienced Cloud and Solutions Architects that can analyze problems, discuss architectures and suggest solutions from the high level view, down to the kernel of the operating system and even to the silicon of systems and devices. Chris Kenyon and John Zannos teams are called “Sales”, but they are really advisors for a growing ecosystem of providers and adopters of Ubuntu and OpenStack technologies.

I have been inspired by the dedication and leadership of Canonical CEO Jane Silber. Jane has the difficult job of leading a company that is moving at lightspeed in many different directions, so that the technology that powers clouds, networks, end users and small devices can share the same kernel and will eventually converge. Jane is in my opinion the leading force, making Canonical blossom like a plum tree in mid-winter, when the rest of the of the nature still sleeps under the snow.

My greatest experience at Canonical has been working with Mark Shuttleworth. Mark is an inspiration not only for the people of Canonical or for the users of Ubuntu, but for us all. Mark’s energy and passion are second only to his great vision for the future. I recommend everybody to follow Mark’s blog and watch or attend his talks. His attention to detail and search for perfection never shadows the core message and understanding of the big picture; for this reason, both experienced listeners and newbies will have takeaways from his talks.

Back in June last year, I decided to join Canonical because of Mark’s vision. His ideas were in sync with what I wanted to bring at SkySQL/MariaDB. At Canonical, I could see this vision materialize in the direction the products were going, only on larger scale. This experience has reinforced in me the belief that we have an amazing opportunity right in front of us. The world is changing dramatically and at a speed that is incomparable with the past, even when compared with the first 10 years of the new millennium. We must think out of the box and reconsider the models that companies have used so far to sustain their business, since some of them are already anachronistic and create artificial barriers that will eventually collapse.

This experience at Canonical will stay with me forever and I hope to make a good use of what I have learned so far and all that I will learn in the future from Mark.

Joining ScaleDB

The last Percona Live was a great event. It was great to see so many friends and ex-colleagues again, now working on different companies but gathering together once a year as in a school reunion. Percona has now become a mature company, but more importantly, it has reached its maturity growing organically. The results are outstanding and the new course to be a global player in the world of databases looks even more promising.

The list of people and companies I would like to mention is simply too long and it would be a subject for a post per se. I found the MySQL world more active than ever. In this Percona Live I found the perfect balance between solid and mature technologies that are constantly improving, and new and disruptive technologies that are coming out under the same MySQL roof.

I simply feel as I am part of this world, and it is part of me. I worked with databases in many different roles for all my life, first with Digital/Oracle RDB and Digital/HP Datatrieve, then with IBM/Informix, Oracle, Sybase and SQLServer, and last with MySQL. I am looking at this world with the eyes of someone who has been enriched by new experiences. I simply think I have more to offer to this market than to networks and systems infrastructures. I therefore decided to come back. I also feel I can offer more in designing and defining products than in running services.

ScaleDB seems to me the company where I can express myself and I can help more at this point of my working life. With my previous role as advisor for the company, working on products and strategies just feels natural to me. The position is also compatible with my intention to improve and extend my involvement in the MySQL ecosystem, not only as MariaDB Ambassador, but also and equally advocating for Oracle and Percona products.

I also believe that MySQL should not be an isolated world from the rest of the database market. I already expressed my interest in Hadoop and other DB technologies in the past, and I believe that there should be more integration and sharing of information and experiences among these products.

I’ve known and have been working with Moshe Shadmon, ScaleDB CTO, for many years. Back in 2007, we spent time together discussing the use, advantages and disadvantages of distributed databases. At the time, we were talking about the differences between Oracle RAC, MySQL/NDB and DB/2, their strong and weak points, what needed to be improved. That was the time when ScaleDB as a technology started taking the shape that it has today.

ScaleDB is an amazing technology. It is currently usable as a storage engine with MariaDB 10.0, it has been developed with the idea of a cluster database from the ground up. As for MySQL in 2005, when the goal was to provide performance, scalability and ease of use in a single product, ScaleDB today provides more performance and greater scalability, without compromising availability and the use of standard SQL/MySQL. The engineering team at ScaleDB has recently worked on an amazing extension of their technology to sustain fast inserts and real-time queries on commodity hardware, at a fraction of the cost of NoSQL alternatives. This addition makes ScaleDB the perfect solution for storing and retrieving time series data, which is the essence for stream analytics and Internet of Things.

I believe ScaleDB has the incredible potential to become a significant player in the DB world, not only in MySQL. I feel excited and honored to be given the opportunity to work on this new adventure. I will try my hardest to serve the MySQL ecosystem in the best possible way, contributing to its success and improving the collaboration of companies – providers, customers, developers and end users – in MySQL and in the world of databases.

Now hop onto the new ride, the future is already here…

2015: More innovation, but still a year of transition

First things first: I could use this title for every year, it is an evergreen. In order for this title to make sense, there must be a specific context and in this case the context is Big Data. We have seen new ideas and many announcements in 2014, and in 2015 those ideas will shape up and early versions of innovative products will start flourishing.

Like many other people, I prepared some comments and opinions to post back in early January then, soon after the season’s break, I started flying around the world and the daily routine kept me away from the blog for some time. So, as a good last blogger, it may be time for me to post my own predictions, for the joy of my usual 25 readers.

Small Data, Big Data, Any Data

The term Big Data is often misused. Many different architectures, objectives, projects and issues deviate from its initial meaning. Everything today seems to be “Big Data” – whether you collect structured or unstructured information, documents, text and patterns, there is so much hype that every company and marketing department wants to associate its offering with Big Data.

Big Data is becoming the way to say that your organisation is dealing with a vast amount of information and it is becoming a synonym for Database. Marketing aside, there are reasons behind this misuse. We are literally inundated by a vast amount of data of all sorts, and we have been told that all this data has some value. Fact is, more and more organisations want to use this data, and in some way they are pushing for the commoditisation of Big Data solutions.

There are valid reasons behind the commoditisation of Big Data. The first one is that data is data and, big or small, it should be simple and easy to manage and use. If this is not the case, then it is an issue that database providers should solve, and an opportunity to inspire entrepreneurs to provide new products. Managers, users and administrators demand this commoditisation. They do not want to treat Big Data differently from any other data: they do not want a batch-only mode, a Lambda junction or another complex architecture. Many organisations need real time analysis, small queries and transactions for the data they collect or generate.

Developers and devops have their say on Big Data too. They need more ways to access Hadoop and Lambda architectures. They long for the simplicity of the good old days of the LAMP stack or for today’s agility of Node.js and MongoDB. They want to code faster, release often, run and fix bugs in minutes (not weeks or months), also on Big Data.

In my humble opinion, the key point for Big Data in 2015 is the convergence towards Hadoop. Everything will be in some way related to Hadoop, whether it is a distributed file system, a map/reduce approach or other related technologies. In some way, established Big Data vendors will create more interfaces. Other SQLs and NoSQLs will reach the Hadoop haven, by integrating their existing products, creating more connectors, or providing hybrid architectures.

The two big issues to tackle are on the administration side and the user side. For administrators, Big Data architectures must be simple to provision, configure and deploy, and eventually modify. For users, Big Data solutions must be simple to use for their analysis or online applications. In both cases, the issue is currently Big Data = Big Complexity.

Some predictions

A convergence towards Hadoop is inevitable. Even the most traditional companies active in the DB world, like Oracle and Microsoft, are taking large steps in this direction. Here we are not talking about integration through adapters or loaders, we are referring to a deeper convergence where Hadoop will be (in some way) part of the commercial products.

There will be more interfaces that allow developers to reuse their skills or existing code to work with Hadoop. This aspect will be interesting for ad-hoc applications, but even more important for BI and Business Analytics vendors, who will integrate their tools with Hadoop with “minimal” effort. An evolution in this area will have the same impact that tools like

Business Objects, Cognos and MicroStrategy had for data warehousing in the ‘90s. Users will have the ability to consume data in a DIY fashion, saving money and ultimately bringing commoditisation to Big Data.

But we need more innovation to make Big Data a real commodity. We need more Hadoop as a service, something that is starting only this year. We need cloud-friendly, or “cloudified” architectures. The natural distribution of the Lambda architecture fits well with the Cloud model, but now the issue is to optimise performance and avoid unnecessary resource consumption in cloud-based Big Data infrastructures.

Orchestration is the magic word for Big Data in 2015 and certainly for one or two more years to come. Too many moving parts create complex architectures that are difficult to manage. Orchestration tools will play the most important role into the commoditisation of Big Data infrastructures. Projects will be delivered faster and in a more agile way in order to cut the costs and make the technology suitable for more uses.

The missing players

In this scenario, we sadly miss PostgreSQL, MySQL and some others.

PostgreSQL has still a large number of enthusiasts and great developers who provide improvements, but big investments are missing. EnterpriseDB monetises migrations of costly Oracle-based applications to PostgreSQL. This is, in my opinion, a pretty correct and pragmatic approach, from a tactical business perspective. The support business around Postgres will go on for many years, but we should not expect any innovations in this area. We can see the use of Postgres technology in Greenplum and in Pivotal HAWQ, but that product would fall more into the bucket of the Hadoop adapters than into a standard PostgreSQL engine.

MySQL is another player that is missing the boat. The great improvements made in MySQL 5.7, in WebScaleSQL and in MariaDB all move in one direction: the MySQL install base. It looks like the world stopped in 2006 and no more technologies have emerged since then. Fact is, almost all the developers have adopted Hadoop and NoSQL technologies for their new projects, leaving [as it happens for Postgres] the MySQL ecosystem still in business for the support of existing installations.

Finally, the traditional NoSQL players are catching up. The fact that they do not have a large install base allows these players to change directions faster and sometimes drastically. Datastax leads the pack, adding Hadoop to its Enterprise solution based on Cassandra. MongoDB benefits from large investments that give this database more bandwidth in the long term. The first step for MongoDB has been the introduction of a new pluggable storage architecture. Now we need to wait for the next step towards an Hadoop pluggable engine. Couchbase and Basho/Riak still maintain their position as servers that can be integrated with Hadoop, but Hadoop is not a component of their enterprise products.

Obviously, I may be completely wrong with my predictions and in 12 months’ time we might see Hadoop more concentrated on real Big Data and none of the missing players joining the bandwagon. Let’s just wait and see.

In the meantime, there is more to come in this area. The future of Big Data is very much connected to the Internet of Things, which will bring even more complexity, along with the need for real time analytics combined with batch data analysis. On top of everything, the orchestration of a large number of components is an essential piece of technology for Big Data and IoT. Without the right orchestration, Devops will spend 80% of their time on operations and 20% on development, but it should be the other way around.
More to come in Jan 2016.

VirtualBox extensions for MAAS

During the last season’s holidays, I spent some time cleaning up demos and code that I use for my daily activities at Canonical. It’s nothing really sophisticated, but for me, and I suspect for some others too, a small set of scripts makes a big difference.

In my daily job, I like to show live demos and I need to install a large set of machines, scale workloads, monitor and administer servers and data centres. Many people I meet don’t want to know only the theoretical details, they want to see the software in action, but as you can imagine, the process of spinning up 8 or 10 machines and install and run a full version of OpenStack in 10-15 minutes, while you also explain how the tools work and perhaps you even try to give suggestions on how to implement a specific solution, is not something you can handle easily without help. Yet, that is what CTOs and Chief Architects want to know in order to decide whether a technology is good or not for them.

At Canonical, workloads are orchestrated, provisioned, monitored and administered using MAAS, Juju and Landscape, around Ubuntu Cloud, which is the Canonical OpenStack offering. These are the products that can do the magic of what I described, but providing in minutes something that usually takes days to install, set up and run.

In addition to this long preface, I am an enthusiastic Mac user. I do love and prefer Ubuntu software and I am not entirely happy with many technical decisions around OS X, but I also found Mac laptops to be a fantastic hardware that simply fits my needs. Unfortunately, the KVM porting to OS X is not available yet, hence the easiest and most stable way to spin up Linux VMs in OS X is to use VMWare Fusion, Parallels or VirtualBox. Coming from Sun/Oracle and willing to use open source software as much as I can, VirtualBox is my favourite and natural choice.

Now, if you mix all the technologies mentioned above, you end up with a specific need: the integration of VirtualBox hosts, specifically running on OS X (but not only), with Ubuntu Server running MAAS. The current version of MAAS (1.5 GA in the Ubuntu archives and 1.7 RC in the maintainers branch), supports virsh for power management (i.e. you can use MAAS to power up, power check and power down your physical and virtual machines), but the VirtualBox integration with virsh is limited to socket communication, i.e. you cannot connect to a remote VirtualBox host, or in other words MAAS and VirtualBox must run in the same OS environment.

Connections to local and remote VirtualBox hosts
Connections to local and remote VirtualBox hosts


My first instinct was to solve the core issue, i.e. add support to remote VirtualBox hosts, but I simply did not have enough bandwidth to embark on such an adventure, and becoming accustomed to the virsh libraries would have taken a significant amount of time. So I opted for a smaller, quicker and dirtier approach: to emulate the most simple power management features in MAAS using scripts that would interact with VirtualBox.

MAAS – Metal As A Service, the open source product available from Canonical to provision bare metal and VMs in data centres, relies on the use of templates for power management. The templates cover all the hardware certified by Canonical and the majority of the hardware and virtualised solutions available today, but unfortunately they do not specifically cover VirtualBox. For my workaround, I modified the most basic power template provided for the Wake-On-LAN option. The template simply manages the power up of a VM, and leaves the power check and power down to other software components.

The scripts I have prepared are available on my GitHub account, and are licensed under GPL v2, so you are absolutely free to download it, study it, use it and, even more important, provide suggestions and ideas to improve them.

The README file in GitHub is quite extensive, so I am not going to replicate here what has been written already, but I am going to give a wider architectural overview, so you may better consider whether it makes sense to use the patches or not.

MAAS, VirtualBox and OS X

The testing scenario that I have prepared and used includes OS X (I am still on Mavericks as some of the software I need does not work well on Yosemite), VirtualBox and MAAS. What I need for my tests and demos is shown in the picture below. I can use one or more machines connected together, so I can distribute workloads on multiple physical machines. The use of a single machine makes things simpler, but of course it puts a big limitation to the scalability of the tests and demos.

A simplified testbed with MAAS set as VM that can control other VMs, all within a single OS X VirtualBox Host machine
A simplified testbed with MAAS set as VM that can control other VMs, all within a single OS X VirtualBox Host machine

The basic testbed I need to use is formed by a set of VMs prepared to be controlled by MAAS. The VMs are visible in this screenshot of the VirtualBox console.

VirtualBox Console
VirtualBox Console

Two aspects are extremely important here. First, the VMs must be connected using a network that allows direct communication between the MAAS node and the VMs. This can be achieved locally by using a host-only adapter where MAAS provides DNS and DHCP services and each VM has the Allow All option set in the communication mode combo.

VirtualBox Network

Secondly, VMs must have PXE boot set on. In VirtualBox, this is achievable by selecting the network boot option as the first option available in the system tab.

VirtualBox Boot Options


In this way, the VMs can start the very first time and can PXE Boot using a cloud image provided by MAAS. Once MAAS has the VM enlisted as a node, administrators can edit the node via the WEB UI, the CLI app or the RESTful API. Apart from changing the name, what is really important is the setting of the Power mode and the physical zone. The power mode must be set as Wake-On-LAN and the MAC address is the last part of the VM id in VirtualBox (with colons). The Physical zone must be associated to the VirtualBox Host machine.

MAAS Edit NodeMAAS Edit Node

In the picture above the Physical zone is set as izMBP13. The description of the Physical zone must contain the UID and the hostname or IP address of the host machine.

Physical Zone

Once the node has been set properly, it can be commissioned by simply clicking the Commission node button in the Node page. If the VM starts and loads the cloud image, then MAAS has been set correctly.

The MAAS instance interacts with the VirtualBox host via SSH and with responds to PXE Boot requests from the VMs
The MAAS instance interacts with the VirtualBox host via SSH and with responds to PXE Boot requests from the VMs

A quick look at the workflow

The workflow used to connect MAAS and VM is relatively simple. It is based on the components listed below.

A. MAAS control

Although I have already prepared scripts to Power Check and Power Off the VM, at the moment MAAS can only control the Power On. Power On is executed by many actions, such as Commission node or the explicit Start node in MAAS. You can always check the result of this action by checking the event log in the Node page.

09 MAAS Node

B. Power template

The Power On action is handled through a template, which in the case of Wake-On-LAN and of the patched version for VirtualBox is a shell script.

The small fragment of code used by the template is listed here and it is part of the file /etc/maas/templates/power/ether_wake.template:

if [ "${power_change}" != 'on' ]
elif [ -x ${home_dir}/VBox_extensions/power_on ]
    ${home_dir}/VBox_extensions/power_on \

C. MAAS script

The script ${home_dir}/VBox_extensions/power_on is called by the template. This is the fragment of code used to modify the MAC address and to execute a script on the VirtualBox Host machine:


# Check if there is the @ sign, typical of ssh
# user@address
if [[ ${vbox_host_credentials} == *"@"* ]]
  # Create the command string
  command_to_execute="ssh \
    ${vbox_host_credentials} \
    '~/VBox_host_extensions/startvm \
  # Execute the command string
  eval "${command_to_execute}"

D. VirtualBox host script

The script in ~/VBox_host_extensions/startvm is called by the MAAS script and executes the stratvm command locally:

start_this_vm=`vboxmanage list vms \
| grep "${1}" \
| sort \
| head -1`
VBoxManage startvm ${start_this_vm} \
           --type headless

The final result will be a set of VMs that are then ready to be used for example by Juju to deploy Ubuntu OpenStack, as you can see in the image below.

MAAS Nodes (Ready)


Next Steps

I am not sure when I will have time to review the scripts, but they certainly have a lot of space for improvement. First of all, by adopting a richer power management option, MAAS will not only power on the VMs, but also power off and check their status. Another improvement regards the physical zones: right now, the scripts loop through all the available VirtualBox hosts. Finally, it would be ideal to use the standard virsh library to interact with VirtualBox. I can’t promise when, but I am going to look into it at some point this new year.

It does not matter if Aurora performs 1x or 10x MySQL: it _is_ a big thing

I spent the last 4 years at SkySQL/MariaDB working on versions of MySQL that could be “suitable for the cloud”. I strongly believed that the world needed a version of MySQL that could work in the cloud even better than its comparable version on bare metal. Users and administrators wanted to benefit from the use of cloud infrastructures and at the same time they wanted to achieve the same performance and overall stability of their installations on bare metal. Unfortunately, ACID-compliant databases in the cloud suffer from the issues that any centrally controlled and strictly persistent system can get when hosted on highly distributed and natively stateless infrastructures.

In this post I am not going to talk about the improvements needed for MySQL in the cloud – I will tackle this topic in a future post. Today I’d like to focus on the business side of RDS and Aurora.

In the last 4 years I had endless discussions over the use of Galera running in AWS on standard EC2 instances. I tried to explain many times that having Galera in such environment was almost pointless, since administrators did not have real control of the infrastructure. The reasons have nothing to do with the quality and the features of Galera, rather with the use of a technology placed in the wrong layer of the *aaS stack. Last but not least, I tried many times to guide the IT managers through the jungle of hidden costs of an installation of Galera (and other clustering technologies) in EC2, working through VPCs, guaranteed IOPs, dedicated instances and AZs etc.

I had interesting meetings with customers and prospects to help them in the analysis of the ROI of a migration and the TCO of an IT service in a public cloud. One example in particular, a media company in North America, was extremely interesting. The head of IT decided that a web service had to be migrated to AWS. The service had predicable web access peaks, mainly during public events – a perfect fit for AWS. When an event is approaching, Sysadms can launch more instances, then they can close them when the event ends. Unfortunately, the same approach cannot be applied to database servers, as their systems require to keep data available at all times. Each new event requires more block storage with higher IOPs and the size and flavour of the DB instances becomes so high spec that the overall cost of running everything in EC2 is higher than the original installation.

Aurora from an end customer perspective

Why is Aurora a big thing? Here are some points to consider:

1. No hidden costs in public clouds

The examples of Galera and the DB servers in AWS that I mentioned, are only two of the surprises that IT managers will find in their bills. There is a very good reason why public clouds have (or should have) a DBaaS offering: databases should be part of IaaS. They must make the most out of the knowledge of the bare metal layer, in terms of physical location, computing and storage performance, redundancy and reliability etc. Cloud users must use the database confidently, leaving typical administration tasks such as data backups and replication to automated systems that are part of the infrastructure. Furthermore, end customers want to work with databases that do not suffer resource contention in terms of processing, storage and network – or at least not in a way that is perceivable from an application standpoint. As we select EBS disks with requested IOPs, we must be able to use a database server with requested QPSs – whatever we define as “Query”. The same should happen for private clouds, since technologies, benefits and disadvantages are substantially the same. In AWS, RDS has already these features, but Aurora simply promises a better experience with more performance and reliability. Sadly, not many alternatives are available from other cloud providers.

2. Reduce the churn rate

A consequence of the real or expected hidden costs is a relatively high churn rate that affects many IT projects in AWS. DevOps start with AWS because it is simple and available immediately, but as the the business grows, so does the bill, and sometimes the growth is not proportional. Amazon needed to remove the increase in costs for their database as one of the reasons to leave or reduce the use of a public cloud, and Aurora is a significant step in this direction. I expect end customers to be more keen to keep their applications on AWS in the long run.

A strong message to the MySQL Ecosystem

There are lots of presentations and analysis around MySQL and the MySQL flavours, yet none of these analysis looks at the generated revenues from the right perspective. Between 2005 and 2010, MySQL was a hot technology that many considered as a serious alternative to closed source relational databases. In 2014, with an amazing combination of factors:

  • A vast number of options available as open source technologies in the database market
  • A substantial change in the IT infrastructure, focused on virtualisation and cloud operations
  • A substantial change in the development of applications and in the types of applications, now dominated by a DevOps approach
  • A fracture in the MySQL ecosystem, caused by forks and branches that generated competition but also confusion in the market
  • An increasing demand for databases focused on rich media and big data
  • A relatively stable and consolidated MySQL
  • A good level of knowledge and skills available in the market
    (…and the list goes on…)

All these factors have not only limited the growth in revenues in the MySQL ecosystem, but have basically shrunk them – if you do not consider the revenues coming from DBaaS. Here is a pure speculation: Oracle gets a good chunk of their revenues for MySQL from OEMs (i.e. commercial licenses) and from existing not-only-MySQL Oracle customers. Although Percona works hard in producing a more differentiated software product (and kudos for the work that the Percona software team does in terms of tooling and integration), the company adopted a healthy, but clearly services-focused business model. The MariaDB approach is more similar to Oracle, but without commercial licenses and without a multi-billion$ customers base. Yet, when you review the now 18 months’ old keynote from 451 research at Percona Live, you realise that the focus on “Who uses MySQL?” is pretty irrelevant: MySQL is ubiquitous and will be the most used open source database in the upcoming years. The question we should ask is rather, “Who pays for MySQL?”, or even better, “Why should one pay for MySQL?”: a reasonable fee paid for MySQL is the lymph that companies in the MySQL ecosystem need to survive and innovate.

Right now, in the MySQL ecosystem, Amazon is the real winner. Unfortunately, there are no public figures that can prove my point, not from the MySQL vendors, nor from Amazon. DBaaS is at the moment the only way to make users pay for a standard, fully compatible MySQL. In topping up X% of standard EC2 instances, Amazon provides a risk free, ready to use and near-zero administration database – and this is a big thing for DevOps, startup, application-focused teams who need to keep their resource costs down and will never hire super experts in MySQL.

With Aurora, Amazon has significantly raised the bar in the DBaaS competition, where there are no real competitors to Amazon at the moment. Dimitry Kravtchuck wrote in his blog that “MySQL 5.7 is already doing better” than Aurora. I have no doubts that a pure 5.7 on bare metal delivers great performance, but I think there are some aspects to consider. First of all, Aurora target customers do not want to deal with my.cnf parameters – even if we found out that Aurora is in reality nothing more than a smart configurator for MySQL, which can magically adapt mysqld instances on a given workload, it would still be good enough for the end customers. Second and most important point, Aurora is [supposed to be] the combination of standard MySQL (i.e. not an esoteric and innodb-incompatible storage engine) that delivers good performance in AWS – if Amazon found out that they can provide the same cloud features using a new and stable version of MySQL 5.7, I have no doubts they would replace Aurora with a better version of MySQL, probably keeping the same name and ignoring the version number, and even more importantly enjoying the revenues that the new improved version would generate.

The ball is in our court

With Aurora, Amazon is well ahead of any other vendor – public cloud and MySQL-based technology vendors – in providing MySQL as DBaaS. Rest assured that Google, Azure, Rackspace, HPCloud, Joyent and others are not sitting there watching, but in the meantime they are behind Aurora. Some interesting projects that can fill the gap are going on at the moment. Tesora is probably the most active company in this area, focusing on OpenStack as the platform for public and private clouds. Continuent provides a technology that could be adapted for this use too and the recent acquisition from VMware may give a push to some projects in this area. Considering the absence of the traditional MySQL players in this area, on one side this is an opportunity for new players to invest in products that are very likely to generate revenues in the near future. On the other hand, it is a concern that the lack of options for DBaaS will convince more end customers to adopt NoSQL technologies, which are more suitable to work in distributed, cloud infrastructures.

Why we should care about wearable Linux

These thoughts have been inspired by the not-so-recent-anymore announcement of Apple Watch and by Jono Bacon’s post re Ubuntu for Smartwatches. They do not specifically refer to Ubuntu, and they reflect my personal opinion only.

Wearable devices are essentially another aspect of the Internet of Things. IoT is now expanded to include your clothes and tools, not to mention your body, or parts of it. In the next decade, people will experience the next generation of mobile applications, the ones that are now used to share your recent workout or your last marathon. Rest assured: there will be all sorts of new applications, and existing applications will be highly improved by the adoption of affordable wearable hardware.

Right now, the two main choices for wearable applications are IOS and Android platforms, understandably evolved to fit with the new hardware. A third and fourth option are based on Microsoft and Linux platforms, but at the moment there is very little to say about their ability to gather a good slice of the market.

Fact is, a market dominated by two companies who are already controlling 99% of the mobile applications, is not a good thing. Indeed, it is in their rights to take advantage from the huge investments they have made to build IOS and Android, but that does not mean that the world should not have other alternatives. Although it is not a walk in the park, Linux is the only platform that can rise as truly independent and have a chance to compete in this war.

Why wearable Linux is important to consumers

The majority of the applications we use are free. The cost of the platform, whether it is IOS or Android, is paid in other ways. One of them is the control over the applications available to the consumers. Apple and Google can decide what goes in and what stays out of your mobile devices, and the same will happen for your wearables. Any attempt to escape from this can be easily prevented by the owners of the technology. So, in simple terms, consumers have less choices and eventually less freedom.

Another aspect which is pretty concerning nowadays is data collection. This is and has always been the most controversial aspect of mobile applications. Terms and conditions may allow providers to take advantage from the the data collected through your smartphone or from your wearable in the way they want. It could even be “their data”, and for the consumers it would be just another way to pay for the “free application”. In the best case scenario, the data will not be used individually, i.e. it will not be connected to your name, but it will be aggregated together with many other users in order to analyse behaviours, define strategies, sell trends and results. In one way or another, rest assured that the data collected is a highly valuable asset for the technology providers.

Now, take these two aspects and consider them in connection to the availability and adoption of wearable Linux. A truly open platform, perhaps controlled in terms of technical choices but not in the way it will be used, will not prevent in any way the control of an app store or the data collection, but it will give you choice. Companies may be application providers or may even create their application store, more or less open than Google and Apple. Eventually, people will choose, and they will have more options. The choice is likely to be based on the applications available, but again it is up to the manufacturers and developers to offer an appealing product. This is not dissimilar from the choice of a Cloud infrastructure – for example comparing what you can do under the full control of a single provider in AWS or what goes with the adoption of OpenStack. To some extent, wearable Linux is just an important piece of a jigsaw that goes into the definition of an open infrastructure for IOT.

Why wearable Linux is important for developers and manufacturers

Take a look at the most prominent open source projects available today: the majority of them is backed by foundations and by companies working together, and then competing to provide the best solution on the market. This is not the case for Android, which is fully controlled by Google, not to mention IOS. Sure, Google is keeping Android open and is making it available to any developer and manufacturer. In a similar way, Apple takes pretty good care of its developers, but both companies  simply have their own agenda. A wearable stack which is part of a full platform developed within a foundation would have its internal huge politics, but also more contribution and independence. Put simply, there will be more players and more opportunities to do business in the biggest revolution in computers after the semiconductors.

From a more technical point of view, developers would immensely benefit from the use of the full Linux stack. If you look at the way Android and IOS have evolved, you will realise that the concept of lightweight has significantly changed since their early versions. On the other hand, some of the libraries available on Linux are not in Android, and this is a significant limitation for developers. Having a common platform where portability from the mainframe to a wearable, is a big, huge asset, and it is not inconceivable in modern scale-out infrastructures. Interoperability, robustness, rapid development and ease of deployment are not trivial aspects of a Linux wearable stack, something that would increase the amount of applications and competition in the market.

The essential takeaway is that wearable applications are a great opportunity to expand the current market of mobile infrastructures in order to introduce more players and competition, and to be ready for the next revolution of the IOT. Whether or not new and old companies will take this opportunity is still to be seen, but for us consumers it would only be positive in the long term.


From Mavericks to Trusty – Intro

I have been on Mac HW and OS X for 8 years now. I remember the evening in Santa Clara, when Colin Charles showed me his MacBook Pro 15”, still based on the Motorola chipset, and OS X. I fell in love with the sleek design of the MacBook, the backlight that in 2006 was like a “dough!” something so obviously helpful that nobody else had thought about it.

I moved from my Sony Vaio Z1 to a MacBook Pro 15” with Intel chipset in no time. I was so pleased I got rid of Outlook and the other clunky office tools, to use Apple Mail and others. I loved it so much.

Then the iPhone came, and then the iPad, and with it IOS and OS X were in some way converging. I now have an i7 MacBook Air 11 with 8GB RAM and 1/2TB Flash Drive, a dream machine for me: feather-light, real 5-6 hours batteries (the way I use it), all the power I need, with an obvious limitation in the non-retina small screen that I top up with a USB monitor when I really need to. The software improved, but it also changed a lot. OS X Mavericks is no longer the sleek and non-intrusive OS that I saw on my old MacBooks. There are lots of great features, but also some very annoying issues that I really do not like. Perfection is something you must aim at, but you will never reach it: this is the reality for software too.

In my work, I always needed to use Linux, in a way or another. So far, I used VMs and cloud instances, but now this is not enough. I need to move the core OS too, and I found this exciting. I am going to replace OS X with Ubuntu, specifically 10.9 Mavericks with 14.04 LTS Trusty.

I am not going to replace my hardware. I did some research, and the closest non-Apple laptop that I may want to use is the Dell XPS 13 Developer Edition, but still, it is far, for many aspects, from the features that a MacBook can provide. But my decision is not only based on pure technology features. For 8 years, I have been spoiled by Apple. I found a limited but very clear choice of hardware, software and accessories available on the Internet and in the Apple Stores. On top of that, I have highly valued the reliability and the lifetime of a MacBook model.

Here is an example. In my search, I stumbled in a very interesting laptop, the Lenovo X1 Carbon. It was not a laptop of the size I was looking for, but I was intrigued by the performance and by its features. I looked at reviews and videos, then I wanted to check the official site, and here comes the surprise. For a laptop that wants to be at the top of the range and with its innovative design also a landmark, I found outdated web pages, I could not buy it online on official sites (certainly I can browse and find it on eBay), the “where to buy” section of the website pointed me at stores that showed all the laptops in a random way, many sites did not have that model at all. After a while, I simply gave up. The X1 can be purchased in the US from the Lenovo website.
Dell was different. In this case, there is a reliable source, which is the online store. Online, you can select, configure, check the specs and buy a laptop and you know what to expect inside the parcel delivered at doorstep. Dell also gives you a sense of continuity, with the product lines that have evolved a lot, but they still share a positioning and a target market with the previous models.

But as I said, I will stick with Apple hardware. It has been a difficult decision, since I know that Apple discourages in many ways users who buy their hardware then they install non-Apple software. Also, I know I am going to find issues with cards, components and new hardware. For example, my current MacBook Air has a PCI HD camera that does not have any Linux driver. But this, i.e. make my favourite hardware work against all odds, is a challenge that I like to take.

So, watch this space, I am going to update it with more info soon…

Moving On

I have a difficult task of making this post interesting, helpful and personal at the same time. I think the main goal is to balance these aspects, and I really appreciate your comments and suggestions that I will add here.

For the busy readers who may be put off by the length of this post, here is a very short summary: I spent 4 wonderful years, first as the head of Field Services, then as a CTO, I believe it is now time for a change, so I am leaving SkySQL. I am leaving behind a great company and very good friends, but I am not disappearing completely, and I will continue supporting the work I started and the projects I created with the help of such great people.

For many, leaving a company is not easy, and it is extremely difficult if you have contributed to its creation and development since the beginning. Even more difficult is to depart from ideas and projects that you have shaped and designed, together with the people who have contributed to them and that I am sure they will continue to work on these projects with great success.

The reasons for SkySQL

In the past 4 years, I have been asked many times why we had created SkySQL and how SkySQL is different from other providers, such as Percona and Oracle.
Since the beginning, the first and most important objective for SkySQL was to provide the best products and services around MySQL. In order to achieve this objective, we had created a network of partners and we were working closely with them to support our customers in the best possible way. The value added by SkySQL to the offering was a strong team of consultants and architects who could suggest and implement MySQL solutions, and a stellar Technical Support team who could provide the best possible answers to a large variety of technical and consultative issues that customers might encounter.
Having many options to choose from was certainly good, but it introduced another issue: not all the products could work well together. Customers demanded solutions from a single vendor that could go beyond the “typical” MySQL database + backup + monitor: they wanted to have a set of products that was tested and guaranteed to work together. This was the first motivation for the first big effort at SkySQL in terms of products and tools, when we defined the SkySQL Reference Architecture.

The Reference Architecture was the result of many hours spent in meetings and solitary thinking in my home office during the Christmas holidays in 2010, when the business slowed down and I could dedicate more CPU cycles to the subject. We worked on the project for 4 months and we launched the Architecture as a concept at the MySQL/Percona Conference in 2011. We demonstrated the SkySQL Reference Architecture with a tool that users could access online, in order to automatically generate and activate a fully functional cluster of MySQL replicated servers with MONyog, MySQL Replication, a cluster software with resource agent in AWS. Severalnines had a similar approach for MySQL Cluster and later for MySQL Replication and Galera, but at that time only SkySQL had the full automation and a selection of different engines and uses, from the configuration to the MySQL prompt. Later, Percona introduced a web tool that could provide an optimised configuration file.

The evolution of the Reference Architecture was the SkySQL Data Suite (SDS). The concept was similar, but the main difference was that for the first time we added SkySQL Intellectual Property to MySQL. The suite was packaged with an administration tool that was designed and built by SkySQL. The first target was the Cloud, specifically AWS and OpenStack. The initial idea was to have SDS seamlessly deployed on bare OS, on clouds or in hybrid environments. All the tools have been designed with programmable and user interfaces, in order to satisfy different customers’ needs. An independent presentation of SDS is available here.

In 2013, the company merged with Monty Program, and we suddenly found ourselves in a position where software development was a fundamental part of our offering. We moved the focus of the Data Suite to MariaDB and we rebranded it as MariaDB Enterprise, but more importantly, we combined the value and the skills of our services team with the core team of the original development of MySQL. The merge resulted in a company with all the credentials needed to excel and innovate in the MySQL world. But the key question at this point was: is this enough to make MySQL even more successful? Is a better MariaDB (or indeed MySQL) the right answer to the data management needs in 2010s and beyond?

The evolution of MySQL and MariaDB

The answer to the previous questions is not surprisingly a “no”. Indeed, users need a better MySQL (or MariaDB). Traditionally, they demanded more performance, more availability and more scalability, and many players have contributed in their own way to the cause.
Still, there is something missing. The competition from NoSQL solutions is, to say the least, intense. It is probably true that the MySQL adoption is not declining (as some analysts say), but the adoption of NoSQL is way bigger in absolute terms. And more important, the majority of the new initiatives and startups that once were the lymph that flowed in the MySQL Community, have now moved to NoSQL.
From a purely technical (and generic) perspective, when MySQL and NoSQL are tested and measured in a fair way, MySQL can provide in many cases better performance and robustness. Scalability, on the other hand, is a big issue as it has always been – it was an issue for bigger servers in the past, it is an issue for distributed systems now. The search for a better scalability is the primary reason why we have created MaxScale.

You may have read a lot about MaxScale, or you may want to read more here and here. In simple terms, MaxScale is a highly scalable, lightweight proxy system aimed at distributing and scaling parts of a database server that do not need to reside in its core. There is a similarity to this approach in the NoSQL world and certainly in many home made solitions. The mongos / mongod binomial is a good example of what MaxScale can achieve with MySQL, but this is only half of the story. MaxScale is generic in nature, what makes it a relevant component of the IT infrastucture are its plugins. By loading different plugins you can make MaxScale a proxy for multiple client protocols, or a proxy for geographically replicated servers, or to integrate different replication technologies, and so on.

I believe that we need MaxScale for MySQL and MariaDB. Incidentally, Max is the name of Monty’s son, so we have covered all his heirs (at least so far). In designing MaxScale, I wanted to provide a link between a technology that was good for servers available in the 90s and today’s infrastructures.

A difficult choice?

One might ask, if I feel so strong about MaxScale and its fundamental role, why am I leaving it behind? The fact is, I am not. The project is in good hands, thanks to the great work and dedication from Mark Riddoch, Massimiliano Pinto and Vilho Raatikka. The concept, the ideas and the architecture are here to stay. MaxScale is shaped today as we – Mark Riddoch, Massimo Brignoli and I – wanted it, as we have designed it during long hours of work and passionate discussions.
When your kids grow up and are ready to walk alone in the world, you need to let them go. MaxScale can now walk with MySQL and MariaDB, and SkySQL will take a good care of their path together. So now, I may move on and I have time to raise other kids.

A look at the future

As for me, I am technically embracing a wider range of technologies. I will not be focused only to MySQL, but rest assured that these 10 years will always remain in my heart. I will work on IT infrastructures and systems where databases play the central and most important role, but I will look at the customers’ needs as a whole. I will carry on my duty for the MySQL User Group in London, that has now reached the reasonable size of 40-50 attendees per session, every other month. I will not move my MySQL blog, so any MySQL-related post will be available on and it will be aggregated on PlanetMySQL. But I will have a collection various topics in my personal blog I will cover more databases, HPC, OpenStack and OSs. I will also have a section dedicated to an important aspect of my life, which is the study of Kung Fu in its inner and outer styles. I started learning Kung Fu almost 30 years ago, first for 12 years, then I abandoned it for another 12 years, until I realised the importance of this practice in my life. I have to thank some of my best friends for that, they really helped me a lot in good and bad times.

So, even if I will not wear a T-shirt with a seal (or a sea lion, as it is more fashionable these days), you will probably see me around at conferences and exhibitions, or perhaps you will not see me, but I will work, as I have done in these 10 years, behind the scenes to make MySQL the good and strong database that can help in creating the next Facebook or the next Twitter of this world.

All the best to all of you.