Dear MySQLers, the Cloud Is Good, But the Fog Is the Next Big Thing

And here we go again: here is another term. We are still debating about the real meaning of Cloud and if a database fits in IaaS, PaaS, SaaS or in a combination of all three, and now we experience another wave of change.

To be fair, the term Fog Computing has been around for quite some time, but it has never enjoyed the popularity of its buddy “the Cloud”.

Is Fog Computing another pointless renaming of well known technologies and IT infrastructures? Or is it a real new thing that Systems and Database Administrators should look at, study and embrace? My intent here is not to support one side or the other, but simply to instil a few thoughts that may turn handy in understanding where the IT market is going, and more specifically where MySQL can be a good fit in some of the new trends in technology.

IoT: Internet of Things? No! IT + OT

When we talk about IoT in general, everyone agrees that it is already changing the world we are living in. Furthermore, analysts predict trillions of dollars in business for IoT, and clearly all the big high-tech companies want a large slice of the pie. Things become interesting though when we ask analysts, decision makers and engineers what is IoT, or even better, what is the implementation of IoT. The thought goes immediately to our wearables, smart phones, or devices at home: smart fridges and smart kettles are embarrassing examples of something that looks like the new seasonal fashion trend. These devices are certainly a significant part of IoT, they make ordinary people aware of IoT, but they are not what developers and administrators should [only] look at. The multi-trillion$ business predicted by analysts is a mix of smart devices that can connect together cities and rural areas, homes and large buildings, offices and manufacturing plants, mines, farms, trains, ships, cars… but also goods and even animals and human beings. All these connected elements have one thing in common: they generate a massive amount of data. This data must be collected, stored, validated, moved, analyzed… and this is not a trivial job.

Many refer to IoT as Internet of Things, but also at IIoT as Industrial Internet of Things, i.e. to this part of IoT that is related to an industrial process. In industrial processes, we add more complexity to the equation: the environment is sometimes inhospitable, intermittently accessible and unattended by operators and users (or there are literally no users). All this may also be true for non IIoT environments, the difference is that if your Fitbit runs out of power you may be disappointed, but if a sensor on an oil platform or an actuator on a train does not have power, that may be a bigger deal.

To me, IoT is clearly all of the above, with IIoT being a subset of IoT. Personally, I have a particularly different approach to IoT. In almost my entire working life I have been involved in the IT (i.e. Information Technology) side of the business, recently with databases, but previously designing and building CRM and ERP products and solutions. In my mind IoT means IT meets OT (i.e. Operational Technology) and the two technologies cannot be treated separately: they are tightly related and any product in IoT has an IT and an OT aspect to consider. It also means that OT is no longer relegated to the industrial and manufacturing world of PLC and SCADA systems, and is now widely adopted in any environment and at any level, even in what we wear or implant.

The convergence of IT and OT into IoT makes IT physical, something that has been missing in many IT solutions. We, as IT people dealing with data, tend to manage data in an abstract manner. When we consider something physical, we refer to the performance we can squeeze from the hardware where our databases reside. With OT, we need to think broadly ofthe physical world where the data come from or go to and, even more importantly, of the journey of that data in every bit of the IT and OT infrastructure.

It’s a Database! No, It’s a Router! No, It’s Both!

The journey! That is the key point. MySQLers think about data as data stored into a database. When we think about the movement of data, we refer to it in terms of data extraction or data loading. The fact is, in IoT, data has value and it must be treated and considered when stored somewhere (data at rest) and when moving from one place to another (data in motion). Moving data in IoT terms means data streaming. There are a plethora of solutions for streaming, like Kafka, RabbitMQ and many other *MQ products, but their main focus is to store and forward data, not to use it while it is in motion. The problem is, infrastructures are so complicated, with multiple layers and with too many cases where data stops while in motion, that it becomes a priority to analyse and “use” the data even when it is transiting from one component to another.

This is a call to build the next generation database, optimised for IoT, with features that go beyond the ability to store and analyse data. Data streaming and analysis of streamed data must be part of a modern database, as also highlighted by a recent Gartner report. If you are a Database Administrator, you may consider it a database with all the features of a traditional database, but with routing and streaming capabilities. If you are a Network and Systems Administrator you may consider it a router or a streaming system with database capabilities. In a way or another, the database needed for IoT must incorporate the features of a traditional database and the ones of a traditional router. Furthermore, it must take into consideration all the security aspects of data moved and stored multiple times and, even more importantly, it must provide a safe data attestation (but let’s reserve this aspect for another post).

Welcome to Fog Computing

So, here it is: Fog Computing is all of the above. Take three layers:

  • The Edge: where things, animals and human beings live where data is collected, or where results from analysis go.
  • The Cloud: where a massive amount of data is stored and analysed, where systems are reliable and scalable, and are attended by operators and administrators.
  • The Fog: is everything in between. It is, in oriental terms, “where the ground meets the sky”. The Fog is the layer that is still closed to the Edge, but must provide features that are typically associated to the Cloud. It is also the layer where data is collected from a vast amount of things, and is consolidated and sent to the Cloud whenever it is possible.

The term Fog Computing is so vague that for some analysts it refers to everything from the Edge of sensors and devices to regional concentrators, routers and gateways. For other analysts, Fog Computing refers only to the layer above the Edge, ie. to the gateways and routers. Personally, I like to think that the former, i.e. Edge + middle layer, offers a more practical definition of Fog Computing.

In Fog Computing, we bring the capabilities of Cloud computing into a more complex, constrained and often technically inhospitable environment. We must collect and store a large amount of data on constrained devices the size of a wristwatch, where the processing power is mostly used to operate the system and the data management is a secondary aspect. Although the power of an Edge system is increasing exponentially, we no longer have the luxury of a stable, always-on environment. It is a bit like going back 20 years or more, when we started using personal computers to manage data. It is a fascinating challenge, certainly unwelcome by lazy administrators, which brings excitement to experienced developers.

Where Is MySQL in All This?

Here is the catch: Fog Computing desperately needs databases. Products that can handle data at rest and in motion, on constrained devices, with a small footprint, databases that can maximise the use of hardware resources, are reliable and can be installed in many flavours to be almost 100% available when needed. Many NoSQL solutions are good in theory (because of the the way they manage unstructured data), but they are often too resource-hungry to compete in this environment, or they lack features that MySQL has implemented more than a decade ago. Embedded databases are on the other side of the offer, but their features are often limited, making the solutions pretty incomplete.

Sounds familiar? Edge and Fog Computing are the perfect place for MySQL, or at least for solutions based on MySQL, where more features must be added. At the moment there are no real database and data management products for Fog Computing. The current solutions are mostly based on MySQL, but they are built ad hoc and their implementation is non replicable: a situation that slows the growth of this market, making the overall cost of a solution higher than it should be.

The opportunity is huge, but also challenging. The first implementation does not have to be a new fresh product, it can be something achievable, step by step. As for more examples and real, live projects, watch this space!

 

ScaleDB ONE: Let’s Get Started

ScaleDB 15.10 is out. Some users have downloaded it and tested it and we have received pretty positive feedback, but also some requests to have more info and help on how to start. I will try to condense here the basic steps to install and test ScaleDB ONE for the first time.

First of all, some terminology. We have two versions: ScaleDB ONE and ScaleDB Cluster. ScaleDB ONE is meant to be used on a single machine (ONE = One Node Edition), whether it is a VM, a cloud instance or a physical server, whilst ScaleDB Cluster is the full size, multi-node cluster that everybody expects to run for mission critical applications. This means that the typical use cases for ScaleDB ONE are testing and development, data marts and streaming data collection and analysis that are limited by a single server (although you can always replicate your data to another server using the standard MySQL Replication). ScaleDB Cluster instead, is highly available out of the box, with no single point of failure, and scaleable on demand (i.e. you do not need replication to set up availability and a more scalable environment).

From now on, in this post I will refer to ScaleDB as ScaleDB ONE.

Now, some prerequisites. ScaleDB has been tested on CentOS 6.7, CentOS 7.1 and on Ubuntu 14.04.3 Trusty Tahr. 1GB of memory and few GB of free disk space would be enough to test the product but, as for any other databases, the more cores, memory and storage you can add, the better. One interesting aspect is that, if you are planning to store a large amount of data, you will be very pleased with the performance you can get from ScaleDB using magnetic HDDs instead of SSD (but this is a topic for another post).

On CentOS 6.7, you must add nmap and nc, since you will need them later to interact with the ScaleDB daemons. A yum install nmap-ncat should do the trick. If you have done a minimal install of CentOS 7.1, I would also recommend to install nettools.

Another mandatory requirement is the installation of the AIO libraries (with yum install libaio). For Ubuntu 14.04.3, you will be required to install the AIO libraries, with sudo apt-get install libaio1.

The last two steps are not mandatory, but they will make your life easier: create a scaledb user that can be a sudoer and disable the firewall on your testing machine. From now on, I assume that you will log in as scaledb.

Downloading the software

If you have not downloaded ScaleDB ONE yet, you will have to hit few pages on the ScaleDB website, but the process is very straightforward. Just go to scaledb.com, click the Download button, scroll down to the bottom and click another Download button (this time it is grey). The next screen is used to select the type of download. You have two choices:

  • VirtualBox Image, which will allow you to download a OVA (a compressed VirtualBox image file), so everything is self contained in a CentOS 7 image and you do not need to install any software.
  • Tarball, which allows you to decompress and unarchive the ScaleDB product. We do not have Linux packages yet, they will be available soon.

Once you select your favourite download, all you have to do is to fill five fields, then you will receive am email with your personalised link to the downloads. You can keep this link, we will update your environment with new versions. Soon we will also add a yum and a apt repository.

You can use the unique URL to download ScaleDB ONE. You must add /scaledb-15.10/latest-release to the URL or you can simply browse the repository to search the release you need. In the latest-release folder, you will find 3 tarballs:

  1. The ScaleDB ONE UDE: UDE stands for Universal Data Engine. This is the ScaleDB engine. Right now (2015-11-17), the latest UDE tarball is scaledb-15.10.1-13199-ude.tgz.
  2. MariaDB: We use MariaDB as database server. ScaleDB can work on its own, i.e. you can access the engine by using the ScaleDB API, but for the MySQL and MariaDB users we have created a storage engine layer, so ScaleDB is fully accessible from MariaDB. The version available with 15.10 is MariaDB 10.0.14, soon we will release a version that works with the latest MariaDB 10.1. For Ubuntu you can use scaledb-15.10.1-mariadb-10.0.14-glibc2.14.tgz, for CentOS scaledb-15.10.1-mariadb-10.0.14.tgz.

You can donwload the tarballs on your own machine and then copy them to the testing machine or you can download them directly into the testing machine with a wget command.

Assuming that the tarballs are in the root directory of the scaledb user, you can now uncompress and copy the files on /usr/local with these commands:

sudo tar xzvf ~/scaledb-15.10.1-13199-ude.tgz -C /

…and…

sudo tar xzvf ~/scaledb-15.10.1-mariadb-10.0.14.tgz -C /

That’s it! ScaleDB is ready to be used.

In order to make things easier for anybody who wants to test ScaleDB ONE, we have assumed that the software will be installed in /usr/local and we have a predefined configuration. More specifically:

For MariaDB:

  • The base directory is /usr/local/mysql
  • The data directory is /usr/local/mysql/data
  • The configuration file is /usr/local/mysql/my.cnf
  • The admin user is root (no password)

For the ScaleDB engine:

  • The base directory is /usr/local/scaledb
  • The data directory is /usr/local/scaledb/data
  • There are three configuration files:
    • Storage Engine: /usr/local/mysql/scaledb.cnf
    • Storage Node: /usr/local/scaledb/cas.cnf
    • Lock Manager: /usr/local/scaledb/slm.cnf

At this point you can launch ScaleDB ONE with this script:

/usr/local/scaledb/scripts/scaledb_one start

If you add /usr/local/scaledb/scripts to your PATH you would see something like:

scaledb@ONE:~$ scaledb_one start
 Starting ScaleDB CAS server...
 ScaleDB CAS Server started.
 Starting ScaleDB SLM server...
 ScaleDB SLM server started.
 Starting MariaDB Server...
 MariaDB Server started.
 ScaleDB ONE started.
scaledb@ONE:~$

The script starts both MariaDB server and the ScaleDB engine. The ScaleDB engine is formed by two daemons, the CAS Server (Cache Accelerated Storage Server) and the SLM Server (ScaleDB Lock Manager Server). The same script must be used to stop the environment, by simply using scaledb_one stop.

Testing ScaleDB ONE

I will give you more information on how to properly test ScaleDB very soon. In the meantime, let’s just see if it works as expected.

The MariaDB interface has no difference, the only news is the ScaleDB storage engine:

SHOW_ENGINES

Now it is time to test the engine. We can start by creating a table. The command to create a streaming table is a bit different from a standard InnoDB table. Here is an example:

MariaDB [(none)]> CREATE TABLE test.test (
 -> id bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
 -> create_time timestamp NOT NULL,
 -> account char(8) NOT NULL,
 -> store int(10) UNSIGNED NOT NULL,
 -> amount decimal(8,2) NOT NULL,
 -> PRIMARY KEY (id) STREAMING_KEY=YES,
 -> KEY create_time (create_time) RANGE_KEY=SYSTEM )
 -> ENGINE=ScaleDB TABLE_TYPE=STREAMING;
 Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]>

And here is the explanation line by line:

  • Streaming tables are special tables in ScaleDB that are used to store streaming data. They are really fast and work as a constant stream, i.e. you can INSERT the data, you can SELECT to run a query and you can DELETE the oldest data, but you cannot UPDATE any row or DELETE a row with a general condition.
  • The id column is the primary key: streaming tables always require a primary key.
  • The create_time is a timestamp associated to a range key, a special time-based key used in ScaleDB to query by a time interval.
  • The table attributes to add are the engine (ScaleDB) and the table type (STREAMING). At the moment we do not recommend to use any other type of table.

When you have received an OK, you have created your first ScaleDB table – congratulations!

And finally a simple INSERT and SELECT:

MariaDB [(none)]> INSERT INTO test.test VALUES (NULL, NULL, 'A', 1, 100);
Query OK, 1 row affected (1.54 sec)

MariaDB [(none)]> SELECT * FROM test.test;
 +-----+---------------------+---------+-------+--------+
 | id  | create_time         | account | store | amount |
 +-----+---------------------+---------+-------+--------+
 | 256 | 2015-11-16 06:30:24 | A       |     1 | 100.00 |
 +-----+---------------------+---------+-------+--------+
 1 row in set (0.00 sec)

MariaDB [(none)]>

One warning, if you run a SELECT query immediately after the INSERT and you do not see the that you have just inserted, it is a “normal behaviour”. By default, ScaleDB has a time window of 30 seconds that is used to flush a large number of rows in one go. This behaviour can be changed to a more usual OLTP behaviour, but the price is a lower load rate. In ScaleDB Cluster, the rows are safely stored on two servers and they are not lost in case of fault.

Welcome ScaleDB 15.10!

Time really flies. A bit less than 4 months ago, I wrote a post about my decision to join ScaleDB. Today, after 4 months and a lot of excitement working with a great team and genuinely good people, I am proud to announce that the first version of ScaleDB is available to the public.

ScaleDB 15.10 Ararat

We decided to number this version 15.10 and to name it Ararat. Indeed, we intend to follow the release cycle of other famous software projects, such as Ubuntu, OpenStack and recently CentOS. Our logo is a peak, we are all about scaling, as in our name and as the main objective of our products. To scale comes from the Latin word scandere, i.e. ‘to climb’. Mount Ararat is one of the most beautiful peaks in the whole world, yet hard to climb and full of significance and mystery for many. It looked natural for us to start our journey naming the product after this mountain.

ScaleDB 15.10 is the first public version of our product. So far, we’ve been using a private beta and we have been working with users, developers and DBAs to make the product available to the public for a more general use.

We have customers and community users who use ScaleDB in production- in the last year we have worked hard to fix all the S1 bugs known to us, but as with any software, we cannot guarantee that the quality of the product will be top notch right from its first public version, therefore we strongly recommend you thoroughly test ScaleDB 15.10 before deploying it in a production environment.

The software is available for download from our website as a tarball, and we are going to provide Red Hat and Ubuntu packages very soon. You can click here, fill in a quick form and receive information on how to download and use ScaleDB within minutes. We will setup an account for you, in which you will also find updates, patches and new releases.

Streaming data, time series and realtime analytics

The main objective of ScaleDB 15.10 is to provide a Big Data solution with MySQL, currently in the form of streaming data and realtime analytics (see some extra info here). From the perspective of a MySQL user, ScaleDB is a standard storage engine that can be plugged into MariaDB. Behind the scenes, we make extensive use of special handlers in MariaDB 10.1 that extend the condition pushdown to the storage engine (although in the very first version of 15.10 we still recommend MariaDB 10.0) and to a cluster of machines. We also call the ScaleDB Cluster IDC, Intelligent Data Cluster.

ScaleDB can handle millions of inserts per second loaded by parallel workers, whilst hundreds of concurrent users can analyse the very same data in realtime. We ran some basic tests and one of them is published here: it can give you an idea of the type of analysis and scalability you may expect from ScaleDB.

We use a different and innovative approach to storing, indexing and accessing the data. The best fit for ScaleDB is time series data, which probably represent a significant part of the data currently qualified as Big Data. That said, ScaleDB can also be used to store and analyse not only time series, but any kind of data, although we are not currently focused on rich data such as documents and multimedia.

ScaleDB ONE and ScaleDB Cluster

ScaleDB 15.10 comes in two flavours, ScaleDB ONE and ScaleDB Cluster.

ScaleDB ONE stands for One Node Edition. It is a single node version of the product. DBAs can install and use the product on a single node. Performance is great for many use cases, and ScaleDB ONE can already sustain hundreds of thousands of inserts per second and real time analysis on a single node. ScaleDB ONE is completely free, and support can be purchased on request.

ScaleDB Cluster is the fully loaded version of ScaleDB that can scale up to many Terabytes of data and hundreds of concurrent users. ScaleDB Cluster is available as a commercial license with technical support that can be purchased from ScaleDB Inc.

What’s next?

Well, this is just the start. We will talk more about ScaleDB in future posts, from its internal structures, to advanced indexing, scalability, roadmap and much more! As they often tell me as a frequent flyer: sit back, relax, and enjoy the journey with ScaleDB.

Percona Live Europe is now over, MySQL is not

Percona Live Europe is now more than a week away. l left Amsterdam with a positive thought: it has been the best European event for MySQL so far. Maybe the reason is that I saw the attendance increasing, or maybe it was the quality of the talks, or because I heard others making the same comment, and I also saw a reinvigorated MySQL ecosystem.
There are three main aspects I want to highlight.

1. MySQL 5.7 and the strong presence of the Oracle/MySQL team

There have been good talks and keynotes on MySQL 5.7. It is a sign of the strong commitment of Oracle towards MySQL. I think there is an even more important point. The most interesting features in 5.7 and the projects still in MySQL Labs derive or are in some way inspired by features available from other vendors. Some examples:

  • The JSON datatype from MySQL and MariaDB – two fairly different approaches, but definitely an interesting addition
  • Improvements in the optimizer from MySQL and MariaDB. There is a pretty long list of differences, this slide deck can help understand them a bit better…
  • Improvement for semi-sync replication from MySQL and WebScaleSQL
  • Automatic failover with replication from MySQL and MHA
  • Multi-source replication from MySQL and MariaDB 10
  • Group replication in MySQL and MariaDB 10 – Here things differ quite a lot, but the concept is similar.
  • MySQL router in MySQL and MaxScale – Again, a different approach but similar concepts to achieve the same results

My intent here is not to compare the features-I am simply pointing out that the competition among projects in the MySQL ecosystem is at least inspirational and can offer great advantages to the end user. Of course the other side of the coin is the creation of almost identical features, and the addition of more confusion and incompatibilities among the distributions.

2. The Pluggable Storage Engine Architecture is alive and kicking

Oracle’s commitment to improving InnoDB has been great so far, and hopefully InnoDB will get even better in the future. That said, the Pluggable Storage Engine Architecture was a unique feature for a long time. There have been two recent additions to the list of storage engines that have been around for long time. Today TokuDB, Infobright, InfiniDB, and ScaleDB share the advantage of being pluggable to MySQL with Deep and RocksDB. RocksDB is also pluggable to MongoDB, and even more important, it has been designed with a specific use case in mind.

3. Great support from the users

The three aspects have similar weight in measuring the health of MySQL, but this is my favourite, because it demonstrates how important MySQL is for some of the most innovative companies on the planet. Despite Kristian Koehntopp’s great keynote, showing us how boring the technology is at Booking.com, nobody really thought it was true. Using a stable and mature product like MySQL is not boring, it is wise. But this was not the only presentation that we enjoyed from the end users. Many showed a great use of MySQL, especially compared to the levels of scalability and performance that NoSQL databases ( these two combined aspects being the number 1 reason for using a NoSQL DB) struggle to produce with certain workloads.
I am looking forward to seeing the next episode, at Percona Live 2016 in Santa Clara.

This time it is real…

A few months ago I updated my profile on LinkedIN, and adjusted my position as CTO and founder of Athoa Ltd, a British company currently active for translation services and events that in the past hosted a couple of interesting open source projects. I simply forgot to disable the email notification to my connections, set by default, and in 2-3 hours I received tens of messages from friends and ex-colleagues who were curious to hear about my new adventure.

Today, I changed my profile on LinkedIN again and have left the email notification set on purpose.

As of today, I join the team at ScaleDB. My role is to define the product and the strategy for the company, working closely with CEO Tom Arthur, CTO Moshe Shadmon, CMO Mike Hogan and the rest of the team.

Leaving Canonical

The last nine months at Canonical have been an outstanding and crazily intense journey. I learned as I never learned before about systems and network infrastructures, and I met an amazing team of core engineers. It has been a unique experience, one of those that only come along once in a lifetime – I really mean it – and I will never forget it.

The decision to leave Canonical came after a lot of thinking and many sleepless nights. I met so many great people that in many ways, are making history in IT. In my team under Dan Poler, I worked with experienced Cloud and Solutions Architects that can analyze problems, discuss architectures and suggest solutions from the high level view, down to the kernel of the operating system and even to the silicon of systems and devices. Chris Kenyon and John Zannos teams are called “Sales”, but they are really advisors for a growing ecosystem of providers and adopters of Ubuntu and OpenStack technologies.

I have been inspired by the dedication and leadership of Canonical CEO Jane Silber. Jane has the difficult job of leading a company that is moving at lightspeed in many different directions, so that the technology that powers clouds, networks, end users and small devices can share the same kernel and will eventually converge. Jane is in my opinion the leading force, making Canonical blossom like a plum tree in mid-winter, when the rest of the of the nature still sleeps under the snow.

My greatest experience at Canonical has been working with Mark Shuttleworth. Mark is an inspiration not only for the people of Canonical or for the users of Ubuntu, but for us all. Mark’s energy and passion are second only to his great vision for the future. I recommend everybody to follow Mark’s blog and watch or attend his talks. His attention to detail and search for perfection never shadows the core message and understanding of the big picture; for this reason, both experienced listeners and newbies will have takeaways from his talks.

Back in June last year, I decided to join Canonical because of Mark’s vision. His ideas were in sync with what I wanted to bring at SkySQL/MariaDB. At Canonical, I could see this vision materialize in the direction the products were going, only on larger scale. This experience has reinforced in me the belief that we have an amazing opportunity right in front of us. The world is changing dramatically and at a speed that is incomparable with the past, even when compared with the first 10 years of the new millennium. We must think out of the box and reconsider the models that companies have used so far to sustain their business, since some of them are already anachronistic and create artificial barriers that will eventually collapse.

This experience at Canonical will stay with me forever and I hope to make a good use of what I have learned so far and all that I will learn in the future from Mark.

Joining ScaleDB

The last Percona Live was a great event. It was great to see so many friends and ex-colleagues again, now working on different companies but gathering together once a year as in a school reunion. Percona has now become a mature company, but more importantly, it has reached its maturity growing organically. The results are outstanding and the new course to be a global player in the world of databases looks even more promising.

The list of people and companies I would like to mention is simply too long and it would be a subject for a post per se. I found the MySQL world more active than ever. In this Percona Live I found the perfect balance between solid and mature technologies that are constantly improving, and new and disruptive technologies that are coming out under the same MySQL roof.

I simply feel as I am part of this world, and it is part of me. I worked with databases in many different roles for all my life, first with Digital/Oracle RDB and Digital/HP Datatrieve, then with IBM/Informix, Oracle, Sybase and SQLServer, and last with MySQL. I am looking at this world with the eyes of someone who has been enriched by new experiences. I simply think I have more to offer to this market than to networks and systems infrastructures. I therefore decided to come back. I also feel I can offer more in designing and defining products than in running services.

ScaleDB seems to me the company where I can express myself and I can help more at this point of my working life. With my previous role as advisor for the company, working on products and strategies just feels natural to me. The position is also compatible with my intention to improve and extend my involvement in the MySQL ecosystem, not only as MariaDB Ambassador, but also and equally advocating for Oracle and Percona products.

I also believe that MySQL should not be an isolated world from the rest of the database market. I already expressed my interest in Hadoop and other DB technologies in the past, and I believe that there should be more integration and sharing of information and experiences among these products.

I’ve known and have been working with Moshe Shadmon, ScaleDB CTO, for many years. Back in 2007, we spent time together discussing the use, advantages and disadvantages of distributed databases. At the time, we were talking about the differences between Oracle RAC, MySQL/NDB and DB/2, their strong and weak points, what needed to be improved. That was the time when ScaleDB as a technology started taking the shape that it has today.

ScaleDB is an amazing technology. It is currently usable as a storage engine with MariaDB 10.0, it has been developed with the idea of a cluster database from the ground up. As for MySQL in 2005, when the goal was to provide performance, scalability and ease of use in a single product, ScaleDB today provides more performance and greater scalability, without compromising availability and the use of standard SQL/MySQL. The engineering team at ScaleDB has recently worked on an amazing extension of their technology to sustain fast inserts and real-time queries on commodity hardware, at a fraction of the cost of NoSQL alternatives. This addition makes ScaleDB the perfect solution for storing and retrieving time series data, which is the essence for stream analytics and Internet of Things.

I believe ScaleDB has the incredible potential to become a significant player in the DB world, not only in MySQL. I feel excited and honored to be given the opportunity to work on this new adventure. I will try my hardest to serve the MySQL ecosystem in the best possible way, contributing to its success and improving the collaboration of companies – providers, customers, developers and end users – in MySQL and in the world of databases.

Now hop onto the new ride, the future is already here…

2015: More innovation, but still a year of transition

First things first: I could use this title for every year, it is an evergreen. In order for this title to make sense, there must be a specific context and in this case the context is Big Data. We have seen new ideas and many announcements in 2014, and in 2015 those ideas will shape up and early versions of innovative products will start flourishing.

Like many other people, I prepared some comments and opinions to post back in early January then, soon after the season’s break, I started flying around the world and the daily routine kept me away from the blog for some time. So, as a good last blogger, it may be time for me to post my own predictions, for the joy of my usual 25 readers.

Small Data, Big Data, Any Data

The term Big Data is often misused. Many different architectures, objectives, projects and issues deviate from its initial meaning. Everything today seems to be “Big Data” – whether you collect structured or unstructured information, documents, text and patterns, there is so much hype that every company and marketing department wants to associate its offering with Big Data.

Big Data is becoming the way to say that your organisation is dealing with a vast amount of information and it is becoming a synonym for Database. Marketing aside, there are reasons behind this misuse. We are literally inundated by a vast amount of data of all sorts, and we have been told that all this data has some value. Fact is, more and more organisations want to use this data, and in some way they are pushing for the commoditisation of Big Data solutions.

There are valid reasons behind the commoditisation of Big Data. The first one is that data is data and, big or small, it should be simple and easy to manage and use. If this is not the case, then it is an issue that database providers should solve, and an opportunity to inspire entrepreneurs to provide new products. Managers, users and administrators demand this commoditisation. They do not want to treat Big Data differently from any other data: they do not want a batch-only mode, a Lambda junction or another complex architecture. Many organisations need real time analysis, small queries and transactions for the data they collect or generate.

Developers and devops have their say on Big Data too. They need more ways to access Hadoop and Lambda architectures. They long for the simplicity of the good old days of the LAMP stack or for today’s agility of Node.js and MongoDB. They want to code faster, release often, run and fix bugs in minutes (not weeks or months), also on Big Data.

In my humble opinion, the key point for Big Data in 2015 is the convergence towards Hadoop. Everything will be in some way related to Hadoop, whether it is a distributed file system, a map/reduce approach or other related technologies. In some way, established Big Data vendors will create more interfaces. Other SQLs and NoSQLs will reach the Hadoop haven, by integrating their existing products, creating more connectors, or providing hybrid architectures.

The two big issues to tackle are on the administration side and the user side. For administrators, Big Data architectures must be simple to provision, configure and deploy, and eventually modify. For users, Big Data solutions must be simple to use for their analysis or online applications. In both cases, the issue is currently Big Data = Big Complexity.

Some predictions

A convergence towards Hadoop is inevitable. Even the most traditional companies active in the DB world, like Oracle and Microsoft, are taking large steps in this direction. Here we are not talking about integration through adapters or loaders, we are referring to a deeper convergence where Hadoop will be (in some way) part of the commercial products.

There will be more interfaces that allow developers to reuse their skills or existing code to work with Hadoop. This aspect will be interesting for ad-hoc applications, but even more important for BI and Business Analytics vendors, who will integrate their tools with Hadoop with “minimal” effort. An evolution in this area will have the same impact that tools like

Business Objects, Cognos and MicroStrategy had for data warehousing in the ‘90s. Users will have the ability to consume data in a DIY fashion, saving money and ultimately bringing commoditisation to Big Data.

But we need more innovation to make Big Data a real commodity. We need more Hadoop as a service, something that is starting only this year. We need cloud-friendly, or “cloudified” architectures. The natural distribution of the Lambda architecture fits well with the Cloud model, but now the issue is to optimise performance and avoid unnecessary resource consumption in cloud-based Big Data infrastructures.

Orchestration is the magic word for Big Data in 2015 and certainly for one or two more years to come. Too many moving parts create complex architectures that are difficult to manage. Orchestration tools will play the most important role into the commoditisation of Big Data infrastructures. Projects will be delivered faster and in a more agile way in order to cut the costs and make the technology suitable for more uses.

The missing players

In this scenario, we sadly miss PostgreSQL, MySQL and some others.

PostgreSQL has still a large number of enthusiasts and great developers who provide improvements, but big investments are missing. EnterpriseDB monetises migrations of costly Oracle-based applications to PostgreSQL. This is, in my opinion, a pretty correct and pragmatic approach, from a tactical business perspective. The support business around Postgres will go on for many years, but we should not expect any innovations in this area. We can see the use of Postgres technology in Greenplum and in Pivotal HAWQ, but that product would fall more into the bucket of the Hadoop adapters than into a standard PostgreSQL engine.

MySQL is another player that is missing the boat. The great improvements made in MySQL 5.7, in WebScaleSQL and in MariaDB all move in one direction: the MySQL install base. It looks like the world stopped in 2006 and no more technologies have emerged since then. Fact is, almost all the developers have adopted Hadoop and NoSQL technologies for their new projects, leaving [as it happens for Postgres] the MySQL ecosystem still in business for the support of existing installations.

Finally, the traditional NoSQL players are catching up. The fact that they do not have a large install base allows these players to change directions faster and sometimes drastically. Datastax leads the pack, adding Hadoop to its Enterprise solution based on Cassandra. MongoDB benefits from large investments that give this database more bandwidth in the long term. The first step for MongoDB has been the introduction of a new pluggable storage architecture. Now we need to wait for the next step towards an Hadoop pluggable engine. Couchbase and Basho/Riak still maintain their position as servers that can be integrated with Hadoop, but Hadoop is not a component of their enterprise products.

Obviously, I may be completely wrong with my predictions and in 12 months’ time we might see Hadoop more concentrated on real Big Data and none of the missing players joining the bandwagon. Let’s just wait and see.

In the meantime, there is more to come in this area. The future of Big Data is very much connected to the Internet of Things, which will bring even more complexity, along with the need for real time analytics combined with batch data analysis. On top of everything, the orchestration of a large number of components is an essential piece of technology for Big Data and IoT. Without the right orchestration, Devops will spend 80% of their time on operations and 20% on development, but it should be the other way around.
More to come in Jan 2016.

VirtualBox extensions for MAAS

During the last season’s holidays, I spent some time cleaning up demos and code that I use for my daily activities at Canonical. It’s nothing really sophisticated, but for me, and I suspect for some others too, a small set of scripts makes a big difference.

In my daily job, I like to show live demos and I need to install a large set of machines, scale workloads, monitor and administer servers and data centres. Many people I meet don’t want to know only the theoretical details, they want to see the software in action, but as you can imagine, the process of spinning up 8 or 10 machines and install and run a full version of OpenStack in 10-15 minutes, while you also explain how the tools work and perhaps you even try to give suggestions on how to implement a specific solution, is not something you can handle easily without help. Yet, that is what CTOs and Chief Architects want to know in order to decide whether a technology is good or not for them.

At Canonical, workloads are orchestrated, provisioned, monitored and administered using MAAS, Juju and Landscape, around Ubuntu Cloud, which is the Canonical OpenStack offering. These are the products that can do the magic of what I described, but providing in minutes something that usually takes days to install, set up and run.

In addition to this long preface, I am an enthusiastic Mac user. I do love and prefer Ubuntu software and I am not entirely happy with many technical decisions around OS X, but I also found Mac laptops to be a fantastic hardware that simply fits my needs. Unfortunately, the KVM porting to OS X is not available yet, hence the easiest and most stable way to spin up Linux VMs in OS X is to use VMWare Fusion, Parallels or VirtualBox. Coming from Sun/Oracle and willing to use open source software as much as I can, VirtualBox is my favourite and natural choice.

Now, if you mix all the technologies mentioned above, you end up with a specific need: the integration of VirtualBox hosts, specifically running on OS X (but not only), with Ubuntu Server running MAAS. The current version of MAAS (1.5 GA in the Ubuntu archives and 1.7 RC in the maintainers branch), supports virsh for power management (i.e. you can use MAAS to power up, power check and power down your physical and virtual machines), but the VirtualBox integration with virsh is limited to socket communication, i.e. you cannot connect to a remote VirtualBox host, or in other words MAAS and VirtualBox must run in the same OS environment.

Connections to local and remote VirtualBox hosts
Connections to local and remote VirtualBox hosts

 

My first instinct was to solve the core issue, i.e. add support to remote VirtualBox hosts, but I simply did not have enough bandwidth to embark on such an adventure, and becoming accustomed to the virsh libraries would have taken a significant amount of time. So I opted for a smaller, quicker and dirtier approach: to emulate the most simple power management features in MAAS using scripts that would interact with VirtualBox.

MAAS – Metal As A Service, the open source product available from Canonical to provision bare metal and VMs in data centres, relies on the use of templates for power management. The templates cover all the hardware certified by Canonical and the majority of the hardware and virtualised solutions available today, but unfortunately they do not specifically cover VirtualBox. For my workaround, I modified the most basic power template provided for the Wake-On-LAN option. The template simply manages the power up of a VM, and leaves the power check and power down to other software components.

The scripts I have prepared are available on my GitHub account, and are licensed under GPL v2, so you are absolutely free to download it, study it, use it and, even more important, provide suggestions and ideas to improve them.

The README file in GitHub is quite extensive, so I am not going to replicate here what has been written already, but I am going to give a wider architectural overview, so you may better consider whether it makes sense to use the patches or not.

MAAS, VirtualBox and OS X

The testing scenario that I have prepared and used includes OS X (I am still on Mavericks as some of the software I need does not work well on Yosemite), VirtualBox and MAAS. What I need for my tests and demos is shown in the picture below. I can use one or more machines connected together, so I can distribute workloads on multiple physical machines. The use of a single machine makes things simpler, but of course it puts a big limitation to the scalability of the tests and demos.

A simplified testbed with MAAS set as VM that can control other VMs, all within a single OS X VirtualBox Host machine
A simplified testbed with MAAS set as VM that can control other VMs, all within a single OS X VirtualBox Host machine

The basic testbed I need to use is formed by a set of VMs prepared to be controlled by MAAS. The VMs are visible in this screenshot of the VirtualBox console.

VirtualBox Console
VirtualBox Console

Two aspects are extremely important here. First, the VMs must be connected using a network that allows direct communication between the MAAS node and the VMs. This can be achieved locally by using a host-only adapter where MAAS provides DNS and DHCP services and each VM has the Allow All option set in the communication mode combo.

VirtualBox Network

Secondly, VMs must have PXE boot set on. In VirtualBox, this is achievable by selecting the network boot option as the first option available in the system tab.

VirtualBox Boot Options

 

In this way, the VMs can start the very first time and can PXE Boot using a cloud image provided by MAAS. Once MAAS has the VM enlisted as a node, administrators can edit the node via the WEB UI, the CLI app or the RESTful API. Apart from changing the name, what is really important is the setting of the Power mode and the physical zone. The power mode must be set as Wake-On-LAN and the MAC address is the last part of the VM id in VirtualBox (with colons). The Physical zone must be associated to the VirtualBox Host machine.

MAAS Edit NodeMAAS Edit Node

In the picture above the Physical zone is set as izMBP13. The description of the Physical zone must contain the UID and the hostname or IP address of the host machine.

Physical Zone

Once the node has been set properly, it can be commissioned by simply clicking the Commission node button in the Node page. If the VM starts and loads the cloud image, then MAAS has been set correctly.

The MAAS instance interacts with the VirtualBox host via SSH and with responds to PXE Boot requests from the VMs
The MAAS instance interacts with the VirtualBox host via SSH and with responds to PXE Boot requests from the VMs

A quick look at the workflow

The workflow used to connect MAAS and VM is relatively simple. It is based on the components listed below.

A. MAAS control

Although I have already prepared scripts to Power Check and Power Off the VM, at the moment MAAS can only control the Power On. Power On is executed by many actions, such as Commission node or the explicit Start node in MAAS. You can always check the result of this action by checking the event log in the Node page.

09 MAAS Node

B. Power template

The Power On action is handled through a template, which in the case of Wake-On-LAN and of the patched version for VirtualBox is a shell script.

The small fragment of code used by the template is listed here and it is part of the file /etc/maas/templates/power/ether_wake.template:

...
if [ "${power_change}" != 'on' ]
then
...
elif [ -x ${home_dir}/VBox_extensions/power_on ]
then
    ${home_dir}/VBox_extensions/power_on \
    $mac_address
...
fi
...

C. MAAS script

The script ${home_dir}/VBox_extensions/power_on is called by the template. This is the fragment of code used to modify the MAC address and to execute a script on the VirtualBox Host machine:

...
vbox_host_credentials=${zone_description//\"}

# Check if there is the @ sign, typical of ssh
# user@address
if [[ ${vbox_host_credentials} == *"@"* ]]
then
  # Create the command string
  command_to_execute="ssh \
    ${vbox_host_credentials} \
    '~/VBox_host_extensions/startvm \
     ${vm_to_start}'"
  # Execute the command string
  eval "${command_to_execute}"
...

D. VirtualBox host script

The script in ~/VBox_host_extensions/startvm is called by the MAAS script and executes the stratvm command locally:

...
start_this_vm=`vboxmanage list vms \
| grep "${1}" \
| sort \
| head -1`
start_this_vm=${start_this_vm#*\{}
start_this_vm=${start_this_vm%\}*}
VBoxManage startvm ${start_this_vm} \
           --type headless
...

The final result will be a set of VMs that are then ready to be used for example by Juju to deploy Ubuntu OpenStack, as you can see in the image below.

MAAS Nodes (Ready)

 

Next Steps

I am not sure when I will have time to review the scripts, but they certainly have a lot of space for improvement. First of all, by adopting a richer power management option, MAAS will not only power on the VMs, but also power off and check their status. Another improvement regards the physical zones: right now, the scripts loop through all the available VirtualBox hosts. Finally, it would be ideal to use the standard virsh library to interact with VirtualBox. I can’t promise when, but I am going to look into it at some point this new year.

It does not matter if Aurora performs 1x or 10x MySQL: it _is_ a big thing

I spent the last 4 years at SkySQL/MariaDB working on versions of MySQL that could be “suitable for the cloud”. I strongly believed that the world needed a version of MySQL that could work in the cloud even better than its comparable version on bare metal. Users and administrators wanted to benefit from the use of cloud infrastructures and at the same time they wanted to achieve the same performance and overall stability of their installations on bare metal. Unfortunately, ACID-compliant databases in the cloud suffer from the issues that any centrally controlled and strictly persistent system can get when hosted on highly distributed and natively stateless infrastructures.

In this post I am not going to talk about the improvements needed for MySQL in the cloud – I will tackle this topic in a future post. Today I’d like to focus on the business side of RDS and Aurora.

In the last 4 years I had endless discussions over the use of Galera running in AWS on standard EC2 instances. I tried to explain many times that having Galera in such environment was almost pointless, since administrators did not have real control of the infrastructure. The reasons have nothing to do with the quality and the features of Galera, rather with the use of a technology placed in the wrong layer of the *aaS stack. Last but not least, I tried many times to guide the IT managers through the jungle of hidden costs of an installation of Galera (and other clustering technologies) in EC2, working through VPCs, guaranteed IOPs, dedicated instances and AZs etc.

I had interesting meetings with customers and prospects to help them in the analysis of the ROI of a migration and the TCO of an IT service in a public cloud. One example in particular, a media company in North America, was extremely interesting. The head of IT decided that a web service had to be migrated to AWS. The service had predicable web access peaks, mainly during public events – a perfect fit for AWS. When an event is approaching, Sysadms can launch more instances, then they can close them when the event ends. Unfortunately, the same approach cannot be applied to database servers, as their systems require to keep data available at all times. Each new event requires more block storage with higher IOPs and the size and flavour of the DB instances becomes so high spec that the overall cost of running everything in EC2 is higher than the original installation.

Aurora from an end customer perspective

Why is Aurora a big thing? Here are some points to consider:

1. No hidden costs in public clouds

The examples of Galera and the DB servers in AWS that I mentioned, are only two of the surprises that IT managers will find in their bills. There is a very good reason why public clouds have (or should have) a DBaaS offering: databases should be part of IaaS. They must make the most out of the knowledge of the bare metal layer, in terms of physical location, computing and storage performance, redundancy and reliability etc. Cloud users must use the database confidently, leaving typical administration tasks such as data backups and replication to automated systems that are part of the infrastructure. Furthermore, end customers want to work with databases that do not suffer resource contention in terms of processing, storage and network – or at least not in a way that is perceivable from an application standpoint. As we select EBS disks with requested IOPs, we must be able to use a database server with requested QPSs – whatever we define as “Query”. The same should happen for private clouds, since technologies, benefits and disadvantages are substantially the same. In AWS, RDS has already these features, but Aurora simply promises a better experience with more performance and reliability. Sadly, not many alternatives are available from other cloud providers.

2. Reduce the churn rate

A consequence of the real or expected hidden costs is a relatively high churn rate that affects many IT projects in AWS. DevOps start with AWS because it is simple and available immediately, but as the the business grows, so does the bill, and sometimes the growth is not proportional. Amazon needed to remove the increase in costs for their database as one of the reasons to leave or reduce the use of a public cloud, and Aurora is a significant step in this direction. I expect end customers to be more keen to keep their applications on AWS in the long run.

A strong message to the MySQL Ecosystem

There are lots of presentations and analysis around MySQL and the MySQL flavours, yet none of these analysis looks at the generated revenues from the right perspective. Between 2005 and 2010, MySQL was a hot technology that many considered as a serious alternative to closed source relational databases. In 2014, with an amazing combination of factors:

  • A vast number of options available as open source technologies in the database market
  • A substantial change in the IT infrastructure, focused on virtualisation and cloud operations
  • A substantial change in the development of applications and in the types of applications, now dominated by a DevOps approach
  • A fracture in the MySQL ecosystem, caused by forks and branches that generated competition but also confusion in the market
  • An increasing demand for databases focused on rich media and big data
  • A relatively stable and consolidated MySQL
  • A good level of knowledge and skills available in the market
    (…and the list goes on…)

All these factors have not only limited the growth in revenues in the MySQL ecosystem, but have basically shrunk them – if you do not consider the revenues coming from DBaaS. Here is a pure speculation: Oracle gets a good chunk of their revenues for MySQL from OEMs (i.e. commercial licenses) and from existing not-only-MySQL Oracle customers. Although Percona works hard in producing a more differentiated software product (and kudos for the work that the Percona software team does in terms of tooling and integration), the company adopted a healthy, but clearly services-focused business model. The MariaDB approach is more similar to Oracle, but without commercial licenses and without a multi-billion$ customers base. Yet, when you review the now 18 months’ old keynote from 451 research at Percona Live, you realise that the focus on “Who uses MySQL?” is pretty irrelevant: MySQL is ubiquitous and will be the most used open source database in the upcoming years. The question we should ask is rather, “Who pays for MySQL?”, or even better, “Why should one pay for MySQL?”: a reasonable fee paid for MySQL is the lymph that companies in the MySQL ecosystem need to survive and innovate.

Right now, in the MySQL ecosystem, Amazon is the real winner. Unfortunately, there are no public figures that can prove my point, not from the MySQL vendors, nor from Amazon. DBaaS is at the moment the only way to make users pay for a standard, fully compatible MySQL. In topping up X% of standard EC2 instances, Amazon provides a risk free, ready to use and near-zero administration database – and this is a big thing for DevOps, startup, application-focused teams who need to keep their resource costs down and will never hire super experts in MySQL.

With Aurora, Amazon has significantly raised the bar in the DBaaS competition, where there are no real competitors to Amazon at the moment. Dimitry Kravtchuck wrote in his blog that “MySQL 5.7 is already doing better” than Aurora. I have no doubts that a pure 5.7 on bare metal delivers great performance, but I think there are some aspects to consider. First of all, Aurora target customers do not want to deal with my.cnf parameters – even if we found out that Aurora is in reality nothing more than a smart configurator for MySQL, which can magically adapt mysqld instances on a given workload, it would still be good enough for the end customers. Second and most important point, Aurora is [supposed to be] the combination of standard MySQL (i.e. not an esoteric and innodb-incompatible storage engine) that delivers good performance in AWS – if Amazon found out that they can provide the same cloud features using a new and stable version of MySQL 5.7, I have no doubts they would replace Aurora with a better version of MySQL, probably keeping the same name and ignoring the version number, and even more importantly enjoying the revenues that the new improved version would generate.

The ball is in our court

With Aurora, Amazon is well ahead of any other vendor – public cloud and MySQL-based technology vendors – in providing MySQL as DBaaS. Rest assured that Google, Azure, Rackspace, HPCloud, Joyent and others are not sitting there watching, but in the meantime they are behind Aurora. Some interesting projects that can fill the gap are going on at the moment. Tesora is probably the most active company in this area, focusing on OpenStack as the platform for public and private clouds. Continuent provides a technology that could be adapted for this use too and the recent acquisition from VMware may give a push to some projects in this area. Considering the absence of the traditional MySQL players in this area, on one side this is an opportunity for new players to invest in products that are very likely to generate revenues in the near future. On the other hand, it is a concern that the lack of options for DBaaS will convince more end customers to adopt NoSQL technologies, which are more suitable to work in distributed, cloud infrastructures.

Why we should care about wearable Linux

These thoughts have been inspired by the not-so-recent-anymore announcement of Apple Watch and by Jono Bacon’s post re Ubuntu for Smartwatches. They do not specifically refer to Ubuntu, and they reflect my personal opinion only.

Wearable devices are essentially another aspect of the Internet of Things. IoT is now expanded to include your clothes and tools, not to mention your body, or parts of it. In the next decade, people will experience the next generation of mobile applications, the ones that are now used to share your recent workout or your last marathon. Rest assured: there will be all sorts of new applications, and existing applications will be highly improved by the adoption of affordable wearable hardware.

Right now, the two main choices for wearable applications are IOS and Android platforms, understandably evolved to fit with the new hardware. A third and fourth option are based on Microsoft and Linux platforms, but at the moment there is very little to say about their ability to gather a good slice of the market.

Fact is, a market dominated by two companies who are already controlling 99% of the mobile applications, is not a good thing. Indeed, it is in their rights to take advantage from the huge investments they have made to build IOS and Android, but that does not mean that the world should not have other alternatives. Although it is not a walk in the park, Linux is the only platform that can rise as truly independent and have a chance to compete in this war.

Why wearable Linux is important to consumers

The majority of the applications we use are free. The cost of the platform, whether it is IOS or Android, is paid in other ways. One of them is the control over the applications available to the consumers. Apple and Google can decide what goes in and what stays out of your mobile devices, and the same will happen for your wearables. Any attempt to escape from this can be easily prevented by the owners of the technology. So, in simple terms, consumers have less choices and eventually less freedom.

Another aspect which is pretty concerning nowadays is data collection. This is and has always been the most controversial aspect of mobile applications. Terms and conditions may allow providers to take advantage from the the data collected through your smartphone or from your wearable in the way they want. It could even be “their data”, and for the consumers it would be just another way to pay for the “free application”. In the best case scenario, the data will not be used individually, i.e. it will not be connected to your name, but it will be aggregated together with many other users in order to analyse behaviours, define strategies, sell trends and results. In one way or another, rest assured that the data collected is a highly valuable asset for the technology providers.

Now, take these two aspects and consider them in connection to the availability and adoption of wearable Linux. A truly open platform, perhaps controlled in terms of technical choices but not in the way it will be used, will not prevent in any way the control of an app store or the data collection, but it will give you choice. Companies may be application providers or may even create their application store, more or less open than Google and Apple. Eventually, people will choose, and they will have more options. The choice is likely to be based on the applications available, but again it is up to the manufacturers and developers to offer an appealing product. This is not dissimilar from the choice of a Cloud infrastructure – for example comparing what you can do under the full control of a single provider in AWS or what goes with the adoption of OpenStack. To some extent, wearable Linux is just an important piece of a jigsaw that goes into the definition of an open infrastructure for IOT.

Why wearable Linux is important for developers and manufacturers

Take a look at the most prominent open source projects available today: the majority of them is backed by foundations and by companies working together, and then competing to provide the best solution on the market. This is not the case for Android, which is fully controlled by Google, not to mention IOS. Sure, Google is keeping Android open and is making it available to any developer and manufacturer. In a similar way, Apple takes pretty good care of its developers, but both companies  simply have their own agenda. A wearable stack which is part of a full platform developed within a foundation would have its internal huge politics, but also more contribution and independence. Put simply, there will be more players and more opportunities to do business in the biggest revolution in computers after the semiconductors.

From a more technical point of view, developers would immensely benefit from the use of the full Linux stack. If you look at the way Android and IOS have evolved, you will realise that the concept of lightweight has significantly changed since their early versions. On the other hand, some of the libraries available on Linux are not in Android, and this is a significant limitation for developers. Having a common platform where portability from the mainframe to a wearable, is a big, huge asset, and it is not inconceivable in modern scale-out infrastructures. Interoperability, robustness, rapid development and ease of deployment are not trivial aspects of a Linux wearable stack, something that would increase the amount of applications and competition in the market.

The essential takeaway is that wearable applications are a great opportunity to expand the current market of mobile infrastructures in order to introduce more players and competition, and to be ready for the next revolution of the IOT. Whether or not new and old companies will take this opportunity is still to be seen, but for us consumers it would only be positive in the long term.

 

From Mavericks to Trusty – Intro

I have been on Mac HW and OS X for 8 years now. I remember the evening in Santa Clara, when Colin Charles showed me his MacBook Pro 15”, still based on the Motorola chipset, and OS X. I fell in love with the sleek design of the MacBook, the backlight that in 2006 was like a “dough!” something so obviously helpful that nobody else had thought about it.

I moved from my Sony Vaio Z1 to a MacBook Pro 15” with Intel chipset in no time. I was so pleased I got rid of Outlook and the other clunky office tools, to use Apple Mail and others. I loved it so much.

Then the iPhone came, and then the iPad, and with it IOS and OS X were in some way converging. I now have an i7 MacBook Air 11 with 8GB RAM and 1/2TB Flash Drive, a dream machine for me: feather-light, real 5-6 hours batteries (the way I use it), all the power I need, with an obvious limitation in the non-retina small screen that I top up with a USB monitor when I really need to. The software improved, but it also changed a lot. OS X Mavericks is no longer the sleek and non-intrusive OS that I saw on my old MacBooks. There are lots of great features, but also some very annoying issues that I really do not like. Perfection is something you must aim at, but you will never reach it: this is the reality for software too.

In my work, I always needed to use Linux, in a way or another. So far, I used VMs and cloud instances, but now this is not enough. I need to move the core OS too, and I found this exciting. I am going to replace OS X with Ubuntu, specifically 10.9 Mavericks with 14.04 LTS Trusty.

I am not going to replace my hardware. I did some research, and the closest non-Apple laptop that I may want to use is the Dell XPS 13 Developer Edition, but still, it is far, for many aspects, from the features that a MacBook can provide. But my decision is not only based on pure technology features. For 8 years, I have been spoiled by Apple. I found a limited but very clear choice of hardware, software and accessories available on the Internet and in the Apple Stores. On top of that, I have highly valued the reliability and the lifetime of a MacBook model.

Here is an example. In my search, I stumbled in a very interesting laptop, the Lenovo X1 Carbon. It was not a laptop of the size I was looking for, but I was intrigued by the performance and by its features. I looked at reviews and videos, then I wanted to check the official site, and here comes the surprise. For a laptop that wants to be at the top of the range and with its innovative design also a landmark, I found outdated web pages, I could not buy it online on official sites (certainly I can browse and find it on eBay), the “where to buy” section of the website pointed me at stores that showed all the laptops in a random way, many sites did not have that model at all. After a while, I simply gave up. The X1 can be purchased in the US from the Lenovo website.
Dell was different. In this case, there is a reliable source, which is the online store. Online, you can select, configure, check the specs and buy a laptop and you know what to expect inside the parcel delivered at doorstep. Dell also gives you a sense of continuity, with the product lines that have evolved a lot, but they still share a positioning and a target market with the previous models.

But as I said, I will stick with Apple hardware. It has been a difficult decision, since I know that Apple discourages in many ways users who buy their hardware then they install non-Apple software. Also, I know I am going to find issues with cards, components and new hardware. For example, my current MacBook Air has a PCI HD camera that does not have any Linux driver. But this, i.e. make my favourite hardware work against all odds, is a challenge that I like to take.

So, watch this space, I am going to update it with more info soon…