2015: More innovation, but still a year of transition

First things first: I could use this title for every year, it is an evergreen. In order for this title to make sense, there must be a specific context and in this case the context is Big Data. We have seen new ideas and many announcements in 2014, and in 2015 those ideas will shape up and early versions of innovative products will start flourishing.

Like many other people, I prepared some comments and opinions to post back in early January then, soon after the season’s break, I started flying around the world and the daily routine kept me away from the blog for some time. So, as a good last blogger, it may be time for me to post my own predictions, for the joy of my usual 25 readers.

Small Data, Big Data, Any Data

The term Big Data is often misused. Many different architectures, objectives, projects and issues deviate from its initial meaning. Everything today seems to be “Big Data” – whether you collect structured or unstructured information, documents, text and patterns, there is so much hype that every company and marketing department wants to associate its offering with Big Data.

Big Data is becoming the way to say that your organisation is dealing with a vast amount of information and it is becoming a synonym for Database. Marketing aside, there are reasons behind this misuse. We are literally inundated by a vast amount of data of all sorts, and we have been told that all this data has some value. Fact is, more and more organisations want to use this data, and in some way they are pushing for the commoditisation of Big Data solutions.

There are valid reasons behind the commoditisation of Big Data. The first one is that data is data and, big or small, it should be simple and easy to manage and use. If this is not the case, then it is an issue that database providers should solve, and an opportunity to inspire entrepreneurs to provide new products. Managers, users and administrators demand this commoditisation. They do not want to treat Big Data differently from any other data: they do not want a batch-only mode, a Lambda junction or another complex architecture. Many organisations need real time analysis, small queries and transactions for the data they collect or generate.

Developers and devops have their say on Big Data too. They need more ways to access Hadoop and Lambda architectures. They long for the simplicity of the good old days of the LAMP stack or for today’s agility of Node.js and MongoDB. They want to code faster, release often, run and fix bugs in minutes (not weeks or months), also on Big Data.

In my humble opinion, the key point for Big Data in 2015 is the convergence towards Hadoop. Everything will be in some way related to Hadoop, whether it is a distributed file system, a map/reduce approach or other related technologies. In some way, established Big Data vendors will create more interfaces. Other SQLs and NoSQLs will reach the Hadoop haven, by integrating their existing products, creating more connectors, or providing hybrid architectures.

The two big issues to tackle are on the administration side and the user side. For administrators, Big Data architectures must be simple to provision, configure and deploy, and eventually modify. For users, Big Data solutions must be simple to use for their analysis or online applications. In both cases, the issue is currently Big Data = Big Complexity.

Some predictions

A convergence towards Hadoop is inevitable. Even the most traditional companies active in the DB world, like Oracle and Microsoft, are taking large steps in this direction. Here we are not talking about integration through adapters or loaders, we are referring to a deeper convergence where Hadoop will be (in some way) part of the commercial products.

There will be more interfaces that allow developers to reuse their skills or existing code to work with Hadoop. This aspect will be interesting for ad-hoc applications, but even more important for BI and Business Analytics vendors, who will integrate their tools with Hadoop with “minimal” effort. An evolution in this area will have the same impact that tools like

Business Objects, Cognos and MicroStrategy had for data warehousing in the ‘90s. Users will have the ability to consume data in a DIY fashion, saving money and ultimately bringing commoditisation to Big Data.

But we need more innovation to make Big Data a real commodity. We need more Hadoop as a service, something that is starting only this year. We need cloud-friendly, or “cloudified” architectures. The natural distribution of the Lambda architecture fits well with the Cloud model, but now the issue is to optimise performance and avoid unnecessary resource consumption in cloud-based Big Data infrastructures.

Orchestration is the magic word for Big Data in 2015 and certainly for one or two more years to come. Too many moving parts create complex architectures that are difficult to manage. Orchestration tools will play the most important role into the commoditisation of Big Data infrastructures. Projects will be delivered faster and in a more agile way in order to cut the costs and make the technology suitable for more uses.

The missing players

In this scenario, we sadly miss PostgreSQL, MySQL and some others.

PostgreSQL has still a large number of enthusiasts and great developers who provide improvements, but big investments are missing. EnterpriseDB monetises migrations of costly Oracle-based applications to PostgreSQL. This is, in my opinion, a pretty correct and pragmatic approach, from a tactical business perspective. The support business around Postgres will go on for many years, but we should not expect any innovations in this area. We can see the use of Postgres technology in Greenplum and in Pivotal HAWQ, but that product would fall more into the bucket of the Hadoop adapters than into a standard PostgreSQL engine.

MySQL is another player that is missing the boat. The great improvements made in MySQL 5.7, in WebScaleSQL and in MariaDB all move in one direction: the MySQL install base. It looks like the world stopped in 2006 and no more technologies have emerged since then. Fact is, almost all the developers have adopted Hadoop and NoSQL technologies for their new projects, leaving [as it happens for Postgres] the MySQL ecosystem still in business for the support of existing installations.

Finally, the traditional NoSQL players are catching up. The fact that they do not have a large install base allows these players to change directions faster and sometimes drastically. Datastax leads the pack, adding Hadoop to its Enterprise solution based on Cassandra. MongoDB benefits from large investments that give this database more bandwidth in the long term. The first step for MongoDB has been the introduction of a new pluggable storage architecture. Now we need to wait for the next step towards an Hadoop pluggable engine. Couchbase and Basho/Riak still maintain their position as servers that can be integrated with Hadoop, but Hadoop is not a component of their enterprise products.

Obviously, I may be completely wrong with my predictions and in 12 months’ time we might see Hadoop more concentrated on real Big Data and none of the missing players joining the bandwagon. Let’s just wait and see.

In the meantime, there is more to come in this area. The future of Big Data is very much connected to the Internet of Things, which will bring even more complexity, along with the need for real time analytics combined with batch data analysis. On top of everything, the orchestration of a large number of components is an essential piece of technology for Big Data and IoT. Without the right orchestration, Devops will spend 80% of their time on operations and 20% on development, but it should be the other way around.
More to come in Jan 2016.

VirtualBox extensions for MAAS

During the last season’s holidays, I spent some time cleaning up demos and code that I use for my daily activities at Canonical. It’s nothing really sophisticated, but for me, and I suspect for some others too, a small set of scripts makes a big difference.

In my daily job, I like to show live demos and I need to install a large set of machines, scale workloads, monitor and administer servers and data centres. Many people I meet don’t want to know only the theoretical details, they want to see the software in action, but as you can imagine, the process of spinning up 8 or 10 machines and install and run a full version of OpenStack in 10-15 minutes, while you also explain how the tools work and perhaps you even try to give suggestions on how to implement a specific solution, is not something you can handle easily without help. Yet, that is what CTOs and Chief Architects want to know in order to decide whether a technology is good or not for them.

At Canonical, workloads are orchestrated, provisioned, monitored and administered using MAAS, Juju and Landscape, around Ubuntu Cloud, which is the Canonical OpenStack offering. These are the products that can do the magic of what I described, but providing in minutes something that usually takes days to install, set up and run.

In addition to this long preface, I am an enthusiastic Mac user. I do love and prefer Ubuntu software and I am not entirely happy with many technical decisions around OS X, but I also found Mac laptops to be a fantastic hardware that simply fits my needs. Unfortunately, the KVM porting to OS X is not available yet, hence the easiest and most stable way to spin up Linux VMs in OS X is to use VMWare Fusion, Parallels or VirtualBox. Coming from Sun/Oracle and willing to use open source software as much as I can, VirtualBox is my favourite and natural choice.

Now, if you mix all the technologies mentioned above, you end up with a specific need: the integration of VirtualBox hosts, specifically running on OS X (but not only), with Ubuntu Server running MAAS. The current version of MAAS (1.5 GA in the Ubuntu archives and 1.7 RC in the maintainers branch), supports virsh for power management (i.e. you can use MAAS to power up, power check and power down your physical and virtual machines), but the VirtualBox integration with virsh is limited to socket communication, i.e. you cannot connect to a remote VirtualBox host, or in other words MAAS and VirtualBox must run in the same OS environment.

Connections to local and remote VirtualBox hosts
Connections to local and remote VirtualBox hosts

 

My first instinct was to solve the core issue, i.e. add support to remote VirtualBox hosts, but I simply did not have enough bandwidth to embark on such an adventure, and becoming accustomed to the virsh libraries would have taken a significant amount of time. So I opted for a smaller, quicker and dirtier approach: to emulate the most simple power management features in MAAS using scripts that would interact with VirtualBox.

MAAS – Metal As A Service, the open source product available from Canonical to provision bare metal and VMs in data centres, relies on the use of templates for power management. The templates cover all the hardware certified by Canonical and the majority of the hardware and virtualised solutions available today, but unfortunately they do not specifically cover VirtualBox. For my workaround, I modified the most basic power template provided for the Wake-On-LAN option. The template simply manages the power up of a VM, and leaves the power check and power down to other software components.

The scripts I have prepared are available on my GitHub account, and are licensed under GPL v2, so you are absolutely free to download it, study it, use it and, even more important, provide suggestions and ideas to improve them.

The README file in GitHub is quite extensive, so I am not going to replicate here what has been written already, but I am going to give a wider architectural overview, so you may better consider whether it makes sense to use the patches or not.

MAAS, VirtualBox and OS X

The testing scenario that I have prepared and used includes OS X (I am still on Mavericks as some of the software I need does not work well on Yosemite), VirtualBox and MAAS. What I need for my tests and demos is shown in the picture below. I can use one or more machines connected together, so I can distribute workloads on multiple physical machines. The use of a single machine makes things simpler, but of course it puts a big limitation to the scalability of the tests and demos.

A simplified testbed with MAAS set as VM that can control other VMs, all within a single OS X VirtualBox Host machine
A simplified testbed with MAAS set as VM that can control other VMs, all within a single OS X VirtualBox Host machine

The basic testbed I need to use is formed by a set of VMs prepared to be controlled by MAAS. The VMs are visible in this screenshot of the VirtualBox console.

VirtualBox Console
VirtualBox Console

Two aspects are extremely important here. First, the VMs must be connected using a network that allows direct communication between the MAAS node and the VMs. This can be achieved locally by using a host-only adapter where MAAS provides DNS and DHCP services and each VM has the Allow All option set in the communication mode combo.

VirtualBox Network

Secondly, VMs must have PXE boot set on. In VirtualBox, this is achievable by selecting the network boot option as the first option available in the system tab.

VirtualBox Boot Options

 

In this way, the VMs can start the very first time and can PXE Boot using a cloud image provided by MAAS. Once MAAS has the VM enlisted as a node, administrators can edit the node via the WEB UI, the CLI app or the RESTful API. Apart from changing the name, what is really important is the setting of the Power mode and the physical zone. The power mode must be set as Wake-On-LAN and the MAC address is the last part of the VM id in VirtualBox (with colons). The Physical zone must be associated to the VirtualBox Host machine.

MAAS Edit NodeMAAS Edit Node

In the picture above the Physical zone is set as izMBP13. The description of the Physical zone must contain the UID and the hostname or IP address of the host machine.

Physical Zone

Once the node has been set properly, it can be commissioned by simply clicking the Commission node button in the Node page. If the VM starts and loads the cloud image, then MAAS has been set correctly.

The MAAS instance interacts with the VirtualBox host via SSH and with responds to PXE Boot requests from the VMs
The MAAS instance interacts with the VirtualBox host via SSH and with responds to PXE Boot requests from the VMs

A quick look at the workflow

The workflow used to connect MAAS and VM is relatively simple. It is based on the components listed below.

A. MAAS control

Although I have already prepared scripts to Power Check and Power Off the VM, at the moment MAAS can only control the Power On. Power On is executed by many actions, such as Commission node or the explicit Start node in MAAS. You can always check the result of this action by checking the event log in the Node page.

09 MAAS Node

B. Power template

The Power On action is handled through a template, which in the case of Wake-On-LAN and of the patched version for VirtualBox is a shell script.

The small fragment of code used by the template is listed here and it is part of the file /etc/maas/templates/power/ether_wake.template:

...
if [ "${power_change}" != 'on' ]
then
...
elif [ -x ${home_dir}/VBox_extensions/power_on ]
then
    ${home_dir}/VBox_extensions/power_on \
    $mac_address
...
fi
...

C. MAAS script

The script ${home_dir}/VBox_extensions/power_on is called by the template. This is the fragment of code used to modify the MAC address and to execute a script on the VirtualBox Host machine:

...
vbox_host_credentials=${zone_description//\"}

# Check if there is the @ sign, typical of ssh
# user@address
if [[ ${vbox_host_credentials} == *"@"* ]]
then
  # Create the command string
  command_to_execute="ssh \
    ${vbox_host_credentials} \
    '~/VBox_host_extensions/startvm \
     ${vm_to_start}'"
  # Execute the command string
  eval "${command_to_execute}"
...

D. VirtualBox host script

The script in ~/VBox_host_extensions/startvm is called by the MAAS script and executes the stratvm command locally:

...
start_this_vm=`vboxmanage list vms \
| grep "${1}" \
| sort \
| head -1`
start_this_vm=${start_this_vm#*\{}
start_this_vm=${start_this_vm%\}*}
VBoxManage startvm ${start_this_vm} \
           --type headless
...

The final result will be a set of VMs that are then ready to be used for example by Juju to deploy Ubuntu OpenStack, as you can see in the image below.

MAAS Nodes (Ready)

 

Next Steps

I am not sure when I will have time to review the scripts, but they certainly have a lot of space for improvement. First of all, by adopting a richer power management option, MAAS will not only power on the VMs, but also power off and check their status. Another improvement regards the physical zones: right now, the scripts loop through all the available VirtualBox hosts. Finally, it would be ideal to use the standard virsh library to interact with VirtualBox. I can’t promise when, but I am going to look into it at some point this new year.

Why we should care about wearable Linux

These thoughts have been inspired by the not-so-recent-anymore announcement of Apple Watch and by Jono Bacon’s post re Ubuntu for Smartwatches. They do not specifically refer to Ubuntu, and they reflect my personal opinion only.

Wearable devices are essentially another aspect of the Internet of Things. IoT is now expanded to include your clothes and tools, not to mention your body, or parts of it. In the next decade, people will experience the next generation of mobile applications, the ones that are now used to share your recent workout or your last marathon. Rest assured: there will be all sorts of new applications, and existing applications will be highly improved by the adoption of affordable wearable hardware.

Right now, the two main choices for wearable applications are IOS and Android platforms, understandably evolved to fit with the new hardware. A third and fourth option are based on Microsoft and Linux platforms, but at the moment there is very little to say about their ability to gather a good slice of the market.

Fact is, a market dominated by two companies who are already controlling 99% of the mobile applications, is not a good thing. Indeed, it is in their rights to take advantage from the huge investments they have made to build IOS and Android, but that does not mean that the world should not have other alternatives. Although it is not a walk in the park, Linux is the only platform that can rise as truly independent and have a chance to compete in this war.

Why wearable Linux is important to consumers

The majority of the applications we use are free. The cost of the platform, whether it is IOS or Android, is paid in other ways. One of them is the control over the applications available to the consumers. Apple and Google can decide what goes in and what stays out of your mobile devices, and the same will happen for your wearables. Any attempt to escape from this can be easily prevented by the owners of the technology. So, in simple terms, consumers have less choices and eventually less freedom.

Another aspect which is pretty concerning nowadays is data collection. This is and has always been the most controversial aspect of mobile applications. Terms and conditions may allow providers to take advantage from the the data collected through your smartphone or from your wearable in the way they want. It could even be “their data”, and for the consumers it would be just another way to pay for the “free application”. In the best case scenario, the data will not be used individually, i.e. it will not be connected to your name, but it will be aggregated together with many other users in order to analyse behaviours, define strategies, sell trends and results. In one way or another, rest assured that the data collected is a highly valuable asset for the technology providers.

Now, take these two aspects and consider them in connection to the availability and adoption of wearable Linux. A truly open platform, perhaps controlled in terms of technical choices but not in the way it will be used, will not prevent in any way the control of an app store or the data collection, but it will give you choice. Companies may be application providers or may even create their application store, more or less open than Google and Apple. Eventually, people will choose, and they will have more options. The choice is likely to be based on the applications available, but again it is up to the manufacturers and developers to offer an appealing product. This is not dissimilar from the choice of a Cloud infrastructure – for example comparing what you can do under the full control of a single provider in AWS or what goes with the adoption of OpenStack. To some extent, wearable Linux is just an important piece of a jigsaw that goes into the definition of an open infrastructure for IOT.

Why wearable Linux is important for developers and manufacturers

Take a look at the most prominent open source projects available today: the majority of them is backed by foundations and by companies working together, and then competing to provide the best solution on the market. This is not the case for Android, which is fully controlled by Google, not to mention IOS. Sure, Google is keeping Android open and is making it available to any developer and manufacturer. In a similar way, Apple takes pretty good care of its developers, but both companies  simply have their own agenda. A wearable stack which is part of a full platform developed within a foundation would have its internal huge politics, but also more contribution and independence. Put simply, there will be more players and more opportunities to do business in the biggest revolution in computers after the semiconductors.

From a more technical point of view, developers would immensely benefit from the use of the full Linux stack. If you look at the way Android and IOS have evolved, you will realise that the concept of lightweight has significantly changed since their early versions. On the other hand, some of the libraries available on Linux are not in Android, and this is a significant limitation for developers. Having a common platform where portability from the mainframe to a wearable, is a big, huge asset, and it is not inconceivable in modern scale-out infrastructures. Interoperability, robustness, rapid development and ease of deployment are not trivial aspects of a Linux wearable stack, something that would increase the amount of applications and competition in the market.

The essential takeaway is that wearable applications are a great opportunity to expand the current market of mobile infrastructures in order to introduce more players and competition, and to be ready for the next revolution of the IOT. Whether or not new and old companies will take this opportunity is still to be seen, but for us consumers it would only be positive in the long term.

 

From Mavericks to Trusty – Intro

I have been on Mac HW and OS X for 8 years now. I remember the evening in Santa Clara, when Colin Charles showed me his MacBook Pro 15”, still based on the Motorola chipset, and OS X. I fell in love with the sleek design of the MacBook, the backlight that in 2006 was like a “dough!” something so obviously helpful that nobody else had thought about it.

I moved from my Sony Vaio Z1 to a MacBook Pro 15” with Intel chipset in no time. I was so pleased I got rid of Outlook and the other clunky office tools, to use Apple Mail and others. I loved it so much.

Then the iPhone came, and then the iPad, and with it IOS and OS X were in some way converging. I now have an i7 MacBook Air 11 with 8GB RAM and 1/2TB Flash Drive, a dream machine for me: feather-light, real 5-6 hours batteries (the way I use it), all the power I need, with an obvious limitation in the non-retina small screen that I top up with a USB monitor when I really need to. The software improved, but it also changed a lot. OS X Mavericks is no longer the sleek and non-intrusive OS that I saw on my old MacBooks. There are lots of great features, but also some very annoying issues that I really do not like. Perfection is something you must aim at, but you will never reach it: this is the reality for software too.

In my work, I always needed to use Linux, in a way or another. So far, I used VMs and cloud instances, but now this is not enough. I need to move the core OS too, and I found this exciting. I am going to replace OS X with Ubuntu, specifically 10.9 Mavericks with 14.04 LTS Trusty.

I am not going to replace my hardware. I did some research, and the closest non-Apple laptop that I may want to use is the Dell XPS 13 Developer Edition, but still, it is far, for many aspects, from the features that a MacBook can provide. But my decision is not only based on pure technology features. For 8 years, I have been spoiled by Apple. I found a limited but very clear choice of hardware, software and accessories available on the Internet and in the Apple Stores. On top of that, I have highly valued the reliability and the lifetime of a MacBook model.

Here is an example. In my search, I stumbled in a very interesting laptop, the Lenovo X1 Carbon. It was not a laptop of the size I was looking for, but I was intrigued by the performance and by its features. I looked at reviews and videos, then I wanted to check the official site, and here comes the surprise. For a laptop that wants to be at the top of the range and with its innovative design also a landmark, I found outdated web pages, I could not buy it online on official sites (certainly I can browse and find it on eBay), the “where to buy” section of the website pointed me at stores that showed all the laptops in a random way, many sites did not have that model at all. After a while, I simply gave up. The X1 can be purchased in the US from the Lenovo website.
Dell was different. In this case, there is a reliable source, which is the online store. Online, you can select, configure, check the specs and buy a laptop and you know what to expect inside the parcel delivered at doorstep. Dell also gives you a sense of continuity, with the product lines that have evolved a lot, but they still share a positioning and a target market with the previous models.

But as I said, I will stick with Apple hardware. It has been a difficult decision, since I know that Apple discourages in many ways users who buy their hardware then they install non-Apple software. Also, I know I am going to find issues with cards, components and new hardware. For example, my current MacBook Air has a PCI HD camera that does not have any Linux driver. But this, i.e. make my favourite hardware work against all odds, is a challenge that I like to take.

So, watch this space, I am going to update it with more info soon…