How to waste $5M on containerized infrastructure

“We’ve built a 70,000-node Mesos cluster for our developers, but they won’t use it. Can you help?” This was the beginning of a conversation with the VP of infrastructure operations in a very large and famous company. While an impressive feat to accomplish, it was also by far the largest containerized infrastructure setup I had seen that had gone unused—nor, sadly, was it an isolated incident.

I’ve talked about this encounter with a large number of customers, analysts, friends, colleagues, partners, venture capitalists, and competitors. We all expressed similar experiences, and all wanted to know why this is so. After all, if so many resources are being wasted in our industry, we are all risking a great deal by not understanding and solving the problem. Otherwise, the next wave of adopters might start to doubt containers can help their businesses, and we would all need to starting polishing our resumes.

I have to be honest here: I am a developer, an engineer and a technologist who loves to build products and use the latest technologies. So, the first place I looked, in my quest to find an answer to this 70,000-node question, was the technologies used. Was Mesos the wrong technology? Was it implemented the wrong way? Did they use open source or closed source? Was there an SI involved? Questions like these came first to mind. In hindsight I think those were probably the wrong questions.

The answer came to me when I remembered a day in my career 15 years ago:

Sitting at my desk as a developer in a large bank, I remember impeccably dressed salespeople coming and going into our meeting rooms, courting our VP of infrastructure and his team. They were from VMware, back then the company for virtualized infrastructure. I was just a developer at the bank, but not even my boss or his boss or even his boss’s boss were invited to any of the steakhouse dinner events the VMware people were hosting almost every week. VMware salespeople were only interested in the operations and infrastructure decision makers. Two or three months later, our team was told that a deal with VMware had been signed and we would be moving our services over to VMs soon, and shortly after this move took place over a couple of weekends.

Then one Monday morning, the services my team were responsible for were running on VMs instead of the old bare-metal servers with flashing blue lights and noisy fans. That was all. Our entire infrastructure was virtualized in a matter of months without much say from the developers, and while we were putting on some fake resistance for this change (and who likes change after all?) and grudgingly agreeing to be on standby over a couple of weekends, we couldn’t really tell the difference between the old and the new setup: Everything was the same. Our VM servers behaved and felt like “real” servers. I am sure we wouldn’t have been able to tell the difference in a double-blind test if anyone had conducted one.

Remembering those days made me wonder why the new containerized wave of infrastructure change doesn’t work the same way. Why can’t we build a Mesos or Kubernetes cluster over a weekend or two and send a memo to the developers with the subject: “Welcome to the future of infrastructure. You’re welcome!”?

The answer as we all know is that containerization is not going to work without developers’ involvement and buyin. Developers need to build applications for a containerized setup, but inherent to containers, with APIs like Kubernetes exposed across the software life cycle, is the imperative for developers and operators to change the way they work and communicate with each other. The reason for a 70,000-node shiny cluster that runs tumbleweed instead of business applications is that the tools we have built for this new transition are not addressing this fundamental and essential organizational change, the meshing together of devs and ops.

The exciting reality is, setting up containerized infrastructure is getting easier, as there is an abundance of open source solutions that get you up and running with a Kubernetes cluster. If you are already running on a major cloud provider, you are simply a couple of clicks away from having your own containerized cluster, managed, serviced, and billed by the minute. The benefits of running a containerized infrastructure are visible to operations teams: single-configuration servers (no more “snowflakes”), built-in high availability and resilience, and improved resource utilization, to name just a few. Developers also see the value of running in a containerized setup: more influence over the running environment, improved control over libraries and dependencies, and narrowing the gap between production and development environments are some of those.

Each side of this equation (devs and ops) has its own vendors, tools, and open source projects to help them with what it takes to move to a containerized world—but that’s not enough. We are still missing the framework for devs and ops to work together to make this a success. There are simply very few, if any, tools and technologies available that facilitate this communication.

We are all so focused on our individual areas of innovation—from network to storage and orchestration—that we can lose focus on our customers’ achievement of their business goals. In such an environment, system integrator, consultancy and professional services companies do well, as they are the only ones who are focused on the result and on delivering across the software supply chain; but this is not sustainable. Technologies that require customers paying so much to consultancies to make them work are not going to be breakthrough technologies. Let’s face it: If virtualization needed McKinsey to be ever-present on the payroll for it to work, there would be no cloud today.

For us all to benefit from a breakthrough technology like infrastructure containerization, we need to think more broadly than our single-purpose tools or primary focus areas and rethink the way we build products for this industry. This is different from the revolution of virtualization and cloud, and the sooner we realize that, the greater the benefits to our customers.

Devops is not just a bunch of fragmented tools or a fancy “digital transformation” project, it is a method of working collaboratively between functions, enabled by technology. Therefore, any technology aimed at the devops market, specifically around containerization, also needs to address the continuous collaboration mindset before anything else. So, let’s all build products with that in mind to start and maintain a conversation between developers and operators.


This post was first published here


SaaS economy in the age of containers

The killer feature of any SaaS business is the massive reduction of operational cost and complexity including setup, configuration, ongoing maintenance and upgrades. Given the runaway success of SaaS in recent decades, we can clearly see that using a SaaS model instead of traditional shrink-wrapped software has made perfect financial sense for many customers. This in line with a theory that says that is a constant level of complexity in a system at any given time—IT organizations can deal with this by investing in-house (investing in a team that handles complexity), or outsourcing to a partner or SaaS/PaaS/IaaS vendor (exchanging money for complexity). When one combines the latter with an OpEx vs. CapEx financial model, easy installation/setup, and flexible pay-as-you-grow options—it becomes be very difficult to justify any other delivery model for generic software.

SaaS businesses, on the other hand, make this model profitable by distributing the operational running costs between many customers using (almost always) a subscription-based model. While legacy software delivery models share the same cost-spreading principle on the R&D level, they lack the ability to arbitrage the operational cost at the delivery level: it is costly to deliver a secure, highly available and continuously upgraded software at scale—and it takes a skilled team of developers and operators to deliver software within the customer-defined SLA boundaries, over time.

Since a large portion of the cost of SaaS development and delivery goes into building the robust and secure infrastructure needed to host the service, vendors make money by building and breaking up a large, robust infrastructure into smaller pieces of the same quality, and sells those pieces to many customers. SaaS infrastructure usually consists of many components—from databases to load balancers—each configured specifically to deliver the service in a particular way, with component-level high-availability (HA), redundancy and security requirements. Think of a typical CRM SaaS: you will need a multi-zone replicated database server, a group of load-balanced and securely firewalled frontend servers, and a cluster of servers to take care of background jobs and housekeeping of the system.

As an example, to keep the details of your 2,000 customers, you will need about 12 servers, two load balancers and several gigabytes of storage; on top of that, add the Ops team cost of maintaining those databases and servers—all of this probably represents a cost of $20k per month just to get going. To make things worse, even with this investment, you are not going to get anywhere near the five-9’s (99.999%) uptime that a SaaS vendor is going to give you at a fraction of the price. In this scenario it makes perfect sense to sign up for a SaaS alternative, paying $2,000 per month for a service that’s always up, upgraded and backed up.

This, however, might change

To see why, it’s worth understanding why running a highly available, secure and robust infrastructure is so expensive. When it comes to infrastructure, “the chain is as strong as its weakest link”. High availability and security cannot be achieved by only making parts of the system highly available and secure—this needs to be done across each and every component, which adds to the cost and complexity, increasing the bill even further.

Now, consider if all those requirements were built into a generic, self-healing, hyper-scale infrastructure, so any application running on top of it was inherently highly available, redundant and secure. This is the promise of containers. Instead of spending time on each service being delivered at a high SLA, the infrastructure takes care of this at the lower level, and provides these attributes as a service to the user. By doing this, containers take away one of the biggest benefits of the SaaS delivery model: infrastructure arbitrage, which I defined earlier in the post. 

Container-based infrastructure systems like Kubernetes allow companies of any size to build their own custom, highly-available and robust infrastructure on top of private data centers or public clouds, at a high granularity and flexibility, without compromising much in return. In this new world of container-based infrastructure, IT teams spend their time on building and maintaining a few Kubernetes clusters, while external vendors and in-house developers use those clusters to provide services to their clients.

It will probably take years to get to a point where this shift affects the SaaS industry at a significant level. However, if we look carefully, we can already see savvy IT teams who are looking to bring on this future: building pipelines for their code, as well as application management stacks that unlock the automation of containerised infrastructure, on both public and private cloud. 

The SaaS delivery model still has a lot of great things going for it—for one, it is now the dominant model for consuming software, wherever it sits or however it is procured. However, infrastructure arbitrage is not going to be one of its key advantages for long. While cloud computing was the killer application for virtualization, changing the economy of SaaS might be the killer application for containerization.

This post was first published here