How to waste $5M on containerized infrastructure

“We’ve built a 70,000-node Mesos cluster for our developers, but they won’t use it. Can you help?” This was the beginning of a conversation with the VP of infrastructure operations in a very large and famous company. While an impressive feat to accomplish, it was also by far the largest containerized infrastructure setup I had seen that had gone unused—nor, sadly, was it an isolated incident.

I’ve talked about this encounter with a large number of customers, analysts, friends, colleagues, partners, venture capitalists, and competitors. We all expressed similar experiences, and all wanted to know why this is so. After all, if so many resources are being wasted in our industry, we are all risking a great deal by not understanding and solving the problem. Otherwise, the next wave of adopters might start to doubt containers can help their businesses, and we would all need to starting polishing our resumes.

I have to be honest here: I am a developer, an engineer and a technologist who loves to build products and use the latest technologies. So, the first place I looked, in my quest to find an answer to this 70,000-node question, was the technologies used. Was Mesos the wrong technology? Was it implemented the wrong way? Did they use open source or closed source? Was there an SI involved? Questions like these came first to mind. In hindsight I think those were probably the wrong questions.

The answer came to me when I remembered a day in my career 15 years ago:

Sitting at my desk as a developer in a large bank, I remember impeccably dressed salespeople coming and going into our meeting rooms, courting our VP of infrastructure and his team. They were from VMware, back then the company for virtualized infrastructure. I was just a developer at the bank, but not even my boss or his boss or even his boss’s boss were invited to any of the steakhouse dinner events the VMware people were hosting almost every week. VMware salespeople were only interested in the operations and infrastructure decision makers. Two or three months later, our team was told that a deal with VMware had been signed and we would be moving our services over to VMs soon, and shortly after this move took place over a couple of weekends.

Then one Monday morning, the services my team were responsible for were running on VMs instead of the old bare-metal servers with flashing blue lights and noisy fans. That was all. Our entire infrastructure was virtualized in a matter of months without much say from the developers, and while we were putting on some fake resistance for this change (and who likes change after all?) and grudgingly agreeing to be on standby over a couple of weekends, we couldn’t really tell the difference between the old and the new setup: Everything was the same. Our VM servers behaved and felt like “real” servers. I am sure we wouldn’t have been able to tell the difference in a double-blind test if anyone had conducted one.

Remembering those days made me wonder why the new containerized wave of infrastructure change doesn’t work the same way. Why can’t we build a Mesos or Kubernetes cluster over a weekend or two and send a memo to the developers with the subject: “Welcome to the future of infrastructure. You’re welcome!”?

The answer as we all know is that containerization is not going to work without developers’ involvement and buyin. Developers need to build applications for a containerized setup, but inherent to containers, with APIs like Kubernetes exposed across the software life cycle, is the imperative for developers and operators to change the way they work and communicate with each other. The reason for a 70,000-node shiny cluster that runs tumbleweed instead of business applications is that the tools we have built for this new transition are not addressing this fundamental and essential organizational change, the meshing together of devs and ops.

The exciting reality is, setting up containerized infrastructure is getting easier, as there is an abundance of open source solutions that get you up and running with a Kubernetes cluster. If you are already running on a major cloud provider, you are simply a couple of clicks away from having your own containerized cluster, managed, serviced, and billed by the minute. The benefits of running a containerized infrastructure are visible to operations teams: single-configuration servers (no more “snowflakes”), built-in high availability and resilience, and improved resource utilization, to name just a few. Developers also see the value of running in a containerized setup: more influence over the running environment, improved control over libraries and dependencies, and narrowing the gap between production and development environments are some of those.

Each side of this equation (devs and ops) has its own vendors, tools, and open source projects to help them with what it takes to move to a containerized world—but that’s not enough. We are still missing the framework for devs and ops to work together to make this a success. There are simply very few, if any, tools and technologies available that facilitate this communication.

We are all so focused on our individual areas of innovation—from network to storage and orchestration—that we can lose focus on our customers’ achievement of their business goals. In such an environment, system integrator, consultancy and professional services companies do well, as they are the only ones who are focused on the result and on delivering across the software supply chain; but this is not sustainable. Technologies that require customers paying so much to consultancies to make them work are not going to be breakthrough technologies. Let’s face it: If virtualization needed McKinsey to be ever-present on the payroll for it to work, there would be no cloud today.

For us all to benefit from a breakthrough technology like infrastructure containerization, we need to think more broadly than our single-purpose tools or primary focus areas and rethink the way we build products for this industry. This is different from the revolution of virtualization and cloud, and the sooner we realize that, the greater the benefits to our customers.

Devops is not just a bunch of fragmented tools or a fancy “digital transformation” project, it is a method of working collaboratively between functions, enabled by technology. Therefore, any technology aimed at the devops market, specifically around containerization, also needs to address the continuous collaboration mindset before anything else. So, let’s all build products with that in mind to start and maintain a conversation between developers and operators.


This post was first published here


SaaS economy in the age of containers

The killer feature of any SaaS business is the massive reduction of operational cost and complexity including setup, configuration, ongoing maintenance and upgrades. Given the runaway success of SaaS in recent decades, we can clearly see that using a SaaS model instead of traditional shrink-wrapped software has made perfect financial sense for many customers. This in line with a theory that says that is a constant level of complexity in a system at any given time—IT organizations can deal with this by investing in-house (investing in a team that handles complexity), or outsourcing to a partner or SaaS/PaaS/IaaS vendor (exchanging money for complexity). When one combines the latter with an OpEx vs. CapEx financial model, easy installation/setup, and flexible pay-as-you-grow options—it becomes be very difficult to justify any other delivery model for generic software.

SaaS businesses, on the other hand, make this model profitable by distributing the operational running costs between many customers using (almost always) a subscription-based model. While legacy software delivery models share the same cost-spreading principle on the R&D level, they lack the ability to arbitrage the operational cost at the delivery level: it is costly to deliver a secure, highly available and continuously upgraded software at scale—and it takes a skilled team of developers and operators to deliver software within the customer-defined SLA boundaries, over time.

Since a large portion of the cost of SaaS development and delivery goes into building the robust and secure infrastructure needed to host the service, vendors make money by building and breaking up a large, robust infrastructure into smaller pieces of the same quality, and sells those pieces to many customers. SaaS infrastructure usually consists of many components—from databases to load balancers—each configured specifically to deliver the service in a particular way, with component-level high-availability (HA), redundancy and security requirements. Think of a typical CRM SaaS: you will need a multi-zone replicated database server, a group of load-balanced and securely firewalled frontend servers, and a cluster of servers to take care of background jobs and housekeeping of the system.

As an example, to keep the details of your 2,000 customers, you will need about 12 servers, two load balancers and several gigabytes of storage; on top of that, add the Ops team cost of maintaining those databases and servers—all of this probably represents a cost of $20k per month just to get going. To make things worse, even with this investment, you are not going to get anywhere near the five-9’s (99.999%) uptime that a SaaS vendor is going to give you at a fraction of the price. In this scenario it makes perfect sense to sign up for a SaaS alternative, paying $2,000 per month for a service that’s always up, upgraded and backed up.

This, however, might change

To see why, it’s worth understanding why running a highly available, secure and robust infrastructure is so expensive. When it comes to infrastructure, “the chain is as strong as its weakest link”. High availability and security cannot be achieved by only making parts of the system highly available and secure—this needs to be done across each and every component, which adds to the cost and complexity, increasing the bill even further.

Now, consider if all those requirements were built into a generic, self-healing, hyper-scale infrastructure, so any application running on top of it was inherently highly available, redundant and secure. This is the promise of containers. Instead of spending time on each service being delivered at a high SLA, the infrastructure takes care of this at the lower level, and provides these attributes as a service to the user. By doing this, containers take away one of the biggest benefits of the SaaS delivery model: infrastructure arbitrage, which I defined earlier in the post. 

Container-based infrastructure systems like Kubernetes allow companies of any size to build their own custom, highly-available and robust infrastructure on top of private data centers or public clouds, at a high granularity and flexibility, without compromising much in return. In this new world of container-based infrastructure, IT teams spend their time on building and maintaining a few Kubernetes clusters, while external vendors and in-house developers use those clusters to provide services to their clients.

It will probably take years to get to a point where this shift affects the SaaS industry at a significant level. However, if we look carefully, we can already see savvy IT teams who are looking to bring on this future: building pipelines for their code, as well as application management stacks that unlock the automation of containerised infrastructure, on both public and private cloud. 

The SaaS delivery model still has a lot of great things going for it—for one, it is now the dominant model for consuming software, wherever it sits or however it is procured. However, infrastructure arbitrage is not going to be one of its key advantages for long. While cloud computing was the killer application for virtualization, changing the economy of SaaS might be the killer application for containerization.

This post was first published here

Google's infrastructure for everyone else

Our office in San Francisco has communal bathroom facilities, which like other communal areas, gets thoroughly cleaned every morning. But everyday around 5pm, if you go to the men's restrooms you’re usually confronted with a pool of urine right in front of the urinals.

This is obviously not a pleasant experience for anyone, including those who contribute to the aforementioned pool of fluids. So it got me thinking: Why does this happen? And why would you not stand just that little bit closer to the porcelain to avoid pool formations by the end of the day?

I couldn't really think of a reason, but I thought of a solution: Printing a sign and attaching it at eye level that reads: You’re not as big as you think, please step closer!

I haven't tried this solution yet. I might report on the success or failure of it in a future post, however it made me think about a current industry trend: Google Infrastructure For Everyone Else or in short and cute form: GIFEE.

This term was coined by fellow container evangelist companies, who are trying to sell Google's way of managing its infrastructure to the rest of us. We’re told that Google is millions of light years ahead of everyone else in building and managing infrastructure. And we’re told that Google has been running containers in production for everything since the dawn of time in a system called Borg. We’re also told that products like Kubernetes are based on Borg and are built to help us benefit from their years of experience in the field.

I think most of what we’re told is true: Google is indeed light years ahead of many others in running infrastructure. I also have no reason to believe Google hasn’t been using containers in production, nor do I think systems like Borg don't exist.

I would, however, question two things: Kubernetes was built by Google to make us benefit from their expertise in running containers and that everyone is better off running infrastructure like Google.

The truth is, Google is unique. With all the talk about unicorns and the next Google and Facebook, the likelihood of your startup making it to unicorn league, let alone becoming the next Google is less than being hit by lightning, while eating an ice cream as you're swimming away from a shark attack during The X Factor final.

That's OK. Not being a unicorn with a valuation in billions and VCs falling over themselves to give you money, there’s a decent chance you can build a profitable business you can be proud of. Let's be honest with each other, you won't sign up Price Waterhouse Cooper (PwC) or Ernst and Young for your accounting, Merrill Lynch to run your current account, and attend Davos instead of the next Ruby Meetup.

But wait, doesn't Google use PwC and Merrill Lynch, and isn't Eric Schmidt part of the furniture at Davos? So why wouldn’t you do the same?

The answer is simple; those services are built for Google size. You don't find a company saying GAFEE (Google Accounting For Everyone Else). That would be ridiculously absurd, and we all know that. Interestingly, Google's accounts are more likely to look like a normal multinational than their infrastructure. I can think of at least a dozen companies that have the same accounting practices as Google: Unilever, Procter and Gamble, Glaxo Smith Klein, Volkswagen, Exxon Mobile, British Petroleum ... but none of them are anything like Google in terms of infrastructure sophistication, and we can imagine why.

“So what's the problem,” you might ask? "OK, we get the point, we’re not as big as Google and we don't use Google's accountants because their practices don't apply to us (or are too expensive).”

“But what's wrong with using Google's infrastructure when they’re giving it to us for free?" I hear you say.

In reality, it's not all about the nominal price. Getting Merrill Lynch to do your banking. even if it’s for free, might not be a good idea for your company because of the burden it puts on you and your admin department. The situation would be akin to taking a Formula One car to do the school run at best.

The issue is, by using tools that aren’t built for your goals, size. and achievable targets, you’ll be burdening your business with unnecessary complications that can be avoided both now and in the future. As software engineers, we’re familiar with Donald Knuth’s saying: premature optimization is the root of all evil.

Why does Google promote tools like Kubernetes? Google's promotion of containers is a lot about taking on Amazon. In short, Google has no way to take on AWS in their game of compute, network and storage -- the traditional blocks of cloud computing. But they have a lot of experience in running infrastructure that doesn't provide those traditional components since they’re using containers. By promoting containers as the building blocks of infrastructure, they’re hoping to leapfrog over Amazon to become the infrastructure setup of the future. Their advertisement campaign for Google Cloud Engine also points to this goal.

You surely noticed how I said “infrastructure setup of the future.” I see containers as the building blocks of this infrastructure (otherwise I wouldn't be spending every waking moment of my life building a business based on this premise). While I think we’re going to be better off building our next-gen infrastructure based on containers, I don’t think we all need to build and manage like Google via these super configurable and modular tools. Most of us need simple tools that just work and get out of the way to let us do what we should: build a business.

By using tools that aren’t right for our size, we run the risk of contributing to and stepping in a pool of urine as the day draws to an end. It’s not only harmful technically, the administrative costs can also quickly become a burden. As a start-up, we all want to believe we’re destined for greatness as the next big thing. Mentally, chasing after an aspiration to become the next unicorn makes us into an unsustainable business, addicted to where the next round of funding is coming from. We see this everyday in Silicon Valley, and trying to imitate Google’s infrastructure is just one aspect of this mentality.

And if all else fails, it's always useful to have a cautionary sign in front of us as a reminder: You’re not as big as you think you are, please step closer!

This post was first published here