How To Avoid Overspend on Your Cloud Resources 

One of the reasons I loved being a CIO was that it gave me hands-on insight into what kind of new tech was coming down the pike. It also gave me a direct look at the problems created by new technologies. I always found the latter of these two insights to be the most compelling and informative. Good technologies — groundbreaking technologies, ones that deliver exponential improvements over whatever came before — always rise to the top. I knew that whatever transformational opportunities lay ahead would never be a secret. What I was far more curious about was the unintended ways they’d impact IT environments and, of course, my job. 

Back in 2016, about a year into my time at CIO at Pure Storage, I started to notice a trend. As more workloads moved to the cloud, cost was becoming harder and harder to predict. Now in most organizations, the CIO reports to the CFO. We were no different. And the last thing you want to tell your CFO is that you’re not sure what something is going to cost and your costs jumped unpredictably. It's far worse when what you’re unsure about is potentially your organization’s most significant spend. There were a multitude of startups that recognized this emerging issue, and it seemed like every month the number increased. My inbox was literally jam packed with startups promising they could help me triage my cloud costs. 

But we didn’t simply need a temporary fix. We needed to be able to predict those costs longer term, and we needed a plan for the inevitability of the unexpected. 

Since then, the problem has only grown. Software is embedded into product strategy today more than ever before, and most times that software is built on cloud platforms. We’ve also seen an increase in services offered by those cloud platforms — build and distribution has become easier and easier over time. As a result, companies are spending more on cloud platforms than ever before, while governance and discipline around use has lagged way behind adoption of the tech. 

Palvi Mehta, a friend and colleague as well as former CFO of ExtraHop Networks and Operating Partner/CFO at Pioneer Square Labs, is intimately familiar with this runaway problem from a financial perspective. The problem of unchecked and unpredictable cloud spend quickly became a top priority when one month the AWS spend far exceeded what was anticipated and budgeted and impacted both margins and EBITDA.  There were so many different groups using cloud resources from sales engineering, customer success, marketing, test, and software engineering that optimizing the spend was a challenge in itself.

Mehta notes that when it comes to cloud, most organizations start small — storage, or apps. The spend is minuscule at the beginning and flies somewhat under the radar from a budget and group spend perspective. Over time, more and more shifts to the cloud. The spend can start to spike quickly and relatively unnoticed. Usage typically extends across multiple groups that have no insight into one another’s needs. 

What’s needed now is a balance between engineering agility, and financial control and predictability. Agility is what the cloud gives us — it allows us to move quickly, and at scale, regardless of the limitations of our own physical infrastructure. But what about control and predictability? That’s the missing piece, or has been. OpEx costs are increasing at a pace we’ve never seen before, to a point where it’s not just impacting the bottom line but margins and earnings. It’s something that requires focus across the board. Good, prudent fiscal behavior requires it. 

There are solutions that acknowledge there’s an issue, but most essentially provide a degree of after-the-fact reporting and a look backward in the rear view mirror. They don’t provide proscriptive action. These approaches fail to enable dynamic operation in a constantly changing cloud environment.

Managing these costs comes down to one key principle — DevOps teams are not financial engineers, they’re software engineers. Palvi explains that at the end of the day, the people actually using and managing public cloud resources within an organization aren’t concerned about optimizing spend. Their performance is measured by different deliverables which translates to a different set of priorities. For engineers and DevOps teams reliability and usage are far more important than costs. Putting gates around spending can slow the process of innovation, delivery, customer service, and customer satisfaction.

Conversely, financial teams are incentivized to optimize spending; however they are not enabled to see and understand cloud costs at the level of depth needed to take meaningful action. This presents a classic example of the principal-agent problem. Finance and executive team members control the budget, but cannot operationalize any changes without the involvement of engineering and DevOps team members. 

From a cost-optimization perspective the simplest fix for Finance teams to implement is to buy longer term commitments with deep discount rates for resources their teams have been using, such as 3 year reserved instances but that isn’t necessarily the best solution. According to Mehta, that doesn’t allow for the unexpected. COVID, for example, made demand skyrocket for some organizations and all but disappear for others. It’s not only money that is locked into commitments, but the actual infrastructure covered by them as well. The inability to change infrastructure based on the needs of the product takes away the agility promised to DevOps by the cloud. Also, what CFO wants to be locked into spend on something that can be highly variable.

Organizations need to put a plan together that considers not only financial concerns but also engineering concerns. Each organization must find the right balance of flexibility and savings that aligns with their business projections. This is no easy task and requires weighing thousands of different purchasing strategies for their infrastructure. Even if done successfully, there is no guarantee the commitments selected will be the right ones in six months. 

Enter Archera, a cloud resource automation solution that proactively manages and de-risks an organization's cloud resources. Archera enables finance and engineering teams to optimize their mix of instances in a way that maximizes financial efficiency without any adverse impact on delivery to customers.Their approach continuously manages commitments to maximize flexibility and provides guaranteed buy-back of commitments that may go unused. This means teams can leverage the discounts provided by 3 year reserved instances, without actually being stuck to that contract as things change.

To accomplish this, they’ve taken a different approach in a few key areas that will help organizations drive more meaningful outcomes. 

  1. Organizational Context - they’ve displayed a keen recognition that cloud resource management has required highly involved cross-team collaboration between DevOps and Finance that is simply not feasible to maintain given competing priorities and time limitations. They empower finance teams to get the visibility they need without requiring the continuous involvement of DevOps teams. 
  2. Maximizing Benefits without Risk- Archera is the only platform that offers guaranteed buy-backs of commitments through an innovation they developed known as “Leased RIs”. This allows customers to select commitments with higher savings without the risk of getting stuck with infrastructure that no longer meets their needs.
  3. Built for Dynamic Operation in the Cloud - maintaining the ability to respond and adapt to rapidly changing cloud environments was prioritized in building Archera. Their algorithms are able to fit new pricing models, shifting infrastructural needs, and changing budgetary constraints in real time allowing for users to maintain an optimized set of commitments across multiple cloud platforms.