Businesses have bought into the Cloud/SaaS model to improve their performance by offloading the heavy lifting work to the Cloud Service Providers (CSPs) and software-as-a-service (SaaS) providers. Many of these providers are delivering on this promise, releasing a stream of new features and functionality, but businesses are struggling to keep pace.

In the rush to modernise, many Enterprises have not realised how some of their core business practices are being derailed by modern software development processes that have not taken into account the reality of how enterprises operate. By addressing these issues businesses can uncover large optimisation opportunities, reduce their carbon emissions, and increase agility.

Enterprises are highly complex, very large scale and full of history. Much of that history is filled with Re-Orgs, Mergers & Acquisitions, and other changes in the strategy, as the business changes quickly to adapt to an evolving market. These changes in direction are critical for a business to survive, and that criticality will drive focus to be targeted on the new strategy, at the expense of the now ‘legacy’ systems.

Some examples that may be familiar:

“The Gamma team will be merged with the Omega team. The priority for all team members is the Omega roadmap for this year”
“We are now part of even-bigger-corp.com. Over the next 6 months we will be merging the following departments together…”
“Project DeathStar had a good PoC, but we’ve decided to reallocate funding to other more critical areas of the business due to a change in our high-level strategy”
“We’ve been using vendor-A-tool for our SCM and CI/CD, but want to switch to big-vendor-B-tool, because our business can get big-vendor-B-tool at very low cost as part of a wider bundle deal“

In previous times, many of the processes related to the legacy systems were manual, for example when installing new builds of software onto a server. There are two aspects of the older techniques worth noting:

The scale at which new artefacts were being created was relatively slow.
If team members were moved onto other projects, the manual processes that were driving growth in costs stopped when the people stopped working on the systems.

The automated software development processes we operate these days are a very different beast. This automation brings huge benefits in terms of stability and reliability to software systems, however many of these processes are not being designed with business agility in mind. As businesses attempt to quickly adapt in the ever-changing world of technology, modern software development practices coupled with incomplete migration strategies are leaving a wake of complex waste in their path.

What went wrong?

Modern cloud software development practices automate areas like Infrastructure-as-Code, Data pipelines, CI/CD, and Observability, allowing the application teams to focus on delivering real business value. However, when businesses change direction, it can result in unintended consequences:

The startup-mode mentality needed to get the new project up and running stomps all over the weak cloud environment controls, generating large amounts of stale resources.

Often when systems are deprioritised the first change that happens is the movement of team members. The core infrastructure delivering the application may get decommissioned, but all the related software development systems get left running.

If you’re a software engineer, how familiar do the following scenarios sound?

We need to get the CI/CD and Infra for this new project up and running quickly, let’s get a basic pipeline up and running, producing artefacts and deploying them to the Dev environment. We need to get this deployed quickly to help collect feedback on the system as soon as possible. Don’t worry about building out a complete solution, just get the bare bones running any way you can for now – our main focus is creating the application itself.

We seem to have quite a lot of old data and automation running, but we’re not completely sure it’s safe to remove it. We’d need to spend some time to investigate, but we need to prioritise the new features we’ve agreed to deliver for the customer first.

The new Re-org means we are now part of team Falcon, not team Pidgeon. Falcon is the project initiative started by our new head of department. All our focus should be placed on project Falcon. We’re not sure what will happen to project Pidgeon – the automation is all setup and ticking along, so we’ll just leave it running because no one wants us working on it, and no one is asking us to shut it down. Maybe it will come in useful in the future…

These scenarios happen all the time, creating silent data explosions. New projects are automating the creation of vast amounts of new data that is expected to be made more efficient at some later date, and as projects slip into End-Of-Life, much of the automation is left running. These anti-patterns drive the delivery of stale data and infrastructure into your business.

The existing controls that many businesses use to cover these risks are often only skin deep:

Governance controls often fail to stop new projects from creating inefficient processes. Even if these issues are flagged at a later date, the team will be tasked with prioritising new feature delivery over ‘technical debt that can be sorted out later’.
Governance over end-of-life applications often only focuses on closing the cloud accounts the application was running in, without looking at the wider software development system that exist around the application

A lot of this waste will show up as nicely optimised waste through your controls:

CI/CD system is left running. This might be opted out of standard FinOps measurements because even if the system’s infra is underutilised, they didn’t want automated policies stopping the core CI/CD system. It lives forever.
Many applications with less-than-optimal CI/CD systems have build processes triggered by CRON schedules, which keep running, churning out daily build artefacts into storage.
Backend data pipelines and their event triggers are left running.
All the logging/alerting systems are still in place and running, with dashboards no one needs.

Eventually these issues might come up on some form of internal radar within the business – some examples could be when systems related to an application:

Start to look unoptimized from a FinOps perspective
Get flagged for new security vulnerabilities
Stand out because they are slowing down the organisation’s ability to migrate onto a new service.

The above options are only possibilities if we assume there were cost & resource allocation processes in place to properly identify the resources in question. What if the resources relate to a PoC that got dropped, an MVP where the team were planning to ‘sort all the tagging out later’ or only the actual application infrastructure was tagged and not the systems used for developing and supporting the software? In this case, you end up with the same problems, but the resources are not attributable to anyone in the business. Now the detection of the problem looks like this:

Unknown resources start to look unoptimised from a FinOps perspective
Unknown resources get flagged for new security vulnerabilities
Unknown resources stand out because they are slowing down the organisation’s ability to migrate onto a new service.

It starts to look like a bit of a tricky problem to fix. But we’re not done yet, as this is where the ‘Enterprise scale-combo-multiplier-smackdown’ comes in. The above examples are for one project. At that scale, not much of a problem for the business. However, within enterprises, this is happening across large swathes of applications, multiple times, over long periods of time. How many re-orgs and team changes has your organisation gone through since you started delivering automated processes?

How bad is the problem?

Over time these problems generate very large amounts of waste, that no one owns. This is the cause of some big problems:

Increased CO2 emissions and Cost
- Sometimes the direct costs attributable may not be that large on the surface, however the impact on CO2e is very significant (see video).

https://www.youtube.com/watch?v=gGLQe_TA38g

Increased inertia within the business
- It’s very hard to migrate to new services if the existing systems are bloated with junk that no one wants to take responsibility for. This can cost a business a great deal of money, that may appear as
- Failures to turn off legacy systems after migration, as no one will take responsibility for the remaining legacy artefacts. Now the business is paying for both the old and the new systems, destroying any expected business gains.
- Over specified business requirements, derived from analysis of the existing bloated legacy configuration. Now the business overpays for unnecessary capacity in the new services.

Businesses are paying the Cloud and SaaS providers to help them stay up to date…but the lack of agility within these businesses means they cannot make use of the latest technologies on offer

Within Enterprises, these problems can remain hiding in plain site for a long time. The first time they may be noticed is when the issues surface at the platform level, as people start to question why the software development processes are using such large sizes of resources.

FinOps: “Why are we paying for 300 TB of storage for our artefact repository and supporting 20,000 Git repos, despite only having 2000 applications in production?”

The first few times these questions get asked, teams will probably default to a defensive stance. This can result is dialogues like the following:

Platform team: “We just provide the Storage/CI-CD/Logging service. Our company does DevOps, meaning the app teams ‘build it, run it, own it’. It’s up to the app teams to take responsibility for their systems!”

App Teams “That’s not our App, or our data. Our company does DevOps, meaning the app teams ‘build it, run it, own it’. It’s up to the app team who own these artefacts to take responsibility for their systems!”

In short, no one seems very motivated to fix the problem, until it’s so big that it starts to be considered as a material threat to the business, much like lax security controls are nowadays.

Potential Solutions

Addressing this problem involves creating software development practices that work in empathy with your business agility. Re-orgs and similar changes will continue to happen, or your business will stagnate and die. You need your systems to be empathetic to this reality.

Don’t be fooled into thinking you can fix this with a documented process for how applications must all contain exit strategies. In this situation, when the time comes to run the exit strategy, the people needed to action the shutdown are already gone.

Instead, look for the anti-patterns in your systems and fix them at source (and see our upcoming articles on anti-patterns)

Weak controls across the software development environment
Stale ‘Golden source’ data
Data and infra explosions through automation
Nothing is getting deleted

Fixing these problems involves detecting these anti-patterns and using graceful resolutions that work in concert with your application teams, not in opposition. Developing these types of processes at scale requires a cross functional team comprised of DevOps, FinOps and business capabilities.

The business capabilities help drive and fund the initiative.
The DevOps capabilities provide knowledge of the systems being affected.
The FinOps capabilities provide techniques to deal with the problems in a structured framework.

The team who delivers this capability need to be proactive, because other areas of the business will not be motivated to solve this problem as a priority.

It’s important to view the problem through the correct context. Rather than ‘we should clean this stuff up because it’s waste’, a more effective way to view this problem is ‘how do we define and enforce our business’ software development practices?” All businesses are different, however when you view the problem this way you start to see how these types of solutions live above the application team level and find a much better fit at the platform level, working as part of a business’ SecOps, DevOps and FinOps systems.

With hindsight, it’s easy to see this problem, but this is not a problem that occurred overnight. Instead, the problem stems organically from how modern software practices have gradually increased over time. Anyone who sees this problem within their business should know they are not alone – this issue is affecting lots of large-scale businesses. Businesses that address this issue will increase their agility in the arena of software development, allowing them to outperform and undercut their competition.

About Devoteam A Cloud

With 500 clients across Europe, Devoteam A Cloud offers excellent know-how on AWS technologies since 2012. Our team of 550+ AWS experts supports customers with scalable infrastructure, new ways of thinking and operating enabled by AWS so that they can explore new possibilities, re-invent their business, and evolve into an enterprise platform.

Devoteam A Cloud is AWS Premier APN Consulting Partner, with 4 competencies: DevOps, Data & Analytics, Security, Migration. In 2021, it was awarded AWS APN Migration Partner France, following 2020, where Devoteam was awarded AWS APN Consulting Partner of the Year.

At Devoteam we specialise in this type of scale complexity. By combining the latest skillsets in Cloud technologies with decades of experience working within enterprise businesses, we can help businesses build out these capabilities internally.

Andrew Thompson
Principal Consultant
FinOps Devoteam UK