The Problem with PaaS Market-sizing

Figuring out the market for PaaS has always been difficult. At the moment, I tend to estimate it at $20–25bn sometime in the future (5–10 years from now?) based on the model of converting the existing middleware and application development market. Sizing this market has been something of an annual bug-bear for me across my time at Dell doing cloud strategy, at 451 Research covering cloud, and now at Pivotal.

A bias against private PaaS

This number is in contrast to numbers you usually see in the single digit billions from analysts. Most analysts think of PaaS only as public PaaS, tracking just Force.com, Heroku, parts of AWS, Azure, and Google, and bunch of “Other.” This is mostly due, I think, to historical reasons: several years ago “private cloud” was seen as goofy and made-up, and I’ve found that many analysts still view it as such. Thus, their models started off being just public PaaS and have largely remained as so.

I was once a “public cloud bigot” myself, but having worked more closely with large organizations over the past five years, I now see that much of the spending on PaaS is on private PaaS. Indeed, if you look at the history of Pivotal Cloud Foundry, we didn’t start making major money until we gave customers what they wanted to buy: a private PaaS platform. The current product/market fit, then, for PaaS for large organizations seems to be private PaaS

(Of course, I’d suggest a wording change: when you end-up running your own PaaS you actually end-up running your own cloud and, thus, end up with a cloud platform. Also, things are getting even more ambiguous at the infrastructure layer all the time — perhaps “private PaaS” means more “owning” the PaaS layer, regardless of who “owns” the IaaS layer.)

How much do you have budgeted?

With this premise — that people want private PaaS — I then look at existing middleware and application development market-sizes. Recently, I’ve collected some figures for that:

  • IDC’s Application Development forecast puts the application development market (which includes ALM tools and platforms) at $24bn in 2015, growing to $30bn in 2019. The commentary notes that the influence of PaaS will drive much growth here.
  • Recently from Ovum: “Ovum forecasts the global spend on middleware software is expected to grow at a compound annual growth rate (CAGR) of 8.8 percent between 2014 and 2019, amounting to $US22.8 billion by end of 2019.”
  • And there’s my old pull from a Goldman Sachs report that pulled from Gartner, where middleware is $24bn in 2015 (that’s from a Dec 2014 forecast).

When dealing with large numbers like this and so much speculation, I prefer ranges. Thus, the PaaS TAM I tent to use now-a-days is something like “it’s going after a $20–25bn market, you know, over the next 5 to 10 years.” That is, the pot of current money PaaS is looking to convert is somewhere in that range. That’s the amount of money organizations are currently willing to spend on this type of thing (middleware and application development) so it’s a good estimate of how much they’ll spend on a new type of this thing (PaaS) to help solve the same problems.

Things get slightly dicey depending on including databases, ALM tools, and the underlying virtualization and infrastructure software: some PaaSes include some, none, or all of these in their products. Databases are a huge market (~$40bn), as is virtualization (~$4.5bn). The other ancillary buckets are pretty small, relatively. I don’t think “PaaS” eats too much database, but probably some “virtualization.”

So, if you accept that PaaS is both public and private PaaS and that it’s going after the middleware and appdev market, it’s a lot more than a few billion dollars.

(Ironic-clipart from my favorite source, geralt.)

Roles and Responsibilities for DevOps and Agile Teams

This post is pretty old and possibly out of date. There’s an updated version of it in my book, Monolithic Transformation.

Overview

There are a handful of “standard” roles used in typical Agile and DevOps teams. Any application of Agile and DevOps is highly contextual and crafted to fit your organization and goals. This document goes over the typical roles and their responsibilities and discusses how Pivotal often sees companies staffing teams. It draws on “the Pivotal Way,” our approach to creating software for clients, recommendations for staffing the operations roles for Pivotal Cloud Foundry (our cloud platform), and recent recommendations and studies from agile and DevOps literature and research.

I don’t discuss definitions for “Agile” or “DevOps” except to say: organizations that want to improve the quality (both in lack of bugs and “design” quality as in “the software is useful”) and uptime/resilience of their software look to the product development practices of agile software development and DevOps to achieve such results. I see DevOps as inclusive of “Agile” in this discussion, and so will use the terms interchangeably. For more background, see one of my recent presentations on this topic.

Finally, there is no fully correct and stable answer to the question of what roles and responsibilities are in teams like this. They fluctuate constantly as new practices are discovered and new technologies remove past operational constraints, or create new ones! This is currently my best understanding of all this based both organizations that are using Cloud Foundry and those who are using other cloud platforms. As you discover new, better ways, I’d encourage you to reach out to me so we can update this document.

Core Roles for DevOps Teams

There are generally two types of teams we see: “business capabilities teams” who work on the actual software or services (“the application”) in question and “agile operations teams” as seen below:

While not depicted in the diagram above, the amount of staff in each layer dramatically reduces as you go “down” the stack. The most number of people are in the business capability layer working on the actual applications and services, many less individuals are working on creating and customizing capabilities in Pivotal Cloud Foundry (e.g., a service to interact with a reservation or ERP system that is unique to the organization), while many, many less “operators” work at keeping the cloud platform up and running, updated, and handle the hardware and networking issues.

Making the Applications — Business Capability Roles

These teams work on the application or services being delivered. The composition of these teams changes over time as each team “gels,” learning the necessary operations skills to be “DevOps” oriented and master the domain knowledge needed to make good design choices.

These are the core roles in this layer:

  • Developer/Engineer — those who not only “code,” but gain the operations knowledge needed to support the application in production, with the help of…
  • Operations — those who work with developers on production needs and help support the application in production. This role may lessen over time as developers become more operations aware, or it may remain a dedicated role.
  • Product Owner/Product Manager — the “owner” of the application in development that specifies what should be done each iteration, prioritizing the list of “requirements.”
  • Designer — studies how users interact with the software and systematically designs ways to improve that interaction. For user-facing applications this also includes visual design.”.

These are roles that are not always needed and sometimes be fulfilled partially by shared, but designated to the team staff:

  • Tester — staff that helps with the effort ensure the software does what was intended and functions properly.
  • Architect — in large organization, the role or architect is often used to ensure that individual teams are aligning with the larger organization’s goals and strategy, while also acting as a consultative enabler to help teams be successful and share knowledge.
  • Data scientist — if data analysis is core to the application being developed, getting the right analysis and domain skills will be key.

Developer/Engineer

These are the programmers, or “software developers.” Through the practice of pairing, knowledge is quickly spread amongst developers, ensuring that there are no “empires” built, and addresses the risks of a low “bus factor.”Developers are also encouraged to “rotate” through various roles from front to back-end to get good exposure to all parts of the project. By using a cloud platform, like Pivotal Cloud Foundry, developers are also able to package and deploy code on their own through the continuous integration and continuous delivery tools.

Developers are not expected to be experts at operations concerns, but by relying on the self-service and automation capabilities of cloud platforms do not need to “wait” for operations staff to perform configuration management tasks to deploy applications. Over time, with this reliance on a cloud platform which cleanly specifies how to best build applications so that they’re easily supported in production, developers gain enough operations knowledge to work without dedicated operations support.

The amount of developers on each team is variable, but so far, following the two pizza team rule of thumb, we see anywhere from 1 to 3 pairs, that is 2 to 6, and sometimes more.

Operations

In a cloud native mode, until business capabilities teams have learned the necessary skills to operate applications on their own, they will need operations support. This support will come in the form of understanding (and co-learning!) how the cloud platform works, and assistance troubleshooting applications in production. Early on you should plan to have heavy operations involvement to help collaboration with developers and share knowledge, mostly around getting the best from the cloud platform in place. You may need to “assign” operations staff to the team at the beginning, making them so called designated operations staff instead of dedicated, as explained in Effective DevOps.

Many teams find that the operations role never leaves the team, which is perfectly normal. Indeed, the desired end-state is that the application teams have all the development and operations skills and knowledge needed to be successful.

As two cases to guide your understanding of this role:

  • Etsy has approximately 15 operations engineers to a few hundred other engineers, according to Effective DevOps.
  • As re-told by Diego Lapiduz at 18F, early on when teams were learning how to use and operate cloud.gov, he and a handful of other operations staff spent much of their time with development teams, getting intimately familiar with each application. Now, because the practice of designated operations staff is less needed, he and his operations peers are less involved and have little knowledge of the applications in use…which is good, and as intended!

As a side note, it’s common for operations people to freak out at this point, thinking they’re being eliminated. While it’s true that margin-berzerked management could choose to look at operations staff as “waste,” it’s more likely that following Jevon’s Paradoxoperations staff will be needed even more as the amount of applications and services multiply.

Product Owner/Product Manager

This is the role that defines and guides the requirements of the application. It is also one of the roles that varies in responsibilities the most across products. At its core, this role is the “owner” of the software under development. In that respect, they help prioritize, plan, and deliver software that meets your requirements. Someone has to be “the final word” on what happens in high functioning teams like this. The amount of control vs. consensus driven management is the main point of variability in this role, plus the topic areas that the product owner must be knowledge of.

It’s best to approach the product owner as a “breadth first role”: they have to understand the business, the customer, and the technical capabilities. This broad knowledge helps them make sure they’re making the right prioritization decisions.

In Pivotal Labs engagements, this role is often performed by a Pivotal employee, pairing up with one of your staff to train and transfer knowledge. Whether during a project or through workshops, they help you understand the Pivotal, iterative development approach, as well as mentor and train your internal staff to help them learn agile methods and skills, which will enable them to move on with confidence when the engagement is complete.

Designer

One of the major lessons of contemporary software is that design matters, a tremendous amount more than previously believed. The “small batch” mentality of learning and improving software afforded by cloud platforms like Pivotal Cloud Foundry gives you the ability to design more rapidly and with more data-driven precision than ever before. Hence, the role of a designer is core to cloud native teams.

The designer focuses on identifying the feature set for the application and translating that to a user experience for the development team. Activities may include completing the information architecture, user flows, wireframes, visual design, and high-fidelity mock-ups and style guides. Most importantly, designers have to “get out of the building” and not only see what actual users are doing with the software, but get to know those users and their needs intimately.

Testers (partial/optional)

While the product manager, and overall team are charged with testing their software, some organizations either want or need additional testing. Often this is “exploratory testing” where a third party (the tester[s]) are trying to systematically find the edge cases and other “bugs” the development team didn’t think of.

It’s worth questioning the need for separate testers if you find yourself in that situation to make sure you need them. Much routine “QA” is now automated (and can, thus, be done by the team and automated CI/CD pipelines), but you may want exploratory, manual testing in addition to what the team is already doing and verification that the software does as promised and functions under acceptable duress. But even that can be automated in some situations as the Chaos Monkey and Chaos Lemur show.

Architect (partial/optional)

Traditionally, this role is responsible for conducting enterprise analysis, design, planning, and implementation, using a “big picture” approach to ensure a successful development and execution of strategy. Those goals can still exist in many large organizations, but the role of an architect is evolving to be an enabler for more self-sufficient, and decoupled teams. Too often this role has become a “Dr. No” in most large organizations, so care must be taken to ensure that the architect supports the team, not the other way around.

Architects are typically more senior technical staff who are “domain experts.” They may also be more technically astute and in a consultative way help ensure the long-term quality and flexibility of the software that the team creates, share best practices with teams, and otherwise enables the teams to be successful. As such, this role may be a fully dedicated one who, hopefully, still spends much of their time coding so as not to “go soft” and lose not only the trust of developers but an intimate enough knowledge of technology to know what’s possible and not possible in contemporary software.

Data Science (partial/optional)

If your application includes a large amount of data analysis, you should consider including a data scientist role on the team. This role can follow the dedicated/designated pattern as discussed with the operations role above.

Data Science today is where design might have been a few years ago. It is not considered to be a primary role within a product team, but more and more products today are introducing a level of insight not seen before. Google Now surfaces contextual information; SwiftKey offers word predictions based on swipe patterns; Netflix offers recommendations based on what other people are watching; and Uber offers predictive arrival times of their drivers. These features help turn transactional products into smart product.

Other Roles

There are many other roles that can and do exist in IT organizations. These are roles like database administrators (DBAs), security operations, network operations, or storage operations. In general, as with any “tool,” you should use what you need when you need it. However, as with the architect role above, any role must reorient itself to enabling the core teams rather than “governing” them. As the DevOps community has discussed at length for nearly ten years, the more you divide up your staffing by function, the further you move from a small, integrated team, and your goal of consistently and regularly building quality software will become harder.

Agile Operations Roles

Roles here focus on operating, supporting, and extending the cloud platform in use. These roles typically sit “under” the Agile and DevOps teams, so the discussion here is briefer. Each role is described in term of roles and responsibilities typically encountered in Pivotal Cloud Foundry installs. These can vary by organization and deployment (public vs. private cloud, the need for multi-cloud support, types of IaaS used, etc.)

For a brief definition of a cloud platform, see “the use of a cloud platform”section below.

Application Operator

These are typically the “operations” people described above and serve as a supporting and oversight function to the business capabilities teams, whether designated or dedicated. Typical responsibilities are:

  • ŸManages lifecycle and release management processes for apps running in Pivotal Cloud Foundry.
  • ŸResponsible for the continuous delivery process to build, deploy, and promote Pivotal Cloud Foundry applications.
  • ŸEnsures apps have automated functional tests that are used by the continuous delivery process to determine successful deployment and operation of applications.
  • ŸEnsures monitoring of applications is configured and have rules / alerts for routine and exceptional application conditions.
  • ŸActs as second level support for applications, triaging issues, and disseminating them to the platform operator, platform developer or application developer as required.

A highly related, sometimes overlapping, role is that of centralized development tool providers. This role creates, sources, and manages the tools used by developers all the way from commonly used libraries to, version control and project management tools, to maintaining custom written frameworks. Companies like Netflix maintain “tools teams” like this, often open sourcing projects and practices they develop.

Platform Operator

The is the typical “sys admin” for the cloud platform itself:

  • Manages IaaS infrastructure that Pivotal Cloud Foundry is deployed to, or co-ordinates with the team that does.
  • Installs and configures Pivotal Cloud Foundry.
  • ŸPerforms capacity, availability, issue, and change management processes for Pivotal Cloud Foundry.
  • ŸScales Pivotal Cloud Foundry, forecasting, adding, and removing IaaS and physical capacity as required.
  • ŸUpgrades Pivotal Cloud Foundry.
  • ŸEnsures management and monitoring tools are integrated with Pivotal Cloud Foundry and have rules / alerts for routine and exceptional operations conditions.

Platform Engineering

This team and its roles are responsible for extending the capabilities of the cloud platform in use. What this role does per organization can vary, but common tasks of this role for organizations using Pivotal Cloud Foundry are to:

  • ŸMakes enhancements to existing buildpack(s) and builds new buildpack(s) for the platform. ŸBuilds service broker(s) to manage lifecycle of external resources and make them available to Pivotal Cloud Foundry apps.
  • ŸBuild Pivotal Cloud Foundry tiles with associated BOSH releases and service brokers to enable managed services in Pivotal Cloud Foundry.
  • ŸManages release and promotion process for buildpacks, service brokers, and tiles across Pivotal Cloud Foundry deployment topology.
  • ŸIntegrates Pivotal Cloud Foundry APIs with external tool(s) when required.

Physical Infrastructure Operations

While not commonly covered in this type of discussion, someone has to maintain the hardware and data centers. In a cloud native organization this function is typically so highly abstracted and automated — if not outsourced to a service provider or public cloud altogether — that it it does not often play a major role on cloud native operations. However, especially at first as your organization is transforming this new way of operating, you will need to work with physical infrastructure operations staff, whether in-house or with your outsourcer.

Supporting Best Practices

The use of a cloud platform

Much of the above is predicated on the use of a cloud platform. A cloud platform is a system, such as Pivotal Cloud Foundry, that provides the runtime and production needs to support applications on-top of cloud infrastructure (public and private IaaS), and often, as in the case of Pivotal Cloud Foundry, with fully integrated middleware services that are natively supported (such as databases, queues, and application development frameworks).

Two of the key ways of illustrating the efficiency of using Pivotal Cloud Foundry are (a.) the ratios of operators to application running in the platform, and, (b.) reduction in lead time. Here are some relevant metrics for Pivotal Cloud Foundry:

  • One large financial institution is currently supporting 145 applications with two operations staff.
  • 18F was able to reduce Authorization to Operate (ATO) from 9+ months to 3 days once auditors understood the automation in the open source Cloud Foundry instance at cloud.gov.
  • When planning the first product developed on Pivotal Cloud Foundry, CoreLogic allocated a team of 12 developers with four QA staff and a management team. The goal was to deliver the product in 14 months. Instead, the project ended up requiring only a product manager, one user experience designer and six engineers who delivered the desired product in just six months. As the second, third, and next projects roll on, you can expect delivery time to reduce even more.
  • A large retail user of Pivotal Cloud Foundry was running over 250 applications in non-production and nearly 100 applications in production after only 4 months using the platform.
  • Humana was able to launch an Apple Watch application in just five weeks.

Without a cloud platform like Pivotal Cloud Foundry in place, organizations will find it difficult, if not impossible, to achieve the infrastructure efficiencies in both development and at runtime needed to operate at full speed and effectiveness.

For reference, the below diagram based on Pivotal Cloud Foundry, is a “white board” of the functions a cloud platform provides:

For a gentle introduction to this stack, see the introductory discussion video at The New Stack. And, for even more discussion of the nature of cloud platforms, see Brian Gracely’s discussion of “structured cloud platforms” and Casey West’s write-up on the same topic.

Sizing: two pizza teams

While some sizing guidelines have been listed throughout, there are no hard and fast rules. The notion a “two pizza team” is a well accepted theory of software team sizes. As Amazon CEO Jeff Bezos is said to have decreedif a team couldn’t be fed with two pizzas, it’s too big. Without making too many assumptions about the size of a “large” pizza or how many slices each individual needs to eat, you could estimate this to around six to fifteen people. This may vary, of course, but the point is to keep things as small as possible to minimize communication overhead, context switching, and responsibility evasion due to “that’s not my job,” among other “wastes.”

The assumption is teams this small is that many of these small teams will organize to create big, overall results. Architectural approaches like microservices that emphasize loose coupling and defensive use of services help enable that coordination at a technical level, while making sure that teams are aware of and aligned to overall organizational goals help coordinate the “meatware.” This coordination is difficult, to be sure, but start looking at your organization a set of autonomous units working together, rather than one giant team. As an example, Pivotal Cloud Foundry engineering is composed of around 300 developers spread over 40 loosely coupled teams.

Dedicated, integrated teams lead to the best outcomes

Agile-think and DevOps teams seek to put every role and, thus, person, needed to deliver the software product from inception to running in production on the same team, with their time fully dedicated to that task. These are often called “integrated teams” or “balanced teams.”

IT has long organized itself in the opposite way, separating out roles (and, thus people) into distinct teams like operators, QA, developers, and DBAs in the hopes of optimizing the usage of these roles. This is the so called “silo” approach. When you’re operating in the more exploratory, rapid, agile cloud native fashion, this attempt at “local optimization” creates “global optimization” problems due to hand-offs between teams (leading to wastage from time and communication errors). A silo approach can also poor people interactions across team which damages the software quality and availability. “That’s not my responsibility” often leads it to being no one’s responsibility.

Most organizations have a “project” mindset when it comes to software as well. The functional roles emerge from their silos as needed to work on the project, and then disband once done. Good software development benefits from a “product” mindset where the team, instead, stays dedicated to the product.

Operating in this way, of course, is not always possible, if only at first as organizations switch over their mind-set from “siloed” teams to dedicated teams. So care must be taken when slicing a person up between different teams: intuitively we understand that people cannot truly “multitask,” and numerous studies have shown the high cost of context switching and spreading a person’s attention across many different projects. In discussing roles, keep in mind that the further you are from truly integrated, dedicated teams, the less efficiency and productivity you’ll get from people on those teams, and, thus, in the project as a whole.

More Reading

As described in the introduction, this is a quick overview of the roles and responsibilities for Agile and DevOps teams. The book Effective DevOpscontains much discussion of these topics and background on studies and proof-points. For a slightly more “enterprise-y” take, see the concepts in disciplined agile delivery which should be read as slightly “pre-cloud native” but nonetheless helpful.

Additionally, Pivotal Labs, and the Pivotal Customer Success and Transformation teams can discuss these topics further and help transform your organization accordingly. With over twenty years of experience and as the maintainers of Pivotal Cloud Foundry, one of the leading cloud platforms, we have been learning and perfecting these methods for sometime.

Thanks to the many reviewer comments from Pivotal folks, in particular Robbie CluttonCornelia Davis, and David McClure. And, as always, excellent post-post-ironic corporate clipart from geralt.

For a more recent, detailed study on this topic, check out my book Monolithic Transformation.

Self-motivated teams lead to better software

This post is pretty old and possibly out of date. There’s an updated version of it in my book, Monolithic Transformation.

In contrast to the way traditional organizations operate, cloud native enterprises are typically comprised of self-motivated and directed teams. This reduces the amount of time it takes to make decisions, deploy code, and see if the results helped move the needle. More than just focusing on speed of decision making and execution, building up these intrinsically motivatedteams helps spark creativity, fueling innovative thinking. Instead of being motivated by metrics like number of tickets closed or number of stories/points implemented, these teams are motivated by ideas like increased customer satisfaction with faster forms processing or helping people track and improve their health.

In my experience, one of the first steps in shifting how your people think — from the Pavlovian KPI bell — is to clearly explain your strategy and corporate principals. Having worked in strategy roles in the past, I’ve learned how poorly companies articulate their goals, constraints, and strategy. While there are many beautiful (and not so beautiful!) presentations that extol a company’s top motivations and finely worded strategies, most employees are left wondering how they can help day to day.

Whether you’re a leader or an individual contributor, you need to know the actionable details of the overall strategy and goals. Knowing your company’s strategy, and how it will implement that strategy, is not only necessary for breeding self-motivated and self-managed people, but it also comes in handy when you’re looking to apply agile and lean principles to your overall continuous delivery pipeline. Tactically, this means taking the time to create detailed maps of how your company executes its strategy. For example, you might do value-stream mappingalignment maps, or something like value chain mapping. This is a case where I, again, find that companies have much less than they think they do — often, organizations have piles of diagrams and documents — but — very rarely have something that illustrates everything — from having an idea for a feature to getting it in customer’s hands. A cloud native enterprise will always seek to be learning from and improving that end-to-end process. So, it’s required to map your entire business process out and have everyone understand it. Then, we all know how we deliver value.

The process of value-stream mapping, for example, can be valuable for simply finding waste (so much so that the authors of Learning to See cite only focusing on waste removal as a lean anti-pattern). As one anecdote goes — after working for several weeks to finally get everything up on the big white board, one of the executives looked up at all the steps and time wasted in their process to get software out the door and said, “there’s a whole lot of stupid up there.” The goal of these exercises is not only removing stupid, but also to focus on and improve the processes.

 

For a more recent, detailed study on this topic, check out my book Monolithic Transformation.

Addressing the DevOps compliance problem

Satisfying the mythical auditors is often one of the first barriers to spreading DevOps initiatives more widely inside an organization. While these process-driven barriers can be annoying and onerous, once you follow the DevOps tradition of empathetic inclusion — being all “one team” — they can not only stop slowing you down but actually help the overall quality of the product. Indeed, the very reason these audit checks were introduced in the first place was to ensure overall quality of the software and business. There’s some excellent, exhaustive overviews out there of dealing with audits and the like in DevOps. In this column, I wanted to go through a little mental re-orientation for how to start thinking about and approaching the “compliance problem.”

Three-Ring Binder Ninjas

In this context, I think of “auditors” as falling into the category of governance, risk and compliance (GRC) — any function that acts as a check on code as and how the code is produced and run as it goes through its lifecycle. I would put security in here as well, though that tends to be such a broad, important topic that it often warrants its own category (and the security people seem to like maintaining their occultic silo-tude, anyhow).

The GRC function(s) may impose self-created policies (like code and architectural review), third party and government imposed regulations (like industry standard compliance and laws such as HIPAA), and verification that risky behavior is being avoided (if you write the code, you can’t be the same person who then uses that code for cash payouts, perhaps, to yourself, for example). In all cases, “compliance” is there to ensure overall quality of the product and the process that created it. That “quality” may be the prevention of malicious and undesired behavior; that is, in a compliance-driven software development mindset, the ends rarely justify the means.

In many cases, the GRC function is more interested in proof that there is a process in place than actually auditing each execution of that process. This is a curious thing at first. Any developer knows that the proof is in the code, not the documentation. And, indeed, for some types of GRC the amount of automation that a DevOps mindset puts into place could likely improve the quality of GRC, ironically.

Establishing trust and automating compliance

Indeed, automation is one of the first areas to look at when reducing DevOps/GRC friction. First, treat complying with policies as you would any other feature. Describe it, prioritize it and track it. Once you have gotten your hands around it, you can start figure out how to best implement that “feature.” Ideally, you can code and automate your way out of having to do too much manual work.

There’s work being done in the US Federal government along these lines that’s helpful because it’s visible and at scale. First, as covered in a recent talk by Diego Lapiduz, part of what auditors are looking for is to trust the software and infrastructure stack that apps are running on. This is especially true from a security standpoint. The current way that software is spec’d out and developed in most organizations follows a certain “do whatever,” or even YOLO principal. App teams are allowed to specify which operating systems, orchestration layers and middleware components they want. This may be within an approved list of options, but more often than not it results in unique software stacks per application.

As outlined by Diego, this variation in the stack meant that government auditors had to review just about everything, taking up to months to approve even the simplest line of code. To solve this problem, 18F standardized on one stack — Cloud Foundry — to run applications on, not allowing for variance at the infrastructure layer. They then worked with the auditors to build trust in the platform. Then, when there was just the metaphoric or literal “one line of code” to deploy, auditors could focus on much less, certainly not the entire stack. This brought approval time down to just days. A huge speed up.

When it comes to all the paperwork, also look to ways to automate the generation of the needed listings of certifications and compliance artifacts. This shouldn’t be a process that’s done in opaque documents, nor manually, if at all possible. Just as we’d now recoil in horror at manually deploying software into production, we should try to achieve “compliance as code” that’s as autogenerated (but accurate!) as possible. To that end, the work being done in the OpenControl project is showing an interesting and likely helpful approach.

The lessons for DevOps teams here is clear: Standardize your stack as much as possible and work with auditors to build their trust in that platform. Also, look into how you can automate the generation of compliance documents beyond the usual .docx and .pptx suspects. This will help your GRC process move at DevOps speed. And it will also allow your auditors to still act as a third party governing your code. They’ll probably even do a better job if they have these new, smaller batches of changes to review.

Refactoring the compliance process

To address the compliance issue fully, you’ll need to start working with the actual compliance stakeholders directly to change the process. There’s a subtle point right there: Work with the people responsible for setting compliance, not those responsible for enforcing it, like IT. All too often, people in IT will take the strictest view of compliance rules, which results in saying “no” to virtually anything new — coupled with Larman’s Law, you’ll soon find that, mysteriously, nothing new ever happens and you’re back to the pre-DevOps speed of deployment, software quality levels and timelines. You can’t blame IT staff for being unimaginative here — they’re not experts in compliance and it’d be risky for them to imagine “workarounds.” So, when you’re looking to change your compliance process, make sure you’re including the actual auditors and policy setters in your conversations. If they’re not “in the room,” you’re likely wasting your time.

As an example, one of the common compliance problems is around “developers deploying to production.” In many cases and industries, a separation of duties is required between coding and deploying. When deploying code to production was a more manual, complicated process, this could be extremely onerous. But once deployments are push-button automated with a good continuous delivery pipeline,you might consider having the product manager or someone who hasn’t written code be the deployer. This ensures that you can “deploy at will,” but keeps the actual coders’ fingers off the button.

As another intriguing compliance strategy, suggested by Home Depot’s Tony McCulley (who also suggested the above approach to the separation of duties) is to give GRC staff access to your continuous delivery process and deployment environment. This means instead of having to answer questions and check for controls for them, you can allow GRC staff to just do it on their own. Effectively, you’re letting GRC staff peer into and even help out with controls in your software. I’d argue that this only works if you have a well-structured platform supporting your CD pipeline with good UIs that non-technical staff can access.

It might be a bit of a stretch, but inviting your GRC people into your DevOps world, especially early on, may be your best bet at preventing compliance slowdowns. And, if there’s any core lesson of DevOps, it’s that the real problems are not in the software or hardware, but the meatware. Figuring out how to work better with the people involved will go a long way towards addressing the compliance problem.

(I originally wrote this December 2015 for FierceDevOps, a site which has made it either impossible or impossibly tedious to find these articles. Hence, it’s now here.)

Barriers to DevOps in government

There’s just as much pull for DevOps in government as there is in the private sector. While most of our focus around adoption is on how businesses can and are using DevOps and continuous delivery, supported by cloud, to create better software, many government agencies are in the same position and would benefit greatly from figuring out how to apply DevOps in their organizations.

Just 13% of respondents in a recent MeriTalk/Accenture survey of 152 US Federal IT managers believed they could “develop and deploy new systems as fast as the mission requires.” The impact of improving on that could be huge. For example, the US Federal government, by conservative estimates, spends $84 billion a year on IT. And yet, the Standish Group believes that 94% of government IT projects fail. These are huge numbers that, with even small improvements, can have massive impact. And that’s before even considering the benefits of simply improving the quality of software used to provide government services.

As with any organization, the first filter for applicability is whether or not the government organization is using custom written software to accomplish it’s goals. If all the organization is doing is managing desktops, mobile, and packaged software, it’s likely that just SaaS and BYOD are the important areas to focus on. DevOps doesn’t really apply, unless there’s software being written and deployed in your organization or, as is more common in government agencies, for your organization as we’ll get to when we discuss “contractors.”

When it comes to adopting and being successful with DevOps, the game isn’t too different than in the business world: much of the change will have to do with changing your organization’s process and “culture,” as well as adopting new tools that automate much of what was previously manual. You’ll still need to actually take advantage of the feedback loop that helps you improve the quality of your software, in respect to defect, performance in production, and design quality. There are a few things that tend to be more common in government organizations that bear some discussion: having to cut through red-tape, dealing with contractors, and a focus on budget.

Living with red-tape

While “enterprise” IT management tasks can be onerous and full of change review boards and process, government organizations seem to have mastered the art of paperwork, three ring binders, and red tape in IT. As an example, in the US Federal government, any change needs to achieve “Authority To Operate” which includes updating the runbook covering numerous failure conditions, certifying security, and otherwise documenting every aspect of the change in, to the DevOps minded, infinitesimal detail. And why not? When was the last time your government “failed fast” and you said “gosh, I guess they’re learning and innovating! I hope they fail again!” No, indeed. Governments are given little leash for failure and when things go terribly wrong, you don’t just get a tongue lashing from your boss, but you might get to go talk to Congress and not in the fun, field-trip how a bill is made kind of way. Being less cynical, in the military, intelligence, and law enforcement parts of government, if things go wrong more terrible things than denying you the ability to upload a picture of your pot roast to Instagram can happen. It’s understandable — perhaps, “explainable” — that government IT would be wrapped up in red-tape.

However, when trying to get the benefits of continuous delivery, DevOps, and cloud (or “cloud native” as that tryptic of buzzwords is coming to be known), government organizations have been demonstrating that the comforting mantle of red-tape can be stripped. For example, in the GSA, the 18F group has reduced the time it takes to get a change through from 9–14 months to just two to three days.

They achieved this because now when they deploy applications on their cloud native platform (a Cloud Foundry instance that they run on Amazon Web Services) they are only changing the application, not the whole stack of software and hardware below the application layer. This means they don’t need to re-certify the he middleware, runtimes and development frameworks, let alone the entire cloud platform, operating systems used, networking, hardware, and security configurations. Of course, the new lines of application code need to be checked, but because they’re following the small batch principles of continuous delivery, those net-new lines are few.

The lesson here is that you’ll need to get your change review process — the red-tape spinners — to trust the standard cloud platform you’re deploying your applications on. There could be numerous ways to do this from using a widely used cloud platform like Cloud Foundry, building up trusted automation build processes, or creating your own platform and software release pipelines that are trusted by your red-tape mavens.

Contractors & Lost Competency

If you want to get staff in a government IT department ranting at you all night long, ask them about contractors. They loathe them and despise them and will tell you that they’re “killing” government IT. Their complaints is that contractors cannot structurally deal with an Agile mentality that refuses to lock-down a full list of features that will be delivered on a specific date. As you shift to not even a “DevOps mindset,” but an Agile mindset where the product team is more discovering with each iteration what the product will be and how to best implement it, you need the ability to change scope throughout the project as you learn and adapt. There is no “fail fast” (read: learning) when the deliverables 12 months out are defined in a 300 page document that took 3–6 months to scope and define.

Once again, getting into this state is likely explainable: it’s not so much that any actor is responsible, it’s more that the management in government IT departments is now responsible to fix the problem. The problem is more than a square peg (waterfall mentalities from contractors) in a round-hole (government IT departments that want to be more Agile) issue. After several decades of outsourcing to contractors, there’s also a skills and cultural gap in the IT departments. Just as custom written software is becoming strategically important to more organizations, many large IT departments find themselves with little experience and even less skill when it comes to software and product development. I hear these same complaints frequently from the private sector who’ve outsourced IT for many years, if not decades.

The Agile community has long discussed this problem and there are always interesting, novel efforts to get back to insourcing. A huge part is simply getting the terms of outsourcing agreements to be more compatible. The flip-side of this is simplifying the process to become a government contractor: it’s sure not easy at the moment. Many of the newer, more Agile and DevOps minded contractors are smaller shops that will find the prospect of working with the government daunting and, well, less profitable than working with other organizations. Making it easier for more shops to sign up will introduce more competitions rather than the more limited strangle-hold by paperwork, smaller market that exists now. The current pool of government contractors seems mostly dominated by larger shops that can navigate the government procurement process and seem to, for whatever reason, be the ones who are the most inflexible and waterfall-y.

Another part is refusing to ceed project management and scoping management to external partiers; and, making sure you have the appropriate skills in-house to do so. Finally, the management layers in both public and private sector need to recognize this as a gap that needs to be filled and start recruiting more in-house talent. Otherwise, the highly integrated state of DevOps — let alone a product focus vs. a project focus — will be very hard to achieve.

Addressing budgetary concerns with waste removal

Every organization faces budget problems. We call them “unicorns” because they have this mythical quality of seemingly unlimited budget. The spiral horn-festooned are the exception that proves the rule that all organizations are expected to spend money wisely. Government, however, seems to operate in a permanent state of shrinking IT budgets. And even when government organizations experience the rare influx of cash, there’s hyper-scrutiny on how it’s spent. To me, the difference is that private sector companies can justify spending “a lot” of money if “a lot” of profit results, where-as government organizations don’t find such calculations as easily. Effectively, government IT departments have to prove that they’re spending only as much money as necessary and strategically plan to have their budget stripped down in each budgetary cycle.

Here, the Lean-think part of DevOps can actually be very helpful and, indeed, may become a core motivation for government to look to DevOps. My simplification of the goals of DevOps are to:

  1. Ensure that the software has good availability (which it does by focusing on resilience vs. perfection, the ability to recover from failure quickly rather than avoiding all failure by rarely changing anything). This is something that recent failures in US Federal government IT can appreciate.
  2. Enable the weekly, if not daily, deployment of new code into production with continuous delivery. The goal here is to improve the quality of the software, both bugs and “design” quality, ensuring that the software is what users actually want by iterating over features frequently.

Those two goals end up working harmoniously together (with smaller batches of code deployed more frequently, you reduce the risk of each causing major downtime, for example). For government organizations focused on “budget,” the focus on removing as much “waste” from the system to speed up the delivery cycle starts to look very attractive for the cost-cutting minded. A well functioning DevOps shop will spend much time analyzing the entire, end-to-end cycle with value-stream mapping, stripping out all the “stupid” from the process. The intention of removing waste in DevOps think is more about speeding up the software release process and helping ensure better resilience in production, but a “side effect” can be removing costs from the system.

Often, in the private sector we say that resources (time, money, and organization attention) saved in this process can be reallocated to helping grow the business. This is certainly the case in government, where “the business” is, of course, understood not as seeking profits but delivering government services and fulfilling “mission” requirements. However, simply reducing costs by finding and removing unneeded “waste” may be an highly attractive outcome of adopting DevOps for governments.

“Bureaucracy” doesn’t have to be a bad word

As with any large organization, governments can be horrendous bureaucracies. Pulling out the DevOps empathy card, it’s easy to understand why people in such government bureaucracies can start to stagnate and calcify, themselves becoming grit in the gears of change if not outright monkey-wrenches.

In particular, there are two mind-sets that need to change as government staff adopt DevOps:

  1. Analysis paralysis — The almost default impulse to over analyze and specify with ponderous, multi-100 page documents the shifting to a more Agile and DevOps mindset. A large part of the magic of DevOps and Agile think is avoiding analysis paralysis and learning by doing rather than thinking in .docx. Government teams not familiar with smaller batch, experiment-based approaches to software development would do well to read up on Lean Startup think, perhaps checking out Lean Enterprise for a compendium of current best practices and, well, mindsets.
  2. Stagnant minds — large organizations, particularly government ones, can breed a certain learned helplessness and even laziness in individuals. If things are slow moving, impossible to change, and managed in a tall blade of grass gets cut style, individuals will tune out rapidly. If DevOps is understood as a practice to help jump-start all too slow IT organizations, it’ll often be the case that individuals in that organization are in this stagnated mindset. One of the key challenges becomes inspiring and then motivating staff to care enough to try something new and stick with it.

Again, these problems frequently happen in the private sector. But, they seem to be larger problems in government that bear closer attention. Thankfully, it seems like leaders in government know this: in a recent Gartner, global survey, 40% of government CIOs said they needed to focus more on developing and communicating their vision and do more coaching. In contrast, 60% said they needed to reduce the time spent in command-and-control mode. Leading, rather than just managing, the IT department, as ever, is key to the transformative use of IT.

More than rats dragging pizza

In any given time, it’s easy to be dismissive of government as wasteful and even incompetent. That’s the case in the U.S. at least, if you can judge based on the many politicians who seem to center their political campaigns around the idea of government waste. In contrast, we praise the private sector for their ability to wield IT to…better target ads to get us to buy sugar coated corn flakes. Don’t get me wrong, I’m part of the private sector and I like my role chasing profit. But we in the “enterprise” who are busy roaming the halls of capitalism don’t often get the chance to positively effect, let alone simply help and improve the lives of, everyone on a daily basis. Government has that chance and when you speak with most people who are passionate about using IT better in government, they want to do it because they are morally motivated to help society.

The benefits of adopting DevOps have been clearly demonstrated in recent years, and for businesses we’re seeing truth in the statement that you’re either becoming a software organization or losing to someone who is. As government organizations start to think about improving how they do IT, they have the chance to help all of us, “winning” isn’t zero-sum like it can be in the business world. To that end, as we in the industry find new, better ways to create and deliver software, it behoves us to figure out how government can benefit as well. That’ll get us a even closer towards making software suck less something we’ll all benefit from.

(I originally wrote this September 2015 for FierceDevOps, a site which has made it either impossible or impossibly tedious to find these articles. Hence, it’s now here.)

Management’s role in DevOps: orchestrating the why

Donkey teamwork

What’s the point of it all? Why are we doing this? These questions pop up frequently in IT teams where the reason for doing your daily activities — like churning through tickets, whizzing up builds, or “doing the DevOps” — seems only that someone, somewhere told you to do it.

If you’re in this situation — you have no idea how your activities are helping your organization make money — you should stop and find out quickly what your company’s goals and strategies are to make sure you’re not wasting time. The good news is the confusion is probably not your fault; the bad news is that you’ll have to convince management that the fault is theirs.

Gratuitous optimization by technology

The adoption of things like DevOps or the cloud sometimes happens for wrong or unknown reasons — gratuitous plans without a tight connection to business goals. We used to call this “management by magazine,” and it happens now more than ever. A process — even “cultural” — change like DevOps is not like the easy improvement fodder of virtualization. But you can’t blame IT management for trying gratuitous optimization by technology. The magic of VMware was that you just installed it, and things got better because it improved resource utilization. You didn’t need to figure out why or match it to goals. If you inject DevOps into an organization expecting it to just improve things without tightly coupling to strategy, you’ll get weird results. You’ll probably just create more work!

If you don’t know where you are going, any road will get you there

Agile, DevOps, and now “cloud native” (I hope you’re updating your buzzword lexicons!) need strong connections to the business goals — some would say “strategy” — to be successful. In order to operate in a lean fashion, you want to only do things that are valuable and useful to the customer (or obligatory to stay in business, like compliance and auditability). Indeed, being able to sort out what’s valuable and useful to the business is a key tool for doing DevOps successfully. You want to cut out all the stuff that doesn’t matter, or at least minimize it. Otherwise, you just sort of do everything and anything because there’s no way to determine if any given activity is helpful.

So how do you align your work with the overall business strategy?

There are tried and true (though seemingly new to the IT department) techniques like value-stream mapping: take any given business process and map out all the activities that happen from end-to-end, questioning if each is needed. Most people are shocked at how much “stupid” is going on in such maps and it’s a great technique for finding and removing bottlenecks.

If you’re in the consumer business — like so many “unicorns” are — it’s easy to understand the mission and the goals: get more people buying books, downloading your app, streaming more videos, and so forth. But in other, more traditional settings, it’s common to find a willful disentanglement between how IT is used and how it contributes to customer value. More than not, the stasis-inducing ludlum of time and success just numbs people’s collective minds and sets them into auto-pilot here.

You see this happen most often around decision making processes in business: things that need approval, planning processes and market assessments. People in large companies love cogitating and wrapping process around activities that cause change in the company; it feels like they almost like to slow down change and activity. You might even codify in a whole process with change review board meetings and careful inspection of all the changed components by a panel of architectural and security audit wizards.

Cultivating complainers

You can also identify where your processes aren’t matching with business goals and strategies by cultivating squeaky wheels.

When change happens, individuals often pipe up asking, “Why are we doing this? Why is this valuable to the customer?” More than likely, they’re seen as troublemakers or sand in the gears, and are shut down by the group, Five Monkeys style. At best, these individuals cope with learned helplessness; at worst, they leave, kicking off a sort of Idiocracy effect in the remaining organization.

These “complainers” are actually a valuable source of data for testing out how well understood a company’s goals and strategies are. You want to court these types of people to continually test out how effective the organization is at setting goals and strategy. One fun practice, as mentioned by Ticketmaster’s Jody Mulkey, is to interview new employees a month after starting to ask them what seems “screwy around here” before they get used to it.

Blame management

So what do you do when they or any other process you’ve tried identify real disconnects between what you’re doing and why? The fun begins — because it’s management’s job to fix this bug. The role of mid- and upper-level management in the cloud native era is poorly understood and documented (its always been so, of course, in creative-driven endeavors like software). To be successful at these types of initiatives, management has a lot of work to do and the managers who are overseeing DevOps teams can’t assume things will just proceed as normal. This is why, as with software, you need to continually test the assumption that people know the business goals and strategy.

This point has been stuck in my brain after reading Leading the Transformation (an excellent book for managers figuring out how to apply DevOps “in the large”), which states the point more plainly than I can:

Management needs to establish strategic objectives that make sense and that can be used to drive plans and track progress at the enterprise level. These should include key deliverables for the business and process changes for improving the effectiveness of the organization.

What I like about this advice (and the rest in the book) is that it’s geared to defining management’s job in all this DevOps hoopla. In said hoopla, we spend a lot of time on what the team does, but we don’t spend too much time on what management should do. It’s lovely thinking about flattening the organization and having everyone act as responsible peers, but in large organizations, this isn’t done easily or quickly. Just as with all those zippy containers, you need an orchestration layer to make sense of all the DevOps teams working together. That’s the job of management in the cloud native era: setting goals and orchestrating all the teams to make sure they’re all pulling in the right direction.

(I originally wrote this August 2015 for FierceDevOps, a site which has made it either impossible or impossibly tedious to find these articles. Hence, it’s now here.)

There’s no easy way to model DevOps ROI

Think you can show DevOps ROI? Think again

“What is the ROI for DevOps?” is a question that has been tossed my way frequently of late. There are numerous reasons why this is at the same time an absurd but also important question.

Modeling DevOps ROI is absurd because predicting the gains and costs of a process, let alone one as new as DevOps, is difficult and dependent on all sorts of unique variables per organization.

However, thinking through DevOps ROI is an important step for adoption because the promises of DevOps are so grandiose and the changes needed sound large and almost impossible to achieve for “normal” people.

That is, DevOps is an unmeasurable process with respect to ROI (it has value, to be sure, but is nearly impossible to measure independently and precisely) and, yet, because “doing DevOps” seems to be such a big change, organizations need assurances that transformation will be “worth it.”

So, if you’re asked to help show the ROI for DevOps, what can you do? Let’s cover three ways to approach the problem. I don’t think any of them are a real answer, but they get closer to satisfying some possible motivations for asking for ROI in the first place.

What is ROI?

First, what is ROI? I misuse economic and accounting terms all the time, but I think of “ROI” as showing the profit you achieve after a given period of time, for our purposes, by doing something new and different with IT: you might buy some new software (running on a cloud platform like Pivotal Cloud Foundry instead of just IaaS), do your software development and delivery differently (like, “doing DevOps”), and so forth.

With ROI, you’re not only interested in the question “does it work,” you’re interested in the question “did this make me money?” Oftentimes, you’re also interested in comparing the costs of competing approaches, or just inflicting vendors with the thrill of “bake offs” and ROI spreadsheet fights.

To figure that basic ROI, you use a brutally simple formula:

(Gain — Cost)/Cost = ROI

You can convert the end result to a percentage if you’re not into the whole decimal thing.

As a simple example, let’s say you sell an app that allows people to track how many apples they eat each day, so they can keep those ravenous doctors out of the way. After it’s shipped for a month, you’ve made $20,000 in sales for the app. To get to that point, it costs them $5,000 in paying for developer time and $5,000 in infrastructure charges (the back-end that analyzes the data, mashes it up with Facebook and Twitter profiles, and then sells that data to the Apple Sellers Association of Tomorrow takes some horse-power and storage!).

So, the ROI for the apple muncher app is:

($20,000 — $10,000)/$10,000 = 100%

A pretty good return on your investment! It’s certainly better than the rate I get on any of my personal investments.

So, what would be the ROI of introducing DevOps to that process? More importantly, how could you predict it? There are many ways to answer the ROI question, including the favorite “that’s a bad question, you shouldn’t want that” which can take on all sorts of subtle and helpful forms. Let’s look at three possible approaches.

1. Bottoms-up ROI: We know everything and have put it in this spreadsheet

If you have clear inputs and outputs — your gains and costs — then things can be realistically simple. This is the favorite approach of ROI spreadsheets: they’ll cost out software license costs, hardware/IaaS costs, and people costs (employees and consultants).

Once you’ve figured out costs, you need to estimate what your gains will be: either based on historic run rates, or, more likely, on a mixture of a prediction and hope for how much you’ll make in the future. Tracking the demand for software can be hard and this estimate is one of the most dangerous parts of this simple method. If all you want to do is track the ROI for saving money, perhaps things are a little easier. And while this implies that you’re not looking to DevOps to support a revenue growth strategy, perhaps that’s good input: if you’re not looking to grow your business, maybe it’s not right for you and will have negative ROI.

You then have to pick a period of time to snap-shot and you just run the math.

Of course, few, if any, of the things you’re costing out here are “DevOps.” You might spend money on a commercial continuous integration tool, on a cloud platform or a DevOps consultant. You’ll certainly spend money on people…but you didn’t really spend money on “doing DevOps.”

You might be tempted to simply ascribe gains to DevOps. “For this release, we were doing DevOps, and we made $30,000 with apple muncher! DevOps brought us $10,000 in new revenue.” But that doesn’t feel right.

Still, if you have a good handle on the costs during some period of time where you were doing DevOps and the gain that resulted from that period of time, you could come up with a bottoms-up ROI analysis. I think it’ll be somewhat dicey since it’s so hard to attribute costs and gains directly to DevOps but, hey, it’s better than either telling people they’re asking the wrong question or its mute cousin: nothing.

2. Were the efforts to change worth it? That is, DevOps is all cost

As you might be teasing out, one of the problems with ROI is that it doesn’t really take time into account. You need to draw clear lines around the time period in which you’re including the factors that create your gains and costs. (If you’re interested in an approach that does take time into account, check out Rex Morrow’s suggestion to use IRR instead of ROI.)

Using this lack of time problem as a generative constraint, you could instead study the ROI of changing to DevOps. What did switching over to DevOps cost us? What did it cost us compared to maintaining our current process state?

Here, you’re taking whatever your regular ROI calculation is and just adding the one-time cost of time and money it took to change to DevOps. Figuring out what your gain is will be problematic. Again, what you’ll be gaining are new capabilities (to deliver software faster and increase your uptime in production); how those contribute to gains is still left as a mysterious exercise to the reader.

Still, if you want to run the numbers on something like “they tell me it will take three months and $50,000 in training and consultants to ‘do the DevOps’” this might satisfy your ROI craving. Again, you’ll need to have a pre-existing ROI at hand to simply plug your DevOps costs into.

3. Pain avoidance and remediation

In the “DevOps is all cost” ROI scenario, we avoided ascribing gain to changing to DevOps. Again, while this is overly simplified, the deliverables of DevOps are to provide a continuous delivery process for your product and ensure that your product has excellent uptime (that is, “it works”). How could you account for the gain of those two desirables? You could create a way of assigning value to the knowledge you gain from weekly iterations about how to improve your product. You could also calculate the savings from avoided downtime.

It’s fun to model-out placing value on the first part, “knowing,” but most people asking for ROI will likely look at that as a “soft” metric and, therefore, not really useful to their “hard”-centric minds. Including money saved (or generated?) by avoiding downtime could be interesting in a point in time (if a trading system goes down, money is lost when no one can trade), but how do you account for it ongoing?

The issue with including DevOps in this “easy” type of ROI calculation is figuring out how much gain and cost to attribute to DevOps.

As with uptime, sometimes it can be easy: before we did DevOps, the system was down two hours a day, now it’s only down five –10 minutes a day, if at all. If there’s a pain you’re seeking to remove, then perhaps this model will work.

Your pain might also be “it takes us too long to deliver software,” which is a common problem for DevOps adopters. If you know how to measure the gain of time to market, for example, then you can do one of these bottoms-up ROI cases: “We were able to deliver a third release of apple muncher in two weeks instead of the six it had been taking. This means we could start charging for the new in-app purchases sooner, gaining us $5,000 more over that two week period.”

If you like this kind of figuring, check out Zend’s suggestion for how to do continuous delivery ROI for some inspiration. Like all “good” ROI calculations, it requires changing ROI around slightly to fit what’s measurable…and some good estimating.

So, can you send me a spreadsheet with all this in it?

I’ve deftly avoided actually giving you anything actionable here. Calculating ROI is a very numbers-, spreadsheet-friendly exercise and any answer should really include at least a starter spreadsheet to get you calculating things. However, as the above hopefully shows, it is indeed the case that asking for “DevOps ROI” is the wrong question. The “ROI” is getting the process and tools in place to create a better product. Obviously, as you rack up the costs associated with DevOps (both in money and time spent), you can start to model the overall ROI of the project versus the revenue and profit you generate, but there’s little DevOps specific about that.

Beyond such obvious answers, when I see people asking for “DevOps ROI,” what we can offer them is over-thinking like the above and examples of it working at organizations. Examples like Allstate and Humana are good mainstream cases, and you can listen in to more on the excellent Goat Farm podcast.

Additionally, I would suggest focusing on looking at DevOps as a continual improvement initiative rather than trying to predict ROI. Much of the problem with figuring out ROI is in having to predict costs and gains. Instead I would try to focus on tracking and trying to improve how you’re doing things in short intervals. In my experience, most organizations devalue the idea of continuously learning and trying to improve their process. Focusing on that might be a better use of time than summoning up a solid case for DevOps ROI.

I’d love to see examples of how you did an DevOps ROI case…or avoided it all together. If we can accumulate enough after-the-fact studies, then at least we could make a “rule of thumb” collection. Leave a comment below!

(I originally wrote this July 2015 for FierceDevOps, a site which has made it either impossible or impossibly tedious to find these articles. Hence, it’s now here.)

Here’s how we can help push DevOps into the mainstream

Can DevOps declare victory yet? Not quite, but soon.

Figuring out when a technology inflection point happens is always hard, if not impossible, in real-time. It’s easy to point backwards and say when ERP, agile software development, the Web, business intelligence, mobile or cloud suddenly became “normal.” I think DevOps is right at the door of that point, and as some recent Gartner predictions have proffered, we could see something like a quarter of all large enterprises using DevOps next year.

But it won’t be easy. The same house of industry sages also threw some cold water on that exuberance by predicting a 90% failure among organizations attempting to do DevOps if they fail to properly address process and culture.

As DevOps spreads to more and more IT shops, what can we in the DevOps community do to help? Clearly, we need to keep up the overall conversation about what DevOps is and the process/cultural changes needed to be successful. Another critical element is to start telling more and more stories of how non-technology companies are succeeding with cloud and DevOps. I think the recent Humana profile provides an interesting template here, as does Standard Bank’s forays into DevOps.

In addition to keeping up the good work, there are four key areas that will be helpful.

1. Define the goal properly

In one of my favorite straw-polls, groups who focus on the wrong outcomes and goals with private cloud have similar failure rates as those Gartner describes for organizations attempting to do DevOps.

What exactly those right goals are, for both DevOps and cloud adoption, is a new theory of mine I wanted to road-test with the DevOpsDays Austin crowd. As far as I can tell, the best goal of both a “cloud project” and “doing DevOps” is to do continuous delivery. So, cloud and DevOps let businesses set up the process and technologies needed to deliver custom written software on weekly, if not shorter turns, and actively study, learn and adapt software from the feedback of actual people using their software. This is the path of becoming a software defined business and DevOps is the definitive “how” of how that’s done.

Making DevOps and continuous delivery synonymous

To that end, I suggested to the audience in Austin that we should start, more or less, thinking of continuous delivery and DevOps as synonymous. Once you frame what DevOps is — what DevOps enables — as that, the conversation becomes crisper and, I believe, easier for everyone to understand and do something about. As I discussed last time, becoming a software defined business entails (a.) starting to think in a product-oriented manner (greatly facilitated by continuous delivery), and, (b.) ensuring that you have the overall cloud platform in place that provides the feathered, infrastructure bed for everything.

And, to add to the tracking of DevOps’ ascension to the mainstream, if you think of it as continuous delivery, some recent studies have shown that while overall CD use is low, growth has been ramping up year over year, just like DevOps.

2. Clearly explain the DevOps stack(s)

While even the best tools without the proper process (or “culture”) are ineffective, most people think in terms of stacks and tool-chains. So many of the DevOps conversations I’ve been involved in over recent years start by talking about tools and technologies. We’re in IT: it’s what we know.

I’d really like to see us start discussing common tool-chains and patterns of use (“cookbooks” to use an older, common programming documentation metaphor) for doing DevOps. Reference implementations even! Vendors do well telling you what they think the toolchain should be — please, oh please, feel free to ask me! ;). In fact, I’d say there’s almost an unhelpful amount of fragmentation in the infrastructure management layer at the moment: there are so many options that one can be left confused and overwhelmed.

Instead of letting us vendors define those stacks, I’d like to see the overall community get even more involved. Don’t be afraid to talk about tools in the face of all this culture talk! And don’t let us vendors steal the show.

3. Work with legacy code

Almost by definition, the IT shop at a non-technology company will be chock full of existing IT and “legacy code.” That’s the very IT that was once the growth-engine darling of the company and laid the foundation for where they are now.

As we all know but try to shy away from admitting too loudly, the new cycle of code and tech rarely works with the previous cycle’s code. I talk with companies almost weekly that are very interested in the question of how to integrate new cloud-native and mobile applications with five, 10, even 15+ year old centralized IT services. They want all the power of cloud and continuous delivery, but need help rationalizing and working with what they already have.

In my view, this is a conversation that doesn’t happen often enough in the DevOps community. It first starts with a — wait for it! — seemingly dottering old term: “IT portfolio management.” That is, taking the time to assess what IT you have and understand the business priorities around it. Without that kind of big picture, systems-based understanding of what you have, any whiz-bang awesomeness you get with DevOps will pale in comparison to the rumbling rebar-festooned concrete ball of legacy IT you have to deal with. (Damon Edwards gave a great talk right before mine introducing one method of getting down and dirty with portfolio management.)

There are many thought-technologies of how to approach this, from Gartner’s bi-modal IT approach, to some interesting work going on over at the Cutter Consortium. The point is to have the discipline and maturity to actually do portfolio management so you can start to improve everything and better prioritize your time and projects.

And, to point out the obvious: we need to start documenting how the application being written and supported by DevOps teams is integrating and co-existing with non-DevOps (or “legacy”) applications and services.

4. Keep up the Land-grab

As its name implies, DevOps has been on a land-grab mission that started back in the Agile days. If agile had gone the portmanteau route in naming itself, we might have seen DevQA, or even ProductDevQA. Agile development very consciously crossed silos and unified product management and QA with development, so much so that by the time we came up with DevOps, “Dev” represented all of those traditional roles.

Now, as companies are looking to IT and custom written software to help them become software defined businesses, DevOps-minded folks need to start thinking about how they can get more involved with “the business.” Do you know who these mythical business people are, what they’re worried about, how they think? Can you speak their language and help them learn yours?

To pick one very specific item that’s always a punji pit of IT despair: what KPIs and metrics should you use to communicate “up the chain”? (Ernest Mueller and Karthik Gaekwad have a great presentation on just this topic from last year.)

Think of it this way: what is the “API” for your business, and how can you start programming it…if not designing the API? Once DevOps is tightly integrated with the business side, and most companies are activity thinking about how custom written software can help run, grow, and innovate their business…then we’ll be able to declare DevOps success in the mainstream.

(I originally wrote this May 2015 for FierceDevOps, a site which has made it either impossible or impossibly tedious to find these articles. Hence, it’s now here.)

Software Defined Businesses need Software Defined IT Departments

(I originally wrote this April 2015 for FierceDevOps, a site which has made it either impossible or impossibly tedious to find these articles. Hence, it’s now here.)

Quick tip: if you’re in a room full managers and executives from non-technology companies and one of them asks, “what kind of company do you think we are?”…no matter what type of company they are, the answer is always “a technology company.” That’s the trope us in the technology industry have successfully deployed into the market in recent years. And, indeed, rather than this tip being backhanded mocking, it’s praise. These companies are taking advantage of the opportunity to use software and connected devices in novel ways to establish competitive advantage in their businesses. They’re angling to win customer cash by having better software and technology than their competitors.

What does it look like “on the ground,” though when it comes to “being a technology company”? I’d argue that the traditional ways we think about structuring the IT department is different than how technology companies structure themselves. To massively simplify it, traditional IT departments are oriented around working on projects, where-as technology companies are oriented around working on products.

The project-oriented IT department

Project oriented thinking takes in requests from an outside entity and works on solving an immediate, well understood problem. There’s often a definitive end to the project: the delivery of the new “service.” Project oriented thinking is good for creating an initial version of an application, installing and upgrading existing packaged software, setting up new offices, on-boarding employees, and other things that have a definitive completion date and well known tasks.

When organizing around this type of work, you setup a functional organization that can be assembled to implement the specific problem. Here, by “functional organization,” I mean groups of people who are defined by their expertise in something: networking, server administrator, software development, project management, audit and compliance, security, and so on. These people are typically shared across various projects as needed and usually are responsible for just what they know about. (For a very different take on how to use functional organizations, see Horace Dediu’s discussion of how Apple organizes itself.)

On-top of this, you take a request-driven approach to change management, which defines when to launch a new project or make “small” changes to an existing one (like adding a user). In the 2000s, we fancied up this concept by calling it “service management.”

Once the project is up and running, there may be something called “maintenance mode” which sees IT making sure, for example, that the ERP application has enough disk space available, that new users are added to the application, and that extra capacity is added when needed.

This mind-set is very handy when you’re dealing with keeping a bunch of products from tech vendors up and running. It’s even good if you have custom written applications that are not changed frequently. What’s also great, is that because each project is well defined at the start and has a definitive end, you can measure success and financial metrics easily: how many requests did we handle (tickets closed) this month? did we deliver on time? did we deliver on-budget? is the project profitable (thus, did we pay too much for that software or get a good deal?)

However, two things have been changing this state of affairs, pushing IT to be more exploratory in nature. As a consequence, the structure of the IT department will need to change as well to maximize ITs value to the overall business.

Cloud changes the organization into a software eating the world type situation

I often joke that it’s been impossible to see a keynote in recent years without seeing the horsemen of the digital apocalypse. These are the cliche topics that seem to come up in every keynote. Two of these lay the groundwork for why the structure of the IT department needs to change:

  1. Software is eating the world — Cloud technologies and practices have made huge improvements in productivity and costs when it comes to creating and running custom written applications. It’s easier to write and run software now, and the rise in “always on” devices (all those super-computers in our pockets that are on the Internet 24/7) creates a massive foot-print for computation: an endless buffet for software.
  2. Change or die! — with this huge buffet of opportunity, there’s a rallying call for companies to invent new business models that rely heavily on software. This means that most every business has the opportunity to use custom written software to change the nature of their business. Think of the opportunity for taxi companies to use software to change how they operate, or for the hotel industry to come up with a brand new business model to sell empty capacity…and you’re thinking of Uber and AirBnB. The “or die” part is a rhetorical trick to position this imperative as dire. And, indeed, studies have shown that remaining on-top has become harder in recent decades. Change is needed to survive.

These two alone create a pull for more custom written software in businesses. It’s fast and cheaper to create software, and competition is relying on that to create new business models that challenge incumbents or, rather, those businesses that are not evolving how they run their business with software. Again: think of all those taxi services versus Uber.

IT — SaaS = what?

There’s a third “horseman” in the broader industry that’s driving the need to change how IT departments are structured: the rise of SaaS. Before the advent of SaaS across application categories, software had to be run and managed in-house (or handed off to outsources to run): each company needed its own team of people to manage each instance of the application.

Source: Source: Two studies, first with 1,137 respondents, second with 1,097, involved in their company’s IT buying decisions participated in the Jan 2014 and July 2014 survey, including 470 and 445 whose company currently use public cloud. “Corporate Cloud Computing Trends,” 451 ChangeWave, Feb, 2014 & “Corporate Cloud Computing Trends,” 451 ChangeWave, Aug 2014.

As SaaS use grows more and more, that staffing need changes. How many IT staff members are needed to keep Google Apps or Microsoft’s Office 365 up and running? How many IT staff do you need to manage the storage for Salesforce or Successfactors? Indeed, I would argue that companies use more and more SaaS instead of on-premises packaged software, the staffing needs change dramatically: they lessen. You can look at this in a cost-cutting way, as in “let’s reduce the budget!” Hopefully you can look at it in a growth way instead: we’ve freed up the budget to focus on something more valuable to the business. In most cases, that thing will writing custom software. That is: developers.

The product oriented IT department

This is where the shift to thinking like a product organization is vital. First of all, if you feel the need to develop more custom software — as you should! — you’ll need to hire and train more software developers, product managers, QA staff, and related folks. You’ll also want to cultivate an environment where new ideas can be explored, user-tested in production, and then quickly refined in a loop that spans mere weeks if not one week. You’ll need to become a continuous delivery and learning organization. Jonathan Murray has called this type of organization a “software factory” and has explained how he implemented the change while at Warner Music. More recently, books like Lean Enterprise have explained how this type of thinking can be applied outside of “startup culture,” whose concerns tend to be more around achieving a high valuation to get the company acquired or IPO rather than building and maintaining sustainable business models.

Setting up an organization like this requires not only developers, but creating the actual “factory” that they operate in. I think of this factory as a “platform” and the folks responsible for standing up and caring for that platform are a new type of operations staff. They’re in charge of, really, providing the “cloud” that developers effortlessly deploy and run their applications in.

This new type of IT staff has to think about how they add in as many self-service and highly elastic services in their “cloud” as possible. They too are creating a “product,” one that’s targeted at the internal developer teams and which must continually have new features added to it.

Meanwhile, your developers will be arranged into product-centric teams, hopefully working more closely with line of business managers and staff who are helping craft and grow new applications. No doubt they’ll need operations skill on the team: staff who know how to properly architect and operationalize cloud-native applications.

This is where the now classic DevOps mentality comes in: in order to properly focus on a product, the team must be responsible for all parts of that product’s life, from development through production and back. With a proper cloud platform in place and the operations team to support it, these goals are more achievable than if the product team has to start from bare metal, or work with IT through a ticket system.

To be pragmatic, you probably can’t dedicate all people fully to a product and will need to share them. This carries large risks, however, namely, making sure you properly prioritize an individual’s time and realizing that they’ll have a harder time keeping up with fewer products rather than more. Quality and ability to deliver on time will likely decrease. It may seem like an impossible goal, but often in order to stay competitive — to survive — large, seemingly impossible changes are needed