Core DevOps (tech) metrics, from Nicole Forsgren

Everyone always wants to know metrics. While the answer is always a solid “it depends – I mean, what are your business goals and then we can come up with some KPIs,” there’s a reoccurring set of technical metrics. Nicole lists some off:

These IT performance metrics capture speed and stability of software delivery: lead time for changes (from code commit to code deploy), deployment frequency, mean time to restore (MTTR), and change fail rate. It’s important to capture all of these because they are in tension with each other (speaking to both speed and stability, reflecting priorities of both the dev and ops sides of the team), and they reflect overall goals of the team. These metrics as a whole have also been shown to drive organizational performance.

And, then, further summarized by Daniel Bryant:

Key metrics for IT performance capture speed and stability of software delivery, and include: lead time for changes (from code commit to code deploy), deployment frequency, mean time to restore (MTTR), and change fail rate.

Also in the interview, a concise DevOps definition:

I define DevOps as a technology transformation that drives value to organizations through an ability to deliver code with both speed and stability.

See the rest.

Coté Show: Patrick Debois on using serverless for a year and half, defining DevOps vs. SRE vs. design, and meatware over tools

Continuing the DevOpsDays Austin interview series, Barton and I talked with Patrick Debois on a variety of topics:

Check out the full show notes for more and podcast subscription options.

Scaling DevOps in large organizations – My April Register column


My The Register column this month is on scaling DevOps/cloud-native teams to the entire organization. It’s easy to build one team that does software in a new and exciting way, but how can you move to two teams, five teams, and then 100’s? It goes over the amalgamation of a few case studies and plenty of over-the-top gonzo analogies, per usual.

Check it out, and check out past ones if you’re curious for more.

Pivotal Conversations: How microservices enable DevOps, with Josh Long by Pivotal Conversations

It was a pretty good episode:

In preparation for his DevOpsDays Atlanta talk, Josh and Coté (well, mostly Coté) talk about the relationship between microservices and DevOps. They use the CAMS framing to go over how microservices could provide the architectural requirements to make DevOps possible.

Pivotal Conversations: “Running like Google,” the CRE Program & Pivotal, with Andrew Shafer

The summary:

What does it really mean to “run like Google”? Is that even a good idea? Andrew Shafer comes back to the podcast to talk with Coté about how the Google SRE book and the newly announced Google CRE program start addressing those questions. We discuss some of the general principals, and “small” ones too that are in those bodies of work and how they represent an interesting evolution of it IT management is done. Many of the concepts that the DevOps and cloud-native community talks about pop in Google’s approach to operations and software delivery, providing a good, hyper-scale case study of how to do IT management and software development for distributed applications. We also discuss Pivotal’s involvement in the Google CRE program.

Check out the SoundCloud listing, or download the MP3 directly.

We’re getting exactly the government IT we asked for

If there’s one complaint that I hear consistently in my studies of IT in large organizations, it’s that government IT, as traditionally practiced, is fucked. Compared to the private sector, the amount of paperwork, the role of contractors, and the seeming separation between doing a good job and working software drives all sorts of angst and failure.

Mark Schwartz’s book on figuring out “business value” in IT is turning out to be pretty amazing and refreshing, especially on the topic of government IT. He’s put together one of the better “these aren’t the Droids you’re looking for” responses to ROI for IT.

You know that answer: you just want to figure out the business case, ROI, or whatever numbers driven thing, and all the DevOps-heads are like “doo, doo, doo-doo – driving through a tunnel, can’t hear you!” and then they pelt you with Goldratt and Deming books, blended in with some O’Reilly books and The Phoenix Project. “Also, you’re argument is invalid, because reasons.”

A Zen-like calm comes over them, they close their eyes and breath in, and then start repeating a mantra like some cowl-bedecked character in a Lovecraft story: “survival is not mandatory. Survival is not mandatory. Survival is not mandatory!”

Real helpful, that lot. I kid, I jest. The point of their maniacally confusing non-answers is, accurately, that your base assumptions about most everything are wrong, so before we can even approach something as precise as ROI, we need to really re-think what you’re doing. (And also, you do a lot of dumb shit, so let’s work on that.)

But you know, no one wants to hear they’re broken in the first therapy session. So you have to throw out some beguiling, mind-altering, Lemarchand’s boxes to change the state of things and make sure they come to the next appointment.

Works as Designed

Anyhow, back to Schwartz’s book. I’ll hopefully write a longer book review over at The New Stack when I’m done with it, but this one passage is an excellent representation of what motivates the book pelters and also a good unmasking of why things are the way we they are…because we asked for them to be so:

The US government is based on a system of “checks and balances”—in other words, a system of distrust. The great freedom enjoyed by the press, especially in reporting on the actions of the government, is another indication of the public’s lack of trust in the government. As a result, you find that the government places a high value on transparency. While companies can keep secrets, government is accountable to the public and must disclose its actions and decisions. There is a business need for continued demonstrations of trustworthiness, or we might as well say a business value assigned to demonstrating trustworthiness. You find that the government is always in the public eye—the press is always reporting on government actions, and the public is quick to outrage. Government agencies, therefore, place a business value on “optics”—how something appears to the observant public. In an oversight environment that is quick to assign blame, government is highly risk averse (i.e., it places high business value on things that mitigate risk).

And then summarized as:

…the compliance requirements are not an obstacle, but rather an expression of a deeper business need that the team must still address.

Which is to say: you wanted this, and so I am giving it to you.

The Agile Bureaucracy

The word “bureaucracy” is something like the word “legacy.” You only describe something as legacy software when you don’t like the software. Otherwise, you just call it your software. Similarly, as Schwartz outlines, agile (and all software processes) are insanely bureaucratic, full of rules, norms, and other “governance.” We just happen to like all those rules, so we don’t think of them as bureaucracy. As he writes:

While disavowing rules, the Agile community is actually full of them. This is understandable, because rules are a way of bringing what is considered best practices into everyday processes. What would happen if we made exceptions to our rules—for instance, if we entertained the request: “John wants to head out for a beer now, instead of fixing the problem that he just introduced into the build?” If we applied the rules capriciously or based on our feelings, they would lose some of their effectiveness, right? That is precisely what we mean by sine ira et studio in bureaucracy. Mike Cohn, for example, tells us that “improving technical practices is not optional.”15 The phrase not optional sounds like another way of saying that the rule is to be applied “without anger or bias.” Mary Poppendieck, coauthor of the canonical works on Lean software development, uses curiously similar language in her introduction to Greg Smith and Ahmed Sidky’s book on adopting Agile practices: “The technical practices that Agile brings to the table—short iterations, test-first development, continuous integration—are not optional.” I’ve already mentioned Schwaber and Sutherland’s dictum that “the Development Team isn’t allowed to act on what anyone else [other than the product owner] says.”17 Please don’t hate me for this, Mike, Mary, Ken, and Jeff, but that is the voice of the command-and-control bureaucrat. “Not optional,” “not allowed,” – I don’t know about you, but these phrases make me think of No Parking and Curb Your Dog signs.

These are the kind of thought-trains that only ever evoke “well, of course my intention wasn’t the awful!” from the other side. It’s like with the ITIL and the NRA, gun-nut people: their goal wasn’t to put in place a thought-technology that harmed people, far from it.

Gentled nestled in his wry tone and style (which you can image I love), you can feel some hidden hair pulling about the unintended consequences of Agile confidence and decrees. I mean, the dude is the CIO of a massive government agency, so he be throwing process optimism against brick walls daily and too late into the night.

Learning bureaucracies

The cure, as ever, is to not only to be smart and introspective, but to make evolution and change part of your bureaucracy:

Rules become set in stone and can’t change with circumstances. Rigidity discourages innovation. Rules themselves come to seem arbitrary and capricious. Their original purpose gets lost and the rules become goals rather than instruments. Bureaucracies can become demoralizing for their employees.

So, you know, make sure you allow for change. It’s probably good to have some rules and governance around that too.

Often, people just won’t change, or, there is no magic to thaw the frozen mind

Under the auspices of “how can we DevOps around here?” I’ve had numerous conversations with organizations who feel like they can’t get their people to change over to the practices and mind-set that leads to doing better software. The DevOps community has spent a lot of time trying to gently bring along and even brain-hack resistant people and there are numerous anecdotes and studies that doing things in DevOps/cloud native/agile/etc. way not only make businesses run more smoothy and profitably, but make the actual lives of staff better (interested in leaving work at 5pm and not working on weekends?)

Still, for whatever reasons, the idea of “the frozen-middle” is reoccurring. In government work, there’s often the frozen-ground” with staff throwing down their heels as well.

Most of the chatter on this topic is at the team level, and usually teams in already flexible, new organizations (like all those logos on scare-slides in presentations from people like Pivotal: UBER! AIRBNB! TESLA! SOFTWARE-IS-EATING-THE-WORLD, INC.!) If you’re already in a company that can change its culture every year, the pain of these “frozen” people is almost nil in comparison to being in government, a 50-100 years old company, or otherwise “in the real world.”

Right here, I’m supposed to establish empathy with these frozen people to get them to change. I should stop calling them “frozen people” and treating them like “resources,” and think of them as people. That is all true, but what do I do after I’ve cavorted with them in empathy splash-pads for months on end and we still can’t get them to configure the firewall so we can do the DevOps? What do you do with enterprise architects who have become specialists in telling you how impossibly fucked up everything is? How do you work with all those “frozen middle-managers” who are acting as a “shit umbrella” (or maybe in this case, a “rainbow umbrella”?) to all your fanciful DevOps notions?

There’s one last ditch tactic: money and firing. When it comes to changing the culture in a large orginizations, this is the big hammer that upper-level management has. That is, the middle-manager’s manger is the one who has to tools to fix the problem. By design, the surest, easiest, sanctioned way to change behavior is to change cash pay-outs and other compensation based on not only the performance of people (management and staff), but their willingness to adapt and change to new, better ways of operating. That is: you reward and punish people based on them doing what you told them to do, or not.

There’s all sorts of “little and cheap” things to do instead of bringing out the big hammer:

  • Giving out acrylic trophies, showering people in attention from high-level execs
  • Fame and glory internally and externally
  • T-shirts (no, really! They work amazingly well)
  • Sanctioning goofing off (“we may have a 15 day holiday policy, but, you know, you should leave work at 4 every day and don’t log vacation”)

Matt Curry covers most of these management-hacks in a great talk on changing large organizations. But let’s assume those aren’t working.

At this point, “leadership” (middle-management’s management) needs to reward and punish the unfreezing or freezing. If middle-management changes (“unfreezes”), they get paid more. If they don’t unfreeze at least their pay needs to freeze, and at “best” they need to get quarantined or fired (as a tone-note, back when I was young and the idea of getting “laid-off” or “RIFed” started being used I reprogrammed myself to only ever say “fired” in reference to loosing your job: those soft, BS words obscure the fact that you’ve been, well, fired – so, feel free to s/// whatever more HR-correct phrase you like into there).

If you look at the patterns of change in most companies going cloud native, they do exactly this. Organizations that successful change so that they can start doing software better set up separate “labs” where the “digital transformation” happens. These are often logically and physically separate: they often have new names and are often “off-site” or well hidden, in the skunk-works style. They do this because if they stay mentally and physically in the “old” system, they end up getting frozen simply due to proximity. The long-term, managerial strategy, here, is an application of the strangler pattern to the “legacy” organization: eventually, the new “labs” organization, through value to the business and sheer size, becomes the “normal” organization.

The staff process they use – almost as a side-effect! – is to only move over people who are willing to change. I’d argue that “skills” and aptitude have little to do with the selection of people: any person in IT can learn new ways of typing and right clicking if the company trains them and helps them. We’re all smart, here.

What happens is that you cull the frozen people, either leaving them behind in the old organization (where there’s likely plenty of work to be done keeping the legacy systems alive!), or just outright firing them. That’s all brutal and not in the rainbows and sandals spirit of DevOps, but it’s what I see happening out there to successfully change the fate of large organizations’ IT departments.

You don’t really hear about this in cheery keynotes at conferences, but at dark bars and (I shit you not) in quiet parking garages I’ve talked with several executives who paint out this unfreezing-by-attrition process as something that’s worked in their organization. One of them even said “we should have gotten rid of more people.”

This pattern gets more difficult in highly labor-regulated industries – like government work and government contractors – but setting up brand new organizations to shed the frozen old ways is at least something to try.

And, barring all else, as I like to say, if nothing is working, and people won’t actually change, Pivotal is always hiring.

If that’s too grim, don’t despair! My buddies have a much light-hearted, hopefully whack at the problem in a recent discussion of ours:

The problem with the “DevOps tools” category, and the RightScale Cloud Survey

There’s always some good year/year stuff in this survey. I’ll have to check it out soon.

Meanwhile, some coverage from Serdar Yegulalp:

“DevOps tools”:

This category is going to be increasingly weird to cover. Here, what they really mean are “containers and container orchestration tools…oh, and configuration management tools.”

Arguably, you’d put something like Cloud Foundry in there if you have a gumbo of three different software categories, each used by people who want to follow DevOps practices and get the benefits of improving software. We haven’t put public numbers recently, but back in September of 2015 Pivotal Cloud Foundry had $100m in annual bookings, and the growth has continued.

Here’s how 451 characterized our customer base and foot-print out there:

While Pivotal Labs’ early success was among startups, PCF was designed for organizations needing to increase velocity and change their application delivery process, which appeals to large enterprises. About 90% of Pivotal’s revenue comes from enterprise customers today. The company reports that its number of paying customers is in the hundreds, with average deal sizes in the high six-figure range. It also says most of its customers are using PCF on top of VMware’s IaaS offering. AWS is currently in second place among PCF deployments with Azure as a close third that is gaining momentum among the Fortune 100.

All of that means: Pivotal Cloud Foundry has many customers running in production (with the resulting revenue) and, I’d argue, much market share. If you throw in IBM and MicroFocus/HPE’s Cloud Foundry distro revenue, plus OSS use (like at 18F), you’ve got a large chunk of Cloud Foundry usage out there, which should translate into a large chunk of “doing software better (with DevOps)” marketshare for all of Cloud Foundry.


Like I said, coverage will be weird for awhile as people sort it out. Kind of like “cloud” was in 2009, and pretty much every “PaaS” category now, though it’s getting better. I’m looking forward to what they sort out in the new(ish?) DevOps practice at IDC.

More coverage of the survey:

Private, public, hybrid:

One key trend RightScale’s tracked over 2014, 2015, and 2016 has been the use of private cloud through OpenStack, VMware, and related products. This time around, while private cloud adoption hasn’t fallen off a cliff, it’s edged down: 72 percent of respondents this year run a private cloud, versus 77 percent last year and 63 percent the prior year. Hybrid cloud has shown a similar trajectory of 67 percent in 2017, 71 percent in 2016, and 58 percent in 2015.

After margin of error, pretty slow, but a good enough pace I guess.

Make no mistake: AWS is cloud king, and it’ll safely remain that way for a long time. But Microsoft Azure has made remarkable amount of progress in the last year. In 2016, only 20 percent of respondents were running apps there; this year, it’s 34 percent. With enterprise users specifically, the growth was even more dramatic, from 26 percent to 43 percent.

And, Oracle and DigitalOcean:

Azure’s growth isn’t coming at the expense of business from other cloud providers. Oracle Cloud and DigitalOcean, the two decliners on the list, didn’t have enough business to lose in the first place to constitute Azure’s jump: respectively, 4 and 5 percent in 2016, and 3 and 2 percent this year.

Now, of course all this hinges on what exactly they mean by cloud, what the sample base is (just RightScale customers?), etc. But, anything that’s multi-year is useful.

See also Barb’s brief coverage, e.g.: “Azure, meanwhile, saw a bigger jump with 43% of respondents on Azure now, compared to 26% last year.”

Source: Docker’s tops for devops, AWS is the cloud king, InfoWorld

Gary Gruver interview on scaling DevOps

I always like his focus in speeding up the release cycle as a forcing function for putting continuous integration in place, both leading to improving how an organization’s software:

I try not to get too caught up in the names. As long as the changes are helping you improve your software development and delivery processes then who cares what they are called. To me it is more important to understand the inefficiencies you are trying to address and then identify the practice that will help the most. In a lot of respects DevOps is just the agile principle of releasing code on a more frequent basis that got left behind when agile scaled to the Enterprise. Releasing code in large organizations with tightly coupled architectures is hard. It requires coordinating different code, environment definitions, and deployment processes across lots of different teams. These are improvements that small agile teams in large organizations were not well equipped to address. Therefore, this basic agile principle of releasing code to the customer on a frequent basis got dropped in most Enterprise agile implementations. These agile teams tended to focus on problems they could solve like getting signoff by the product owner in a dedicated environment that was isolated from the complexity of the larger organization.


You can hide a lot of inefficiencies with dedicated environments and branches but once you move to everyone working on a common trunk and more frequent releases those problems will have to be address. When you are building and releasing the Enterprise systems at a low frequency your teams can brute force their way through similar problems every release. Increasing the frequency will require people to address inefficiencies that have existed in your organization for years.

On how organization size changes your managerial tactics:

If it is a small team environment, then DevOps is more about giving them the resources they need, removing barriers, and empowering the team because they can own the system end to end. If it is a large complex environment, it is more about designing and optimizing a large complex deployment pipeline. This is not they type of challenges that small empowered team can or will address. It takes a more structured approach with people looking across the entire deployment pipeline and optimizing the system.

The rest of the interview is good stuff. Also, I reviewed his book back in November; the book is excellent.