Turns out of course it’s not just Developer Time To Suck that is shrinking. Operations is heading the same way. Folks at Pivotal are saying that operating systems don’t matter, as we’ve moved further up the stack. Cloud Native is a proxy for saying much the same thing. But then, something is being written right now that will supplant Kubernetes. If we’re not running our own environments in house, operations disposability become increasingly realistic. Cattle not pets, for everything.
A recent rendition of one of my standard talks at the Austin DevOps Meetup. See the slides as well.
This month’s column is on finding and vetting consultants.
Check out the piece!
New types of software and delivery mechanisms (SaaS, mobile) mean new problems and scale:
“We were so used to dealing with tens of servers and suddenly it was hundreds and thousands of servers,” which in turn created more work for the development teams.
“The digital expansion of business equals more work and firefighting,” Cox said.
Less time spent doing dumb-shit:
employees used to spend the eight hours of the park closed every night, manually updating each server. Now only one person can update the whole fleet in 30 minutes.
Some guiding principals and management challenges:
Cox said that leading a change of this order of magnitude involved three crucial ingredients:
1. Collaboration: break down silos, mutual objectives.
2. Curiosity: keep experimenting.
3. Courage: candor, challenge, no blaming or witch-hunting.
But these can come with its own leadership challenges, including:
• The politics of command and control.
• How new leadership can take a company in a new direction.
• The blame bias of who versus what.
And, some good motivation:
We keep moving forward, opening up new doors, doing more things because we’re curious.
Everyone always wants to know metrics. While the answer is always a solid “it depends – I mean, what are your business goals and then we can come up with some KPIs,” there’s a reoccurring set of technical metrics. Nicole lists some off:
These IT performance metrics capture speed and stability of software delivery: lead time for changes (from code commit to code deploy), deployment frequency, mean time to restore (MTTR), and change fail rate. It’s important to capture all of these because they are in tension with each other (speaking to both speed and stability, reflecting priorities of both the dev and ops sides of the team), and they reflect overall goals of the team. These metrics as a whole have also been shown to drive organizational performance.
And, then, further summarized by Daniel Bryant:
Key metrics for IT performance capture speed and stability of software delivery, and include: lead time for changes (from code commit to code deploy), deployment frequency, mean time to restore (MTTR), and change fail rate.
Also in the interview, a concise DevOps definition:
I define DevOps as a technology transformation that drives value to organizations through an ability to deliver code with both speed and stability.
See the rest.
Continuing the DevOpsDays Austin interview series, Barton and I talked with Patrick Debois on a variety of topics:
Check out the full show notes for more and podcast subscription options.
My The Register column this month is on scaling DevOps/cloud-native teams to the entire organization. It’s easy to build one team that does software in a new and exciting way, but how can you move to two teams, five teams, and then 100’s? It goes over the amalgamation of a few case studies and plenty of over-the-top gonzo analogies, per usual.
It was a pretty good episode:
In preparation for his DevOpsDays Atlanta talk, Josh and Coté (well, mostly Coté) talk about the relationship between microservices and DevOps. They use the CAMS framing to go over how microservices could provide the architectural requirements to make DevOps possible.
What does it really mean to “run like Google”? Is that even a good idea? Andrew Shafer comes back to the podcast to talk with Coté about how the Google SRE book and the newly announced Google CRE program start addressing those questions. We discuss some of the general principals, and “small” ones too that are in those bodies of work and how they represent an interesting evolution of it IT management is done. Many of the concepts that the DevOps and cloud-native community talks about pop in Google’s approach to operations and software delivery, providing a good, hyper-scale case study of how to do IT management and software development for distributed applications. We also discuss Pivotal’s involvement in the Google CRE program.