Scaling DevOps in large organizations – My April Register column

apololypse_now_smell_of_napalm_in_the_morning

My The Register column this month is on scaling DevOps/cloud-native teams to the entire organization. It’s easy to build one team that does software in a new and exciting way, but how can you move to two teams, five teams, and then 100’s? It goes over the amalgamation of a few case studies and plenty of over-the-top gonzo analogies, per usual.

Check it out, and check out past ones if you’re curious for more.

The news from Docker-land, plus, the money being fought over – Notebook

With DockerCon this week, there’s no end of Docker quotables and items. Here’s my collection

General momentum

Once landed in an account, Docker usage grows their CEO says:

There has also been expansion within customers, with organizations that start with Docker expanding their usage on average by five times within six months

Way back in 2015, the (now annual?) DataDog study of Docker usage among their customers said that 2/3 of companies that try Docker adopt it. Which is all to say: once it gets in, it spreads.

Moby

A toolkit for putting together docker stacks:

In essence, Moby is the build system that creates Docker Community Edition, which is akin to Fedora, and Docker Enterprise is derived from Moby and is akin to Red Hat Enterprise Linux. Link

People got all freaked out. I’d even say “freaked the fuck out.” Competitors, of course, gloated, if only in silence. Criticism of handling the announcement aside (ideally, you wouldn’t like to kick up a stink), I feel like it was more like a tempest in a teapot.

Docker momentum/penetration and types of applications/workloads

Global 2000 customers have somewhere on the order of thousands to tens of thousands of applications, and across these major firms, less than 5 percent of the applications have been containerized so far. While somewhere between 5 percent and 10 percent of the applications that are being containerized are net-new, microservices-style applications that everyone is talking about all the time, the other 90 percent to 95 percent are just lifting and shifting legacy applications from bare metal or virtual machines to containers. Link

VMware threat…or just legacy gobbling?

Docker bounces back and forth between “replacement for VMware” and “a different thing, so don’t worry about VMware.” In this round of Docker news, there’s been some strong pull towards the “replacement for VMware” camp. To be fair, it’s more like doing both:

In general, says Johnston, customers who move from bare metal or VMs to Docker containers can provision, scale, and deploy applications up to 75 percent faster, and those moving from bare metal to containers can save 50 percent on compute and those who are moving from VMs will save around 25 percent. Link

This might also come from the obvious move to start gobbling up legacy (more accurately “existing”) applications. Here, Docker had two customer reference:

Northern Trust, a leading international financial services company, experienced  deployment times that were 4X faster and noted a 2X improvement in infrastructure utilization

And, Microsoft IT:

Microsoft is not only a partner in this program; their IT organization is also a beta customer.  Microsoft IT increased app density 4X with zero impact to performance and were able to reduce their infrastructure costs by a third.

There was also a story of Visa using Docker:

Kocherlakota said Visa is aiming to move as many workloads at it can to the container model to help improve overall efficiency.

See more on this legacy migration stuff and the program with Avanade, Cisco, HP, and Microsoft from Docker’s Scott Johnson.

Major vendors

Other tech companies are often cautious about working with Docker. They’re not really certain about how it helps or threatens their position in the IT stack and, therefore, their ability to sell higher profit margin products and services. No one wants to become the x86 manufacturer of the cloud (read: low margin, commodity).

I’ve noticed this cautiousness slightly melting as more and more vendors are at least putting their stuff in Docker images and, on the public cloud front, supporting the use of Docker. My company, Pivotal, ingests Docker images.

A brief whack at why Microsoft cares, from Christopher Tozzi:

Although there remains work to do to get Docker on Windows ready for prime time, the platform will be important in helping Windows Server stay as nimble as Linux environments in hosting the workloads of the future…. Microsoft’s interest in Docker may seem strange. Microsoft already offers traditional virtual machine products, most notably Hyper-V. In some respects, Docker containers compete with virtual machine platforms…. But that’s not necessarily the case. Depending on how they’re used, containers can complement virtual machines, rather than replace them. If you use virtual machines to host the environment in which Docker runs, your Docker environment becomes more scalable and portable than it would be if it ran on bare metal. That’s likely the type of use case Microsoft envisions for containers on Windows.

More from Nick Martin on Microsoft and Docker.

Oracle bundling middleware in Docker containers:

Oracle becomes the latest enterprise IT vendor to jump on the Docker container bandwagon as it seeks to expand its reach in the public cloud market. Among the container-based application, middleware and development tools made available on the container platform are Oracle’s MySQL database and its WebLogic server. Those tools are in addition to the more than 100 images of Oracle products already available on Docker Hub, its cloud-based image registry.

So, what’s going on here? Staking a claim on The New Stack

I’m often asked to explain all the various cloud stacks, to help Pivotal buyers sort out what CaaS, PaaS, cloud-native, and “cloud strategy” means. They’re trying to figure out their planning for building out new IT, for “doing DevOps.” It’s a mess out there w/r/t to figuring all this out if you’re not a vendor or analyst who’s steeped in this shoggoth every day.

In all the Docker, container, and cloud-native wars, the revenue battle for vendors is mostly about two things:

  1. The pool of money in simply migrating the VMware workload to a new, more efficient layer, hence the ongoing attention to “the VMware threat” that Docker poses). I’m not sure how big this market is because, as a disruptive shift (cf. Linux vs. UNIX vs. Windows vs. z) part of it is reducing the overall spend through lower prices and more efficient usage. But, the existing virtualization market is best described as “fucking huge.”
  2. Fighting over who “owns” (and therefore collects the most profit from) the stack that companies are using to build and run their software. By my estimate, this is something like around a $20-25bn market in the future. You can see a Spanish Civil War like precursor going on in the Java application server market; it’s spreading to a “World War” with respect to all custom software stacks.

On that second point, here’s my latest attempt to describe how things are shaking out category/definition wise:

Of all the SPI cloud categories, PaaS is the most problematic place as all us vendors hate the PaaS term and are trying to re-define what it means. I would break PaaS into two categories currently: (1.) container orchestration, and, (2.) cloud platform.

Container orchestration takes an IaaS and manages the installation and configuration of container images on your new cloud. By “images” here, I mean that you’ve chosen to put your software (probably custom written software, not packaged software) into containers (or the delegated way we do it with buildpacks in CF), specified how all the different nodes are wired together with all the ACLs and configuration, and then given it over to the orchestration software to deploy those containers, set the configuration, and do the ongoing health-checks/remediation.

Ideally, the orchestration platform should also have “day 2” tools to help you monitor and manager (“fix”) problems that happen in production. I assume things like kubernetes, the Docker/Moby constellation of things, Mesosphere, etc. fit here.

People are obsessed with container orchestration now and it’s pretty much all anyone talks about. I think all this is what’s becoming known as “CaaS” – Containers as a Service.

(On this next section, I’m extremely monetarily biased, of course:) A cloud platform either has or depends on an orchestration layer, but adds in integrated middle-ware, ALM tools (from basics like “cf push”, and an overall programming and deployment model with all the tools and enforcements. Heroku is the classic example here in public cloud, and now Cloud Foundry (CF) has taken over this model in public and private cloud, the second (it seems) where most of the usage and money is, at least in the enterprise space. I’d argue, that CF is the enterprise market-leader (by revenue at least, but increasingly penetration in the F500 – while Pivotal has impressive numbers, throw in the other CF distros and it’s even larger, no doubt); at the very least, “the highest growth and in enterprise production usage.” That all depends how you slice it, and of course my slicing favors me.

A cloud platform “pulls together” everything into a fully working “cloud” that deploy and provisions the servers, builds/maintains/deploys the containers, takes care of your networking configuration and concerns (inc. firewalls, etc.), and configs/manages all the middleware needed (e.g. “I want a database” means you just ask for it, instead of having to configure it and make container images of it and specify how it all works together).

The end goal of a cloud platform is the original end-goal of a PaaS: developers don’t have to “setup” any of the infrastructure or, really, middleware (databases, queues, etc.) that they use: they just write the “business logic” of their applications.

All this standardization is technically “restrictive” (developers can’t just install anything they download off the Internet, it has to be integrated into the platform). This is why we often call this model “opinionated,” but it follows the same contract/promises model that Google SREs follow: we promise we can support your applications in production if you use only the things we support, otherwise it’s all on you.

However, the benefit of such opinions is a huge jump in productivity as we see at all our customers: one Pivotal customer manages 1,000+ applications (all angles toward very frequent, DevOps-style releases for fast feedback loops and all that small batch stuff) with just 4 PCF operations staff, etc.

Our DIY white paper makes the case that snow-flaking this all out is a bad idea. At the very least, if you build your own platform, you should try to just have one used organization wide.

In comparing CaaS and cloud platform, the key distinction to me is that a cloud platform bundles and integrates together all your middleware and “services” frameworks. For example, if you want to do microservices with all the bulk-heads and such, that functionality should be built into the cloud platform – you should have to go read-up how to set most of that up. PCF, of course, has Spring Cloud and more for that. All of the systems management tools (thing used in production to detect and fix problems) should also be built in, or the cloud platform should be instrumented so deeply that third party tools can do the managing as well.

Now, these two categories are likely to converge, and then the discussion will just be which cloud platforms are more featureful and better. It’ll be like battling Java application servers.

I haven’t made one of my own “burger” stacks of all this in a long time, but I think (again, highly biased) the ones we use for PCF are pretty good:

More

In case you don’t know, working at Pivotal, I obviously have a stake in how all this turns out, so I’m biased on multiple angles of the above whether I want to be or not. 

With no competition, government websites often have no incentive to be good

In contrast to agile, private-sector companies, the public sector does not face any pressure from competition. When it comes time to renew your license, there is only one place for you to do that: and, unfortunately for Americans, that’s the DMV. With no competitive forces, government agencies do not have to innovate or take bold risks when it comes to digital.

And, as ever, being smart about using updated tools and new methods yield huge productivity results:

While running technology for Obama’s WhiteHouse.gov, open-source solutions enabled our team to deliver projects on budget and up to 75% faster than alternative proprietary-software options. More than anything, open-source technology allows governments to utilize a large ecosystem of developers, which enhances innovation and collaboration while driving down the cost to taxpayers.

While open source has different cost dynamic, I’d suggest that simply switching to new software to get the latest features and mindset that the software imbues gives you a boost. Open source, when picked well, will come with that community and an ongoing focus on updates: older software that has long been abandoned by the community and vendors will stall out and become stale, open or not.

With most large organizations, and especially government, simply doing something will give you a huge boost in all your KPIs in the short term. Picking a thriving, vibrant stack is critical for long term success. Otherwise, five or ten years from now, whether using open or closed source, you’ll end up in the same spot, dead in the water and sucking.

Link

DIUx working in streamlining IT projects at the DoD

Since May 2016, DIUx has completed 21 contracts using other transaction (OT) authority and the average time is 78 days, Shah said at the New America Foundation Future of War summit in Washington.

The mission of DIUx, he said, “is to do agile culture change.…We are never going to be the acquisition arm of the Department of Defense, we’re not the R&D arm of the department.”
DIUx has so far comprised $42 million in program funding, which Shah characterized as a “rounding error of a rounding error” of the DOD budget.

Hey, they’re trying over there in the government. It ain’t easy. I’ve meet with some of the folks there and they sure seem genuine about fixing things up and curious to work closer with the civilian IT world.

When I meet with military people they use the word “agile” over and over: meaning, they’re incredibly interested in modernizing. It’s just the tiny matter of figuring out how to get from here to there.

Link

Vanguard’s thinking on microservices

Breaking up the monolith with good, old fashioned, OO-think:

Instead, Vanguard has begun a journey to break apart our monolithic legacy systems piece-by-piece by replacing them with microservices over time. With a microservices architecture, we remove the business logic and data logic from our applications and replace it with a set of re-usable modules of code that are built and deployed as independent entities. We then compliment this architecture by chunking out our user interfaces into modular purpose-built components.

De-coupling for stability and resiliency, among other things:

This service-based approach to application architecture provides a variety of advantages over the jumble of code that defines a non-modular monolithic application. First, services reduce redundancy by making sure there is only one copy of application logic for a given capability – regardless of how many applications leverage that logic. In the long run, this leads to lower development costs and increases speed to market. Second, since these services are deployed independently and built in a resilient manner, outages in one area of an application are less likely to bring down an entire system. In some instances, several of our services can be down without our clients being aware of a loss in functionality thanks to the ability of our applications to automatically react to a service that isn’t available. Finally, services enable our applications to scale easier. The marriage of cloud and services means we can quickly spin up infrastructure to handle surges in the number of transactions we need to handle without needing to scale up an entire application.

Vanguard CIO: Why we’re on a journey to evolve to a microservices architecture

Pivotal Conversations: “Running like Google,” the CRE Program & Pivotal, with Andrew Shafer

The summary:

What does it really mean to “run like Google”? Is that even a good idea? Andrew Shafer comes back to the podcast to talk with Coté about how the Google SRE book and the newly announced Google CRE program start addressing those questions. We discuss some of the general principals, and “small” ones too that are in those bodies of work and how they represent an interesting evolution of it IT management is done. Many of the concepts that the DevOps and cloud-native community talks about pop in Google’s approach to operations and software delivery, providing a good, hyper-scale case study of how to do IT management and software development for distributed applications. We also discuss Pivotal’s involvement in the Google CRE program.

Check out the SoundCloud listing, or download the MP3 directly.

If compliance is so important, bake it into the platform

Can we take that governance and work with the platform team to codify, to automate that which they were doing on a per application basis – that’s, quiet frankly slowing down the delivery of the software – can we take that governance and can we have them work with the platform team to codfiy, to actually automate on a per application basis, have them expose that as a service on the platform

Cornelia Davis on governance and cloud-native, “Who Does What? Mapping Cloud Foundry Activities and Entitlements to IT Roles,” August 2016

In other words: you should not only automate the audit three-ring binders of compliance, but enforce as much as possible in the platform.

The rest of the talk is good stuff on how think through re-arranging your organization to be all DevOps-y, with the help of Pivotal Cloud Platform to automate all the infrastructure and middleware stuff:

Cloud-Native Cookbook – beyond “survival is not mandatory”

I started a new booklet project, the Cloud Native Cookbook.

The premise is this:

The premise of this book is to collect specific, tactical advice transitioning to a cloud-native organization. The reader is someone who “gets it” when it comes to agile, DevOps, cloud native, and All the Great Things. Their struggle is actually putting it all in place. Any given organization has all of it’s own, unique advantages and disadvantages, so any “fix” will be situational, of course.

This cookbook draws from actual experiences of what worked and didn’t work to try to help organizations hack out a path to doing software better. While we’ll allow ourselves some “soft,” cultural things here and there, each of the “recipes” should be actionable, tangible items. At the very least, the rainbows and unicorns stuff should have concrete examples, e.g., how do you get people to actually pair program when they think it’s a threat to their self-worth?

As with my previous cloud-native booklet, I have this one open for comments as I’m working on it. It’d be great to get your input.

Here’s some slides I’ve been using around all this.

Pacing cloud-native transformation, and actually doing the work to increase productivity

I like to tell large organizations that compared to the break-neck pace of “the silicon valley mindset,” they can operate at a leisurely pace. That pace is usually fast for these enterprises, but their problem set and risk profile is a lot different than hats on cats. Abby has a nice, short write-up that hits on this topic among others:

By the end of his first year, Safford and his teams had built prototypes and market tests and finished 16 new software projects.

At Home Depot, they were at about 140 to 150 projects after a year or so. However, it’s common in the first year to do a lot of replatforming of “simple,” mostly cloud-native compatible apps in there. You can do these at a pretty fast clip, with the rule of thumb being 10 apps in 10 weeks. This is in addition to new applications, but explains high numbers like those at Home Depot. I suspect the Allstate numbers are mostly net-new apps, though.

Goals:

Safford’s eventual goal is to shift Allstate software development to 70 percent extreme agile programming and 30 percent traditional scrum and waterfall. Where developers used to spend only 20 percent of their time coding software, today up to 90 percent of their days are spent programming. Each of his CompoZed development labs around the world has the same startup look and feel, including scooters parked in the hallways. This is not your grandfather’s insurance company anymore.

What you hear over and over again from organizations going cloud-native is that developers were spending lots of time in meetings, checking email, and otherwise not coding (and, yes, by “coding” I don’t mean just recklessly LOC‘ing it up without design, and all that). Management had to spend much effort to get them back to coding.

As I fecklessly tell my seven year old when he’s struggling with homework: the only way to finish this quickly is to actually do the work.

(Also: nice write-up from Abby!)

Source: Don’t Forget People and Process in Your Digital Transformation