Vanguard’s thinking on microservices

Breaking up the monolith with good, old fashioned, OO-think:

Instead, Vanguard has begun a journey to break apart our monolithic legacy systems piece-by-piece by replacing them with microservices over time. With a microservices architecture, we remove the business logic and data logic from our applications and replace it with a set of re-usable modules of code that are built and deployed as independent entities. We then compliment this architecture by chunking out our user interfaces into modular purpose-built components.

De-coupling for stability and resiliency, among other things:

This service-based approach to application architecture provides a variety of advantages over the jumble of code that defines a non-modular monolithic application. First, services reduce redundancy by making sure there is only one copy of application logic for a given capability – regardless of how many applications leverage that logic. In the long run, this leads to lower development costs and increases speed to market. Second, since these services are deployed independently and built in a resilient manner, outages in one area of an application are less likely to bring down an entire system. In some instances, several of our services can be down without our clients being aware of a loss in functionality thanks to the ability of our applications to automatically react to a service that isn’t available. Finally, services enable our applications to scale easier. The marriage of cloud and services means we can quickly spin up infrastructure to handle surges in the number of transactions we need to handle without needing to scale up an entire application.

Vanguard CIO: Why we’re on a journey to evolve to a microservices architecture

Pivotal Conversations: “Running like Google,” the CRE Program & Pivotal, with Andrew Shafer

The summary:

What does it really mean to “run like Google”? Is that even a good idea? Andrew Shafer comes back to the podcast to talk with Coté about how the Google SRE book and the newly announced Google CRE program start addressing those questions. We discuss some of the general principals, and “small” ones too that are in those bodies of work and how they represent an interesting evolution of it IT management is done. Many of the concepts that the DevOps and cloud-native community talks about pop in Google’s approach to operations and software delivery, providing a good, hyper-scale case study of how to do IT management and software development for distributed applications. We also discuss Pivotal’s involvement in the Google CRE program.

Check out the SoundCloud listing, or download the MP3 directly.

If compliance is so important, bake it into the platform

Can we take that governance and work with the platform team to codify, to automate that which they were doing on a per application basis – that’s, quiet frankly slowing down the delivery of the software – can we take that governance and can we have them work with the platform team to codfiy, to actually automate on a per application basis, have them expose that as a service on the platform

Cornelia Davis on governance and cloud-native, “Who Does What? Mapping Cloud Foundry Activities and Entitlements to IT Roles,” August 2016

In other words: you should not only automate the audit three-ring binders of compliance, but enforce as much as possible in the platform.

The rest of the talk is good stuff on how think through re-arranging your organization to be all DevOps-y, with the help of Pivotal Cloud Platform to automate all the infrastructure and middleware stuff:

Cloud-Native Cookbook – beyond “survival is not mandatory”

I started a new booklet project, the Cloud Native Cookbook.

The premise is this:

The premise of this book is to collect specific, tactical advice transitioning to a cloud-native organization. The reader is someone who “gets it” when it comes to agile, DevOps, cloud native, and All the Great Things. Their struggle is actually putting it all in place. Any given organization has all of it’s own, unique advantages and disadvantages, so any “fix” will be situational, of course.

This cookbook draws from actual experiences of what worked and didn’t work to try to help organizations hack out a path to doing software better. While we’ll allow ourselves some “soft,” cultural things here and there, each of the “recipes” should be actionable, tangible items. At the very least, the rainbows and unicorns stuff should have concrete examples, e.g., how do you get people to actually pair program when they think it’s a threat to their self-worth?

As with my previous cloud-native booklet, I have this one open for comments as I’m working on it. It’d be great to get your input.

Here’s some slides I’ve been using around all this.

Pacing cloud-native transformation, and actually doing the work to increase productivity

I like to tell large organizations that compared to the break-neck pace of “the silicon valley mindset,” they can operate at a leisurely pace. That pace is usually fast for these enterprises, but their problem set and risk profile is a lot different than hats on cats. Abby has a nice, short write-up that hits on this topic among others:

By the end of his first year, Safford and his teams had built prototypes and market tests and finished 16 new software projects.

At Home Depot, they were at about 140 to 150 projects after a year or so. However, it’s common in the first year to do a lot of replatforming of “simple,” mostly cloud-native compatible apps in there. You can do these at a pretty fast clip, with the rule of thumb being 10 apps in 10 weeks. This is in addition to new applications, but explains high numbers like those at Home Depot. I suspect the Allstate numbers are mostly net-new apps, though.


Safford’s eventual goal is to shift Allstate software development to 70 percent extreme agile programming and 30 percent traditional scrum and waterfall. Where developers used to spend only 20 percent of their time coding software, today up to 90 percent of their days are spent programming. Each of his CompoZed development labs around the world has the same startup look and feel, including scooters parked in the hallways. This is not your grandfather’s insurance company anymore.

What you hear over and over again from organizations going cloud-native is that developers were spending lots of time in meetings, checking email, and otherwise not coding (and, yes, by “coding” I don’t mean just recklessly LOC‘ing it up without design, and all that). Management had to spend much effort to get them back to coding.

As I fecklessly tell my seven year old when he’s struggling with homework: the only way to finish this quickly is to actually do the work.

(Also: nice write-up from Abby!)

Source: Don’t Forget People and Process in Your Digital Transformation

The role of enterprise architects in cloud-native organizations

My colleague Richard has a nice post suggesting the new functions enterprise architects can play in a cloud-native organization. I like this one in particular, help make the change:

Champion new team organization patterns. As an architect, you can bring developers and operations teams together. Recognize that functional silos slow down delivery. A DevOps-type approach really works. Architects are perfectly positioned to pioneer new team structures that increase velocity and customer attentiveness.

It’s brief, but there’s plenty of other good chunks of advice in there.

Source: How to Remaster Enterprise Architecture for a Cloud-Native World

Making mainframe applications more agile, Gartner – Highlights

In a report giving advice to mainframe folks looking to be more Agile, Gartner’s Dale Vecchio and Bill Swanton give some pretty good advice for anyone looking to change how they do software.

Here’s some highlights from the report, entitled “Agile Development and Mainframe Legacy Systems – Something’s Got to Give”

Chunking up changes:

  1. Application changes must be smaller.
  2. Automation across the life cycle is critical to being successful.
  3. A regular and positive relationship must exist between the owner of the application and the developers of the changes.


This kind of effort may seem insurmountable for a large legacy portfolio. However, an organization doesn’t have to attack the entire portfolio. Determine where the primary value can be achieved and focus there. Which areas of the portfolio are most impacted by business requests? Target the areas with the most value.

An example of possible change:

About 10 years ago, a large European bank rebuilt its core banking system on the mainframe using COBOL. It now does agile development for both mainframe COBOL and “channel” Java layers of the system. The bank does not consider that it has achieved DevOps for the mainframe, as it is only able to maintain a cadence of monthly releases. Even that release rate required a signi cant investment in testing and other automation. Fortunately, most new work happens exclusively in the Java layers, without needing to make changes to the COBOL core system. Therefore, the bank maintains a faster cadence for most releases, and only major changes that require core updates need to fall in line with the slower monthly cadence for the mainframe. The key to making agile work for the mainframe at the bank is embracing the agile practices that have the greatest impact on effective delivery within the monthly cadence, including test-driven development and smaller modules with fewer dependencies.

It seems impossible, but you should try:

Improving the state of a decades-old system is often seen as a fool’s errand. It provides no real business value and introduces great risk. Many mainframe organizations Gartner speaks to are not comfortable doing this much invasive change and believing that it can ensure functional equivalence when complete! Restructuring the existing portfolio, eliminating dead code and consolidating redundant code are further incremental steps that can be done over time. Each application team needs to improve the portfolio that it is responsible for in order to ensure speed and success in the future. Moving to a services-based or API structure may also enable changes to be done effectively and quickly over time. Some level of investment to evolve the portfolio to a more streamlined structure will greatly increase the ability to make changes quickly and reliably. Trying to get faster with good quality on a monolithic hairball of an application is a recipe for failure. These changes can occur in an evolutionary way. This approach, referred to in the past as proactive maintenance, is a price that must be paid early to make life easier in the future.

You gotta have testing:

Test cases are necessary to support automation of this critical step. While the tooling is very different, and even the approaches may be unique to the mainframe architecture, they are an important component of speed and reliability. This can be a tremendous hurdle to overcome on the road to agile development on the mainframe. This level of commitment can become a real roadblock to success.

Another example of an organization gradually changing:

When a large European bank faced wholesale change mandated by loss of support for an old platform, it chose to rewrite its core system in mainframe COBOL (although today it would be more likely to acquire an off-the-shelf core banking system). The bank followed a component-based approach that helped position it for success with agile today by exposing its core capabilities as services via standard APIs. This architecture did not deliver the level of isolation the bank could achieve with microservices today, as it built the system with a shared DBMS back-end, as was common practice at the time. That coupling with the database and related data model dependencies is the main technical obstacle to moving to continuous delivery, although the IT operations group also presents cultural obstacles, as it is satis ed with the current model for managing change.

A reminder: all we want is a rapid feedback cycle:

The goal is to reduce the cycle time between an idea and usable software. In order to do so, the changes need to be smaller, the process needs to be automated, and the steps for deployment to production must be repeatable and reliable.

The ALM technology doesn’t support mainframes, and mainframe ALM stuff doesn’t support agile. A rare case where fixing the tech can likely fix the problem:

The dilemma mainframe organizations may face is that traditional mainframe application development life cycle tools were not designed for small, fast and automated deployment. Agile development tools that do support this approach aren’t designed to support the artifacts of mainframe applications. Modern tools for the building, deploying, testing and releasing of applications for the mainframe won’t often t. Existing mainframe software version control and conguration management tools for a new agile approach to development will take some effort — if they will work at all.

Use APIs to decouple the way, norms, and road-map of mainframes from the rest of your systems:

wrapping existing mainframe functions and exposing them as services does provide an intermediate step between agile on the mainframe and migration to environments where agile is more readily understood.

Contrary to what you might be thinking, the report doesn’t actually advocate moving off the mainframe willy-nilly. From my perspective, it’s just trying to suggest using better processes and, as needed, updating your ALM and release management tools.

Read the rest of the report over behind Gartner’s paywall.

How JPMC is making IT more innovative with PaaS, public and private

wocintech (microsoft) - 154

A good, pretty long overview of JPMorgan Chase’s plans for doing cloud with a PaaS focus. Some highlights.

More than just private-IaaS and DIY-platforms:

Like most large U.S. banks, JPMorgan Chase has had some version of a private cloud for years, with virtualized servers, storage and networks that can be shared in a flexible way throughout the organization.

The bank is upgrading its private cloud to “platform as a service” — in other words, the cloud service will manage the infrastructure (servers, storage, and networks), so that developers don’t have to worry about that stuff.

On the multi-/hybrid-cloud thing:

By the second half of 2017, the bank plans to run proprietary applications on the public cloud. At the same time, it’s building a new, modern internal cloud, code-named Gaia.

While “hybrid-cloud” has been tedious vendor-marketing-drivel over the past ten years, pretty much all of the large organizations I work with at Pivotal have exactly this approach. Public, private, whatever: we want to do it all.

Shifting their emphasis innovation:

“We aren’t looking to decrease the amount of money the firm is spending on technology. We’re looking to change the mix between run-the-bank costs versus innovation investment,” he said. “We’ve got to continue to be really aggressive in reducing the run-the bank costs and do it in a very thoughtful way to maintain the existing technology base in the most efficient way possible.” …Dollars saved by using lower-cost cloud infrastructure and platforms will be reinvested in technology, he said.

On appreciating the scale of “large organizations” that drive their very real challenges with adopting new ways of running IT:

The bank has 43,000 employees in IT; almost 19,000 are developers.

Good luck having the “we have no process by design” process with that setup.

On security, there’s a nice, almost syllogistic re-framing of “cloud security here”:

For years, banks have worried about using the public cloud out of security concerns and fears of what their regulators will say. Ever since the 2013 Target data breach, in which hackers stole card information from 40 million customers by breaking into the computers of an air conditioning company Target used, regulators have strongly urged banks to carefully vet and monitor all third parties, with a specific focus on security.

“We’re spending a significant amount of time to ensure that any applications we choose to run on a public cloud will have the same level of security and controls as those run internally,” Deasy said.

Most notable corporate security breeches over the year have involved on-premises IT (like the HVAC example above). The point is not to make sure that “cloud is as secure as [all that on-prem IT that’s been the source of most security problems in the past], but to make sure that all IT has a rigorous approach to security. “Cloud” isn’t the security problem, doing a shitty job at security is the security problem.

Source: Unexpected Champion of Public Clouds: JPMorgan CIO Dana Deasy, Penny Crosman, American Banker

Often, people just won’t change, or, there is no magic to thaw the frozen mind

Under the auspices of “how can we DevOps around here?” I’ve had numerous conversations with organizations who feel like they can’t get their people to change over to the practices and mind-set that leads to doing better software. The DevOps community has spent a lot of time trying to gently bring along and even brain-hack resistant people and there are numerous anecdotes and studies that doing things in DevOps/cloud native/agile/etc. way not only make businesses run more smoothy and profitably, but make the actual lives of staff better (interested in leaving work at 5pm and not working on weekends?)

Still, for whatever reasons, the idea of “the frozen-middle” is reoccurring. In government work, there’s often the frozen-ground” with staff throwing down their heels as well.

Most of the chatter on this topic is at the team level, and usually teams in already flexible, new organizations (like all those logos on scare-slides in presentations from people like Pivotal: UBER! AIRBNB! TESLA! SOFTWARE-IS-EATING-THE-WORLD, INC.!) If you’re already in a company that can change its culture every year, the pain of these “frozen” people is almost nil in comparison to being in government, a 50-100 years old company, or otherwise “in the real world.”

Right here, I’m supposed to establish empathy with these frozen people to get them to change. I should stop calling them “frozen people” and treating them like “resources,” and think of them as people. That is all true, but what do I do after I’ve cavorted with them in empathy splash-pads for months on end and we still can’t get them to configure the firewall so we can do the DevOps? What do you do with enterprise architects who have become specialists in telling you how impossibly fucked up everything is? How do you work with all those “frozen middle-managers” who are acting as a “shit umbrella” (or maybe in this case, a “rainbow umbrella”?) to all your fanciful DevOps notions?

There’s one last ditch tactic: money and firing. When it comes to changing the culture in a large orginizations, this is the big hammer that upper-level management has. That is, the middle-manager’s manger is the one who has to tools to fix the problem. By design, the surest, easiest, sanctioned way to change behavior is to change cash pay-outs and other compensation based on not only the performance of people (management and staff), but their willingness to adapt and change to new, better ways of operating. That is: you reward and punish people based on them doing what you told them to do, or not.

There’s all sorts of “little and cheap” things to do instead of bringing out the big hammer:

  • Giving out acrylic trophies, showering people in attention from high-level execs
  • Fame and glory internally and externally
  • T-shirts (no, really! They work amazingly well)
  • Sanctioning goofing off (“we may have a 15 day holiday policy, but, you know, you should leave work at 4 every day and don’t log vacation”)

Matt Curry covers most of these management-hacks in a great talk on changing large organizations. But let’s assume those aren’t working.

At this point, “leadership” (middle-management’s management) needs to reward and punish the unfreezing or freezing. If middle-management changes (“unfreezes”), they get paid more. If they don’t unfreeze at least their pay needs to freeze, and at “best” they need to get quarantined or fired (as a tone-note, back when I was young and the idea of getting “laid-off” or “RIFed” started being used I reprogrammed myself to only ever say “fired” in reference to loosing your job: those soft, BS words obscure the fact that you’ve been, well, fired – so, feel free to s/// whatever more HR-correct phrase you like into there).

If you look at the patterns of change in most companies going cloud native, they do exactly this. Organizations that successful change so that they can start doing software better set up separate “labs” where the “digital transformation” happens. These are often logically and physically separate: they often have new names and are often “off-site” or well hidden, in the skunk-works style. They do this because if they stay mentally and physically in the “old” system, they end up getting frozen simply due to proximity. The long-term, managerial strategy, here, is an application of the strangler pattern to the “legacy” organization: eventually, the new “labs” organization, through value to the business and sheer size, becomes the “normal” organization.

The staff process they use – almost as a side-effect! – is to only move over people who are willing to change. I’d argue that “skills” and aptitude have little to do with the selection of people: any person in IT can learn new ways of typing and right clicking if the company trains them and helps them. We’re all smart, here.

What happens is that you cull the frozen people, either leaving them behind in the old organization (where there’s likely plenty of work to be done keeping the legacy systems alive!), or just outright firing them. That’s all brutal and not in the rainbows and sandals spirit of DevOps, but it’s what I see happening out there to successfully change the fate of large organizations’ IT departments.

You don’t really hear about this in cheery keynotes at conferences, but at dark bars and (I shit you not) in quiet parking garages I’ve talked with several executives who paint out this unfreezing-by-attrition process as something that’s worked in their organization. One of them even said “we should have gotten rid of more people.”

This pattern gets more difficult in highly labor-regulated industries – like government work and government contractors – but setting up brand new organizations to shed the frozen old ways is at least something to try.

And, barring all else, as I like to say, if nothing is working, and people won’t actually change, Pivotal is always hiring.

If that’s too grim, don’t despair! My buddies have a much light-hearted, hopefully whack at the problem in a recent discussion of ours: