Anywhere there is lack of speed, there is massive business vulnerability:
Speed to deliver a product or service to customers.
Speed to perform maintenance on critical path equipment.
Speed to bring new products and services to market.
Speed to grow new businesses.
Speed to evaluate and incubate new ideas.
Speed to learn from failures.
Speed to identify and understand customers.
Speed to recognize and fix defects.
Speed to recognize and replace business models that are remnants of the past.
Speed to experiment and bring about new business models.
Speed to learn, experiment, and leverage new technologies.
Speed to solve customer problems and prevent reoccurrence.
Speed to communicate with customers and restore outages.
Speed of our website and mobile app.
Speed of our back-office systems.
Speed of answering a customer’s call.
Speed to engage and collaborate within and across teams.
Speed to effectively hire and onboard.
Speed to deal with human or system performance problems.
Speed to recognize and remove constructs from the past that are no longer effective.
Speed to know what to do.
Speed to get work done.
— John Mitchell, Duke Energy.
When enterprises need to change urgently, in most cases, The Problem is with the organization, the system in place. Individuals, like technology, are highly adaptable and can change. They’re both silly putty that wiggle into the cracks as needed. It’s the organization that’s obstinate and calcified.
How the organization works, it’s architecture, is the totally the responsibility of the leadership team. That ream owns it just like a product team owns their software. Leadership’s job is to make sure the organization is healthy, thriving, and capable.
DevOps’ great contribution to IT is treating culture as programmable. How your people work is as agile and programmable as the software. Executives, management, and enterprise architects — leadership — are product managers, programmers, and designers. The organization is their product. They pay attention to their customers — the product teams and the platform engineers — and do everything possible to get the best outcomes, to make the product, the organization, as productive and well designed as possible.
I’ve tried to collect together what’s worked for numerous organizations going through — again, even at the end, gird your brain-loins, and pardon me here — digital transformation. Of course, as in all of life, the generalized version of Orwell’s 6th rule applies: “break any of these rules rather than doing anything barbarous.
As you discover new, better ways of doing software I’d ask you to share those learnings a widely as possible, especially outside of your organization. There’s very little written on the topic of how regular, large organization managing the transformation to becoming software-driven enterprises.
Know that if your organization is dysfunctional, is always late and over budget, that it’s your fault. Your staff may be grumpy, may seem under-skilled, and your existing infrastructure and application may be pulling you down like a black-hole. All of that is your product: you own it.
As I recall, a conclusion is supposed to be inspirational instead of a downer. So, here you go. You have the power to fix it. Hurry up and get to work.
We had assumed that alignment would occur naturally because teams would view things from an enterprise-wide perspective rather than solely through the lens of their own team. But we’ve learned that this only happens in a mature organization, which we’re still in the process of becoming. — Ron van Kemenade, ING.
The enterprise architect’s role in all of this deserves some special attention. Traditionally, in most large organizations, enterprise architects define the governance and shared technologies. They also enforce these practices, often through approval processes and review boards. An enterprise architect (EA) is seldom held in high regard by developers in traditional organizations. Teams (too) often see EAs as “enterprise astronauts,” behind on current technology and methodology, meddling too much in day-to-day decisions, sucking up time with change-advisory boards (CABs), and forever working on work that’s irrelevant to “the real work” done in product teams.
If cruel, this sentiment often has truth to it. “If I’m doing 8 or 15 releases a week,” HCSC’s Mark Ardito says, “how am I going to get through all those CABs?” While traditional EAs may do “almost nothing” of value for high performing organizations, the role does play a significant part in cloud native leadership.
First, and foremost, EAs are part of leadership, acting something like the engineer to the product manager on the leadership team. An EA should intimately know the current and historic state of the IT department, and also should have a firm grasp on the actual business IT supports.
While EAs are made fun of for ever defining their enterprise architecture diagrams, that work is a side-effect of meticulously keeping up with the various applications, services, systems and dependencies in the organization. Keeping those diagrams up-to-date is a hopeless task, but the EAs who make them at least have some knowledge of your existing spaghetti of interdependent systems. As you clean-up this bowl of noodles, EAs will have more insights into the overall system. Indeed, tidying up that wreckage is an under appreciate task.
The EA’s dirty hands
I like to think of the work EAs do as gardening the overall organization. This contrasts with the more tops-down idea of defining and governing the organization, down to technologies and frameworks used by each team. Let’s look at some an EAs gardening tasks.
Setting technology & methodology defaults
Even if you take an extreme, developer friendly position, saying that you’re not going to govern what’s inside each application, there are still numerous points of governance about how the application is packaged, deployed, how it interfaces and integrates with other applications and services, how it should be instrumented to be managed, and so on. In large organizations, EAs should play a large role in setting these “defaults.” There may be reasons to deviate, but they’re the prescribed starting points.
I think that it’s important that as you’re doing this you do have to have some standards about providing a tap, or an interface, or something to be able to hook anything you’re building into a broader analytics ecosystem called a data-lake — or whatever you want to call it — that at least allows me to get at your data. It’s not you know, like “hey I wrote this thing using a gRPC and golang and you can’t get at my data!” No you got to have something where people can get at it, at the very least.
Beyond software, EAs can also set the defaults for the organization’s meatware, all the process, methodology, and other “code” that actual people execute. Before Home Depot started standardizing their process, Tony McCully says, “everyone was trying to be agile and there was this very disjointed fragmented sort of approach to it… You know I joke that we know we had 40 scrum teams and we were doing it 25 different ways.” Clearly, this is not ideal, and standardizing how your product teams operate is better.
First, someone has to define all the applications and services that all those product teams form around. At a small scale, the teams themselves can do this, but as you scale up to 1,000’s of people and 100’s of teams, gathering together aStar Wars scale Galactic Senate is folly. EAs are well suited to define the teams, often using domain-driven design (DDD) to first find and then form the “domains” that define each team. A DDD analysis can turn quickly into its own crazy wall of boxes and arrows, of course. Hopefully, EAs can keep the lines as helpfully straight as possible.
Rather than checking in on how each team is operating, EAs should generally focus on the outcomes these teams have. Following the rule of team autonomy (described elsewhere in this booklet), EAs should regularly check on each team’s outcomes to determine any modifications needed to the team structures. If things are going well, whatever’s going on inside that black box must be working. Otherwise, the team might need help, or you might need to create new teams to keep the focus small enough to be effective.
Most cloud native architectures use microservices, hopefully, to safely remove dependencies that can deadlock each team’s progress as they wait for a service to update. At scale, it’s worth defining how microservices work as well, for example: are they event based, how is data passed between different services, how should service failure be handled, and how are services versioned?
Again, a senate of product teams can work at a small scale, but not on the galactic scale. EAs clearly have a role in establishing the guidance for how microservices are done and what type of policy is followed. As ever, this policy shouldn’t be a straight-jacket. The era of SOA and ESBs has left the industry suspicious of EAs defining services. Those systems became cumbersome and slow moving, not to mention expensive in both time and software licensing. We’ll see if microservices avoid that fate, but keeping the overall system light-weight and nimble is clearly a gardening that EAs are well suited for.
As we’ll discuss later, at the center of every cloud native organization is a platform. This platform standardizes and centralizes the runtime environment, how software is packaged and deployed, how it’s managed in production, and otherwise removes all the toil and sloppiness from traditional, bespoke enterprise application stacks. Most of the platform cases studies I’ve been using, for example, are from organizations using Pivotal Cloud Foundry.
Occasionally, EAs become the product managers for these platforms. The platform embodies the organization’s actual enterprise architecture and evolving the platform, thus, evolves the architecture. Just as each product team orients their weekly software releases around helping their customers and users, the platform operations team runs the platform as a product.
EAs might also get involved with the tools groups that provide the build pipeline and other shared services and tools. Again, these tools embody part of the overall enterprise architecture, more of the running cogs behind all those boxes and arrows.
As a side-effect of product managing the platform and tools, EAs can establish and enforce governance. The packaging, integration, runtime, and other “opinions” expressed in the platform can be crafted to force policy compliance. That’s a command-and-control way of putting it, and you certainly don’t want your platform to be restrictive. Instead, by implementing the best possible service or tool, you’re getting product teams to follow policy and best practices by bribing them with easy of use and toil-reduction.
It’s the same as always
I’ve highlighted just three areas EA contribute to in a cloud native organization. There are more, many of which will depend on the peccadilloes of your organization, for example:
Identifying and solving sticky cultural change issues is one such, situational topic. EAs will often know individual’s histories and motivations, giving them insights into how to deal with grumps that want to stall change.
EA groups are well positioned to track, test, and recommend new technologies and methodologies. This can become an “enterprise astronaut” task of being too far afield of actual needs and not understanding what teams need day-to-day, of course. But, coupled with being a product manager for the organizations’ platform, scouting out new technologies can be grounded in reality.
EAs are well positioned to negotiate with external stakeholders and blockers. For example, as covered later, auditors often end-up liking the new, small batch and platform-driven approach to software because it affords more control and consistency. Someone has to work with the auditors to demonstrate this and be prepared to attend endless meetings that product team members are ill-suited and ill-tempered for.
What I’ve found is that EAs do what they’ve always done. But, as with other roles, EAs are now equipped with better process and technology to do their jobs. They don’t have to be forever struggling eyes in the sky and can actually get to the job of architecting, refactoring, and programming the enterprise architecture. Done well, this architecture becomes a key asset for the organization, often the key asset of IT.
The CIO is the enterprise architect and arbitrates the quality of the IT systems in the sense that they promote agility in the future. The systems could be filled with technical debt but, at any given moment, the sum of all the IT systems is an asset and has value in what it enables the company to do in the future. The value is not just in the architecture but also in the people and the processes. It’s an intangible asset that determines the company’s future revenues and costs and the CIO is responsible for ensuring the performance of that asset in the future.
Hopefully the idea of architecting and then actually creating and gardening that enterprise asset is attractive to EAs. In most cases, it is. Like all technical people, they pine for the days when they actually wrote software. Now’s their chance to get back to it.
In banking, you don’t often get a clean slate like you would at some of the new tech companies. To transform banking, you not only need to be equipped with the latest technology skills, you also need to transform the culture and skill sets of existing teams, and deal with legacy infrastructure. — Siew Choo Soh, DBS Bank
Most organizations have a damaging mismatch between the culture of service management and the strategic need to become a product organization. In a product culture, you need the team to take on more responsibility, essentially all of the responsibility, for the full life cycle of the product. Week-to-week they need to experiment with new features and interpret feedback from users. In short, they need to become innovators.
Service delivery cultures, in contrast, tend more towards a culture of following up-front specification, process, and verification. Too often when put into practice, IT Service Management (ITSM) becomes a governance bureaucracy that drives project decision. This governance-driven culture tends to be much slower at releasing software than a product culture.
[A] key goal for DevOps teams is the establishment of a high cadence of trusted, incremental production releases. The CAB meeting is often seen as the antithesis of this: a cumbersome and infrequent process, sucking a large number of people into a room to discuss whether a change is allowed to go ahead in a week or two, without in reality doing much to ensure the safe implementation of that change.
Mainstream organizational management work has helpful definitions of culture: “Culture can be seen in the norms and values that characterize a group or organization,” O’Reilly and Tushman write, “that is, organizational culture is a system of shared values and norms that define appropriate attitudes and behaviors for its members.”
Jez Humble points out another definition, from Edgar Schein:
[Culture is] a pattern of shared tacit assumptions that was learned by a group as it solved its problems of external adaptation and internal integration, that has worked well enough to be considered valid and, therefore, to be taught to new members as the correct way to perceive, think, and feel in relation to those problems.
We should take “culture,” then, to mean the mindset used by people in the organization to make day-to-day decisions, policy, and best practices. I’m as guilty as anyone else for dismissing “culture” as simple, hollow acts like allowing dogs under desks and ensuring that there’s six different ways to make coffee in the office. Beyond trivial pot-shots, paying attention to culture is important because it drives of how people work and, therefore, the business outcomes they achieve.
Year after year, the DevOps reports show that “high performing” organizations are much more generative than pathologically…as you would suspect from the less than rosy words chosen to describe “power-oriented” cultures. It’s easy to identify your organization as pathological and equally easy to realize that’s unhelpful. Moving from the bureaucratic column to the generative column, however, is where most IT organizations struggle.
Core values of product culture
There are two layers of product culture, at least that I’ve seen over the years and boiled down. The first layer describes the attitudes of product people, the second the management tactics you put in place to get them to thrive.
Risk takers — I don’t like this term much, but it means something very helpful and precise in the corporate world, namely, that people are willing to do something that has a high chance of failing. The side that isn’t covered enough is that they’re also focused on safety. “Don’t surf if you can’t swim,” as Andrew Clay Shafer summed it up. Risk takers ensure they know how to “swim” and they build safety nets into their process. They follow a disciplined approach that minimizes the negative consequences of failure. The small batch process, for example, with its focus on a small unit of work (a minimal amount of damage if things go wrong and an easier time diagnosing what caused the error) and studying the results, good and bad, creates a safe, disciplined method for taking risks.
People focused — products are meant to be used by people, whether as “customers” or “employees.” The point of everything I’m discussing here is to make software that better helps people, be that delivering a product the like using or one that allows them to be productive, getting banking done as quickly as possible so they can get back to living their life, to lengthen DBS Bank’s vision. Focusing on people, then, is what’s needed. Too often, some people are focused on process and original thinking, sticking to those precepts even if they prove to be ineffective. People-focused staff will instead be pragmatic, looking to observe how their software is helping or hindering the people we call “users.” They’ll focus on making people’s lives better, not achieving process excellence, making schedules and dates, or filling out request tickets correctly.
Finding people like this can seem like winning the lottery. Product-focused people certainly are hard to find and valuable, but they’re a lot less rare than you’d think. More importantly, you can create them by putting the right kind of management policy and nudges in place. A famous quip by Adrian Cockcroft then at Netflix, now at Amazon) illustrates this. As he recounts:
[A]t a CIO summit I got the comment “we don’t have these Netflix superstar engineers to do the things you’re talking about”, and when I looked around the room at the company names my response was “we hired them from you and got out of their way.”
There is no talent shortage, just shortage of management imagination and gumption. As most recently described in the 2018 DORA DevOps report, over and over again, research finds that the following gumptions give you the best shot at creating a thriving, product-centric culture: autonomy, trust, and voice. Each of these three support and feed into each other as we’ll see.
People who’re told exactly what to do tend not to innovate. Their job is not to think of new ways to solve problems more efficiently and quickly, or solve them at all. Instead, their job is to follow the instructions. This works extremely well when you’re building IKEA furniture, but following instructions is a port fit when the problem set is unknown, when you don’t even know if you know that you don’t know.
Your people and the product teams need to autonomy to study their users, theorize how to solve their problems, and fail their way to success. Pour on too much command-and-control, and they’ll do exactly what you don’t want: they’ll follow your orders perfectly. A large part of a product-centric organization’s ability to innovate is admitting that people closest to the users — the product team — are the most informed about what features to put into the software and even what the user’s problems are. You, the manager, should be overseeing multiple teams and supporting them by working with the rest of the organization. You’ll lack the intimate, day-to-day knowledge of the users and their problems. Just as a the business analysts and architects in a waterfall process are too distant from the actual work, you will be too and will make the same errors.
Establishing and communicating goals, but letting the team decide how the work will be done.
Removing roadblocks by keeping rules simple.
Allowing the team to change rules if the rules are obstacles to achieving the goals.
Letting the team prioritize good outcomes for customers, even if it means bending the rules.
This list is a good start. As ever, apply a small batch mentality to how you’re managing this change and adapt according to your findings.
There are some direct governance and technology changes needed to give teams this autonomy. The product teams need a platform and production tools that allow them to actually manage the full-life cycle of their product. “[I]f you say to your team that ‘when you build it you also run it,’” Rabobanks’ Vincent Oostindië says, “you cannot do that with a consolidated environment. You cannot say to a team ‘you own that stuff, and by the way somebody else can also break it.’”
Taking risks, suggesting new features, resolving problems in production, and otherwise innovating in software requires a great deal of trust, both from management and of management. The DORA report defines trust, in this context as “how much a person believes their leader or manager is honest, has good motives and intentions, and treats them fairly.”
To succeed at digital transformation, the people in the product teams must trust management. Changing from a services-driven organization of a product organization requires a great deal of upheaval and discomfort. Staff are being asked to behave much differently than they’ve been told to in the past. The new organization can seem threatening to careers. People will gripe and complain, casting doubt on success. Management needs to first demonstrate that their desire to change can be trusted. Doing things like celebrating failures, rewarding people for using the new methods, and spending money on the trappings of the new organization (like free breakfast or training) will demonstrate management commitments.
Just as staff must trust management, managers must trust the product teams to be responsible and independent. This means managers can’t constantly check in on and meddle in the day-to-day affairs of product teams. Successful managers will find it all too tempting to get their hands dirty and volunteer to help out with problems. Getting too involved on a day-to-day basis is likely to hurt more than help, however.
Felten Buma uses Finding Nemo as a metaphor for the trust managers must have in their product teams…if you’ll pardon a cartoon reference in this book. Nemo’s father, Marlin, is constantly worried about and micromanaging his son, having been shocked by the death of his wife, Nemo’s mother. They’re fish as you might recall, so his mother was eaten one day. Not only that, but Nemo has a weak flipped on one side. Overall, this means Nemo’s father is a helicopter parent, but is also forever telling Nemo that he’s not skilled enough can’t do risky things, like swimming beyond the reef. While most leaders haven’t experienced the loss of one of their parents from fish’s meal-making, they’ve likely experienced some disasters in the past that could make them helicopter managers, always looking to “help” staff with advice about what works and doesn’t work. As in the movie, until that manager actually trusts the product team and demonstrates that trust by backing off, the product teams will lack the full moral and self-trust needed to perform well.
Buma suggests an exercise to help transform helicopter managers. In a closed meeting of managers, ask them to each share one of their recent corporate failures. Whether or not you discuss how it was fixed is immaterial to the exercise, the point is to have the managers practice being vulnerable and then show them that their career doesn’t end. Then, to practice giving up control, ask them to deligrate an important task of theirs to someone else. Buma says that surprisingly, most managers find these two tasks very hard and some outright reject it. Those managers who can go through these two exercises are likely mentally prepared to be good, transformational leaders.
The third leg of transformative leadership is giving product teams voice. Once teams trust management and start acting more autonomously, they’ll need to have the freedom to speak up and suggest ways to improve not only the product, but the way they work. A muzzled product team is much less valuable than one that can speak freely. As the DORA report defines it:
Voice is how strongly someone feels about their ability and their team’s ability to speak up, especially during conflict — for example, when team members disagree, when there are system failures or risks, and when suggesting ideas to improve their work.
Put another way, you don’t want people to be “courageous.” Instead, you want open discussions of failure and how to improve to be common and ordinary, “boring,” not “brave.” The opposite of giving your team’s voice is suppressing their suggestions, dismissing them, and explaining why such thinking is dangerous or “won’t work here.” Traditional managers tend to be deeply offended when “their” staff speaks to the rest of the organization independently, when they “go around” their direct line managers. This kind of thinking is a good indication that the team lacks true voice. While it’s certain more courteous to involve your manager in such discussions, management should trust teams to be autonomous enough to do the right thing.
In an organization like the US Air Force, where you literally have to ask permission to “speak freely,” giving product teams voice can seem impossible. To solve this problem, the Kessel Run team devised a relatively simple fix: they asked the airmen and women to wear civilian clothes when they were working on their products. Without the explicit reminder of rank that a uniform and insignia enforces, team members found it easier to talk freely with each other, regardless of rank. Of course, managers also explicitly told and encouraged this behavior. Other organizations like Allstate have used this same sartorial trick, encouraging managers to change from button-up shirts and suits to t-shirts and hoodies instead. Dress can be surprisingly key for changing culture. As a Nissan factory manager put it, “[i]f I go out to the plant in a $400 suit and tie, people don’t talk to me so freely.”
Managing ongoing culture change
Improving culture is a never ending process. Pivotal, for example has created an excellent, beloved culture over the past 25 years but is still constantly monitoring and improving it. And while I might sigh at yet another employee survey to fill out, the company has demonstrated that it actually listens and changes. This is very rare for any company and it shows how much work is needed to maintain a good culture.
Employee surveys are a good way to monitor progress. You should experiment with what to put in these surveys, and even other means of getting feedback on your organization’s culture. Dick’s Sporting Goods narrowed down to ENPS as small and efficient metric. Longer term, Dick’s Jason Williams says that they’ve seen some former employees come back to their team, another good piece of feedback for how well you’re managing your organization’s cultural change.
How you react to these surveys and feedback is even more important than gathering the feedback. Just as you expect your product teams to go through a small batch process, reacting to feedback from users, you should cycle through organizational improvement theories, paying close attention to the feedback you get from surveys and other means.
The ultimate feedback, of course, will be if you achieve the business goals derived from your strategy. But, you need to make sure that success isn’t at the cost of incurring cultural debt that will come due in the future. This debt often comes due in the form of stressed out staff leaving or, worse, going silent and no longer telling you about what they’re learning from failures. Then you’re back in the same situation you were trying to escape from all this digital transformation, an organization that’s scared and static, rather than savvy and successful.
When you’re changing, you need to know what you’re changing to. It’s also handy to know how you’re going to change, and, equally, how you’re not going to change. In organizations, vision and strategy are the tools management uses to define why and how change happens.
Use vision to set your goals and inspiration
“Vision” can be a bit slippery. Often it means a concise phrase of hope that can actually happen, if only after a lot of work. Andy Zitney’s vision of starting on Monday and shipping on Friday is a classic example of vision. Vision statements are often more than a sentence, but they give the organization a goal and the inspiration needed to get there. Everyone wants to know “why I’m here,” which the vision should do, helping stave off any corporate malaise and complacency.
Vision refers to a picture of the future with some implicit or explicit commentary on why people should strive to create that future. In a change process, a good vision serves three important purposes. First, by clarifying the general direction for change, by saying the corporate equivalent of “we need to be south of here in a few years instead of where we are today,” it simplifies hundreds or thousands of more detailed decisions. Second, it motivates people to take action in the right direction, even if the initial steps are personally painful. Third, it helps coordinate the actions of different people, even thousands and thousands of individuals, in a remarkably fast and efficient way.
Creating and describing this vision is one of the first tasks a leader, and then their team, needs to do. Otherwise, your staff will just keep muddling through yesterday’s success, unsure of what to change, let alone, why to change. In IT, a snappy vision also keeps people focused on the right things instead of focusing on IT for IT’s sake. “Our core competency is ‘fly, fight, win’ in air and space,” says the US Air Force’s Bill Marion, for example, “It is not to run email servers or configure desktop devices.”
The best visions are simple, even quippy sentences. “Live more, bank less” is a great example from DBS Bank. “[W]e believe that our biggest competitors are not the other banks,” DBS’s Siew Choo Soh says. Instead, she continues, competitive threats are coming new financial tech companies “who are increasingly coming into the payment space, as well as the loan space.”
DBS Bank’s leadership believes that focusing on the best customer experience in banking will fend off these competitors and, better, help DBS become one of the leading banks in the world. This isn’t just based on rainbow whimsey, but strategic data: in 2017, 63% of total income and a 72% of profits came from digital customers. Focusing on that customer set and spreading whatever magic brought in that much profit to the “analog customers” is clearly a profitable course of action.
“We believe that we need to reimagine banking to make banking simple, seamless, as well as invisible to allow our customers to live more bank less,” Soh says. A simple vision like that is just the tip the of the iceberg but it can easily be expanded into strategy and specific, detailed actions that will benefit DBS Bank for years to come. Indeedm DBS has already won several awards, including Global Finance Magazine’s best bank in the world for 2018.
Creating an actionable strategy
“Strategy” has many, adorably nuanced and debated definitions. Like enterprise architecture, it’s a term that at first seems easily knowable, but becomes more obtuse as you stare into the abyss. A corporate strategy defines how a company will create, maintain, and grow business value. At the highest level, the strategy is usually increasing investor returns, usually through increasing the company’s stock price (via revenue, profits, or investor’s hopes and dreams thereof), paying out dividends, or engineering the acquisition of the company at a premium. In not-for-profit organizations, “value” often means how effective and efficiently the organization can execute its mission, be that providing clean water, collecting taxes, or defending a country. The pragmatic part of strategy is cataloging the tools the organization has at its disposal to achieve, maintain, and grow that value. More than specifying which tools to use, strategy also says what the company will not do.
People often fail at writing down useful strategy and vision. They want to serve their customers, be the best in their industry, and other such thin bluster. I like to use the check cashing test to start defining an organization’s strategy. Your organization always want to make more money with good profits. Well, check cashing is a profit rich, easy business. You just need a pile of cash and good insurance for when you get robbed. Do you want to cash checks? No? OK, then we know at least one thing you don’t want to do…
How broad or narrow is your product or service offering?
Why should customers prefer your product or service to a competitor’s?
What are the competencies you possess that others can’t easily imitate?
How do you make money in these segments?
Strategy should explain how to deliver on the vision with your organization’s capabilities, new capabilities enabled by technologies, customers needs and jobs to be done, your market, and your competitors. “This is where strategy plays an important role,” Kotter says, “Strategy provides both a logic and a first level of detail to show how a vision can be accomplished.”
A strategy for the next 10 years of growth at Dick’s Sporting Goods
Dick’s Sporting Goods, the largest sporting good retailer in the US, provides a recent example of translating higher level vision and strategy. As described by Jason Williams, over the past 10 years Dick’s rapidly built out its e-commerce and omni-channel capabilities, an enviable feat for any retailer. As always, success created a new set of problems, esp. for IT. It’s worth reading William’s detailed explanation of these challenges:
With this rapid technological growth, we’ve created disconnects in our overall enterprise view. There were a significant number of store technologies that we’ve optimized or added on to support our e-commerce initiatives. We’ve created an overly complex technology landscape with pockets of technical debt, we’ve invested heavily in on premise hardware — in the case of e-commerce you have to plan for double peak, that’s a lot of hardware just for one or two days of peak volume. Naturally, this resulted in a number of redundant services and applications, specifically we have six address verification services that do the same thing. And not just technical issues, we often had individuals and groups that have driven for performance, but it doesn’t align to our corporate strategy. So why did we start this journey? Because of our disconnect in enterprise view, we lack that intense product orientation that a lot of our competitors already had.
These types of “disconnects” and “pockets of technical debt” are universal problems in enterprises. Just as with Dick’s, these problems are usually not the result of negligence and misfeasance, but of the actions needed to achieve and maintain rapid growth.
To clear the way for the next 10 years of success, Dick’s put a new IT strategy in place, represented by 4 pillars:
Product architecture — creating an enterprise architecture based around the business, for example, pricing, catalog, inventory, and other business functions. This focus helps shift from a function and service centric mindset to product-centric mindset.
Modern software development practices — using practices like test-driven development, pairing, CI/CD, lean design, and all the proven, agile best practices.
Software architecture — using a microservices architecture, open source, following 12 factor principles to build cloud native applications on-top of Pivotal Cloud Foundry. This defines how software will be created, reducing the team’s toil so that they can focus on product design and development.
Balanced teams — finally, as Williams describes it, having a unified, product-centric team is the “the most critical part” of Dick’s strategy. The preceding three provider the architectural and infrastructural girding to shift IT from service delivery over to product delivery.
Focusing on these four areas gives staff very clear goals which translate easily into next steps and day-to-day work. Nine months into executing this strategy, Dick’s has achieved tangible success: they’ve created 31 product teams, increased developer productivity by 25%, ramped their testing up to 70% coverage, and improved the customer experience by increasing page load time and delivering more features, more frequently.
Keep your strategy agile
Finally, keep your strategy agile. While your vision is likely to remain more stable year to year, how you implement it might need to change. External forces will put pressure on a perfectly sound strategy: new government regulations or laws could change your organization’s needs, Amazon might finally decide to bottom out your market. Figure out a strategy review cycle to check your assumptions and course correct your strategy as needed. That is, apply a small batch approach to strategy.
Organizations usually review and change strategy on an annual basis as part of corporate planning, which is usually little more than a well orchestrated fight between business units for budget. While this is an opportunity to review and adjust strategy, it’s at the whim of finance’s schedule and the mercurial tactics of other business units.
Annual planning is also an unhelpfully waterfall-centric process, as pointed out by Mark Schwartz in The Art of Business Value. “The investment decision is fixed,” he writes, but “the product owner or other decision-maker then works with that investment and takes advantage of learnings to make the best use possible of the investment within the scope of the program. We learn on the scale of single requirements, but make investment decisions on the scale of programs or investment themes — thus the impedance mismatch.”
A product approach doesn’t thrive in that annual, fixed mindset. Do at least an additional strategy review each year, and many more in the first few years as you’re learning about your customers and product with each release. Don’t let your strategy get hobbled by the fetters of the annual planning and budget cycle.
By now, the reasons to improve how your organization does software are painfully obvious. Countless executives feel this urgency in their bones, and have been saying so for years:
“There’s going to be more change in the next five to ten years than there’s been in the last 50” — Mary Barra, CEO, GM
Intuitively, we know that business cycles are now incredibly fast: old companies die out, or are forced to dramatically change, and new companies rise to the top…soon to be knocked down by the new crop of sharp-toothed ankle biters.
Innosight’s third study of companies’ ability to maintain leadership positions estimates that by 2018, 50% of the companies on the S&P 500 will drop off, replaced by competitors and new market entrants. Staying at the top of your market-heap is getting harder and harder.
Profesor Rita McGrath has dubbed this the age of “transient advantage,” which is an apt way of describing how long — not very! — a company can rely on yesterday’s innovations. A traditional approach to corporate strategy is too slow moving, as she says: “[t]he fundamental problem is that deeply ingrained structures and systems designed to extract maximum value from a competitive advantage become a liability when the environment requires instead the capacity to surf through waves of short-lived opportunities.” Instead, organizations must be more agile: “to win in volatile and uncertain environments, executives need to learn how to exploit short-lived opportunities with speed and decisiveness.”
Software defined businesses
“We’re in the technology business. Our product happens to be banking, but largely that’s delivered through technology.” — Brian Porter, CEO, Scotiabank
We’re now solidly in an innovation phase of the business cycle. Organizations must become faster and more agile in strategy formulation, execution, and adaptation to changing markets. Again and again, IT is at the center of how startups enter new markets (often, disruptively) and how existing enterprises retain and grow market-share.
Organizations are seeking to become software defined businesses. In this mode of thinking, custom written software isn’t just a way of “digitizing” analog process (like making still lengthy mortgage applications or insurance claims processes “paperless”), but the mission critical tool for executing and evolving business models.
While software might have played merely a supporting role in the business for so long, successful organizations are casting software as the star. “It’s no longer a business product conversation, it’s a software product that drives the business and drives the market,” McKesson’s Andy Zitney says, later adding, “[i]t’s about the business, but business equals software now.”
Retail is the most obvious example. There’s an anecdote that Home Depot realized how important innovation was to them when they found out that Amazon sold more hammers than Homer. While other retailers languish, Home Depot grew revenue 7.5% year-over-year in Q4 2017. This isn’t solely due to software, but controlling its own software destiny has played a large part. As CIO Matt Carey says of competition from Amazon, “I don’t run their roadmap; I run my roadmap.”
These cases can seem pedestrian compared to self-driving cars and AIs that will (supposedly) create cyber-doctors. However, unlike these gee-whiz technologies, these small changes work incredibly fast and have large impacts.
Organizations often focus on the process, not the software
Most large organizations have massive IT departments, and equally large pools of developers working on software. However, many of these organizations haven’t updated their software practices and technologies for a decade or more. The results are predictable as three years of a Cutter Consortium survey shows. The study found that just 30% of respondents felt that IT helped their business innovate. As the chart below shows, this has fallen from about 50 percent in 2013:
This usefulness gap continues because IT departments are using an old approach to software. IT departments still rely on three-tier architectures, process hardened, dedicated infrastructure “service management” processes, and use functional organizations and long release cycles to (they believe) reliably produce software. I have to assume that this “waterfall” method was highly innovative and better than alternatives at the time…years and years ago.
In trying to be reliable and cost effective, IT departments have become excellent at process, even projects. In the 1990s, IT was in chaos with a shift from mainframes to Unix, then to Linux and Windows Server. On the desktop, the Windows GUI took over, but then the web browser hit mid-decade and added a whole new set of transitions and worries. Oh, and then there was the Internet, and the tail-end of massive ERP stand-ups that were changing core business processes. With all this chaos, IT often failed even on the simplest task like changing a password. Addressing this, the IT community created a school of thought called IT Service Management (ITSM) that sought to understand and codify each thing IT did for the business, conceptualizing those things as “services”: email, supply chain management, CRM, and, yes, changing passwords. Ticket desks were used to manage each process, and project management practices erected to lovingly cradle requests to create and change each IT service.
The result was certainly better than nothing, and much better than chaos. However, the ITSM age too often resulted in calcified IT departments that focused more on doing process perfectly than delivering useful services, that is, “business value.” The paladins of ITSM are quick to say this was never the intention, of course. It’s hard to know who’s the blame, or if we just need Jeffersonian table-flipping every ten years to hard reboot IT. Regardless, the traditional way of running IT is now a problem.
Most militaries, for example, can take anywhere between five to 12 years to roll out a new application. In this time, the nature of warfare can change many times over, a generations of soldiers can churn through the ranks, and the original requirements can change. Release cycles of even a year often result in the paradox of requirements perfection. In the best case scenario, the software you specified a year ago is delivered completely to spec, well tested, and fully function. But now, a year later, new competitor and customer demands nullifies the requirements from 12 months ago: that software is no longer needed.
Stretch this out to ten years, and you can see why the likes of US Air Force are treating transforming their software capabilities as a top priority. As General James “Mike” Holmes, Commander, Air Combat Command put it, “[y]ears of institutional risk aversion have led to the strategic dilemma plaguing us today: replacing our 30- year old fleet on a 30-year timeline.”
It’s easy to dismiss this as government work at its worst, clearly nothing like private industry. I’d challenge you, though, to find a large, multinational enterprise that doesn’t suffer from a similar software malaise. This misalignment is clearly unacceptable. IT needs to drastically change or it risks slowing down their organization’s innovation.
Small Batch Thinking
“If you aren’t embarrassed by the first version of your product, you shipped too late.” — Reid Hoffman, LinkedIn co-founder and former PayPal COO
How is software done right, then? Over the past 20 years, I’ve seen successful organizations use the same, general process: continuously executing small batches of software, over short iterations that put a rapid feedback loop in place. IT organizations that follow this process are delivering a different type of outcome than a set of requirements. They’re giving their organization the ability to adapt and change monthly, weekly, even daily.
By “small batches,” I mean identifying the problem to solve, formulating a theory of how to solve the problem, creating a hypothesis that can prove or disprove the theory, doing the smallest amount of application development and deployment needed to test your hypothesis, deploying the new code to production, observing how users interact with your software, and then using those observations to improve your software. The cycle, of course, repeats itself.
This whole process should take at most a week — hopefully just a day. All of these small batches, of course, add up over time to large pieces of software, but in contrast to a “large batch” approach, each small batch of code that survives the loop has been rigorously validated with actual users. Schools of thought such asLean Startup reduce this practice to helpfully simple sayings like “think, make, check.” Meanwhile, the Observe, Orient, Decide, Act (OODA) loop breaks the cycle down into even more precision. However you label and chart the small batch cycle, make sure you’re following a hypothesis driven cycle instead of assuming up-front that you know what how your software should be implemented.
As Liberty Mutual’s’ Chris Bartlow says, “document this hypothesis right because if you are disciplined in doing that you actually can have a more measurable outcome at the end where you can determine was my experiment successful or not.” This discipline gives you a tremendous amount of insight into decisions about the your software — features to add, remove, or modify. A small batch process gives you a much richer, fact-based ability to drive decisions.
“When you get to the stoplight on the circle [the end of a small batch loop] and you’re ready to make a decision on whether or not you want to continue, or whether or not you want to abandon the project, or experiment [more], or whether you want to pivot, I think [being hypothesis driven] gives you something to look back on and say, ‘okay, did my hypothesis come true at all,” Bartlow says, “is it right on or is it just not true at all?”
Long-term, you more easily avoid getting stuck in the “that’s the way we’ve always done it” lazy river current. The record of your experiments will also serve as an excellent report of your progress, even something auditors will cherish once you explain that log to them. These well-documented and tracked records are also your ongoing design history that you rely on to improve your software. The log helps makes even your failures valuable because you’ve proven something that does not work and, thus, should be avoided in the future. You avoid the cost and risk of repeating bad decisions.
This is the realm of multi-year projects that either underwhelm or are consistently late. As one manager at a large organization put it, “[w]e did an analysis of hundreds of projects over a multi-year period. The ones that delivered in less than a quarter succeeded about 80 percent of the time while the ones that lasted more than a year failed at about the same rate.”
No stranger to lengthy projects with, big, up-front analysis, the US Air Force is starting to think in terms of small batches for its software as well. “A [waterfall] mistake could cost $100 million, likely ending the career of anyone associated with that decision. A smaller mistake is less often a career-ender and thus encourages smart and informed risk-taking,” said M. Wes Haga.
Shift to user-centric design
If a small batch approach is the tool your organization now wields, a user-centric approach to software design is the ongoing activity you enable. There’s little new about taking a user-centric approach to software. What’s different is how much more efficient and fast creating good user experience and design is done thanks to highly networked applications and cloud-automated platforms.
When software was used exclusively behind the firewall and off networks as desktop applications, software creators had no idea how their software was being used. Well, they knew when there were errors because users fumed about bugs. Users never reported how well things were going when everything was working as planned. Worse, users didn’t report when things were just barely good enough and could be improved. This meant that software teams had very little input into what was actually working well in their software. They were left to, more or less, just make it up as they went along.
This feedback deficit was accompanied by slow release cycles. The complex, costly infrastructure used required a persnickety process of hardware planning, staging, release planning, and more operations work before deploying to production. Even the developers’ environments, needed to start any work, often took months to provision. Resources were scarce and expensive, and the lack of comprehensive automation across compute, storage, networking, and overall configuration required much slow, manual work.
The result of these two forces was, in retrospect, a dark age of software design. Starting in the mid-2000s, the ubiquity of always-on users and cloud automation removed these two hurdles.
Because applications were always hooked up to the network, it was now possible to observe every single interaction between a user and the software. For example, a 2009 Microsoft study found that only about one third of features added to the web properties achieved the team’s original goals — that is, were useful and considered successful. If you can quickly know which features are effective and ineffective, you can more rapidly improve your software, even eliminating bloat and the costs associated with unused, but expensive to support code.
By 2007, it was well understood that cloud automation dramatically reduced the amount of manual work needed to deploy software. The problem was evenly distributing those benefits beyond Silicon Valley and companies unfettered by the slow cycles of large enterprise. Just over 10 years later, we’re finally seeing cloud efficiencies spreading widely through enterprises. For example, Comcast realized a 75 percent lift in velocity and time to market when they used a cloud platform to automated their software delivery pipeline and production environment.
When you can gather, and thus, analyze all user interactions as well as deploy new releases at will, you can finally put a small batch cycle in place. And. this, you can create better user interaction and product design. And as we’ve seen in the past ten years, well designed products handily win out and bring in large profits.
Good design is worth spending time on. As Forrester consistently finds, organizations that focus on design tend to perform better financially than those that don’t. As such, design can be a highly effective competitive tool. Looking at the relationship between good design and revenue growth, Forrester found that organizations that focus on better design have a 14% lead on those that don’t. For example, “in two industries, cable and retail, leaders outperformed laggards by 24 percentage and 26 percentage points, respectively.”
I haven’t done a great job at describing what exactly good design looks like, let alone what the day-to-day work is. Let’s next look at simple case study with clear business results as an example.
The IRS historically used call centers to provide basic account information and tax payment services. Call centers are expensive and error prone: one study found that only 37% of calls were answered. Over 60% of people calling the IRS for help were simply hung-up on! With the need to continually control costs and deliver good service, the IRS had to do something.
In the consumer space, solving this type of account management problem has long been taken care of. It’s pretty easy in fact; just think of all the online banking systems you use and how you pay your monthly phone bills. But at the IRS, viewing your transactions had yet to be digitized.
When putting software around this, the IRS first thought that they should show you your complete history with the IRS, all of your transactions, as seen in the before UI example above. This confused users and most of them still wanted to pick up the phone. Think about what a perfect failure that is: the software worked exactly as designed and intended, it was just the wrong way to solve the problem.
Thankfully, because the IRS was following a small batch process, they caught this very quickly, and iterated through different hypotheses of how to solve the problem until they hit on a simple finding: when people want to know how much money they owe the IRS, they only want to know how much money they owe the IRS. When this version of the software was tested, most people didn’t want to use the phone.
Now, if the IRS was on a traditional 12 to 18 months cycle (or longer!) think of how poorly this would have gone, the business case would have failed, and you would probably have a dim view of IT and the IRS. But, by thinking about software in an agile, small batch way, the IRS did the right thing, not only saving money, but also solving people’s actual problems.
This project has great results: after some onerous up-front red-tape transformation, the IRS put an app in place which allows people to look up their account information, check payments due, and pay them. As of October 2017, there have been over 2 million users and the app has processed over $440m in payments. Clearly, a small batch success.
Create business agility with small batches
A small batch approach delivers value very early in the process with incremental releases of feature to production. This contrasts to a large batch approach which waits until the very end to (attempt to) deliver all of the value in one big lump. Of course, delivering early doesn’t delivering 1 year’s worth of work in one week. Instead, it means delivering just enough software to validate your design with user feedback.
Delivering early also allows you to prioritize your backlog, the list of requirements to implement. Organizations delivering weekly often find that a feature has been implemented “enough” and further development on the feature can be skipped. For example, to give people their hotel stay invoice, just allowing them to print a stripped down webpage might suffice instead of writing the code that creates and downloads a PDF. Once further development on that feature is de-prioritised, the team can decided to bring a new feature to the top of the backlog, likely ahead of schedule. This flexibility in priorities is one of the core of reasons agile software delivery makes business more agile and innovative.
Done properly a small batch approach also gives you a steady, reliable release train. This concept means that each week, your product teams will deliver roughly the same amount of “value” to production. Here, “value” means whatever changes they make to the software in production: typically, this is adding code that creates new features of modifies existing ones, but it could also be performance, security improvements, patches that ensure the software runs properly.
A functioning small batch process, then, gives you business agility and predictability. Trying out multiple ideas is now much cheaper, one of the keys to innovating new products and business models. The traditional, larger batch approach often requires millions of dollars in budget, driving the need for high-level approval, driving the need…to wait for the endless round of meetings and finance decisions, often on the annual budget cycle. This too often killed off ideas, as Allstate’s Opal Perry explains: “by the time you got permission, ideas died.” But with an MVP approach, as she contrasts, “a senior manager has $50,000 or $100,000 to do a minimum viable product” and can explore more ideas.
Case study: the lineworker knows best at Duke Energy
The team working on this went further than just trusting the VP’s first instincts, doing some field research with the actual line-workers. After getting to know the line-workers, they discovered a solution that redinfed the business problem. While the VP’s map would be a fine dashboard and give more information to central office, what really helped was developing a job assignment application for line-workers. This app would let line-workers locate their peers to, for example, partner with them on larger jobs, and avoid showing up at the same job. The app also introduced an Uber-like queue of work where line-workers could self-select which job to do next.
In retrospect this change seems obvious, but it’s only because the business paid attention to the feedback loop and user research and then reprioritized their software plans accordingly.
Transforming is easy…right?
Putting small batch thinking in place is no easy task: how long would it take you, currently, to deploy a single line of code, from a whiteboard to actually running in production? If you’re like most people, following the official process, it’d take weeks — just getting on the change review board’s schedule would take a week or more, and hopefully key approvers aren’t on vacation. This single-line of code thought experiment will start to flesh out what you need to do — rather, fix — to switch over to doing small batches.
Transforming one team, one piece of software isn’t easy, but it’s often very possible. Improving two applications usually works. How do you go about switching 10 applications over to a small batch process? How about 500?
Supporting hundreds of applications and teams — plus the backing services that support these applications — is a horse of a different color, rather, a drove of horses of many different colors. There’s no comprehensive manual for doing small batches at large scale, but in recent years several large organizations have been stampeding through the thicket. Thankfully, many of them have shared their success, failures, and, most importantly, lessons learned. We’ll look at their learnings next, with an eye, of course, at taming your organization’s big batch bucking.
Speed is the currency of business today and speed is the common attribute that differentiates companies and industries going forward. Anywhere there is lack of speed, there is massive business vulnerability:
● Speed to deliver a product or service to customers.
● Speed to perform maintenance on critical path equipment.
● Speed to bring new products and services to market.
● Speed to grow new businesses.
● Speed to evaluate and incubate new ideas.
● Speed to learn from failures.
● Speed to identify and understand customers.
● Speed to recognize and fix defects.
● Speed to recognize and replace business models that are remnants of the past.
● Speed to experiment and bring about new business models.
● Speed to learn, experiment, and leverage new technologies.
● Speed to solve customer problems and prevent reoccurrence.
● Speed to communicate with customers and restore outages.
● Speed of our website and mobile app.
● Speed of our back-office systems.
● Speed of answering a customer’s call.
● Speed to engage and collaborate within and across teams.
● Speed to effectively hire and onboard.
● Speed to deal with human or system performance problems.
● Speed to recognize and remove constructs from the past that are no longer effective.
● Speed to know what to do.
● Speed to get work done.
Continuous innovation only works with an enterprise that embraces speed and the data required to measure it. By creating conditions for continuous innovation, we must bring about speed. While this is hard, it has a special quality that makes the job a little easier. Through data, speed is easy to measure.
Innovation, on the other hand, can be extremely difficult to measure. For example, was that great quarterly revenue result from innovation or market factors? Was that product a one hit wonder or result of innovation? How many failures do we accept before producing a hit? These questions are not answerable. But we can always capture speed and measure effects of new actions. For example, we can set compliance expectations on speed and measure those results.
Speed is not only the key measurement, it becomes a driver for disruptive innovation. Business disruption has frequently arisen from startups and new technologies, not seeking optimization, but rather discovering creative ways to rethink problems to address speed. Uber is about speed. Mobile is about speed. IoT is about speed. Google is about speed. Drones are about speed. AirBnB is about speed. Amazon is about speed. Netflix is about speed. Blockchain is about speed. Artificial Intelligence is about speed.
Continuous Innovation then is the result of an enterprise, driven by speed, which is constantly collecting data, developing and evaluating ideas, experimenting and learning, and through creativity and advancing technologies, is constructing new things to address ever evolving customer needs.
Skilled, experienced team members are obviously valuable and can temper the risk failure by quickly delivering software. Everyone would like the mythical 10x developer, and would even settle for a 3 to 4x “full stack developer.” Surely, management often thinks, doing something as earth-shattering as “digital transformation” only works with highly skilled developers. You see this in surveys all the time: people say that lack of skills is a popular barrier to improving their organization’s software capabilities.
This mindset is one of the first barriers to scaling change. Often, an initial, team of “rockstars” has initial success, but attempts to clone them predictably fails and scaling up change is stymied. It’s that “lack of skills” chimera again. It’s impossible to replicate these people, and companies rarely want to spend the time and money to actually train existing staff.
Worse, when you use the only ninjas need apply tactic, the rest of the organization loses faith that they could change as well. “When your project is successful,” Jon Osborn explains, “and they look at your team, and they see a whole bunch of rockstars on it, then the excuse comes out, ‘well, you took all the top developers, of course you were successful.’”
Instead of only recruiting elite developers, also staff your initial teams with a meaningful dose of normals. This will not only help win over the rest of the organization as you scale, but also means you can actually find people. A team with mixed skill levels also allows you train your “junior” people on the job, especially when they pair with your so called “rockstars.”
Rockstars known to destroy hotel rooms
I met a programmer with 10x productivity once. He was a senior person and required 10 programmers to clean up his brilliant changes. –Anonymous on the c2 wiki
Usually what people find, of course, is that this rockstar/normal distinction is situational and the result of a culture that awards the lone wolf hero instead of staff that helps and supports each other. Those mythical 10x developers are lauded because of a visual cycle of their own creation. At some point, they spaghetti coded out some a complicated and crucial part of the system “over the weekend,” saving the project. Once in production, strange things started happening to that piece of code, and of course our hero was the only one who could debug the code, once again, over the weekend. This cycle repeats itself, and we laud this weekend coder, never realizing they’re actually damaging our business.
Relying on these heros, ninjas, rockstars, or what have you is a poor strategy in a large organization. Save the weekend coding for youngsters in Ramen chomping startups that haven’t learned better yet. “Having a team dynamic and team structure that the rest of the organization can see themselves in,” Osborn goes on to say, “goes a long way towards generating a buy in that you’re actually being successful and not cheating by using all your best resources.”
When possible, recruiting volunteers is the best option for your initial projects, probably for the first year. Forcing people to change how they work is a recipe for failure, esp. at first. You’ll need motivated people who are interested in change or, at least, will go along with it instead of resisting it.
Osborn describes this tactic at Great American Insurance Group: “We used the volunteer model because we wanted excited people who wanted to change, who wanted to be there, and who wanted to do it. I was lucky that we could get people from all over the IT organisation, operations included, on the team… it was a fantastic success for us.”
This might be difficult at first, but as a leader of change you need to start finding and cultivating these change-ready volunteers. Again, you don’t necessarily want rockstars, so much as open minded people who enjoy trying new things.
Rotating out to spread the virus of digital transformation
Few organizations have the time or budget-will to train their staff. Management seems to think that a moist bedding of O’Reilly books in a developer’s dark room will suddenly pop-up genius skills like mushrooms. Rotating pairing in product teams addresses this problem in a minimally viable way inside a team: team members learn from each other on a daily basis. Event better, staff is actually producing value as they learn instead of sitting in a neon-light buzzing conference room working on dummy applications.
To scale this change, you can selectively rotate staff out of a well functioning team into newer teams. This seeds their expertise through the organization, and once you repeat this over and over, knowledge will spread faster. One person will work with another, becoming two skilled people, who each work with another person, become four skilled people, then eight, and so on. Organizations like Synchrony go so far as the randomly shuffle desks every six months to ensure people are moving around.
More than just skill transfer and on the job training, rotating other staff through your organization will help spread trust in the new process. People tend to trust their peers more than leaders throwing down change from high, and much more than external “consultants,” and worse, vendor shills like myself. As ever, building this trust through the organization is key to scaling change.
Orange France is one of the many examples of this strategy in practice. After the initial success revitalizing their SMB customer service app, Orange started rotating developers to new teams. Developers that worked on the new mobile application pair with Orange developers from other teams, the website team. As ever with pairing, they both teach the peers how to apply agile and improve the business with better software at the same time. Talking about his experience with rotating pairing, Orange’s Xavier Perret says that “it enabled more creativity in the end. Because then you have different angles, [a] different point of view. As staff work on new parts of the system they get to know the big picture better and being “more creative problem solving” to each new challenge, Perret ads.
While you may start with ninjas, you can take a cadre of volunteers and slowly by surely build up a squad of effective staff that can spread transformation throughout your organization. All with less throwing stars and trashed hotel rooms than those 10x rockstars leave in their wake.
The phrase “digital transformation” is mostly bull-shit, but then again, it’s perfect. The phrase means executing a strategy to innovate new business models driven by rapidly delivered, well designed, and agile software. For many businesses, fixing their long dormant, lame software capabilities is an urgent need: companies like Amazon loom as over-powering competitors in most every industry. More threatening, clever, existing enterprises have honed their ability software capabilities over the past five years.
Liberty Mutual, for example, entered a new insurance market on the other side of the world in 6 months, doubling the average close rate. Home Depot has grown it’s online business by around $1bn each of the past four years, is the #2 ranked digital retailer by Gartner L2, and is adding more than 1,000 technical hires in 2018. The US Air Force modernized their air tanker scheduling process in 120 days, driving $1m in fuel savings each week, and leading to canceling a long-standing $745m contract that hadn’t delivered a single line of code in five years.
Whatever businesses you’re in, able, ruthless competition is coming from all sides: new entrants and existing behemoths. Their success is driven by an agile, cloud-driven software strategy that transforms their organizations into agile businesses.
Let’s take a breath.
That’s some full tilt bluster, but we’ve been in an era of transient advantage for a long time. Businesses need every tool they can lay hands on to grow their business, sustain their existing cash-flows, and fend off competitors. IT has always been a powerful tool for enabling strategies, as they say, but in the past 10 years seemingly helpful but actually terrible practices like outsourcing have ruined most IT department’s ability to create useful software for the businesses they supposedly support.
These organizations need to improve how they do software to transform their organizations into programmable businesses.
Studying how large organizations plan for, initially fail, and then succeed at this kind of transformation is what I spend my time doing. This book (which I’m still working on) collects together what I’ve found so far, and is constructed from the actual experiences and stories of people of who’ve suffered through the long journey to success.
Enjoy! And next time someone rolls their eyes at the phrase “digital transformation,” ask them, “well, what better phrase you got, chuckle-head?”
I’m posting draft chapters of this book as I MVP-polish them up. In sort of the right order, here they are:
If a strategy is presented in the boardroom but employees never see, is it really a strategy? Obviously, not. Leadership too often believes that the strategy is crystal clear but staff usually disagree. For example, in a survey of 1,700 leaders and staff, 69% of leaders said their vision was “pragmatic and could easily translated into concrete projects and initiatives.” Employees, had a glummer picture: only 36% agreed.
Your staff likely doesn’t know the vision and strategy. More than just understanding it, they rarely know how they can help. As Boeing’s Nikki Allen put it:
In order to get people to scale, they have to understand how to connect the dots. They have to see it themselves in what they do — whether it’s developing software, or protecting and securing the network, or provisioning infrastructure — they have to see how the work they do every day connects back to enabling the business to either be productive, or generate revenue.
There’s little wizardry to communicating strategy. First, it has to be compressible. But, you already did that when you established your vision and strategy…right? Next, you push it through all the mediums and channels at your disposal to tell people over and over again. Chances are, you have “town hall” meetings, email lists, and team meetings up and down your organization. Recording videos and podcasts of you explaining the vision and strategy is helpful. Include strategy overviews in your public speaking because staff often scrutinizes these recordings. While “Enterprise 2.0” fizzled out several years ago, Facebook has trained all us to follow activity streams and other social flotsam. Use those habits and the internal channels you have to spread your communication.
You also need to include examples of the strategy in action, what worked and didn’t work. As with any type of persuasion, getting people’s peers to tell their stories are the best examples. Google and others find that celebrating failure with company-wide post mortems is instructive, career-ending crazy as that may sound. Stories of success and failure are valuable because you can draw a direct line between high-level vision to fingers on keyboard. If you’re afraid of sharing too much failure, try just opening up status metrics to staff. Leadership usually underestimates the value of organization-wide information radiators, but staff usually wants that information to stop prairie dogging through their 9 to 5.
As you’re progressing, getting feedback is key: do people understand it? Do people know what to do to help? If not, then it’s time to tune your messages and mediums. Again, you can apply a small batch process to test out new methods of communicating. While I find them tedious, staff surveys help: ask people if they understand your strategy. Be to also ask if know how to help execute the strategy.
Manifestos can help decompose a strategy into tangible goals and tactics. The insurance industry is on the cusp of turbulent competitive landscape. To call it “disruptive,” would be too narrow. To pick one sea of chop, autonomous vehicles are “changing everything about our personal auto line and we have to change ourselves,” says Liberty Mutual’s Chris Bartlow. New technologies are only one of many fronts in Liberty’s new competitive landscape. Every existing insurance company and cut-throat competitors like Amazon are using new technologies to both optimize existing business models and introduce new ones.
“We have to think about what that’s going to mean to our products and services as we move forward,” Bartlow says. Getting there required re-engineering Liberty’s software capabilities. Like most insurance companies, mainframes and monoliths drove their success over past decades. That approach worked in calmer times, but now Liberty is refocusing their software capability around innovation more than optimization. Liberty is using a stripped down set of three goals to make this urgency and vision tangible.
“The idea was to really change how we’re developing software. To make that real for people we identified these bold, audacious moves — or ‘BAMS,’” says Liberty Mutual’s John Heveran:
These BAMs grounded Liberty’s strategy, giving staff very tangible, if audacious, goals. With these in mind, staff could start thinking about how they’d achieve those goals. This kind of manifesto, makes strategy actionable.
So far, it’s working. “We’re just about cross the chasm on our DevOps and CI/CD journey,” says Liberty’s Miranda LeBlanc. “I can say that because we’re doing about 2,500 daily builds, with over a 1,000 production deployments per a day,” she adds. These numbers are tracers of putting a small batch process in place that’s used to improve the business. They now support around 10,000 internal users at Liberty and are better provisioned for the long ship ride into insurance’s future.
Choosing the right language is important for managing IT transformation. For example, most change leaders suggest dumping the term “agile.” At this point, near 25 years into “agile,” everyone feels like they’re agile experts. Whether that’s true is irrelevant. You’ll faceplam your way through transformation if you’re pitching switching to a methodology people believe they’ve long mastered.
It’s better to pick your own branding for this new methodology. If it works, steal the buzzwords du jour, from “cloud native,” DevOps, or serverless. Creating your own brand is even better. As we’ll discuss later, Allstate created a new name, CompoZed Labs, for its transformation effort. Using your own language and branding can help bring smug staff onboard and involved. “Oh, we’ve always done that, we just didn’t call it ‘agile,’” sticks-in-the-mud are fond of saying as they go off to update their Gantt charts.
Make sure people understand why they’re going through all this “digital transformation.” And make even more sure they know how to implement the vision and strategy, or, as you start thinking, our strategy.
Lone wolves rarely succeed at transforming business models and behavior at large organizations. True to the halo effect, you’ll hear about successful lone wolves often. What you don’t hear about are all the lone wolves who limped off to die alone. Even CEOs and boards often find that change-by-mandate efforts fail. “Efforts that don’t have a powerful enough guiding coalition can make apparent progress for a while,” as Kotter summarizes, “But, sooner or later, the opposition gathers itself together and stops the change.”
Organizations get big by creating and sustaining a portfolio of revenue sources, likey over decades. While these revenue sources may transmogrify from cows to dogs, if frightened or backed into a corner, hale but mettlesome upstarts will are usually trampled by the status quo stampede. At the very least, they’re constantly protecting their neck from frothy, sharp-tooth jackals. You have to work with those cows and canines, often forming “committees.” Oh, and, you know, they might actually be helpful.
How you use this committee is situation. It might be the placate enemies who’d rather see you fail than succeed, looking to salvage corporate resources from the HMS Transformation’s wreak. The old maxim to keep your friends close and your enemies closer summarizes this tactic well. Getting your “enemies” committed to and involved in your project is an obvious, facile suggestion, but it’ll keep them at bay. You’ll need to remove my cynical tone from your committee and actually rely on them for strategic and tactical input, support in budgeting cycles, and, eventually, involvement in your change.
For example, a couple years back I was working with all the C-level executives at a large retailer. They’d come together to understand IT’s strategy to become a software defined business. Of course, IT could only go so far and needed the the actual lines of business to support and adopt that change. The IT executives explained how transforming to a cloud native organization would improve the company’s software capabilities in the morning. In the afternoon, they all started defining a new application focused on driving repeat business, using the very techniques discussed in the morning. This workshopping solidified IT’s relationship with key lines of business and started working transforming those businesses. It also kicked off real, actual work on the initiative. By seeing the benefits of the new approach in action, IT also won over the CFO who’d been the most skeptical.
As this anecdote illustrates, building an alliance often requires serving your new friends. IT typically has little power to drive change, especially after decades of positioning themselves as a service bureau instead of a core enabler of growth. As seen in the Duke lineworker case above, asking the business what they’d like changed is more effective than presuming to know. As that case also shows, a small batch process discovers what actually needs to happen despite the business’ initial theories. But, getting there requires a more of a “the customer is always right” approach on IT’s part.
Now, there are many tactics for managing this committee; as ever Kotter does an excellent job of cataloging them in Leading Change. In particular, you want to make sure the committee members remain engaged. Good executives can quickly smell a waste of time and will start sending junior staff if the wind of change smells stale (wouldn’t you do the same?). You need to manage their excitement, treating them as stakeholder and customers, not just collaborators. Luckily, most organizations I’ve spoken with find that cloud native technologies and methodologies so vastly improve their software capabilities, in such a short amount of time that winning over peers is easy. As one executive a year intro their digital transformation program told me, “holy-@$!!%!@-cow we are starting to accelerate. It’s getting hard to not overdo it. I have business partners lined up out the door.”
Tracking the health of your overall innovation machine can be both overly simplified and overly complex. What you want to measure is how well you’re doing at software development and delivery as it relates to improving your organization’s goals. You’ll use these metrics to track how your organization is doing at any given time and, when things go wrong, get a sense of what needs to be fixed. As ever with management, you can look at this as a part of putting a small batch process in place: coming up with theories for how to solve your problems and verifying if the theory worked in practice or not.
All that monitoring
In IT most of the metrics you encounter are not actually business oriented and instead tell you about the health of your various IT systems and processes: how many nodes are left in a cluster, how much network traffic customers are bringing in, how many open bugs development has, or how many open tickets the help desk is dealing with on average.
All of these metrics can be valuable, just as all of them can be worthless in any given context. Most of these technical metrics, coupled with ample logs, are needed to diagnose problems as they come and go. In recent years, there’ve been many advances in end-to-end tracing thanks to tools like Zipkin and Spring Sleuth. Log management is well into its newest wave of improvements, and monitoring and IT management analytics are just ascending another cycle of innovation — they call it “observability” now, that way you know it’s different this time!
Instead of looking at all of these technical metrics, I want to look at a few common metrics that come up over and over in organizations that are improving their software capabilities.
Six common cloud native metrics
Some metrics consistently come up when measuring cloud native organizations:
Lead time is how long it takes to go from an idea to running code in production, it measures how long your small batch loop takes. It includes everything in-between from specifying the idea, writing the code and testing it, passing any governance and compliance needs, planning for deployment and management, and then getting it up and running in production.
If your lead time is consistent enough, you have a grip on IT’s capability to help the business by creating and deploying new software and features. Being this machine for innovation through software is, as you’ll hopefully recall, the whole point of all this cloud native, agile, DevOps, and digital transformation stuff.
As such, you want to monitoring your lead closely. Ideally, it should be a week. Some organizations go longer, up to two weeks, and some are even shorter, like daily. Target and then track an interval that makes sense for you. If you see your lead time growing, then you should stop everything, find the bottlenecks and fix them. If the bottlenecks can’t be fixed, then you probably need to do less each release.
Velocity shows how many features are typically deployed each week. Whether you call features “stories,” “story points,” “requirements,” or whatever else, you want to measure how many of them the team can complete each week; I’ll use the term “story.” Velocity tells you three things:
Your progress to improving and ongoing performance — at first, you want to find out what your team’s velocity is. They will need to “calibrate” on what they’re capable of doing each week. Once you establish this base line, if it goes down something is going wrong and you can investigate.
How much the team can deliver each week — once you know how many features your team can deliver each week, you can more reliability plan your road-maps. If a team can only deliver, for example, 3 stories each week, asking them to deliver 20 stories in a month is absurd. They’re simply not capable of doing that. Ideally, this means your estimates are no longer, well, always wrong.
If the the scope of features is getting too big or too small — if previously, reliability performing team’s velocity starts to drop, it means that they’re scoping their stories incorrectly: they’re taking on too much work, or someone is forcing them to. On the other hand, if the team is suddenly able to deliver more stories each week or finds themselves with lots of extra time each week, it means they should take on more stories each week.
There are numerous ways to first calibrate on the number of stories a team can deliver each week and managing that process at first is very important. As they calibrate, your teams will, no doubt, get it wrong for many releases, which is to be expected (and one of the motivations in picking small projects at first instead of big, important ones). Other reports like burn down charts can help illustrate how the team’s velocity is getting closer to delivering across major releases (or in each release) and help you monitor any deviation from what’s normal.
In general, you want your software to be as responsive as possible. That is, you want it to be fast. We often think of speed in this case, how fast is the software running and how fast can it respond to requests? Latency is a slightly different way of thinking about speed, namely, how long does a request take end-to-end to process, returning back to the user.
Latency is different than the raw “speed” of the network. For example, a fast network will send a static file very quickly, but if the request requires connecting to a database to create and then retrieve a custom view of last week’s Austrian sales, it will take awhile and, thus, the latency will be much longer than downloaded an already made file.
From a user’s perspective, latency is important because an application that takes 3 minutes to respond versus 3 milliseconds might as well be “unavailable.” As such, latency is often the best way to measure if your software is working.
Measuring latency can be tricky….or really simple. Because it spans the entire transaction, you often need to rely on patching together a full view — or “trace” — of any given user transaction. This can be done by looking at locks, doing real or synthetic user-centric tracing, and using any number of application performance monitoring (APM) tools. Ideally, the platform you’re using will automatically monitor all user requests and also put together catalog all of the sub-processes and sub-sub-processes that make up the entire request. That way, you can start to figure why things are so slow.
Often, your systems and software will tell when there’s an error: an exception is thrown in the application layer because the email service is missing, an authentication service is unreachable so the user can’t login, a disk is failing to write data. Tracking and monitoring these errors is, obviously, a good idea. Some of them will range from “is smoke coming out of the box?” to more obtuse ones like servicing being unreachable because DNS is misconfigured. Oftentimes, errors are roll-ups of other problems: when a web server fails, returning a 500 response code, it means something went wrong, but doesn’t the error doesn’t usually tell you what happened.
Error rates also occur before production, while the software is being developed and tested. You can look at failed tests as error rates, as well as broken builds and failed compliance audits.
Fixing errors in development can be easier and more straight forward, whereas triaging and sorting through errors in production is an art. What’s important to track with errors is not just that one happened, but the rate at which they happen, perhaps errors per second. You’ll have to figure out an acceptable level of errors because there will be many of them. What you do about all these errors will be driven by your service targets. These targets may be foisted on you in the form of heritage Service Level Agreements or you might have been lucky enough to negotiate some sane targets.
Chances are, a certain rate of errors will be acceptable (have you ever noticed that sometimes, you just need to reload a web-page?) Each part of your stack will throw off and generate different errors: some are meaningless (perhaps they should be more warnings or even just informative notices, e.g., “you’re using an older framework that might be deprecated sometime in the next 30 years) and others could be too costly, or even impossible to fix (“1% of user’s audio uploads fail because their upload latency and bandwidth is too slow”). And some errors may be important above all else: if an email server is just losing emails every 5 minutes…something is terribly wrong.
Generally, errors are collected from logs, but you could also poll the service in question and it might send alerts to your monitoring systems, be that an IT management system or just your phone.
If you can accept the reality that things will go wrong with software, how quickly you can fix those problems becomes a key metric. It’s bad when an error happens, but it’s really bad if it takes you a long time to fix it.
Tracking mean-time-to-repair is an ongoing measurement of how quickly you can recovering from errors. As with most metrics, this gives you a target to improve towards and then allows you to make sure you’re not getting worse.
If you’re following cloud native practices and using a good platform, you can usually shrink your MTTR with the ability to roll back changes. If a release turns out to be bad (an error), you can back it out quickly, removing the problem. This doesn’t mean you should blithely roll out bad releases, of course.
Measuring MTTR might require tracking support tickets and otherwise manually tracking the time between incident detection and fix. As you automate remediations, you might be able to easily capture those rates. As with most of these metrics, what becomes important in the long term is tracking changes to your acceptable MTTR and figuring out why the negative changes are happening.
Everyone wants to measure cost, and there are many costs to measure. In addition to the time spent developing software and the money spent on infrastructure, there are ratios you’ll want to track like number of applications to platform operators. Typically, these kinds of ratios give you a quick sense of how efficiently IT runs. If each application takes one operator, something is probably missing from your platform and process. T-Mobile, for example, manages 11,000 containers in production with just 8 platform operators.
There are also less direct costs like opportunity and value lost due to waiting on slow release cycles. For example, the US Air Force calculated that is saved $391M by modernizing it’s software methodology. The point is that you need to obviously track the cost of what you’re doing, but you also need to track the costs of doing nothing, which might be much higher.
Of course, none of the metrics so far has measured the most valuable, but difficult metric: value delivered. How do you measure your software’s contribution to your organization’s goals? Measuring how the process and tools you use contributes to those goals is usually harder. This is the dicey plain of correlation versus causation.
Somehow, you need to come up with a scheme that shows and tracks how all this cloud native stuff you’re spending time and money on is helping the business grow. You want to measure value delivered over time to:
Prove that you’re valuable and should keep living and get more funding,
Figure out when you’re failing to deliver so that you can fix it
There are a few prototypes of linking cloud native activities to business value delivered. Let’s look at a few examples:
As described in the case study above, when the IRS replaced call centers with poor availability with software, IT delivered clear business value. Latency and error rates decreased dramatically (with phone banks, only 37% of calls made it through) and the design improvements they discovered led to increased usage of the software, pulling people away from the phones. And, then, the results are clear: by the Fall of 2017, the this application had collected $440m in back taxes.
Running existing businesses more efficiently is a popular goal, especially for large organizations. In this case, the value you deliver with cloud native will usually be speeding up businesses processes, removing wasted time and effort, and increasing quality. Duke Energy’s lineworker case is a good example, here. Duke gave lineworkers a better, highly tuned application that queue and coordinate their work in the field. The software increased lineworker’s productivity and reduced waste, directly creating business value in efficiencies.
The US Air Force’s tanker scheduling case study is another good example here: by adapting a cloud native, software model they were able to ship the first version in 120 days and started saving $100,000’s in fuel costs each week. Additionally, the USAF computed the cost of delay — using the old methods that took longer — at $391M, a handy financial metric to consider.
And, then, of course, there comes raw competition. This most easily manifests itself as time-to-market, either to match competitors or get new features out before them. Liberty Mutual’s ability to enter the Australian motorcycle market, from scratch, in six months is a good example. Others, like Comcast demonstrate competing with major disruptors like Netflix.
It’s easy to get very nuanced and detailed when you’re mapping IT to business value. You need to keep things as simple as possible, or, put another way, only as complex as needed. As with the example above, clearly link your cloud native efforts to straight forward business goals. Simply “delivering on our commitment to innovation” isn’t going to cut it. If you’re suffering under vague strategic goals, make them more concrete before you start using them to measure yourself. On the other end, just lowering costs might be a bad goal to shoot for. I talk with many organizations who used outsourcing to deliver on the strategic goal of lowering costs and now find themselves incapable of creating software at the pace their business needs to compete.
Fleshing out metrics
I’ve provided a simplistic start at metrics above. Each layer of your organization will want to add more detail to get better telemetry on itself. Creating a comprehensive, umbrella metrics system is impossible, but there are many good templates to start with.
Pivotal has been developing a cloud native centric template of metrics, divided into 5 categories:
These metrics cover platform operations, product, and business metrics. Not all organizations will want to use all of the metrics, and there’s usually some that are missing. But, this 5 S’s template is a good place to start.
If you prefer to go down rabbit holes rather than shelter under umbrellas, there are more specialized metric frameworks to start with. Platform operators should probably start by learning how the Google SRE team measures and manages Google, while developers could start by looking at TK( need some good resource ).
Whatever the case, make sure the metrics you choose are
targeting the end goal of putting a small batch process in place to create better software,
reporting on your ongoing improvement towards that goal, and,
alerting you that you’re slipping and need to fix something…or find a new job.
“Compliance” will be one of your top bugbears as you improve how your organization does software. As numerous organizations have been finding, however, compliance is a solvable problem. You can even improve the quality of compliance and risk management in most cases with your new processes and tools, introducing more, reliable controls than traditional approaches.
I’ve seen three approaches to dealing with compliance, often used together as a sort of maturity model:
Ignore compliance, compliantly — select projects to work on that don’t need much compliance, if any. Eventually, you’ll want to work on projects that do, but this buys you time to learn by doing and building up a small series of successful projects.
Minimal Viable Compliance — often, the compliance requirements you must follow have built up over years, even decades. It’s very rare that any control is removed, but it’s very frequent that they should be. Find the smallest set of controls you actually need to satisfy.
Transform compliance — as you scale up your transformation efforts, like most organizations you’ll find that you have to work with auditors. Most organizations are finding that simply involving auditors in your software lifecycle from start to end not only helps you pass compliance with flying colors, but that it improves the actual compliance work.
But first, what exactly is “compliance”?
If you’re a large organization, chances are you’ll have a set of regulations you need to comply with. These are both self- and government-imposed. In software, the point of regulations is often to govern the creation of software, how it’s managed and in run in production, and how data is handled. The point of most compliance is risk management, e.g., making sure developers deliver what was asked for, making sure they follow protocol for tracking changes and who made them, making sure the code and the infrastructure is secure, and making sure that people’s personal data is not needlessly exposed.
Compliance often takes the form of a checklist of controls and verifications that must be passed. Auditors are staff that go through the process of establishing those lists, tracking down their status in your software, and also negotiating if each control must be followed or not. The auditors are often involved before and after the process to establish the controls and then verify that they were followed. It’s rare that auditors are involved during the process, which is a huge source of wasted time, it turns out. Getting involved after your software has been created requires much compliance archaeology and, sadly, much cutting and pasting between emails and spreadsheets, paired with infinite meeting scheduling.
When you’re looking to transform your software capabilities, this traditional approaches to compliance, however, often end up hurting businesses more than helping them. As Liberty Mutual’s David Ehringer describes it
The nature of the risk affecting the business is actually quite different: the nature of that risk is, kind of, the business disrupted, the business disappearing, the business not being able to react fast enough and change fast enough. So not to say that some of those things aren’t still important, but the nature of that risk is changing.
Ehringer says that many compliance controls are still important, but there are better ways of handling them without worsening the largest risk: going out of business because innovation was too late.
Let’s look at three ways that organizations are avoiding failure by compliance.
These may seem disconnected from anything that matters and, thus, not worth working on. Early on, though, the ability to get moving and prove that change is possible often trumps any business value concerns. You don’t want to eat these “empty calorie” projects too much, but it’s better than being killed off at the start.
Minimal Viable Compliance
Part of what makes compliance seem like toil is that many of the controls seem irrelevant. Over the years, compliance builds up like plaque in your steak-loving arteries. The various controls may have made sense at some time — often responding to some crisis that occured because this new control wasn’t followed. At other times, the controls may simply not be relevant to the way you’re doing software.
Clearing away old compliance
When you really peer into the audit abyss, you’ll often find out that many of the tasks and time bottlenecks are caused by too much ceremony and processes no longer needed to achieve the original goals of audibility. Target’s Heather Mickman recounts her experience with just such an audit abyss clean-up in The DevOps Handbook:
As we went through the process, I wanted to better understand why the TEAP-LARB [Target’s existing governance] process took so long to get through, and I used the technique of “the five whys”…which eventually led to the question of why TEAP-LARB existed in the first place. The surprising thing was that no one knew, outside of a vague notion that we needed some sort of governance process. Many knew that there had been some sort of disaster that could never happen again years ago, but no one could remember exactly what that disaster was, either.
As Boston Scientific’s CeeCee O’Connor says, finding your path to minimal viable compliance means you’ll actually need to talk with auditors and understand the compliance needs. You’ll likely need to negotiate if various controls are needed or not, more or less proving that they’re not. When working with auditors on an application that helped people manage a chronic condition, O’Connor group first mapped out what they called “the path to production.”
This was a value-stream like visual that showed all of the steps and processes needed to get the application into production, including, of course compliance steps. Representing each of these as sticky notes on a wall allowed the team to quickly work with auditors to go through each step — each sticky note — and ask if it was needed. Answering such a question requires some criteria, so applying lean they team asked the question “does this process add value for the customer?”
You’re already helping compliance
This mapping and systematic approach allowed the team and auditors to negotiate the actual set controls needed to get to production. At Boston Scientific, the compliance standards had built up over 15 years, growing thick, and this process helped thin them out, speeding up the software delivery cycle.
The opportunity to work with auditors will also let you demonstrate how many of your practices are already improving compliance. For example, pair programming means that all code is continuously being reviewed by a second person and detailed test suite reports show that code is being tested. Once you understand what your auditors need, there are likely other processes that you’re following that contribute to compliance.
Discussing his work at Boston Scientific, Pivotal’s Chuck D’Antonio describes a happy coincidence between lead design and compliance. When it comes to pacemakers and other medical devices, you’re only supposed to build exactly the software needed, removing any extraneous software that might bring bugs. This requirement matches almost exactly with one of the core ideas of minimum viable products and lean: only deliver the code needed. Finding these happy coincidences, of course, requires working closely with auditors. It’ll be worth a day or two of meetings and tours to show your auditors how you do software and ask them if anything lines up already.
Case Study: “It was way beyond what we needed to even be doing.”
Operating in five US states and insuring around 15 million people, health insurance provider HCSC is up to its eyeballs in regulations and compliance. As it started to transform, HCSC initially felt like getting over the compliance hurdle would be impossible. Mark Ardito recounts how easy it actually was once auditors were satisfied with how much better a cloud-native approach was:
Turns out it’s really easy to track a story in [Pivotal] Tracker to a commit that got made in git. So I know the SHA that was in git, that was that Tracker story. And then I know the Jenkins job that pushed it out to Cloud Foundry. And guess what? I have this in the tools. There’s logs of all these things happening. So slowly, I was able to start to prove out auditability just from Jenkins logs, git SHAs, things like that. So we started to see that it became easier and easier to prove audits instead of Word documents, Excel documents — you can type anything you want in a Word document! You can’t fake a log from git and you can’t fake a log in Jenkins or Cloud Foundry.
Automation makes auditors happier and removes huge, time-sucking bottlenecks.
While you may be able to avoid compliance or eliminate some controls, regulations are more likely unavoidable. Speeding up the compliance bottleneck, then, requires changing how compliance is done. Thankfully, using a build pipeline and cloud platforms provides a deep set of tools to speed up compliance. Even better, you’ll find cloud native tools and processes improve the actual quality and accuracy of compliance.
Compliance as code
Many of the controls auditors need can be satisfied by adding minor steps into your development process. For example, as Boston Scientific found, one of their auditors controls specified that a requirement had to be tracked through the development process. Instead of having to verify this after the team was code complete, they made sure to embed the story ID into each git commit, automated build, and deploy. Along these lines, the OpenControl project has put several years of effort into automating even the most complicated government compliance regimes. Chef’s InSpec project is also being used to automate compliance.
Pro-actively putting in these kinds of tracers is a common pattern form organizations that are looking to automate compliance. There’s often a small amount of scripting required to extract these tracers and present them in a human readable format, but that work is trivial in comparison to the traditional audit process.
This makes your entire stack of infrastructure and software a single, unique unit that must be audited each release. This creates a huge amount of compliance work that needs to be done even for a single line of code: everything must be checked from the dirt to screen. As Raytheon’s Keith Rodwell lays out, working with auditors, you can often show them that by using the same, centralized platform for all applications you can inherit compliance from the platform. This allows you to avoid the time taken to re-audit each layer in your stack.
The US federal government’s cloud.gov platform provides a good example of baking controls into the platform. 18F, the group that built and supports cloud.gov described how their platform, based on Cloud Foundry, takes care of 269 controls for product teams:
Out of the 325 security controls required for Moderate-impact systems, cloud.gov handles 269 controls, and 41 controls are a shared responsibility (where cloud.gov provides part of the requirement, and your applications provide the rest). You only need to provide full implementations for the remaining 15 controls, such as ensuring you make data backups and using reliable DNS (Domain Name System) name servers for your websites.
Organizations that bake controls into their platforms find that they can reduce the time to pass audits from months (if not years!) to just weeks or even days. The US Air Force has had similar success with this approach, bringing security certification down from 18 months to 30 days, sometimes even just 10.
Compliance as a service
Finally, as get deeper into dealing with compliance, you might even find that you work more closely with auditors. It’s highly unlikely that they’ll become part of your product team; though that could happen in some especially compliance-driven government and military work where being compliant is a huge part of the business value. However, organizations often find that auditors are involved closely throughout their software life-cycle. Part of this is giving auditors the tools to proactively check on controls first hand.
Home Depot’s Tony McCulley suggests giving auditors access to your continuous delivery process and deployment environment. This means auditors can verify compliance questions on their own instead of asking product teams to do that work. Effectively, you’re letting auditors peer into and even help out with controls in your software. Of course, this will only works if have a well-structured, standardized platform supporting your build pipeline with good UIs that non-technical staff can access.
Making compliance better
There have obviously been culture shocks. What is more interesting though is that the teams that tend to have the worst culture shock are not those typical teams that you might think of, audit or compliance. In fact, if you’re able to successfully communicate to them what you’re doing, DevOps and all of the associated practices seem like common sense. [Auditors] say, ‘Why weren’t we doing this before?’” — Manuel Edwards, E*TRADE, Jan 2016
Understanding and working with auditors gives the product team the chance to write software that more genuinely matches compliance needs.
The traceability of requirements, authorization, and automated test reports give auditors much more of the raw materials needed to verify compliance.
Automating compliance reporting and baking controls into the platform creates much more accurate audits and can give so called “controls” actual, programmatic control to enforce regulations.
As with any discussion that includes the word “automation,” some people take all of this to mean that auditors are no longer needed. That is, we can get rid of their jobs. This sentiment then gets stacked up into the eternal “they” antipattern: “well, they won’t change, so we can’t improve anything around here.
But, also as with any discussion that includes to word “automation,” things are not so clear. What all of these compliance optimizations point to is how much waste and extra work there is in the current approach to compliance.
This often means auditors working overtime, on the weekend, and over holidays. If you can improve the tools auditors use you don’t need to get rid of them. Instead, as we can do with previously overworked developers, you end up getting more value out of each auditor and, at the same time, they can go home on-time. As with developers, happy auditors mean a happier business.
Most companies don’t realize the amount of work required to fully transform their approach to creating and caring for software. Scaling up the improvements learned and put into place by your initial teams relies on building trust and understanding in the overall organization. For whatever reason, most people in large organizations are resistant to change and, what with the frequent introduction of process improvement programs, skeptical of the flavor of the week of the syndrome. A large part of scaling up digital transformation, then, is internal marketing. And it’s a lot more than most people anticipate.
Once you nail down some, initial, successful applications, start a program to tell the rest of the organization about these projects. This is beyond the usual email newsletter mention, often quickly leading to internal “summits” with speakers from your organization going over lessons learned and advice for starting new cloud native projects.
You have to promote your change, educate people, and overall “sell” it to people who either don’t care, don’t know, or are resistant. These events can piggyback on whatever monthly “brown-bag” sessions you have and should be recorded for those who can’t attend. Somewhat early on, management should set aside time and budget to have more organized summits. You could, for example, do a half day event with three to four talks by team members from successful projects, having them walk through the project’s history and advice on getting started.
This internal marketing works hand-in-hand with starting small and building up a collection of successful projects. As you’re working on these initial projects, spend time to document the “case studies” of what worked and didn’t work, and track the actual business metrics to demonstrate how the software helped your organization. You don’t so much want to just how fast you can now move, but you want to show how doing software done in this new way is strategic for the business.
Content-wise, what’s key in this process is for staff to talk with each other, about your organization’s’ own software, market, and challenges faced. I find that organizations often think that they face unique challenges. Each organization does have unique hang-ups and capabilities, so people in those organization tend to be most interested in how they can apply the wonders of cloud native to their jobs, regardless of whatever success they might hear about at conferences or, worse, vendors with an obvious bias. Hearing from each other often gets beyond this sentiment that “change can’t happen here.”
Once your organization starts hearing about these success, you’ll be able to break down some of the objections that stop the spread positive change. As Amy Patton at SPS Commerce put it, “having enough wins, like that, really helped us to keep the momentum going while we were having a culture change like DevOps.”
Winning over process stakeholders
The IRS provides an example of using release meetings to slowly win over resistant middle-management staff and stakeholders. Stakeholders felt uncomfortable letting these detailed requirements evolve over each iteration. As with most people who’re forcedencouraged to move from waterfall to agile, they were skeptical that the final software would have all the features they initially wanted.
While the team was, of course, verifying these evolving requirements with actual, in production user testing, stakeholders were uncomfortable. These skeptics were used to comfort of lots of up-front analysis and requirements, exactly spelling out which features would be implemented. To start getting over this skepticism, the team used their release meetings to show off how well the process was working, demonstrating working code and lessons learned along the way. These meetings went from five skeptics to packed, standing room only meetings with over 45 attendees. As success was built up and the organizational grape-vine filled with tales of wins, interest grew and with it, trust in the new system.
The next step: training by doing
As the organizations above and others like Verizon and Target demonstrate, internal marketing must be done “in the small,” like this IRS case and, eventually, “in the large” with internal summits.
Scaling up from marketing activities is often done with intensive, hands-on training workshops called “dojos.” These are highly structured, guided, but real release cycles that give participants the chance to learn the technologies and styles of development. And because they’re working on actual software, you’re delivering business value along the way: it’s training and doing.
These sessions also enable the organization to learn the new pace and patterns of cloud native development, as well as set management expectations. As Verizon’s Ross Clanton put it recently:
The purpose of the dojo is learning, and we prioritize that over everything else. That means you have to slow down to speed up. Over the six weeks, they will not speed up. But they will get faster as an outcome of the process.
Scaling up any change to a large organization is mostly done by winning over the trust of people in that organization, from individual contributors, to middle-management, to “leadership.” Because IT has been so untrustworthy for so many decades — how often are projects not only late and over-budget, but then anemic and lame when finally delivered? — the best way to win over that trust is to actually learn by doing and then market that success relentlessly.
Change is hard, but possible, or, It’s the still the case that you should stop hitting yourself
Improving is never easy, and that’s certainly true when it comes to how large organizations improve how they do software. While it can seem like a curse, I’m lucky to talk with people at organizations who are struggling to improve. By the nature of the work Pivotal does, we spend a lot of time talking with organizations who want to be more agile, shift to a DevOps approach, and other wise (to a buzz-phrase) become “cloud native.”
The road to better software is paved with white collar pain. It’s like (as I’m told) when you start working out: it hurts, for, like, several years, and then you sort of start to enjoy it and maybe can live 0.7 years longer.
Let’s look at a couple of those common pains.
The Pain of legacy process (aka “culture”)
Perhaps the most frequent question is something like:
“How do we reconcile [old processes we don’t like] with DevOps [or whatever new way of doing things we now want to do?”
At that point, you ask “does our current process do that”? If not, then you have to get executives to change how the organization runs. There’s no short cuts or easy cuts, you just have to do it over the course of *years*.
In contrast, to “virtualize,” you sort of just install VMware and after a few years you have huge ROI and savings. (Granted, the truth of virtualization is that it ruffled all sorts of feathers in IT departments in the 2000s and people were all up in arms and chickens without heads running around and cats and dogs living together …we just forget all that ;>)
Put another way “you’re going to change and eliminate those processes. GET READY FOR A SURPRISE!”
One of the chief characteristics of large organizations is that you have to convince the organization to actually do anything. We have these visions that executives in large organizations can actually make the trains run on time, as it were. Nope.
Thus, when it comes to change, you have to spend much of your time doing internal marketing to sell it up the chain, to your peers, and to your own organization.
To my mind, the only ways to do internal marketing well is to either (1.) already be successful, or, (2.) get executives at other companies to tell you and your organization How They Did It And That It Worked.
The first is just a recursive noop (“success breeds success” and other such nonsense). Thankfully, when it comes to the second, there’s a lot help now-a-days, primary in the form of there change agents who’ve gone through this themselves.
How did these executives succeed? By actually trying: picking small projects at first, learning the new way, succeeding (and hiding failures), then trying bigger things, and then telling people about it by making money. After all, success breeds success, right?
They also fire, er, “re-allocate” a lot of people, which they don’t talk about a lot in big glitzy keynotes but do over drinks in loud bars.
So, let me direct you to some “actual” people who’ve gone through all this:
Tony McCulley from Home Depot has two good talks walking antiseptically through two years of Home Depot changing how they do software. Check out the first from 2015, then the update from this year. (He’ll be speaking at Gartner’s appdev conference this December if you’re looking for some pre-Holidays jollies.).
There’s many more “talks” that aren’t recorded, you just have to find the right people and sit down with them to chat.
How do we migrate legacy software?
There are no good answers here. This is like someone with terminal lung cancer asking for help on stopping smoking. I suppose that’s gas-lighting…but if “legacy” is what’s holding you back it means you’re not managing technical debt well. Stick all your enterprise architects in a room (maybe even have an open bar!) and gently ask them, “so…what would you say you do around here?”
Updating legacy software is hard. The problem with “lift and shift” (which many vendors like to wrap fancy slides around) is that the “cloud native” benefits you’re looking to get are not only from the platform you run your software on, but from how the software itself is written (and, then, obviously, then, how you manage and operate it in production).
Sure, you could just dump some three tier, MVC, hairball into a WAR file and spoot it out into some container orchestrated cloud thing…but all you’ll new have is a big lever that says “reboot” on it. With brute-migration there won’t be:
All the resiliency advantages of little blue/green man deploys, canary parties, feature flag burnings, bulk-bin heads, etc.,
The ability to to start deploying weekly or even daily to improve how software is done (i.e., “you don’t operate in an agile way”)
Worse: the original problem still isn’t fixed. The next time you need to pay down your technical debt so you can improve/do things in a better way, you’ll still have the same old crap weighing you down, just with a different compression format and file extension.
I think there’s plenty of “hacks” to be had to extend legacy software’s value; that is, not having to spend time and money on updating/refactor/rewriting them). I hear Oracle has some bridge-themed tools if you like your current parking arrangements, and there’s always queues, amirght? You could probably do a lot worse than doing some BCG matrix trust-falls to find your low-priority, little used apps and shipping them off to an MSP, AWS, or one of those data centers that’s sitting there purring like an old cat with crusty eyes and renal issues.
The point of Whatever The New Approach You Want To Do is: when you want to write software in the best way possible, do it the new way, not the way you’ve been doing.
There’s some more instructive help from people like my pals Kenny and Rohit, to be sure. You can find plenty of content like this that speaks to how to start picking away at the scabs of legacy. As with peeling off any scab, it’s important to know that the skin underneath it is healed, or you just re-bleed. That’s probably how you should treat migrating legacy applications.
Like I said: no good answers here, just lots of work and risk of bleeding.
If you’re struggling with stopping hitting yourself, the best next step is to find other people who’re struggling and to talk with them. You then need to “see it to believe it” and, then, really, just start trying. There’s no universal bromide or DVD you can install. There is, however, a way of thinking — a process even — you can apply, namely: learning and slowly changing towards the better.
This post is pretty old and possibly out of date. There’s updates on this topic and more in my book, Monolithic Transformation.
Every journey begins with a single step, they say. What they don’t tell you is that you need to pick your step wisely. And there’s also step two, and three, and then all the n + 1 steps. Picking your initial project is important because you’ll be learning the ropes of a new way of developing and running software, and hopefully of running your business.
Choosing these first projects wisely is also important for internal marketing and momentum purposes: the smell of success is the best deodorant, as they say, so you want your initial projects to be successful. And…if they’re not, you want to quietly sweep them under the rug so no one notices. Few things will ruin the introduction of a new, proven way of operating into a large organization than failure’s foetidly. Following Larman’s Law, the organization will do anything it can — consciously and unconsciously — to stop change. One sign of weakness early, and your cloud journey will be threatened by status quo zombies.
Project picking peccadilloes
Your initial projects should be material to the business, but low risk. They should be small enough that you can quickly show success in the order of months, and also technically feasible for cloud technologies. These shouldn’t be “science projects” or automation of low value office activities: no virtual reality experiments or conference room schedulers (unless those are core to your business). On the other hand, you don’t want to do something too big, like “migrate the .com site.” As Christopher Tretina recounts Comcast’s initial cloud native ambitions:
We started out last year with a very grandiose vision.. And it didn’t take us too long to realize we had bit off a little more than we could choose. So around mid-year, last year, we pivoted and really tried to hone in and focus on ‘what are just the main services we wanted to deploy that’ll get us the most benefit?’
Your initial projects should also allow you to test out the entire software lifecycle, all the way from conception, to coding, to deployment, to running in production. Learning is a key goal of these initial projects and you’ll only do that by going through the full cycle. As Home Depot’s Anthony McCulley describes the applications chosen in the first 6 or so months of their cloud native roll-out: “they were real apps, I would just say that they were just, sort of, scoped in such a way that if there was something wrong it wouldn’t impact an entire business line.” In Home Depot’s case, the applications chosen were projects like managing (and charging for!) late returns for tool rentals and running the in-store custom paint desk.
A special case for initial projects is picking a microservice to deploy. This is not as perfect case as a full-on, human-facing project, but will allow you to test out cloud native principals. The microservice could be something like a fraud detection or address canonicalization service. This is one approach to migrating legacy applications in reverse order, a strangler from within!
Picking projects by portfolio analysis
There are several ways to select your initial projects following the above criteria. Many Pivotal customers use a method perfected over the past 25 years by Pivotal Labs called “discovery.” In the abstract, it follows the usual BCG matrix approach but builds in intentional scrappiness to ensure that you can quickly do a portfolio analysis with the limited time and attention you can secure from all the stakeholders. The goal is to get a ranked list of projects to do based on the organization’s priorities and the “easiness” of the projects.
First, gather all the relevant stakeholders. This should include a mixture of people from “the business” and IT side, as well as the actual team that will be doing the initial projects. This discovery session is typically led by a facilitator, usually a Pivotal Labs person familiar with coaxing a room through this process.
The facilitator will hand out stacks of sticky notes and markers, asking everyone to write down projects that they think are valuable. What “valuable” is will depend on each stakeholder. We’d hope that the more business minded of them would have a list of corporate initiatives and goals in their heads (or a more formal one they brought to the meeting). One approach used in Lean is to ask management “if we could do one thing better, what would it be?” and start from there, maybe with some five why’s spelunking.
The participants then put up their sticky notes in this quadrant, forcing themselves not to weasel out and put the notes on the lines. Once everyone has done this, you get a good sense of projects that all stakeholders think are important, sorted by the criteria I mentioned above: material to the business (“important”) and low risk (“easy”).
If all of the notes are clustered in one quadrant (usually, in the upper right, of course), the facilitator will redo the 2×2 lines to just that quadrant, forcing the decision and narrowing down on just projects to “do now.” The process might repeat itself over several rounds. To force a ranking of projects you might also use techniques like dot voting which will force the participants to really think about how they would prioritize the projects. At the end, you should have a list of projects, ranked by the consensus of the stakeholders in the room.
Like I said: “scrappy.”
Planning out the initial project
Of course, you may want to refine your list even more, but to get moving, the next step is to pick the top project and start breaking down what to do next. How you proceed here is highly dependent on how your product teams break down tasks into stories, iterations, and releases (or epics, sagas, or whatever cutesy terms you like for “bucket of stuff scoped at some hierarchical level with purposefully vague responsibility and temporal connotations”).
This chart measures application instances in Pivotal Cloud Foundry which does not map exactly to a single application. What’s important is the general shape and acceleration of this curve as they became more familiar with the approach and the platform.
Another Pivotal customer in the telco space started with about 10 unique applications at first and expanded to 100 applications just over half a year later. These were production applications used to manage millions of customer account management and billing tasks.
How do you start: simple
It all sounds simple, and that’s part of the point. When learning something new, you want to start as simple as possible, but not simpler.
This post is pretty old and possibly out of date. There’s updates on this topic and more in my book, Monolithic Transformation.
There are a handful of “standard” roles used in typical Agile and DevOps teams. Any application of Agile and DevOps is highly contextual and crafted to fit your organization and goals. This document goes over the typical roles and their responsibilities and discusses how Pivotal often sees companies staffing teams. It draws on “the Pivotal Way,” our approach to creating software for clients, recommendations for staffing the operations roles for Pivotal Cloud Foundry (our cloud platform), and recent recommendations and studies from agile and DevOps literature and research.
I don’t discuss definitions for “Agile” or “DevOps” except to say: organizations that want to improve the quality (both in lack of bugs and “design” quality as in “the software is useful”) and uptime/resilience of their software look to the product development practices of agile software development and DevOps to achieve such results. I see DevOps as inclusive of “Agile” in this discussion, and so will use the terms interchangeably. For more background, see one of my recent presentations on this topic.
Finally, there is no fully correct and stable answer to the question of what roles and responsibilities are in teams like this. They fluctuate constantly as new practices are discovered and new technologies remove past operational constraints, or create new ones! This is currently my best understanding of all this based both organizations that are using Cloud Foundry and those who are using other cloud platforms. As you discover new, better ways, I’d encourage you to reach out to me so we can update this document.
Core Roles for DevOps Teams
There are generally two types of teams we see: “business capabilities teams” who work on the actual software or services (“the application”) in question and “agile operations teams” as seen below:
While not depicted in the diagram above, the amount of staff in each layer dramatically reduces as you go “down” the stack. The most number of people are in the business capability layer working on the actual applications and services, many less individuals are working on creating and customizing capabilities in Pivotal Cloud Foundry (e.g., a service to interact with a reservation or ERP system that is unique to the organization), while many, many less “operators” work at keeping the cloud platform up and running, updated, and handle the hardware and networking issues.
Making the Applications — Business Capability Roles
These teams work on the application or services being delivered. The composition of these teams changes over time as each team “gels,” learning the necessary operations skills to be “DevOps” oriented and master the domain knowledge needed to make good design choices.
These are the core roles in this layer:
Developer/Engineer — those who not only “code,” but gain the operations knowledge needed to support the application in production, with the help of…
Operations — those who work with developers on production needs and help support the application in production. This role may lessen over time as developers become more operations aware, or it may remain a dedicated role.
Product Owner/Product Manager — the “owner” of the application in development that specifies what should be done each iteration, prioritizing the list of “requirements.”
Designer — studies how users interact with the software and systematically designs ways to improve that interaction. For user-facing applications this also includes visual design.”.
These are roles that are not always needed and sometimes be fulfilled partially by shared, but designated to the team staff:
Tester — staff that helps with the effort ensure the software does what was intended and functions properly.
Architect — in large organization, the role or architect is often used to ensure that individual teams are aligning with the larger organization’s goals and strategy, while also acting as a consultative enabler to help teams be successful and share knowledge.
Data scientist — if data analysis is core to the application being developed, getting the right analysis and domain skills will be key.
These are the programmers, or “software developers.” Through the practice of pairing, knowledge is quickly spread amongst developers, ensuring that there are no “empires” built, and addresses the risks of a low “bus factor.”Developers are also encouraged to “rotate” through various roles from front to back-end to get good exposure to all parts of the project. By using a cloud platform, like Pivotal Cloud Foundry, developers are also able to package and deploy code on their own through the continuous integration and continuous delivery tools.
Developers are not expected to be experts at operations concerns, but by relying on the self-service and automation capabilities of cloud platforms do not need to “wait” for operations staff to perform configuration management tasks to deploy applications. Over time, with this reliance on a cloud platform which cleanly specifies how to best build applications so that they’re easily supported in production, developers gain enough operations knowledge to work without dedicated operations support.
The amount of developers on each team is variable, but so far, following the two pizza team rule of thumb, we see anywhere from 1 to 3 pairs, that is 2 to 6, and sometimes more.
In a cloud native mode, until business capabilities teams have learned the necessary skills to operate applications on their own, they will need operations support. This support will come in the form of understanding (and co-learning!) how the cloud platform works, and assistance troubleshooting applications in production. Early on you should plan to have heavy operations involvement tohelp collaboration with developers and share knowledge, mostly around getting the best from the cloud platform in place. You may need to “assign” operations staff to the team at the beginning, making them so called designated operations staff instead of dedicated, as explained in Effective DevOps.
Many teams find that the operations role never leaves the team, which is perfectly normal. Indeed, the desired end-state is that the application teams have all the development and operations skills and knowledge needed to be successful.
As two cases to guide your understanding of this role:
Etsy has approximately 15 operations engineers to a few hundred other engineers, according to Effective DevOps.
As re-told by Diego Lapiduz at 18F, early on when teams were learning how to use and operate cloud.gov, he and a handful of other operations staff spent much of their time with development teams, getting intimately familiar with each application. Now, because the practice of designated operations staff is less needed, he and his operations peers are less involved and have little knowledge of the applications in use…which is good, and as intended!
This is the role that defines and guides the requirements of the application. It is also one of the roles that varies in responsibilities the most across products. At its core, this role is the “owner” of the software under development. In that respect, they help prioritize, plan, and deliver software that meets your requirements. Someone has to be “the final word” on what happens in high functioning teams like this. The amount of control vs. consensus driven management is the main point of variability in this role, plus the topic areas that the product owner must be knowledge of.
It’s best to approach the product owner as a “breadth first role”: they have to understand the business, the customer, and the technical capabilities. This broad knowledge helps them make sure they’re making the right prioritization decisions.
In Pivotal Labs engagements, this role is often performed by a Pivotal employee, pairing up with one of your staff to train and transfer knowledge. Whether during a project or through workshops, they help you understand the Pivotal, iterative development approach, as well as mentor and train your internal staff to help them learn agile methods and skills, which will enable them to move on with confidence when the engagement is complete.
One of the major lessons of contemporary software is that design matters, a tremendous amount more than previously believed. The “small batch” mentality of learning and improving software afforded by cloud platforms like Pivotal Cloud Foundry gives you the ability to design more rapidly and with more data-driven precision than ever before. Hence, the role of a designer is core to cloud native teams.
The designer focuses on identifying the feature set for the application and translating that to a user experience for the development team. Activities may include completing the information architecture, user flows, wireframes, visual design, and high-fidelity mock-ups and style guides. Most importantly, designers have to “get out of the building” and not only see what actual users are doing with the software, but get to know those users and their needs intimately.
While the product manager, and overall team are charged with testing their software, some organizations either want or need additional testing. Often this is “exploratory testing” where a third party (the tester[s]) are trying to systematically find the edge cases and other “bugs” the development team didn’t think of.
It’s worth questioning the need for separate testers if you find yourself in that situation to make sure you need them. Much routine “QA” is now automated (and can, thus, be done by the team and automated CI/CD pipelines), but you may want exploratory, manual testing in addition to what the team is already doing and verification that the software does as promised and functions under acceptable duress. But even that can be automated in some situations as the Chaos Monkey and Chaos Lemur show.
Traditionally, this role is responsible for conducting enterprise analysis, design, planning, and implementation, using a “big picture” approach to ensure a successful development and execution of strategy. Those goals can still exist in many large organizations, but the role of an architect is evolving to be an enabler for more self-sufficient, and decoupled teams. Too often this role has become a “Dr. No” in most large organizations, so care must be taken to ensure that the architect supports the team, not the other way around.
Architects are typically more senior technical staff who are “domain experts.” They may also be more technically astute and in a consultative way help ensure the long-term quality and flexibility of the software that the team creates, share best practices with teams, and otherwise enables the teams to be successful. As such, this role may be a fully dedicated one who, hopefully, still spends much of their time coding so as not to “go soft” and lose not only the trust of developers but an intimate enough knowledge of technology to know what’s possible and not possible in contemporary software.
Data Science (partial/optional)
If your application includes a large amount of data analysis, you should consider including a data scientist role on the team. This role can follow the dedicated/designated pattern as discussed with the operations role above.
Data Science today is where design might have been a few years ago. It is not considered to be a primary role within a product team, but more and more products today are introducing a level of insight not seen before. Google Now surfaces contextual information; SwiftKey offers word predictions based on swipe patterns; Netflix offers recommendations based on what other people are watching; and Uber offers predictive arrival times of their drivers. These features help turn transactional products into smart product.
There are many other roles that can and do exist in IT organizations. These are roles like database administrators (DBAs), security operations, network operations, or storage operations. In general, as with any “tool,” you should use what you need when you need it. However, as with the architect role above, any role must reorient itself to enabling the core teams rather than “governing” them. As the DevOps community has discussed at length for nearly ten years, the more you divide up your staffing by function, the further you move from a small, integrated team, and your goal of consistently and regularly building quality software will become harder.
Agile Operations Roles
Roles here focus on operating, supporting, and extending the cloud platform in use. These roles typically sit “under” the Agile and DevOps teams, so the discussion here is briefer. Each role is described in term of roles and responsibilities typically encountered in Pivotal Cloud Foundry installs. These can vary by organization and deployment (public vs. private cloud, the need for multi-cloud support, types of IaaS used, etc.)
These are typically the “operations” people described above and serve as a supporting and oversight function to the business capabilities teams, whether designated or dedicated. Typical responsibilities are:
Manages lifecycle and release management processes for apps running in Pivotal Cloud Foundry.
Responsible for the continuous delivery process to build, deploy, and promote Pivotal Cloud Foundry applications.
Ensures apps have automated functional tests that are used by the continuous delivery process to determine successful deployment and operation of applications.
Ensures monitoring of applications is configured and have rules / alerts for routine and exceptional application conditions.
Acts as second level support for applications, triaging issues, and disseminating them to the platform operator, platform developer or application developer as required.
A highly related, sometimes overlapping, role is that of centralized development tool providers. This role creates, sources, and manages the tools used by developers all the way from commonly used libraries to, version control and project management tools, to maintaining custom written frameworks. Companies like Netflix maintain “tools teams” like this, often open sourcing projects and practices they develop.
The is the typical “sys admin” for the cloud platform itself:
Manages IaaS infrastructure that Pivotal Cloud Foundry is deployed to, or co-ordinates with the team that does.
Installs and configures Pivotal Cloud Foundry.
Performs capacity, availability, issue, and change management processes for Pivotal Cloud Foundry.
Scales Pivotal Cloud Foundry, forecasting, adding, and removing IaaS and physical capacity as required.
Upgrades Pivotal Cloud Foundry.
Ensures management and monitoring tools are integrated with Pivotal Cloud Foundry and have rules / alerts for routine and exceptional operations conditions.
This team and its roles are responsible for extending the capabilities of the cloud platform in use. What this role does per organization can vary, but common tasks of this role for organizations using Pivotal Cloud Foundry are to:
Makes enhancements to existing buildpack(s) and builds new buildpack(s) for the platform. Builds service broker(s) to manage lifecycle of external resources and make them available to Pivotal Cloud Foundry apps.
Build Pivotal Cloud Foundry tiles with associated BOSH releases and service brokers to enable managed services in Pivotal Cloud Foundry.
Manages release and promotion process for buildpacks, service brokers, and tiles across Pivotal Cloud Foundry deployment topology.
Integrates Pivotal Cloud Foundry APIs with external tool(s) when required.
Physical Infrastructure Operations
While not commonly covered in this type of discussion, someone has to maintain the hardware and data centers. In a cloud native organization this function is typically so highly abstracted and automated — if not outsourced to a service provider or public cloud altogether — that it it does not often play a major role on cloud native operations. However, especially at first as your organization is transforming this new way of operating, you will need to work with physical infrastructure operations staff, whether in-house or with your outsourcer.
Supporting Best Practices
The use of a cloud platform
Much of the above is predicated on the use of a cloud platform. A cloud platform is a system, such as Pivotal Cloud Foundry, that provides the runtime and production needs to support applications on-top of cloud infrastructure (public and private IaaS), and often, as in the case of Pivotal Cloud Foundry, with fully integrated middleware services that are natively supported (such as databases, queues, and application development frameworks).
Two of the key ways of illustrating the efficiency of using Pivotal Cloud Foundry are (a.) the ratios of operators to application running in the platform, and, (b.) reduction in lead time. Here are some relevant metrics for Pivotal Cloud Foundry:
One large financial institution is currently supporting 145 applications with two operations staff.
When planning the first product developed on Pivotal Cloud Foundry, CoreLogic allocated a team of 12 developers with four QA staff and a management team. The goal was to deliver the product in 14 months. Instead, the project ended up requiring only a product manager, one user experience designer and six engineers who delivered the desired product in just six months. As the second, third, and next projects roll on, you can expect delivery time to reduce even more.
A large retail user of Pivotal Cloud Foundry was running over 250 applications in non-production and nearly 100 applications in production after only 4 months using the platform.
Without a cloud platform like Pivotal Cloud Foundry in place, organizations will find it difficult, if not impossible, to achieve the infrastructure efficiencies in both development and at runtime needed to operate at full speed and effectiveness.
For reference, the below diagram based on Pivotal Cloud Foundry, is a “white board” of the functions a cloud platform provides:
While some sizing guidelines have been listed throughout, there are no hard and fast rules. The notion a “two pizza team” is a well accepted theory of software team sizes. As Amazon CEO Jeff Bezos is said to have decreed: if a team couldn’t be fed with two pizzas, it’s too big. Without making too many assumptions about the size of a “large” pizza or how many slices each individual needs to eat, you could estimate this to around six to fifteen people. This may vary, of course, but the point is to keep things as small as possible to minimize communication overhead, context switching, and responsibility evasion due to “that’s not my job,” among other “wastes.”
The assumption is teams this small is that many of these small teams will organize to create big, overall results. Architectural approaches like microservices that emphasize loose coupling and defensive use of services help enable that coordination at a technical level, while making sure that teams are aware of and aligned to overall organizational goals help coordinate the “meatware.” This coordination is difficult, to be sure, but start looking at your organization a set of autonomous units working together, rather than one giant team. As an example, Pivotal Cloud Foundry engineering is composed of around 300 developers spread over 40 loosely coupled teams.
Dedicated, integrated teams lead to the best outcomes
Agile-think and DevOps teams seek to put every role and, thus, person, needed to deliver the software product from inception to running in production on the same team, with their time fully dedicated to that task. These are often called “integrated teams” or “balanced teams.”
IT has long organized itself in the opposite way, separating out roles (and, thus people) into distinct teams like operators, QA, developers, and DBAs in the hopes of optimizing the usage of these roles. This is the so called “silo” approach. When you’re operating in the more exploratory, rapid, agile cloud native fashion, this attempt at “local optimization” creates “global optimization” problems due to hand-offs between teams (leading to wastage from time and communication errors). A silo approach can also poor people interactions across team which damages the software quality and availability. “That’s not my responsibility” often leads it to being no one’s responsibility.
Most organizations have a “project” mindset when it comes to software as well. The functional roles emerge from their silos as needed to work on the project, and then disband once done. Good software development benefits from a “product” mindset where the team, instead, stays dedicated to the product.
Operating in this way, of course, is not always possible, if only at first as organizations switch over their mind-set from “siloed” teams to dedicated teams. So care must be taken when slicing a person up between different teams: intuitively we understand that people cannot truly “multitask,” and numerous studies have shown the high cost of context switching and spreading a person’s attention across many different projects. In discussing roles, keep in mind that the further you are from truly integrated, dedicated teams, the less efficiency and productivity you’ll get from people on those teams, and, thus, in the project as a whole.
As described in the introduction, this is a quick overview of the roles and responsibilities for Agile and DevOps teams. The book Effective DevOpscontains much discussion of these topics and background on studies and proof-points. For a slightly more “enterprise-y” take, see the concepts in disciplined agile delivery which should be read as slightly “pre-cloud native” but nonetheless helpful.
Additionally, Pivotal Labs, and the Pivotal Customer Success and Transformation teams can discuss these topics further and help transform your organization accordingly. With over twenty years of experience and as the maintainers of Pivotal Cloud Foundry, one of the leading cloud platforms, we have been learning and perfecting these methods for sometime.
In contrast to the way traditional organizations operate, cloud native enterprises are typically comprised of self-motivated and directed teams. This reduces the amount of time it takes to make decisions, deploy code, and see if the results helped move the needle. More than just focusing on speed of decision making and execution, building up these intrinsically motivatedteams helps spark creativity, fueling innovative thinking. Instead of being motivated by metrics like number of tickets closed or number of stories/points implemented, these teams are motivated by ideas like increased customer satisfaction with faster forms processing or helping people track and improve their health.
In my experience, one of the first steps in shifting how your people think — from the Pavlovian KPI bell — is to clearly explain your strategy and corporate principals. Having worked in strategy roles in the past, I’ve learned how poorly companies articulate their goals, constraints, and strategy. While there are many beautiful (and not so beautiful!) presentations that extol a company’s top motivations and finely worded strategies, most employees are left wondering how they can help day to day.
Whether you’re a leader or an individual contributor, you need to know the actionable details of the overall strategy and goals. Knowing your company’s strategy, and how it will implement that strategy, is not only necessary for breeding self-motivated and self-managed people, but it also comes in handy when you’re looking to apply agile and lean principles to your overall continuous delivery pipeline. Tactically, this means taking the time to create detailed maps of how your company executes its strategy. For example, you might do value-stream mapping, alignment maps, or something like value chain mapping. This is a case where I, again, find that companies have much less than they think they do — often, organizations have piles of diagrams and documents — but — very rarely have something that illustrates everything — from having an idea for a feature to getting it in customer’s hands. A cloud native enterprise will always seek to be learning from and improving that end-to-end process. So, it’s required to map your entire business process out and have everyone understand it. Then, we all know how we deliver value.
The process of value-stream mapping, for example, can be valuable for simply finding waste (so much so that the authors of Learning to See cite only focusing on waste removal as a lean anti-pattern). As one anecdote goes — after working for several weeks to finally get everything up on the big white board, one of the executives looked up at all the steps and time wasted in their process to get software out the door and said, “there’s a whole lot of stupid up there.” The goal of these exercises is not only removing stupid, but also to focus on and improve the processes.