This is an excerpt from a new blog post of mine covering a recent panel with platform engineers from Charles Schwab. I’ve made some slight changes.
Ensuring platform scalability and resilience starts with making sure applications are architected appropriately. Nowadays, that usually means using a cloud native architecture.
The guidelines for creating cloud native applications are well-known and proven. “Generally speaking, follow the 12 factor app pattern, have a stateless application, and deploy it as a microservice to PCF, that is our guidance,” Anis says, “It’s pretty simple.” When he says “PCF,” he’s using the older name (Pivotal Cloud Foundry) for the Tanzu Platform for Cloud Foundry, which is built to support cloud native applications.
Establishing and enforcing that style of development is a role for the platform engineering group that I think is under-appreciated: they need to play a role in specifying what type of application architectures work on the platform. It’s tempting to think in a more traditional way where the operations staff have to support a variety of application architectures that come their way. And, in large organizations built up of years of acquisitions, this is an inescapable reality for parts of their app portfolio.
But, when the platform team can drive consistency in application architecture, they can start to make promises about resilience, reliability, supportability, scalability, and the other “ilities.” A platform engineering team that specifies what types of architectures the platform supports is putting in place a contract. “Write your applications this way, and we can ensure that they run well in production.” Site Reliability Engineering thinking brought this idea of “contracts” into enterprise operations, and the Schwab team talks about that way of thinking frequently.
That “contract” extends beyond the app architecture. One example of that is in the continuous integration and continuous deployment (CI/CD) pipeline. In contrast to traditional approaches where individual developers or development teams create their own build pipelines, many platform teams standardize the CI/CD pipeline. This allows platform teams to control how applications are built, configured, and ultimately deployed. For Charles Schwab, this kind of thinking is key, as Rajesh puts it “today anything which goes to the platform is via automation.” This allows the team to control app configuration and put in controls for things like quota, security groups, and other operations configuration.
To me, what the Charles Schwab team is doing is making sure they have the controls in place to scale how they manage all those applications. This removes burden for the application developers, but also allows the platform team to manage the apps in production. Introducing this consistency comes in handy when the team needs to scale applications. “If you want to sync your apps, you don’t have to reach out to the hundreds of application owners to do the deployments at a platform level,” Rajesh says.
One of the fundamental principles of Cloud Foundry is that developers should not build and package the containers for their applications. Instead, developers use buildpacks to specify how their applications should be built and containerized. This allows the platform team to control and automate those application builds. In another talk from Explore, Scott Rosenberg, from TeraSky gave a great overview of why this principle is a good idea. A lot of the benefits of using buildpacks are focused on security, but there are basic operations benefits as well.
It was a great talk, covering lots of platform engineering topics. One you’ll also want to check out is the team structure they use. Instead of just one platform engineering team, they have two: a developer-facing one and an infrastructure facing one. The first has all the feels of standard platform engineering, the second more SRE vibe-y. This is the division they’ve learned over the past seven or so years, so, you know: it’s probably valid. Read the rest of the blog post, and definitely check out the recording of the panel.
Enterprise Philosophy and The First Wave of AI - AI is (too) expensive for what it does, and thus far clunky, so it will need to start in the enterprise space where companies can get good ROI.
IBM AI simply not up to the job of replacing staff - If the AI tools don’t work well, it’s hard to get good results. Also, the enterprise AI coding assistant needs to know a lot more than PHP.
Google Cloud rolls out new Gemini models, AI agents, customer engagement suite - Big round-up of new Gemini stuff: “agents.”
Gemini at Work 2024: How customers use Google Cloud AI products - More on the “agent” metaphor/thought technology Gemini is using. Plus, a really great list of one-liner, enterprise AI uses cases. See this even longer list.
Most mainframe application rewrites fail the first time - “Refactoring mainframe applications commonly results in failure on the first try, according to a Forrester survey of over 300 IT professionals commissioned by Rocket Software.” And, from the survey sponsor: ‘“Starting from scratch and rewriting the apps rarely goes well,” Buckellew said. “It can lead to massive cost overruns and it can take years. When you’re in a long rewrite project – that’s when bad things happen and projects get canceled.”’ // As always, there’s bias in a vendor sponsored survey, but you know, seems like it’d be true. It’s software after all.
How to Keep Learning at Work — Even When You Feel Fried - Enterprise Mindfulness: find out what you want to focus on and personally find valuable, then use your own motivation to structure work that fits your own goals. Also: try to have less toil/cognitive load. // This also falls into the biz-self-help bucket of “the default answer is ‘no,’ unless you can make a personal business case for ‘yes.’”
3 Key Practices for Perfecting Cloud Native Architecture - Some notes on cloud native architecture and app patterns. That is, apps you want to run in container platforms like Cloud Foundry or Kubernetes.
Beyond Infrastructure as Code: System Initiative Goes Live - “In practice, as Jacob has pointed out, this has led to unwieldy, hard-to-update and difficult-to-understand systems built on static definitions. The tools are tightly tied to a version control, making them brittle and difficult to work with. And only elite companies, such as Google, can deploy multiple times in a day with this approach as Jacob (and others) have argued.” // Hey, man, if it works and it handles 60% to 80% of the configuration management goop out there, it’ll be great.
Talks I’m giving, places I’ll be, and other plans.
Cloud Foundry Day EU, Karlsruhe, Oct 9th. VMware Explore Barcelona, speaking, Nov 4th to 7th. GoTech World, speaking, Bucharest, Nov 12th and 13th. SREday Amsterdam, speaking, Nov 21st, 2024.
Discounts! SREDay Amsterdam: 20% off with the code SRE20DAY. Cloud Foundry Day 20% off with the code CFEU24VMW20.
My friend Whitney Lee and I are rebooting the Software Defined Interviews podcast. We’ve been circling around starting a podcast together for a couple of years now, and I think we’ve finally achieved escape velocity. We’ll start with me interviewing her. She’s got one of the most interesting stories of people in tech, and she’s one of a kind. Then we’ll get on to all sorts of people. My filter is interviewing people I want to talk with but either haven’t in the comfortable podcast format or have never talked with. The two of us will make a good podcast-vibe I think. Anyhow, if you’re not already subscribed to it, look it up and subscribe. I’m hoping we get something out soon, maybe even next week.