What is a service mesh? Why do you need a service mesh? And which is the best service mesh?

Sep 19, 2023

The infrastructure drives the app architecture

A cloud native applications is typically designed as a bunch of little components that coordinate with each other over a network. They may use events instead, and while that isn’t the same as point-to-point network communications, it follows the same idea: you have a bunch of indepedent-ish bundles of code that work together, as needed, instead of just one big chunk of code that does all the work. This is, you know, a distributed application. “Message passing” is one of the dreams of object oriented programming and Internet apps.1

Microservices!

Why you use a service mesh

Anyhow, if you’re do all of that, you need a way to manage all that network traffic. Each little bit of code has to know how to contact the other bits of code and work with it - so called “east-west traffic.”2 You need a registry that catalogs all those bits of code. You need to know information about that chunk of code: the version, how to connect to it, how to authenticate with it. You need to somehow make a call over the network, that is, get a network connection. You want it to be secure and encrypted, like, always now-a-days (I don’t really know what mTLS is, but EBC decks are fucking rife with it, so it must be great). And then the people running that network want to manage it: if some chunks of code are too chatty and filling up your series of tubes with too much crap, you want to throttle them. You want to gather metrics about your series of tubes and the messages sent down them. You know: network management. And, when you’re using it with Kubernetes, you want it to all think like and work with Kubernetes: how you configure and deploy it (yaml!), how configuration is rolled out and drift is done. Etc. Etc. (Check out Ivan McPhee’s service mesh overview for a lot more details and the vendors in the space.)

What drives me bonkers about this is that, like, this is what the Internet does. Why don’t we just use Internet primitives to do all of this? Why do we need to layer a whole new network management layer on-top of all the layers. Even more maddening, when you go up the stack into the application layer: the developers there have written all of their own stuff that handles all this functionality. You look at something like the projects in Spring Cloud and they’re, you know, doing all of this too. I’ve started to think that each of these layers happens because the people in the layers above you don’t want to talk with the network admins.

Anyhow, back to service meshes. They are handy! They do important things! For example, help you run your applications across multiple clouds, Kubernetes clusters (is that the right phrasing?), add in customized layers of security, and so forth. Big ol’ enterprises need all of this. I mean, everyone does.

So, what’s up with the whole category of service mesh? Well, Gartner is not so hot on it:

The hype around service mesh software has mostly settled down, and the market has not grown as much as was once anticipated. This raises questions about the usefulness and ROI of service meshes for most organizations. “Market Guide for Service Mesh,” August 2nd, 2023, Gartner.

The report notes that service meshes are used outside of Kubernetes as well. It’s like a whole new marbling of a layer around and inside your existing layers, be they VMs or containers. Yay…? Ivan’s take a little less dire, simply urging taking it slow before choosing which service mesh to use:

Avoid adopting a service mesh based purely on consumer trends, industry hype, or widespread adoption. Instead, take the time to understand the problem you’re trying to solve. Explore the potential tradeoffs in terms of performance and resource consumption. Evaluate your support requirements against your in-house resources and skills (many open-source service meshes rely on community support). Once you’ve created a short list, choose a service mesh—and microservices-based application development partner—that works best with your software stack. Ivan McPhee, GigaOm, August 2023.

Filling in the gaps

When I first head about the notion of a service mesh long ago, my first reaction was basically “wait, I thought Kubernetes already did that?” This was the first in a long series of that reaction over the years. It turns out Kubernetes didn’t do a lot of the things I assumed it did. This was an instance of confusing outcomes with capabilities: for all the praise Kubernetes gets for improving operations and developer productivity, I’d assumed it, like, had those capabilities. But, in fact, many of the outcomes Kubernetes achieves are done by layering in all sorts of other projects, products, and ways of working.3 Ivan’s report does good job cataloging all those capabilities: your eyes can start to glaze over after awhile, so be sure to read the vendor profiles in reverse alphabetical order!

So, you need a service mesh to get all of that basic, distributed app functionality. This is fine! That’s how Kubernetes was designed, whether the overall community over the years treated it as such or not: “platform for building platforms,” “a life of it’s own,” and all that.

That Gartner report identifies a key trend in the ongoing rollout of Kubernetes. People don’t want to pay for things, and this leads to a lot of unplanned for work on their part of integrate all the free components together and deal with them:

The current service mesh market is largely dominated by open-source offerings such as Consul, Istio and Linkerd. However, Gartner client inquiries about service meshes consistently show open-source service meshes suffer from difficulty of use, and a lack of sufficient skills for effective engineering, administration and operational upkeep. The lack of mature DevOps practices can increase the operational burden. These challenges substantially increase as the number of deployed container pods and services grows exponentially, especially in a multicloud environment.

Hey, you get what you pay for. For vendors, this does mean one important product management and strategy decision: you need an easy to download, easy to get up and running, and totally free on-ramp to your paid-for product. I mean: that’s just late 2000’s, open core and early public cloud basics, right?

That Gartner report is good reading if you have access to it.

On your radar

I’m guessing you don’t have access to Gartner, so you’ll probably be interested in this GigaOm report from Ivan McPhee (have I referred to it already here yet?), which you can read thanks to my employer VMware. It’s equally good, though not as strident. Here is their radar:

“The placement towards the center of their radar recognizes our innovation and maturity as well as spotlights the forward-thinking integration strategy VMware embodies,” Darin Zook.

We also discussed the services mesh concept and space on last week’s Tanzu Talk podcast (podcast or in video form-factor). Also, check out this interview about service meshes on our podcast from July of this year.

Relative to your interests

Second Wave DevOps - The tools keep changing: “Let’s face facts: our implementation is what’s letting us down. What worked for John and Paul in 2009 is, in broad strokes, exactly what we have been asking every single DevOps practitioner to do since. We’ve replaced all the individual tools in the system multiple times (look at the CNCF Cloud Native Landscape for the evidence): less automated infrastructure, more infrastructure as code; less monitoring, more observability; less data centers, more cloud; less svn, more git; less virtual machines, more docker; less capistrano, more kubernetes; less hudson, more github actions. The problem isn’t that we haven’t optimized each individual part of the system enough. We’ve built more efficient tooling at every step. But the way the whole system is put together? The experience of using it? That’s basically identical to how it was in 2009, and it’s the reason we’re stuck.” There’s two fronts to the “DevOps is dead” rhetorical war now: from the platform engineering crowd and the fraction within the DevOps crowd itself.
Did I Make a Mistake Selling Del.icio.us to Yahoo? - Plan to never get past slide one: “Any decision was an endless discussion. I remember once, we had to present to a senior vice-president. We had a 105-slide deck prepared, and we didn’t get past the second slide because they ratholed about one fucking slide. It was a miserable environment.”
iOS 17 release: everything you need to know about Apple’s big updates - A concise list. The journaling app comes out later this year.

Survey: Majority of US Workers Are Already Using Generative AI Tools, But Company Policies Trail Behind - “The new survey finds that 56 percent of workers are using generative AI on the job, with nearly 1 in 10 employing the technology on a daily basis. Yet just 26 percent of respondents say their organization has a policy related to the use of generative AI, with another 23 percent reporting such a policy is under development.”

Logoff

I was at SHIFT in Zadar, Croatia this week. I presented my platform talk on a huge stage! This was an arena and the stage was on the center, you were surrounded by the audience. That’s not normal: usually, the audience is all in-front of you. When I’m presenting, I tend to pick out three or five people in the audience that look at. You, of course, want to pick out people who are smiling and paying attention to you. They give you energy, and also help you figure out if your approach and content are working. In this case, I forced myself to circle around the stage, finding those people in all directions.

If you find yourself “in the round” like this, try to move around so that you can find more of those positive vibe people.

Also, the morning of I had some kind of anxiety attack. You know, the kind where there’s nothing to actually worry about and yet it feels like there’s everything to worry about. It wasn’t about speaking at all. In fact, I was looking forward to finally getting up there because I knew it’d drive out that general panic attack thing. And, it worked! Public speaking is a safe, calming space for me.

Man: I sound so old! Smalltalk - blerg, blerg!

Chris: “I know, let’s call it ‘east-west communication!' - now let’s get to lunch.” Avery: “Hey, Chris. You know that the whole rest of the (western) world always starts with ‘west’ then goes to ‘east,’ like, imitating the way we read, left to right?” Chris: “fuck you, Avery! We need to get to Chuy’s before the line is too long!” Avery: “…er…Tufte…?”

As ever with ways of working, I’m always left wondering “have you tried just working that new way without a major swap out of a new technology?”

Coté