Site iconAxway Blog

Microgateway Blog Series, Part 3: Whodunit?

Micro gateway

When things break in a microservice architecture, determining the root cause is challenging- where did it fail and why? There are so many things that could go wrong, there are many places to look. When errors occur, performing root cause analysis in a microservice environment has tongue-in-cheek been referred to as “solving a murder mystery.”

Imagine that you have all your microservices running in production, and suddenly you are awoken by a frantic voice, “All of our orders are being rejected and no one knows why!”. For you, now it’s time to roll out of bed, grab your pipe and magnifying glass because now you’re about to start playing detective.

Imagine that you’re playing a game of Cluedo (or Clue as it’s known in North America). Cluedo/Clue is a murder mystery game. The object of the game is to determine who murdered the game’s victim “Dr. Black”/”Mr. Boddy”, where the crime took place, and which weapon was used. You win the game by solving the question of who, where, and how?

In Clue, the following items are at play to determine the solution:

Where: 9 rooms (Kitchen, Ballroom, Conservatory, Dining Room, Billiard Room, Library, Lounge, Hall, Study)

How: 6 weapons (candlestick, knife, lead pipe, revolver, rope, and spanner/wrench)

Who: 10 suspects (the original six Colonel Mustard, Miss Scarlet, Mr. Green, Mrs. Peacock, Mrs. White, Professor Plum plus the new suspects M. Brunette, Madam Rose, Sgt. Gray, and Miss Peach).

So, 9 rooms, 6 weapons, and 10  suspects would be: 9 * 6 * 10 = 540. Therefore, there are 540 possible solutions!

With a microservice deployment, the complexity is far greater. Let’s investigate…

Where: A microservice deployment, consisting of small isolated services, typically would be deployed to some form of abstraction or virtualization: cloud provider, virtual machine, container; in some cases, all of them at the same time! On top of this abstraction, we have our actual software (business logic). Now, throw in all the other components that could be at play: load balancer, DNS, API Gateway, web app, datastore, identity provider, … The list of where something could go wrong can be daunting and seemingly infinite. In addition, oftentimes the location of the failure is the symptom, but not the cause. For example, the service failed, but it is failing because the database is unavailable.

How: Ephemeral services, running in a distributed manner, across a vast network of virtual machines, interacting with other services, running in the same fashion- what could possibly go wrong? Well, to put it mildly, anything. During my time managing SaaS deployments, I’ve seen many different murder weapons at play. Even the seemingly benign can produce deadly consequences, which will destabilize our microservice environments. Here are a few examples:

Who: The who becomes the most critical question to solve. Even if we can understand the where and how, if we do not solve the question of who, the murders will continue to happen. We must find the culprit(s) and eradicate them. Unfortunately, determining the culprit can be difficult, because it is a very nuanced idea. The who could be a failure by a human or automated process, a system failure, a software failure, or a multitude of other types of failures. Consider these examples:

Prevent the Murder

The best advice is to try to avoid/prevent the crime from happening in the first place. A couple of obvious starting points to prevent the crime:

Microservice Forensics

Since we cannot always prevent the murder from happening, the good news for us is that a Microgateway can help to play the role of Hercule Poirot or Sherlock Holmes, by providing us with the following in your microservice deployment:

As ever, with any technology, there is no silver bullet, but the microgateway does provide you with a silver magnifying glass that can help you become the detective and solve the whodunit mystery of service outage!

For more microgateway info, read the other blogs in this series:

 

Exit mobile version