Deployed by leading companies such as Netflix, Expedia, Paypal, and eBay, microservices are fast becoming a popular architecture pattern. The movement towards microservices is driven primarily by the need for agility and scale in modern businesses. In a microservices architecture, services are fine-grained and protocols are lightweight thereby helping improve modularity. This makes the application easier to understand, develop, and test. It also helps it become more resilient to architecture erosion, thereby improving DevOps speed in rolling out incremental improvements in the applications

Consequences

Like all major technological advancements, there are both expected and unintended consequences of migrating to a microservices architecture. Microservices deliver strong impact areas such as scheduling, consistency, resource management, security, and updates. However, they also make the spatiotemporal behavior of the application harder to understand and manage.

From Monoliths to Microservices’ (‘Seer: Leveraging Big Data to Navigate the Increasing Complexity of Cloud Debugging.’ Hot Clouds 2018)

Performance assurance in this new environment requires a whole new approach for a number of reasons.

Performance Assurance

Microservices exhibit different performance profiles from traditional monolithic applications, in a variety of ways. Consider, for example, the impact of “the network within” – an important aspect of microservice applications.

  • Large number of thin components – Instead of smaller number of thick components, the application now comprises of a large number of thin components. They are ephemeral in many cases, change the number of instances, can exist in within multiple servers and data centers. Multiple versions can co-exist as rolling updates occur.
  • East-west communication – These components communicate frequently with each other by sending data in serial fashion and not by switching memory around. This East-West communication can cross server and data center boundaries, and cross Internet boundaries to reach other microservices appearing as North-South traffic.
  • Communication to computation ratio – The communication to computation ratio in the application can shift dramatically as the application spends significantly more time in networking, increasing to as much as 41% from 18% before[Gan et al]. Unlike monolithic applications, microservices components in the chain are performing similar amounts of communication work, which affects performance differently.

The above changes raise multiple questions around application performance. You may ask:

  • As a number of microservices communicate with other each and rely on a hierarchy of virtual machines, where is the performance bottleneck?
  • How do I know if the problem is due to a change in the application or a change in the infrastructure?
  • As workload to the application changes, how does the application behave?
  • What about the impact of a new code update on the microservices?
  • Why has a bottleneck that was there in one location yesterday moved somewhere else today?
  • How do I find the real problem when I have so many ‘alerts’?

These are the questions that keep performance and production Ops awake at night.

The Good and The Scary

The on-ramp to the cloud is open and busy. The value of using cloud is a foregone conclusion.  This is further spurred by standardization of developing and deploying cloud applications using processes, frameworks and tools that are becoming accepted. Businesses are happy that they have a new found freedom and agility.

Unfortunately, the ‘black box’ nature of containers hinders visibility into the performance of microservices, which introduces levels of complexities for those tasked with managing the user experience[1]. And this is creating more demands on Production Operations to keep applications up and running efficiently and catch up to the agility race.  They are grappling with lots of data and existing tools.

Are the current tools adequate for the job?  Is it time to build a larger war room to manage the tsunami of monitoring data? Where do we go from here?

We would love to hear your thoughts. Send a note at info@opscruise.com.

[1] Brian Solis,”As Digital Darwinism Evolves, Enterprise Organizations Must Learn New Ways To Adapt And Innovate,” Forbes, Dec. 11 2018.