Few years back, where businesses found elasticity and swiftness as a prime use case to move to cloud, automation was a real boom. Users liked the way virtual machines or workloads could be provisioned using a self service portal and couldn't be happier after getting their underlying infrastructure ready within minutes as compared to waiting for weeks before they moved out of traditional IT. But things changed quite rapidly.
Microservices and associated challenges?
Over past few years, cloud markets have exploded and the reason behind that is instead of pure hosting and auto provisioning, cloud platform offers much more today. With advent of microservices, an application can be broken down into much smaller pieces of code which are loosely coupled to work as a complete ecosystem and cater to funtional and non-functional requirements. Those pieces are ultra light weight and can be placed on cloud platforms to leverage underlying elasticity to make an application auto scalable and self healing.
Moreover, one of the main reasons why microservices with containers are on everyone's table is it's capability to make your applications portable. With modern trend of heterogeniety where people are looking to utilize hybrid clouds to maintain control while leveraging cloud benefits, such portablilty of applications is a must requirement. Think about if application can be developed on a public cloud in an isolated development environment and ported back easily to private cloud where it undergoes code merges, production lifecycle events and gets published using private backend and exposed to the audience.
Today, such cloud native applications have optimised the way of writing applications but increased the operational complexity in the process. With dynamic abilities of portability & auto scaling of applications and increase in stakeholders involved in building a single application (multiple developers developing mutiple microservices independently), it is imperative to keep track if everything is working as expected in each microservice, and more importantly if the interactions amongst those microservices are happening smoothly and as expected. If an application is not responding as expected, it makes huge impact if you are unable to trace down the issue rapidly if not immediately.
So here are some of the questions which arises when application devides to have a coffee with microservices architecture:
· How to observe a system as whole ? How do I maintain if microservices architecture is working as expected ? All services are functioning and talking to each other in a required manner ?
· In case of failure or issue, how to detect the cause ? There are multiple smaller codes of application running in different containers, where one could impact a complete system, how do I trace it down in case of an issue ?
· In case of failure of one of the services, how do I maintain minimum impact and avoid cascading effect since there are lot of dependencies between services meaning one can easily impact other ?
· How can I test against failures considering there are a lot of microservices interacting with each other making it a complex eco-system?
· How do I control how an application scales under such circumstances ?
To a rescue of above mentioned challenges, a Service Mesh, which essentially observes and handles inter-service netwoking in a microserviced architecture, brings in a lot of flexibility when it comes to service to service networking by making communication between service instances optimised, secure, reliable and fast. On a high level, Service Mesh provides following capabilities:
Service Registry: Service Mesh maintains a dynamic database of all the services, service instances, their condition and location so that every time an instance is powered on / off / modified / goes bad / comes back up, gets registered in the database.
Service Discovery: Service Mesh maintains the state of instances. A list of available or healthy service instances is maintained by the service mesh which is essential to make load distribued across the available service instances, meaning it's an inbuit load balancing logic.
Intellegent Routing: Service Mesh controls the traffic between services using dynamic route configuration, meaning it opens / allows the communication between different services on the most optimized route which optimises paths across services.
Encryption: The service mesh offloads the encryption overhead from services or applications and performes encryption of requests and responses between different services.
Authentication: Similar to Encryption, a Service Mesh offloads authentication overhead from respective services.
Circuit Breaker Pattern: If a microservice fails, then Service Mesh opens a circuit breaker, meaning it isolates faulty instances, and applies a health logic to bring the instance back again once it is brought back up.
Sidecar proxy: The service mesh uses a proxy instance, called a sidecar, for each service instance which are resposible for inter‑service communications and functionalities such as monitoring and encryption.
Istio, backed by Google, IBM, and Lyft, is currently the best‑known service mesh architecture. Kubernetes, which was originally designed by Google, is currently the only container orchestration framework supported by Istio.
What is Istio?
Istio is an open source service mesh project with number of contibutors majorly Google, Lyft, Red Hat and IBM
Istio provides powerful service mesh features which helps achieving required granularity into the health insight of all connected services in a microserviced architecture. Istio's role, as a service mesh, makes it the ideal data source for observability information, particularly in a microservices environment which helps solving the issues associated with microservices architecture.
Istio provides distributed tracing functionality which helps achieving the required transparency into what's going on under the covers. It makes is easy to manage network of deployed services by providing following features:
Transparency of traffic behavior with intellegent routing, awareness of retries, failovers, and fault injection.
Automatic Load Balancing.
Configurable APIs supporting access controls, rate limits and quotas.
Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress.
Secure service-to-service communication in a cluster with strong identity-based authentication and authorization.
As described in the earlier section, a service mesh uses a sidecar proxy. In the case of Istio, Envoy, a high performance distributed proxy system, originally built by Lyft is being used as a proxy sidecar.
Envoy mediates all inbound and outbound traffic for all services in the service mesh. Istio leverages Envoy’s many built-in features, for example:
- Dynamic service discovery
- Load balancing
- TLS termination
- HTTP/2 and gRPC proxies
- Circuit breakers
- Health checks
- Staged rollouts with %-based traffic split
- Fault injection
- Rich metrics
Sidecar deployment of Envoy allows Istio to extract traffic insights and behavior which are essential for a service mesh functionality. With the help of such insights and behaviour analytics, monitoring systems are able to provide information about the behavior of the entire mesh which solves the challenges with takes birth along with microservices architecture