Azure Kubernetes Service: Essential Best Practices for Business Continuity

In today’s digital landscape, where businesses rely heavily on cloud infrastructure and containerized applications, ensuring continuous availability and seamless operation is paramount. Azure Kubernetes Service (AKS), a managed container orchestration service by Microsoft Azure, offers a robust and scalable platform for deploying, managing, and scaling containerized applications. However, to fully leverage the power of AKS and ensure business continuity, it is crucial to follow essential best practices.

In this blog post, we will explore the key best practices that organizations should consider when using Azure Kubernetes Service. We will delve into various aspects of business continuity, focusing on strategies and techniques that can help minimize disruptions, enhance reliability, and ensure the smooth operation of containerized applications in an AKS environment. By implementing these best practices, businesses can mitigate potential risks, proactively address failures, and maximize the uptime of their applications.

Whether you are planning to migrate existing applications to AKS or embark on a new containerization journey, this guide will equip you with the knowledge and tools necessary to establish a resilient and highly available environment on Azure Kubernetes Service.

What are you waiting for? Let’s begin.

Essential Best Practices for Business Continuity in Azure Kubernetes Service

Here I am presenting the top best practices that will effortlessly keep your application running smoothly.

Best Practice 1: Deploy multiple Azure Kubernetes Services Cluster that is close to your region and using paired regions

To save your application from potential region failures, it is recommended to deploy multiple Azure Kubernetes Service (AKS) clusters that are in close proximity. But, what if one region fails? Where you should deploy your Azure Kubernetes Cluster? Here are some of the important factors you should consider:

1. AKS Region Availability: Choose regions where the AKS service is available. You can click here to check whether it is available in your region or not – Azure Products by Region | Microsoft Azure

2. Paired Regions: When setting up your AKS cluster, it is important to think about two things together: Azure Kubernetes Service region availability and paired regions. These paired regions are designed to work together in case of a major problem, disaster, or failure in one of the regions.

Consider the below image where we have paired two regions: Region 1 as East US and Region 2 as West US.

When deploying an Azure Kubernetes Service (AKS) cluster, it’s important to consider using paired regions. Why? Because in the event of a disaster or error causing a failure in Region 1 (for example, East US), having a paired Region 2 (for example, West US) ensures continuity and protection for your application.

3. Redundancy and Availability: There are three options to choose to make your application available and redundant in the AKS.

  • Hot-site is the most expensive and highest level of redundancy. It is a fully functional site that mirrors the primary site, including all hardware, software, and data. This means that in the event of a disaster, the organization can switch to the hot-site immediately with no interruption to business operations.
  • Warm-site is a less expensive option than a hot-site, but it still provides a high level of redundancy. A warm-site has all the necessary hardware and software, but the data is not mirrored on the site. This means that in the event of a disaster, the organization will need to restore the data from backups to the warm-site. However, the warm-site is still operational, so the organization can resume business operations relatively quickly.
  • Cold-site is the least expensive option for disaster recovery. A cold-site is a basic facility with the necessary infrastructure, such as power and cooling. However, the cold-site does not have any hardware or software, so the organization will need to bring its own equipment and install its own software. This means that in the event of a disaster, the organization will experience a significant amount of downtime while they get the cold-site up and running.

Let’s understand the summary of these options and how they impact the latency, cost, and Availability.

Hot/HotAlways availableLowHigh
Hot/WarmUsually availableMediumMedium
Hot/ColdNot always availableHighLow

Service 1: Hot/Hot – This is the highest level of service availability. In the hot/hot configuration, there are two identical copies of the service running in separate regions; say Region 1 and Region 2. If one region goes down, the service will continue to be available in the other region.

In this type of service, you will not lose your data. For example, if Region 1 (such as East US) experiences a failure, another paired Region 2 (such as West US) will seamlessly assume control of the application without any delay. This failover process occurs automatically and ensures the continuous operation of your application.

Service 2: Hot/Warm – This is a lower level of service availability. In the Hot/Warm configuration, there is one active copy of the service running in one region and a warm copy (It has all the necessary hardware and software, but the data is not mirrored on the site) of the service running in another region.

For example, if there is a failure in Region 1, the paired Region 2 will assume control of the application. However, there may be a delay of a few minutes before Region 2 becomes fully activated. This delay occurs because it takes some time to mirror the data and set the configurations and during this time the application is not fully accessible. However, once Region 2 is activated, it takes over the workload and ensures the continued availability and functionality of the application.

Service 3: Hot/Cold – In the Hot/Cold configuration, there is one active copy of the service running in one region and a cold copy of the service running in another region. This means if Region 1 goes down, another Region 2 cannot take over the control because Region 2 is inactive. You need to deploy, install and configure to make it active and this process will take a lot of time. There is a risk of data loss during this process. Please make sure to back up your data before proceeding.

Unlike other service types where failover is automatic, in this hot/cold configuration, manual deployment and configuration of resources in Region 2 are required. Once Region 2 is properly deployed and configured, it can be activated to take control of the workload.

Don’t forget to add CI/CD Pipeline

When deploying your application, make sure to add an additional step to your CI/CD (Continuous Integration/Continuous Deployment) pipeline to deploy the application to redundant Azure Kubernetes Service clusters available in multiple regions.

What if we do not deploy the application to multiple regions and do not keep the same copy of the application in all the regions? This will cause inconsistent copies of the application running in different regions so in case of failure fallback region will not run the application deployed in the primary region. As a result, the user won’t be able to access the most recent code updates. Therefore, it’s essential to ensure that both regions are updated simultaneously through the CI/CD pipeline to avoid any inconsistencies in code versions

Consider the below diagram that shows the process of deploying Azure Kubernetes Service with CI/CD pipeline.

Once the user saves the code to the source control repository, such as GitHub, Azure Pipelines automatically initiates the build and testing process. If the build and tests pass successfully, the code is deployed to an AKS cluster located in the East US region. Additionally, the same code is deployed to another AKS cluster situated in the West US region. This ensures that the application becomes available in both regions, allowing users from both regions to access and utilize the application.

The CI/CD pipeline plays a crucial role in deploying your application to multiple regions.

Read for more detailed knowledge – Unlocking the Full Potential of CI/CD Pipeline for Azure Kubernetes Services

Best Practice 2: Use Azure Traffic Manager to route the traffic to the application deployed in the Azure Kubernetes Service cluster

What if you have many Azure Kubernetes Service clusters in different regions? For that, you should use Traffic Manager to manage how traffic flows to the applications in each cluster. Why? Because Azure Traffic Manager works like a traffic controller and spreads the network traffic across different regions accordingly.

Also, the Traffic Manager helps you to decide where to send users based on the response time of each cluster or set a priority for routing traffic. This will ensure a smooth and efficient user experience by distributing the load among the different clusters and regions.

What if you have only one AKS cluster? For that, you can connect to the application using its service IP or DNS name. But when you have multiple Azure Kubernetes Service clusters, you need to publish the application to a Traffic Manager DNS name instead. This DNS name is set up to point to the services running on each AKS cluster.

To do this, you define these services as Traffic Manager endpoints, which are basically the load balancer IPs of each service. By configuring it this way, you can direct network traffic from one region’s Traffic Manager endpoint to the endpoint in a different region.

Consider the above image that illustrates how a DNS service can work with a traffic manager to improve the user’s experience when accessing a website or application.

  1. The first time the user sends a query for a website, such as Google, the DNS service forwards the query to the traffic manager, which checks the status and performance of all the available endpoints that host Google.
  2. The traffic manager then chooses the best endpoint for the user, based on criteria such as proximity, load, speed, and reliability. For example, if one endpoint is closer to the user’s location, has less traffic, and responds faster than others, it will be selected as the best endpoint.
  3. The traffic manager then sends back the response to the user with the IP address of the selected endpoint. The user can then access Google through that endpoint.
  4. If the user sends another query to Google, it will not go through the traffic manager again, but directly to the selected endpoint. This saves time and resources for both the user and the traffic manager.
  5. However, if the selected endpoint becomes unavailable or slow for some reason, the traffic manager will detect it and select another endpoint for the user. This way it works as a DR strategy.

Best Practice 3: Route the Application with Azure Front Door Service

Azure Front Door is a cloud-based content delivery network (CDN) service from Microsoft that helps to make your website or application accessible to users all over the world. It works like a super-fast and reliable network that connects your users to your website’s content. Whether it’s static (like images or videos) or dynamic (like interactive features), Azure Front Door ensures that your content reaches users quickly and securely.

Azure Front Door uses Microsoft Global Edge Network with hundreds of Points of Presence distributed around the world which is close to your enterprise and consumer end users.

Now, consider the diagram that illustrates a network architecture for a website that uses Azure Front Door to deliver fast, reliable, and secure content to users.

In the above image, the website “” is accessed through an Azure Front Door. Here, Azure Front Door acts as a global HTTP load balancer and failover solution that routes user requests to the best available region based on performance, availability, and routing rules. The Edge location represents the location of the user who is accessing the website hosted in the AKS.

Azure Front Door Service uses a special split TCP-based anycast protocol to quickly connect your users to the closest Front Door Point of Presence (POP). It offers several useful features:

  1. TLS termination: Azure Front Door can handle the encryption and decryption of TLS (Transport Layer Security) for your application, providing a secure connection between the users and your application.
  2. Custom domain: You can configure Azure Front Door to use your own custom domain name, making it easier for users to access your application with a familiar URL.For example instead of accessing you can access it with or any meaningful name.
  3. Web application firewall: Front Door includes a built-in web application firewall that helps protect your application from common web attacks and ensures the security of your data.
  4. URL Rewrite: This feature allows you to modify or rewrite URLs as they pass through Azure Front Door, enabling you to implement custom routing rules or redirect requests.
  5. Session affinity: Azure Front Door supports session affinity, which means that subsequent requests from the same user are directed to the same backend server, ensuring a consistent experience.

To determine the most suitable solution for your application traffic, carefully review the specific requirements of your application. Consider factors like performance, security, and customization needs to make an informed decision about using Azure Front Door Service.

Best Practice 4: Use Virtual Network Peering to Connect Virtual Networks

To allow communication between clusters, you can connect virtual networks using virtual network peering. Virtual Network Peering acts like a bridge that connects the networks together. This way, even if the clusters are in different regions, they can still communicate efficiently using Microsoft’s backbone network, which has a high bandwidth.

Before you establish virtual network peering with your AKS clusters, make sure you are using the standard Load Balancer in your AKS cluster. This step is important because it ensures that the Kubernetes services within your clusters can be accessed through virtual network peering. In simpler terms, it sets things up so that the services in your Azure Kubernetes Service clusters can communicate across the connected virtual networks.

Best Practice 5: Store container images in Azure Container Registry by enabling geo-replication for container images

When deploying and running your application on Azure Kubernetes Service, it is important to have a designated storage location for storing and retrieving the images. And the best place to store the images for AKS is “Azure Container Registry“.

The Azure container registry is a fully managed Azure service that allows you to store and manage your container images. A container image is a software package containing everything you need to run an application or service, such as code, libraries, dependencies, and configuration files. 

Container Registry is integrated with AKS, offering a useful feature called multimaster geo-replication. This feature automatically duplicates your container images to different Azure regions around the world.

Consider the above image. It shows how the images are stored in a container registry, and how you can retrieve or access that images if there is any system failure. The container registry has two regions, Region 1 and Region 2, which are geographically distributed locations that host the same images. If Region 1 fails or becomes unavailable for some reason, then Region 2 can still access or retrieve the images and serve them to the users.

Consider the above image. It shows a scenario where the images are not stored in a container registry. If Region 1 fails or becomes unavailable for some reason, then Region 2 can still access the image from the registry, but since the image is not synchronized between these container registries it will use a different version of the image than the one in East US. This can cause inconsistency and errors in your application or service.

For better performance and availability, it is recommended to use Container Registry’s geo-replication. This involves creating a registry in each region where you have an Azure Kubernetes Service cluster and deploying the images with CI/CD pipelines so all the images comply to the same version of the application.

By doing this, each Azure Kubernetes Service cluster will pull container images from its local registry in the same region. This helps reduce latency and ensures that your AKS clusters have faster access to the necessary container images.

Best Practice 6: Avoid Storing Service State Inside the Container

The Service state is a piece of important information that a service requires to do its job. It can be data that is stored in the computer’s memory or on the computer’s hard drive. This includes things like the data structures and variables that the service uses to perform its tasks.

Consider the below diagram which will give you the proper understanding that why to avoid storing the service state inside the container rather than inside the database:

The application has two components: a web server and a database. The web server is responsible for handling requests from clients and the database stores the application’s state.

You can see in the diagram that the state of the application is stored in the database, not in a container. Why? because if the web server container crashes, the database will still be available. The web server can be recreated, and it will still be able to connect to the database and resume serving requests.

What if you store the service state inside the container?

Storing service state inside the container is not recommended because it has some drawbacks, such as:

  • The data doesn’t persist when the container stops or is deleted, and it can be difficult to get the data out of the container if another process needs it.
  • The data is tightly coupled to the host machine where the container is running, and you can’t easily move it somewhere else.
  • Writing to the container’s writable layer requires a storage driver to manage the filesystem, which reduces performance as compared to using data volumes.

Where to store the service state?

Instead, you should use an Azure platform as a service (PaaS) that supports multi-region replication, such as Azure Cosmos DB, Azure SQL Database, or Azure Storage. These services offer benefits such as:

  • High availability and durability of your data across multiple regions
  • Automatic failover and load balancing of your requests
  • Consistent and scalable performance for your applications
  • Encryption at rest and in transit for your data

Best Practice 7: Create a Storage Migration Plan

If you are using Azure Storage for your applications’ data and you have multiple Azure Kubernetes Service clusters in different regions, it’s important to ensure that your storage is synchronized. This means keeping the data consistent across all regions.

There are two common ways to replicate your storage:

  1. Infrastructure-based asynchronous replication: Asynchronously replicate the data from primary regions to the secondary regions by setting infrastructure-level replication mechanisms provided by Azure.
  2. Application-based asynchronous replication: Design your application in such as way that it handles the replication of data across regions. This itself manages the synchronization of data between primary and backup regions.

Infrastructure-based Asynchronous Replication

Sometimes, your applications need to store data that should persist even if a pod is deleted. In Kubernetes, you can achieve this by using persistent volumes. Persistent volumes act as storage units that can be attached to a virtual machine (VM) in a cluster. They are then made available to the pods running on those VMs.

Persistent volumes always stay connected to the pods even if the pods are moved to different VMs within the same cluster. This ensures that your data remains accessible and preserved throughout the lifecycle of your application.

Different storage solutions, such as Gluster, Ceph, Rook, and Portworx, offer their own recommendations for disaster recovery and data replication.

Consider the image above, where you set up a shared storage location where applications can write their data. This data is then replicated across different regions, ensuring it is available even if there is an issue in one region. When accessing the data, applications can access the data locally, Which helps them deliver fast and efficient access to the replicated data.

Application-based asynchronous replication:

Currently, Kubernetes does not have a built-in way to perform application-based asynchronous replication. However, because containers and Kubernetes are designed to work with various applications and programming languages, you can use traditional methods to achieve replication.

In this case, the applications themselves handle the replication of storage requests. This means that when data needs to be replicated, the applications are responsible for writing the data to the underlying data storage in each cluster.

By using the flexibility and extensibility of containers and Kubernetes, you can implement replication logic within your applications using the programming languages and techniques that work best for your specific needs. This allows you to replicate data across clusters, ensuring that it is consistently stored in each cluster’s underlying data storage.


Azure Kubernetes Service offers a powerful platform for deploying and managing containerized applications at scale. By adhering to the essential best practices outlined in this blog post, businesses can ensure business continuity, minimize disruptions, and maximize the availability and reliability of their applications in an AKS environment. As organizations increasingly embrace cloud-native architectures, adopting these best practices becomes vital for achieving seamless operations and maintaining a competitive edge in today’s digital landscape. By implementing these practices, organizations can minimize downtime, increase application availability, enhance security, and protect critical data and services, thereby ensuring uninterrupted business operations and delivering a seamless experience to their customers.

+ There are no comments

Add yours

Leave a Reply