Azure landing zone design best practices


Recently I have come across a requirement to design the Azure landing zone for a customer who wants to migrate their workloads from on-premise to Azure. This article explains the best practices implemented in Azure landing zone design. I have divided it into multiple Azure areas:

Azure foundational components

Subscriptions design:

Azure provides four levels of scope: Management groups, Subscriptions, Resource groups, and Resources: While designing your architecture you apply management settings at any of these four levels of scope. The level you select determines how widely the setting is applied. Lower levels inherit settings from higher levels. For example, when you apply a policy to the subscription, the policy is applied to all resource groups and resources in your subscription. When you apply a policy to the resource group, that policy is applied to the resource group and all its resources. However, another resource group doesn’t have that policy assignment.

If your organization has several subscriptions, you may need a way to efficiently manage access, policies, and compliance for those subscriptions. Azure management groups provide a level of scope above subscriptions. You organize subscriptions into containers called management groups and apply your governance conditions to the management groups. Management group enables:

  • Organizational alignment for your Azure subscriptions through custom hierarchies and grouping.
  • Targeting of azure policies and spending budgets across subscriptions and inheritance down the hierarchies.
  • Compliance and cost reporting by the organization (business/teams).
Example Management group in a corporate environment

BEST PRACTICE: The architecture should use multi subscriptions model which is segregated between Production, management, and non-production subscriptions to align with the best practices. Move your specific workload into their respective subscriptions. Avoid moving all your workloads into one subscription.

BEST PRACTICE: Always have an intermediate root management group between Tenant root and other management groups and have a landing zone, sandbox/non-production management group underneath it. Organization-level governance and policies can be scoped at the intermediate management group which is applicable to all subscriptions. Then there may be different policies in each hierarchy that are specific to workload and required governance. This way you will not alter the main root Management group at the top level. For example.

Azure tags:

BEST PRACTICE: Always tag your Azure resources to have a better organization around management hierarchy and billing for example creating an environment tag and putting values of Dev, Prod, and QA. This can be used for billing or classifying the environment differently. Also, keep in mind that

  1. Each resource or resource group can have a maximum of 50 tag name/value pairs.
  2. Tags applied to the resource group are not inherited by the resources in that resource group

Resource locks:

A common concern with resources provisioned in Azure is the ease with which they can be deleted. A careless administrator can accidentally erase months of work with a few steps. Azure Resource Manager locks allow organizations to put a structure in place that prevents the accidental deletion of resources in Azure.

  1. You can associate the lock with a subscription, resource group, or resource.
  2. Locks are inherited by child resources.

BEST PRACTICE: Consider using Resource locks wherever applicable. As a best practice, you need to lock subscriptions, resource groups, and resources to prevent accidental deletion or modification of the critical resource.

Resource groups:

BEST PRACTICE: Create multiple resource groups to organize the resource which aligns with the best practices, for example, create a dev resource group for dev resources and a Prod resource group for prod resources.

Identity and access management:

BEST PRACTICE: Always plan your Domain controller and DNS (for name resolution strategy) placement in azure when you are planning for a hybrid environment. Once you plan your disaster recovery or plan for workload migration to Azure, VMs will be required to use DC and DNS servers planned in Azure. Avoid placing them On-premise because that can cause latency issues. Moreover, the role-based access control planned on the on-premise need to be replicated in the migrated workload so we need to place DC and DNS VMs in the Azure and sync the Azure AD with On-premise DC.

BEST PRACTICE: Place DNS, SMTP, Authentication server (for example Radius Authentication server), DHCP, and DC servers in the trusted zone (separate Trusted Subnet) in Azure and provide the connectivity to all the VMs via Route Table. Putting these servers directly with the production workloads is not a good idea for security.

Access management for cloud resources is a critical function for any organization that is using the cloud. Role-based access control (RBAC) helps you manage who has access to Azure resources, what they can do with those resources, and what areas they have access to.

BEST PRACTICE: It is recommended to use Azure Role Based Access Control for managing the subscription and resources and document those groups/roles/scope in your architecture.

BEST PRACTICE: Always configure Azure AD Privileged Identity Management for high-privilege accounts to make sure zero standing access and least privilege for the accounts. It can help to avoid:

  • a malicious actor getting access.
  • an authorized user inadvertently impacting a sensitive resource.

Azure PIM provides time-based and approval-based role activation to mitigate the risks of excessive, unnecessary, or misused access permissions on resources. Key features of Privileged Identity Management:

  • Provides just-in-time and time-bound (start and end dates) privileged access to Azure AD and Azure resources
  • Approval-based access to activate privileged roles
  • Enforce multi-factor authentication to activate RBAC roles
  • Justification-based access to understand why users need a particular role
  • Get notified when privileged roles are activated
  • Conduct access audit reviews to ensure users still need roles for their work to complete
  • Download audit history for internal or external auditing purposes
  • Prevents accidental removal of the last active Global Administrator role assignment

BEST PRACTICE: Always use Azure multi-factor authentication for any users who have access to the Azure environment.

BEST PRACTICE Create an Emergency access account that is limited to emergency or “break glass“‘ scenarios where normal administrative accounts can’t be used. It is recommended that you maintain a goal of restricting emergency account use to only the times when it is absolutely necessary. An emergency Access account or break glass account is required in case your user is unable to authenticate due to MFA or some other reason.

Networking and connectivity:

BEST PRACTICE: While implementing site-to-site VPN use a separate gateway for each partner’s site-to-site connectivity. Leveraging the same VPN gateway to terminate the partner’s site-to-site connection and customer’s S2S connectivity could lead to a transitive connection between the partner and customer network. It is recommended to isolate the network traffic for customer and partner traffic to avoid security risks. For example, this is the wrong design.

The correct design would be to create a separate VNet and separate gateway subnet inside the Vnet and create S2S connectivity for Partner from it. Remember you can create only one gateway subnet inside a Vnet so you need to separate Vnet and subnet for partner S2S connectivity.

BEST PRACTICE Always plan for highly available cross-premises connectivity. There are three options available here:

  1. Multiple VPN devices at on-premise VPN devices for on-premise device redundancy.
Diagram shows multiple on-premises sites with private I P subnets and on-premises V P N connected to an active Azure V P N gateway to connect to subnets hosted in Azure, with a standby gateway available.

2. Active Active Azure VPN Gateway for Azure Gateway Redundancy.

Diagram shows an on-premises site with private I P subnets and on-premises V P N connected to two active Azure V P N gateway to connect to subnets hosted in Azure.

3. Dual Redundancy Active Active on both sides for redundancy at the customer side and Azure Side.

Diagram shows a Dual Redundancy scenario.

BEST PRACTICE If you can not implement the above configurations due to cost concerns then at least use VPN Gateway SKU which supports zone redundancy for higher resilience to zone-level failures. These SKUs have AZ as a suffix. Here is the list:

BEST PRACTICE: Never use a common Firewall for traffic that originates from the internet and on-premises. Using only one set of firewalls for both is a security risk because it provides no perimeter between the two sets of networks. Using separate firewall layers reduces the complexity of checking security rules and makes it clear which rules correspond to which incoming network requests. So always Plan for the traffic isolation that originates from the internet and on-premises.

The below design is a good design because it segregates the internet traffic and internal traffic.

BEST PRACTICE Always make sure enough IP address is planned for scaling and future requirements while you plan address space for your VNet. For example, if you have a scalability requirement that you will be adding 2000 servers in near future and you kept your address space of the VNet is 10.121.0.0/26 which can provide you 64 maximum number of hosts in this address space. So your address space design is wrong you should keep it something like 10.121.0.0/21 which provides 2048 hosts in the address space.

BEST PRACTICE: Always keep the management jump boxes in the separate subnet and if you are using a hub and spoke architecture ensure that the correct routing table and connectivity are established between the spokes and Hub.

BEST PRACTICE: To avoid Distributed Denial of service use Azure DDoS Protection Standard protection plans to help protect all public endpoints (Web Applications hosted in the VMs and accessible from the internet) hosted within the virtual networks.

BEST PRACTICE: Even though this is not a network design recommendation but I would like to mention here that do not your preview features in Azure for example at the time of writing this article Azure defender is in preview. Also, make sure to check the feature available in the region where you are going to deploy your workload for example if Azure Firewall premium is not available in the south Indian region you can not use the south India region for Azure workload if your design mandates Azure firewall premium.

Operations

BEST PRACTICE: Ensure the DR strategy that you are designing suffice the RPO and RTO requirement and they are clearly documented in your Architecture. You should have automation in place to failover from the primary site to the secondary site. Also, make sure that the technologies used in Azure support your business requirements.

BEST PRACTICE: Use monitoring tools like Datadog or ScienceLogic (Agentless monitoring) for monitoring the VMs and Applications. If you are not using these external tools please make sure to use Azure native monitoring and collect these logs for better alerting and correlation.

BEST PRACTICE Make sure you are fully aware of SLA for different Azure services used in your architecture, and the compound SLA meets the application SLA requirement. For example, if Azure site recovery takes a minimum of 15 minutes to restore the VMs but your recovery time objective is 10 minutes it will not help to use Azure site recovery.

BEST PRACTICE: Always use deployment automation practice (DevOps/IaC) and use infrastructure as a code wherever possible to deploy the infrastructure.

Security and Compliance

BEST PRACTICE Always deploy Azure Policy for better security and governance. There is a list of built-in Azure Policies which can be used in the environment.

BEST PRACTICE: Use antimalware protections for Azure VMs,

BEST PRACTICE: Always Use Azure Security Center( a security posture management tool) to monitor the security landscape. It is recommended to use Azure security center enabled with Azure Defender instead of using the free version of it. Defender-enabled Azure security center provides some critical security features to secure your workloads in Azure.

BEST PRACTICE: Plan your Architecture to enable patching and update management solutions for example writing an ansible playbook for Patching. You can also write Powershell Patch Automation playbooks in the Azure Automation account.

I hope this article will be useful while designing the Azure landing zone.

6 Comments

Add yours
  1. 1
    Rajkumar

    Hi sir,
    Thanks for sharing such a valuable inputs.i was searching from long time such a notes.
    Surely will help many for understanding the azure cloud

Leave a Reply