Recently I have came across a requirement to design the Azure landing zone for a customer who wants to migrate their workloads from on-premise to Azure. This article explains the best practices implemented in Azure landing zone design. I have divided it into multiple Azure areas:
- Azure foundational components
- Identity and access management
- Networking and connectivity design
- Security and Compliance design
Azure foundational components
Azure provides four levels of scope: Management groups, Subscriptions, Resource groups and Resources: While desiginig your architecture you apply management settings at any of these four levels of scope. The level you select determines how widely the setting is applied. Lower levels inherit settings from higher levels. For example, when you apply a policy to the subscription, the policy is applied to all resource groups and resources in your subscription. When you apply a policy on the resource group, that policy is applied to the resource group and all its resources. However, another resource group doesn’t have that policy assignment.
If your organization has several subscriptions, you may need a way to efficiently manage access, policies, and compliance for those subscriptions. Azure management groups provide a level of scope above subscriptions. You organize subscriptions into containers called management groups and apply your governance conditions to the management groups. Management group enable:
- Organizational alignment for your Azure subscriptions through custom hierarchies and grouping.
- Targeting of azure policies and spend budgets across subscriptions and inheritance down the hierarchies.
- Compliance and cost reporting by organization (business/teams).
BEST PRACTICE: The architecture should use multi subscriptions model which is segregated between Production, management, and non-production subscriptions to align with the best practices. Move your specific workload into their respective subscriptions. Avoid moving all your workloads into one subscription.
BEST PRACTICE: Always have an intermediate root management group between Tenant root and other management groups and have a landing zone, sandbox/non-production management group underneath it. Organization level governance and policies can be scoped at intermediate management group which is applicable to all subscriptions. Then there may be different policies at each hierarchy that are specific to workload and for required governance. This way you will not alter the main root Management group at the top level. For example.
BEST PRACTICE: Always tag your Azure resources to have a better organization around management hierarchy and billing for example creating environment tag and putting values of Dev, Prod, QA. This can be used for billing or classifying the environment differently. Also keep in mind that
- Each resource or resource group can have a maximum of 50 tag name/value pairs.
- Tags applied to the resource group are not inherited by the resources in that resource group
A common concern with resources provisioned in Azure is the ease with which they can be deleted. A careless administrator can accidentally erase months of work with a few steps. Azure Resource Manager locks allow organizations to put a structure in place that prevents the accidental deletion of resources in Azure.
- You can associate the lock with a subscription, resource group, or resource.
- Locks are inherited by child resources.
BEST PRACTICE: Consider using Resource locks wherever applicable. As a best practice you need to lock subscription, resource groups and resources to prevent accidental deletion or modification of the critical resource.
BEST PRACTICE: Create multiple resource groups to organize the resource which aligns to the best practices for example create dev resource group for dev resources and Prod resource group for prod resources.
Identity and access management:
BEST PRACTICE: Always plan your Domain controller and DNS (for name resolution strategy) placement in azure when you are planning for hybrid environment. Once you plan your disaster recovery or plan for workload migration to Azure, VMs will be required to use DC and DNS servers planced in Azure. Avoid placing them in the On-premise because that can cause latency issues. Moreover the role based access control planned in on-premise need to be replicated in the migrated workload so we need to place DC and DNS VMs in the Azure and sync the Azure AD with On-premise DC.
BEST PRACTICE : Place DNS, SMTP, Authentication server (for example Radius Authentication server), DHCP and DC servers in the trusted zone (saparate Trusted Subnet) in Azure and provide the connectivity to all the VMs via Route Table.Putting these servers directly with the production workloads is not a good idea for security.
Access management for cloud resources is a critical function for any organization that is using the cloud. Role-based access control (RBAC) helps you manage who has access to Azure resources, what they can do with those resources, and what areas they have access to.
BEST PRACTICE: It is recommended to use Azure Role Based Access Control for managing the subscription and resources and document those groups/roles/scope in your architecture.
BEST PRACTICE : Always configure Azure AD Privileged Identity management for high privilege accounts to make sure zero standing access and least privilege for the accounts. It can help to avoid:
- a malicious actor getting access.
- an authorized user inadvertently impacting a sensitive resource.
Azure PIM provides time-based and approval-based role activation to mitigate the risks of excessive, unnecessary, or misused access permissions on resources. Key features of Privileged Identity Management:
- Provides just-in-time and time-bound (start and end dates) privileged access to Azure AD and Azure resources
- Approval based access to activate privileged roles
- Enforce multi-factor authentication to activate RBAC roles
- Justification based access to understand why users needs particular role
- Get notified when privileged roles are activated
- Conduct access audit reviews to ensure users still need roles for his work to complete
- Download audit history for internal or external auditing purpose
- Prevents accidental removal of the last active Global Administrator role assignment
BEST PRACTICE : Always use Azure multi factor authentication for any users who have access to Azure environment.
BEST PRACTICE Create an Emergency access account which is limited to emergency or “break glass“‘ scenarios where normal administrative accounts can’t be used. It is recommend that you maintain a goal of restricting emergency account use to only the times when it is absolutely necessary.Emergency Access account or Break Glass Account is required in case your user are unable to authenticate due to MFA or some other reason.
Networking and connectivity:
BEST PRACTICE : While implementing site to site VPN use a saparate gateway one for each partner ‘s site to site connectivity. Leveraging the same VPN gateway to terminate partner’s site to site connection and customer’s S2S connectivity could lead to transitive connection between parner and customer network.It is recomended to isolate the network traffic for customer and partner’s traffic to avoid security risk. For example this is a wrong design.
The correct design would be to create a saparate VNet and saparate gateway subnet inside the Vnet and create a S2S connectivity for Partner from it. Remember you can create only one gateway subnet inside a Vnet so you need to saparate Vnet and subnet for partner S2S connectivity.
BEST PRACTICE Always plan for highly available cross premise connectivity. There are three options avaiilable here:
- Multiple VPN device at onpremise VPN device for on-premise device redundancy.
2. Active Active Azure VPN Gateway for Azure Gateway Redundancy.
3.Dual Redundancy Active Active both side for redundancy at customer side and Azure Side.
BEST PRACTICE If you can not implement the above configurations due to cost concerns then at least use VPN Gateway SKU which supports zone redundancy for higher resilience to zone level failures. These SKUs have AZ as a suffix. Here is the list:
BEST PRACTICE: Never use common Firewall for traffic that originates from internet and on-premises. Using only one set of firewalls for both is a security risk because it provides no perimeter between the two sets of networks. Using separate firewall layers reduces the complexity of checking security rules and makes it clear which rules correspond to which incoming network requests. So always Plan for the traffic isolation that originates from internet and on premises.
The below design is a good design because it segregate the internet traffic and internal traffic.
BEST PRACTICE Always make sure enough IP address is planned for scaling and future requirements while you plan address space for your VNet. For example if you have a scalability requirements that you will be adding 2000 servers in near future and you kept your address space of the VNet is 10.121.0.0/26 which can provide you 64 maximum number of hosts in this address space.So your address space design is wrong you should keep it something like 10.121.0.0/21 which provides 2048 hosts in the address space.
BEST PRACTICE : Always keep the management jumpboxes in the saparate subnet and if you are using hub and spoke Architecture ensure that the correct routing table and connectivity is establised between the spokes and Hub.
BEST PRACTICE: To avoid Distributed Denial of service use Azure DDoS Protection Standard protection plans to help protect all public endpoints (Web Applications hosted in the VMs and accessible from internet) hosted within the virtual networks.
BEST PRACTICE: Even though this is not a network design recoomendation but I would like to mention it here that do not your preview features in Azure for example at the time of writting this artcile Azure defender is in preview. Also make sure to check the feature availability in the region where you are going to deply your workload for example if Azure Firewall premium is not available inSouth Indian region you can not use south india region for Azure work,oad if your design mandates Azure firewwall premium.
BEST PRACTICE: Ensure the DR strategy that you are designing suffice the RPO and RTO requirement and they are clearly documented in your Architecture.You should have an automation in place to failover from Primary site to secondary site.Also make sure that the technologies used in Azure supports your business requirements.
BEST PRACTICE: Use monitoring tools like Datadog or ScienceLogic (Agentless monitoring) for monitoring the VMs and Applications.If youa re not using these external tools please make sure to use Azure native monitoring and collecting these logs for better alerting and correlation.
BEST PRACTICE Make sure you are fully aware of SLA for different Azure services used in your architecture, and the compound SLA meets the application SLA requirement.For example if Azure site recovery takes minum 15 minutes to restore the VMs but your recovert time objective is 10 minutes it will not help to use Azure site recovery.
BEST PRACTICE: Always use deployment automation practice (DevOps/IaC) and use infrastructure as a code wherever possible to deploy the infrastructure.
Security and compliance
BEST PRACTICE Always deploy Azure Policy for better security and governance. There are list of built-in Azure Policies which can be used in the environment.
BEST PRACTICE :Use antimalware protections for Azure VMs,
BEST PRACTICE: Always Use Azure Security Center( a security posture management tool) to monitor the security landscape. It is recommended to use Azure security center enabled with Azure Defender instead of using free version of it. Defender enabled Azure security centre provides some critical security features to secure your workloads in Azure.
BEST PRACTICE: Plan your Architecture to emable patching and update management solutions for example writting ansible playbook for Patching.You can also write Powershell Patch Automation playbooks in Azure Automation account.
I hope this Article will be useful while desiging Azure landing zone.