How to deploy Databricks in your private VNet without exposing public IP address (VNet Injection)?

Recently I came across a situation where customer wanted to deploy databricks into their own private network due to security reason.The default installation of databricks creates it’s own Virtual network and you do not have any control over it. Recently Microsoft provided the new feature of deploying the Databricks into own Private VNet (described below).

There are several limitations of this approach when you deploy Azure Databricks in it’s your own VNet..

  1. Databricks creates it’s own NSG(Network security group) which does not hold a good naming convention as per your enterprise naming convention.

2.Once databricks is deployed and you create cluster into it. You will find that it creates public IP addresses into it. This is a big security risk because some organizations does not allows public IP addreses to be part of the deployment.

So How do we solve this issue? The challenge is to deploy the databricks in Private VNet without exposing the public IP address.Here are the step by step instructions to achieve it:

  1. Create a Resource group.

2.Create a VNet and add adequate address space to make room for Databricks.

3. Now create two Network security groups. Make sure it adheres to the organization’s naming convention. These two network security groups will be attached to two subnets which we will create in subsequent steps.

4. Now create two subnets one wil be private subnet and other one will be public subnet.

  • The public subnet allows communication with the Azure Databricks control plane.
  • The private subnet allows only cluster-internal communication.

Do not deploy other Azure resources on the subnet used by your Azure Databricks workspace. Sharing the subnet with other resources, such as virtual machines, prevents managed updates to the intent policy for the subnet.

5.Assign public NSG(created in step 3) to public subnet and delegate the subnet to Microsoft.databricks/Workspace service.

6. Assign private NSG(created in step 3) to private subnet and delegate the subnet to Microsoft.databricks/Workspace service.

7. Use this Azure deployment template to deploy the databricks.Here is the template json file. You can copy this template from this link databricks/101-databricks-secure-cluster-connectivity-with-vnet-injection at master · anildwarepo/databricks (github.com) My special thanks to Anil Dwarakanath for this useful template. Here I have added some tweaks to the template for Tags.

There is another template which can be used as well and this template is linked with Microsoft documentation (https://azure.microsoft.com/en-us/resources/templates/101-databricks-all-in-one-template-for-vnet-injection/)but the issue with this template is that usually you design your network resources before deploying the Azure resources so the network resources like (VNet, subnet etc) will be designed well before Azure Databricks will be deployed but this template does not provide any flexibility for that. It creates everything during the Databricks depoyment. Also if you already created your Azure Networking resources this template will try to recreate it again.Moreover it does not solves the public IP address issue.

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
      "workspaceName": {
        "type": "string",
        "metadata": {
          "description": "The name of the Azure Databricks workspace to create."
        }
      },
	    "pricingTier": {
        "defaultValue": "premium",
        "allowedValues": [
          "trial",
          "standard",
          "premium"
        ],
        "type": "string",
        "metadata": {
          "description": "The pricing tier of workspace."
        }
      },
      "customVirtualNetworkId": {
        "type": "string",
        "metadata": {
          "description": "The complete ARM resource Id of the custom virtual network."
        }
      },
      "customPublicSubnetName": {
        "type": "string",
        "defaultValue": "databricks-public-subnet",
        "metadata": {
          "description": "The name of the public subnet in the custom VNet."
        }
      },
      "customPrivateSubnetName": {
        "type": "string",
        "defaultValue": "databricks-private-subnet",
        "metadata": {
          "description": "The name of the private subnet in the custom VNet."
        }
      },
      "location": {
        "type": "string",
        "defaultValue": "[resourceGroup().location]",
        "metadata": {
          "description": "Location for all resources."
        }
      }
    },
    "variables": {
      "managedResourceGroupId": "[subscriptionResourceId('Microsoft.Resources/resourceGroups', variables('managedResourceGroupName'))]",
      "managedResourceGroupName": "[concat('databricks-rg-', parameters('workspaceName'), '-', uniqueString(parameters('workspaceName'), resourceGroup().id))]"
    },
    "resources": [
      {
        "comments": "The resource group specified will be locked after deployment.",
        "type": "Microsoft.Databricks/workspaces",
        "apiVersion": "2018-04-01",
        "name": "[parameters('workspaceName')]",
        "location": "[parameters('location')]",
        "sku": {
          "name": "[parameters('pricingTier')]"
        },
		"tags": {
                "Application": "myApplicationName",
                "Cost Center": "111111",
                "Tier": "Test"
     },
        "properties": {
          "managedResourceGroupId": "[variables('managedResourceGroupId')]",
          "parameters": {
            "customVirtualNetworkId": {
              "value": "[parameters('customVirtualNetworkId')]"
            },
            "customPublicSubnetName": {
              "value": "[parameters('customPublicSubnetName')]"
            },
            "customPrivateSubnetName": {
              "value": "[parameters('customPrivateSubnetName')]"
            },
            "enableNoPublicIp": {
              "value": true
            }
          }
        }
      }
    ]
  }

and here is the parameter file:

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "workspaceName": {
      "value": null
    },
    "pricingTier": {
      "value": "premium"
    },
    "customVirtualNetworkId": {
      "value": null
    },
    "customPublicSubnetName": {
      "value": "databricks-public-subnet"
    },
    "customPrivateSubnetName": {
      "value": "databricks-private-subnet"
    },
    "location": {
      "value": "[resourceGroup().location]"
    }
  }
}

8. Now let’s deploy the template with the help of template deployment option in Azure and specify these values.

Virtual network ID would be the ID of the VNET created in step 2. You can get the virtual network ID from the VNet’s property.

9. Once the template is deployed you will have to create a cluster to see if it generates the public IP addresses?

You will see that there would not be any public IP address generated from the template.

I hope this will be very useful!!

4 Comments

  1. Excellent post rajaniesh. This approach addressed a major security hole in the Azure Databricks out of the box implementation. It also solved a problem we had with workspaces creating Dynamic Public IPs and not able to get a handle on the list of IPs to add our allowed IP lists when connecting to on-prem resources or Snowflake warehouses. I added an Azure NAT Gateway with a single static IP and added it to the public-subnet created with your template. Solved that issue nicely and with little cost. azure suggestion was to create a firewall with routing tables and NVA. way overkill and costly for a simple solution.
    Thank you

    Like

  2. We absolutely love your blog and find most of your post’s to be just what I’m looking for. Do you offer guest writers to write content for you personally? I wouldn’t mind producing a post or elaborating on most of the subjects you write with regards to here. Again, awesome weblog!

    Like

Leave a Reply to Luella Reedholm Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.