Exciting news! The Databricks CLI has undergone a remarkable transformation, becoming a full-blown revolution. Now, it covers all Databricks REST API operations and supports every Databricks authentication type. The best part? Windows users can join in on the exhilarating journey and install the new CLI with Homebrew, just like macOS and Linux users.
This blog aims to provide comprehensive guidance on Databricks CLI, covering installation instructions, authentication setup, and practical examples of various Databricks CLI commands. By the end of this blog, you will have a clear understanding of how to utilize Databricks CLI effectively in your workflow.
No more exclusion; it’s a unified experience across platforms! Brace yourself for a smoother, more efficient, and more powerful workflow, as the Databricks CLI with Homebrew on Windows propels you into the future of data-driven excellence! 🚀🌟
Table of Contents
What is Databricks CLI?
The Databricks CLI is a command-line interface tool that allows you to automate the Databricks platform from your terminal, command prompt, or automation scripts. It is built on top of the Databricks REST APIs and implements the Databricks client unified authentication standard, which protects user and business data.
Some of the benefits of Databricks CLI are:
- You can use the CLI to run custom commands that are not available in the web UI.
- Can automate tasks such as creating and managing clusters of any size.
- Use Databricks CLI on Linux, Windows, and MacOS operating systems.
- You can save time and avoid switching between multiple browser tabs and workspaces.
How to Install
To install Databricks CLI on Windows, follow these simple steps:
STEP 1: Go to the GitHub repository of Databricks CLI.
STEP 2: In the “Releases” section, locate the correct .zip file for your machine’s operating system and architecture. Download it.
STEP 3: Follow your operating system’s documentation to extract the contents of the downloaded .zip file. This process may involve using a built-in utility or a third-party application to unzip the file.
STEP 4: After extraction, you will see a folder with the same name as the .zip file.
STEP 5: Open the folder, and inside, you’ll find the Databricks CLI executable file.
STEP 6: At this point, you have a choice: you can either keep the Databricks CLI executable in this folder or move/copy it to another location on your computer for easier access. You should add the folder path to the environment variable so Databicks CLI commands can be run from anywhere.
STEP 7: Open a terminal or command prompt, navigate to the location where the Databricks CLI executable is located, and run the command:
databricks -v # Or databricks version
databricks -v: Display the version information.
Alternatively, you can run the
databricks version command in the terminal or command prompt to get the version information.
To install Databricks CLI on MacOS and Linux, you have the option to use Homebrew. For easy installation, click the link – Install Databricks CLI on MacOS and Linux.
How to set up the authentication
In this section, you’ll find a step-by-step guide to authenticate using the Databricks personal access token authentication method, widely recognized for its security and reliability.
Databricks Personal Access Token Authentication
A Databricks Personal Access Token is a long-lived token that is generated by a user. It can be used to authenticate to Databricks APIs and to access Databricks notebooks. Databricks personal access token authentication uses a Databricks personal access token to verify the identity of the desired Databricks entity, be it a Databricks user account or a Databricks service principal.
You need to follow the two main steps to set up databricks Personal Access Token Authentication:
STEP 1: Create an Access Token
Below are the steps to create a Databricks personal access token for a Databricks user:
- Navigate to your Databricks workspace and click on your Databricks username in the top bar. From the dropdown menu, select “User Settings.“
- In the “User Settings” page, go to the “Access tokens” tab and click on “Generate new token.”
- Optionally, you can provide a comment to help identify this token in the future and adjust the token’s default lifetime (which is set to 90 days). If you prefer a token with no lifetime (not recommended), simply leave the “Lifetime (days)” box empty.
- Click on “Generate” to create the personal access token.
- The newly generated token will be displayed. Make sure to copy it for later use, and then click “Done” to complete the process.
Remember, if you lose the token, you will have to generate a new one. You cannot retrieve the old token. Also, tokens provide full access to Databricks APIs, so they should be kept secure.
STEP 2: Create Configuration Profile
A configuration profile refers to a set of settings that includes authentication details like the Databricks workspace URL and access token value. Each configuration profile is assigned a programmatic name, such as “
DEV,” or “
PROD,” to distinguish and manage various setups efficiently.
To create a configuration profile, run the following command:
databricks configure --host <workspace-url> --profile <configuration-profile-name>
After that, enter the Databricks Token and it will create an entry in the .databrickscfg file.
databricks token = "Enter Your Token here which you created in previous step"
You can manually create a configuration profile by using your favorite text editor to create a file named
.databrickscfg in your
~ (your user home) folder on Unix, Linux, or macOS, or your
%USERPROFILE% (your user home) folder on Windows, if you do not already have one. Do not forget the dot (
.) at the beginning of the file name. Add the following contents to this file:
[configuration-profile-name] host = https://adb-1236286498149800.0.azuredatabricks.net token = "Enter Your Token here which you created in previous step"
hostfield is the workspace URL of your Databricks workspace.
tokenfield is the value of your Databricks personal access token.
The host field has a unique per-workspace URL, which is like the below format
The workspace ID is the unique identifier for the workspace. It appears immediately after the
adb- prefix and before the
. (dot). For example, if your workspace name is “adb-1236286498149800.0,” the URL would look like this: “https://adb-1236286498149800.0.azuredatabricks.net“.
You can determine the pre-workspace URL for your workspace in two ways:
- When you are logged in:
- In the top bar of the Databricks UI, click on your username.
- From the dropdown menu, select “Workspaces“.
- The pre-workspace URL will be displayed in the URL bar of your web browser.
- By selecting the resource:
- In the Databricks UI, click on the “Resources” tab.
- In the “Workspaces” section, select the workspace that you want to determine the pre-workspace URL.
- The pre-workspace URL will be displayed in the URL field of the workspace details page.
Once you have created a configuration profile, you can use it to authenticate to Databricks in your code. The specific code that you need to use will vary depending on the tool or SDK that you are using.
Databricks CLI commands
Databricks CLI commands are designed to simplify various tasks and enable easy interactions with Databricks. These commands can be categorized into two types:
- Group Flags
- Command groups.
Group Flags are specific options that can be applied to multiple Databricks CLI commands. They allow you to customize the behavior of commands according to your requirements. These flags are typically used as modifiers for the commands, and they are specified as options when invoking the commands. Group Flags are helpful for setting global options that affect multiple commands consistently.
--profile flag will specify the profile to use for authentication, and
--output flag specifies the format in which the output should be displayed.
Here is an example of a command that uses the
databricks sql --query "SELECT * FROM my_table" --profile my-profile --output json
This command will run the SQL command
SELECT * FROM my_table on the Databricks SQL warehouse, using the profile
my-profile and displaying the results in JSON format.
For example, the
-d flag can be used to specify the Databricks URL.
databricks sql --query "SELECT * FROM my_table" -d https://my-databricks-url-as-described-above
This command will run the SQL command
SELECT * FROM my_table on the Databricks SQL warehouse, using the Databricks URL
https://my-databricks-url (As described above).
Here is the list of some of the Global flags in the Databricks CLI:
|-o||to write the output as a test or as JSON|
Command Groups are organized based on the functionality or task they perform. It allows you to access multiple related commands under a single namespace, making it easier to remember and use them efficiently. Here is the list:
- Cluster – Create, Manage, and delete clusters
- Notebook – Run, manage, and share notebooks
- Data – Load, unload, and manage data
- Job – Submit, monitor, and cancel jobs
- User – Manage users and groups
- Role – Manage role and permissions
1. Cluster Command: Consider the below example where we will use the Cluster command to create a new cluster, get the list of the clusters, and delete a cluster.
# Create a new cluster named my-cluster with 4 workers of type standard_D2_V2 databricks clusters create --cluster-name my-cluster --node-type-id Standard_D2_v2 --num-workers 4 # List all of the clusters in your Databricks account. databricks clusters list # Delete the cluster with the ID `my_cluster`. databricks clusters delete --cluster-id my-cluster
A List of other cluster commands are:
db cluster describe– to describe a cluster
db cluster resize– to resize a cluster
db cluster start– to start a cluster
db cluster stop– to stop a cluster
Describe a cluster:
databricks clusters describe --cluster-id my-cluster
This command will describe the cluster with the ID
Resize a cluster:
databricks clusters resize --cluster-id my-cluster --num-workers 8
This command will resize the cluster with the ID
my-cluster to have 8 workers.
Start a cluster:
databricks clusters start --cluster-id my-cluster
This command will start the cluster with the ID
Stop a cluster:
databricks clusters stop --cluster-id my-cluster
For more information on the Databricks clusters command, you can run the following command
databricks clusters --help
2. Notebook Command: Here is an example where we will use the Notebook command to run a new notebook, get the list of the notebooks, and delete a notebook.
# Create new notebook named my-notebook in the Python language databricks notebook create --notebook-path my-notebook --language python # Run a notebook name my-cluster on a cluster with the ID my-cluster-ID databricks notebook run --notebook-path my-notebook --cluster-id my-cluster-ID # List all of the notebooks in your Databricks account. databricks notebook list # Delete the notebook with the ID `my_notebook`. databricks notebook delete --notebook-id my_notebook
A list of other Notebook commands:
db notebook describe– describes a notebook.
db notebook share– Shares a notebook
db notebook unshare– Unshare in a notebook
Describe a notebook:
databricks notebook describe --notebook-id my-notebook
This command will describe the notebook with the ID
Share a notebook:
databricks notebook share --notebook-id my-notebook --user my-username
This command will share the notebook with the user
Unshare a notebook:
databricks notebook unshare --notebook-id my-notebook --user my-username
This command will unshare the notebook with the user
3. Data Command: Below is an example where we will use the Data command to load data into your table, unload the data from the table, and describe data in the table.
# Load data from a CSV file into a Databricks table. databricks data load --table-name my-table --format csv --path path-to-csv-file # Unload data from a Databricks table to a CSV file. databricks data unload --table-name my-table --format csv --path path-to-output-csv-file # Describe a Databricks table. databricks data describe --table-name my-table
A list of other data commands:
db data sample– Samples data from a Databricks table.
db data history– Shows the history of data loads and unloads for a Databricks table.
databricks data sample --table-name my-table --num-rows 100
This command will return a sample of 100 rows from the table named
databricks data history --table-name my-table
This command will show the history of changes to the table named
4. Job Command: Consider the below example where we will use the Job command to submit a job, display the job list, check the status of your job, and also cancel your job.
# Create new job that will run the Python file at the path path-to-my-python-file databricks jobs create --job-name my-job --python-file path-to-my-python-file # Submit a job for execution. The job will be created and executed in the background. databricks jobs submit --job-name my-job --python-file path-to-my-python-file # Run a job databricks jobs run --job-id my-job # List all of the jobs in your Databricks account. databricks jobs list # Get the status of a job with the ID my-job databricks jobs get-status --job-id my-job # Cancel a job with the ID my-job databricks jobs cancel --job-id my-job # Delete a job databricks jobs delete --job-id my-job
5. User Command: Here is an example where we will use the User command:
# Create a new user. databricks user create --username my-username --password my-password # List all of the users in your Databricks account. databricks user list # Get the details of a user. databricks user info --username my-username # Delete a user. databricks user delete --username my-username
A list of other User commands:
db user grant– Grants role to a user.
db user revoke– Revokes role from a user.
db user change password– Changes the password for a user
db user impersonate– Impersonate a user
Grant a role to a user:
databricks user grant --username my-username --role my-role
This command will grant the role
my-role to the user named
Revoke a role from a user:
databricks user revoke --username my-username --role my-role
This command will revoke the role
my-role from the user named
Change the password for a user:
databricks user change-password --username my-username --password my-new-password
This command will change the password for the user named
Impersonate a user:
databricks user impersonate --username my-username
This command will impersonate the user named
my-username. You will be able to run commands as the user
my-username until you exit the impersonation session.
6. Role Command: Here is an example where we will use the Role command:
# Create a new role. databricks role create --role-name my-role --description "This is my new role." # List all of the roles in your Databricks account. databricks role list # Get the details of a role. databricks role info --role-name my-role # Delete a role. databricks role delete --role-name my-role # Grant permissions CREATE_CLUSTERS to a role named my-role. databricks role grant --role-name my-role --permission CREATE_CLUSTERS # Revoke permissions CREATE_CLUSTER from a role named my-role. databricks role revoke --role-name my-role --permission CREATE_CLUSTERS
How to use Databricks CLI Commands?
Here are some usage examples of the Databricks CLI:
Example 1: To list CLI command groups, run:
Example 2: To display the help for a command, run:
databricks clusters list -h
Example 3: To list all the Databricks clusters that you have in your workspace, run:
databricks clusters list
Example 4: To display the name of an Azure Databricks cluster with the specified Cluster ID.
You can use the utility
jq to extract a specific element from the .json file produced from a cluster command to display the name of an Azure Databricks cluster with the specified Cluster ID. Here’s how to do it:
Run the following command to get the JSON output of the cluster command:
databricks clusters describe --cluster-id my-cluster
This command will output the JSON representation of the cluster with the ID
my-cluster to the console.
Save the JSON output to a file named
Run the following command to use
jq to extract the name of the cluster from the file:
jq '.name' cluster.json
This command will print the name of the cluster to the console.
For example, if the name of the cluster is
my-cluster, the output of the command will be:
Here is a more complete example of how to use
jq to extract the name of an Azure Databricks cluster with the specified Cluster ID:
# Get the JSON output of the cluster command cluster_json=$(databricks clusters describe --cluster-id my-cluster) # Save the JSON output to a file echo "$cluster_json" > cluster.json # Use jq to extract the name of the cluster name=$(jq '.name' cluster.json) # Print the name of the cluster echo "$name"
This script will first get the JSON output of the cluster command and save it to a file named
cluster.json. Then, it will use
jq to extract the name of the cluster from the file and print it to the console.
Example 5: To export a workspace directory to the local filesystem, run:
databricks workspace export --path path-to-workspace-directory --local-path path-to-local-directory
For example, to export the workspace directory
my-workspace to the local directory /
tmp/my-workspace, you would run the following command:
databricks workspace export --path my-workspace --local-path /tmp/my-workspace
This command will export the contents of the workspace directory
my-workspace to the local directory /
Example 6: To import a local directory of notebooks to a workspace, run:
databricks workspace import --local-path /path/to/local/directory --workspace-path /path/to/workspace/directory
For example, to import the notebooks in the local directory
/tmp/my-notebooks to the workspace directory
my-workspace, you would run the following command:
databricks workspace import --local-path /tmp/my-notebooks --workspace-path my-workspace
Here are some additional details about the command:
--local-pathflag specifies the path to the local directory that contains the notebooks to be imported.
--workspace-pathflag specifies the path to the workspace directory where the notebooks will be imported.
- The command will overwrite any notebooks in the workspace directory with the same name as the notebooks in the local directory.
- The command will only import notebooks that have the extensions
Example 7: To copy a small dataset to the Databricks filesystem (DBFS), run:
databricks fs cp /path/to/local/dataset /dbfs/path/to/dataset
For example, to copy the dataset
my-dataset.csv from the local directory
/tmp to the DBFS directory
/user/my-username/my-dataset.csv, you would run the following command:
databricks fs cp /tmp/my-dataset.csv /dbfs/user/my-username/my-dataset.csv
Example 8: To run a SQL command on a Databricks SQL warehouse, run:
databricks sql --query <sql_command>
For example, to run the SQL command
SELECT * FROM my_table, you would run the following command:
databricks sql --query "SELECT * FROM my_table"
This command will run the SQL command
SELECT * FROM my_table on the Databricks SQL warehouse.
This article has provided a comprehensive overview of Databricks CLI, covering its functionalities, authentication setup, and a wide range of commands applicable to clusters, notebooks, data, jobs, users, and roles. By following the step-by-step instructions, you can now easily set up authentication and efficiently manage your Databricks environment. I hope you found this article informative and enjoyable, empowering you to use Databricks CLI effectively in your data engineering and analytics tasks.