1

So I am completely new to the terraform and I found that by using this in terraform main.tf I can create Azure Databricks infrastructure:

resource "azurerm_databricks_workspace" "bdcc" {
  depends_on = [
    azurerm_resource_group.bdcc
  ]

  name = "dbw-${var.ENV}-${var.LOCATION}"
  resource_group_name = azurerm_resource_group.bdcc.name
  location = azurerm_resource_group.bdcc.location
  sku = "standard"

  tags = {
    region = var.BDCC_REGION
    env = var.ENV
  }
}

And I also found here

That by using this I can even create particular notebook in this Azure DataBricks infrastructure:

resource "databricks_notebook" "notebook" {
  content_base64 = base64encode(<<-EOT
    # created from ${abspath(path.module)}
    display(spark.range(10))
    EOT
  )
  path = "/Shared/Demo"
  language = "PYTHON"
}

But since I am new to this, I am not sure in what order I should put those pieces of code together.

It would be nice if someone could point me to the full example of how to create notebook via terraform on Azure Databricks.

Thank you beforehand!

4

1 回答 1

2

In general you can put these objects in any order - it's a job of the Terraform to detect dependencies between the objects and create/update them in the correct order. For example, you don't need to have depends_on in the azurerm_databricks_workspace resource, because Terraform will find that it needs resource group before workspace could be created, so workspace creation will follow the creation of the resource group. And Terraform is trying to make the changes in the parallel if it's possible.

But because of this, it's becoming slightly more complex when you have workspace resource together with workspace objects, like, notebooks, clusters, etc. As there is no explicit dependency, Terraform will try create notebook in parallel with creation of workspace, and it will fail because workspace doesn't exist - usually you will get a message about authentication error.

The solution for that would be to have explicit dependency between notebook & workspace, plus you need to configure authentication of Databricks provider to point to newly created workspace (there are differences between user & service principal authentication - you can find more information in the docs). At the end your code would look like this:

resource "azurerm_databricks_workspace" "bdcc" {
  name = "dbw-${var.ENV}-${var.LOCATION}"
  resource_group_name = azurerm_resource_group.bdcc.name
  location = azurerm_resource_group.bdcc.location
  sku = "standard"

  tags = {
    region = var.BDCC_REGION
    env = var.ENV
  }
}

provider "databricks" {
  host = azurerm_databricks_workspace.bdcc.workspace_url
}

resource "databricks_notebook" "notebook" {
  depends_on = [azurerm_databricks_workspace.bdcc]
  ...
}

Unfortunately, there is no way to put depends_on on the provider level, so you will need to put it into every Databricks resource that is created together with workspace. Usually the best practice is to have a separate module for workspace creation & separate module for objects inside Databricks workspace.

P.S. I would recommend to read some book or documentation on Terraform. For example, Terraform: Up & Running is very good intro

于 2021-12-13T13:16:28.957 回答