如果要在 Azure Databricks 中将文件保存到 Azure Data Lake gen2,请参考以下步骤
- 创建 Azure Data Lake Storage Gen2 帐户。
az login
az storage account create \
--name <account-name> \
--resource-group <group name> \
--location westus \
--sku Standard_RAGRS \
--kind StorageV2 \
--enable-hierarchical-namespace true
- 创建服务主体并将 Storage Blob Data Contributor 分配给 Data Lake Storage Gen2 存储帐户范围内的 sp
az login
az ad sp create-for-rbac -n "MyApp" --role "Storage Blob Data Contributor" \
--scopes /subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>
在 Azure Databricks 中创建 Spark 群集
在 Azure databricks(python) 中装载 Azure 数据湖 gen2
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<appId>",
"fs.azure.account.oauth2.client.secret": "<clientSecret>",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant>/oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/folder1",
mount_point = "/mnt/flightdata",
extra_configs = configs)
- 将 json 保存到 Azure Data Lake Gen2
dbutils.fs.put("/mnt/flightdata/<file name>", """
<json string>
""", True)