在 Azure Databricks 中的 Python 3 笔记本中,当我运行此命令时:
%scala
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
val config = Config(Map(
"url" -> "serverName.database.windows.net:1433",
"databasename" -> "dbName",
"user" -> "user@domain.com",
"password" -> "password",
"encrypt" -> "true",
"trustServerCertificate" -> "false",
"hostNameInCertificate" -> "*.database.windows.net",
"loginTimeout" -> "30",
"authentication" -> "ActiveDirectoryPassword",
"dbTable" -> "dbo.TableName"
))
val collection = sqlContext.read.sqlDB(config)
collection.show()
我得到错误:
java.lang.NoClassDefFoundError: com/microsoft/aad/adal4j/AuthenticationException
此数据库需要 ActiveDirectoryPassword。我可以使用上面的凭据在我的计算机上使用 pyodbc 进行连接,但我无法从 Databricks 获得任何连接。这是一个 Azure Databricks 标准帐户(不是高级帐户)。有任何想法吗?
更新:感谢马克的回答。显然,在 Azure Databricks 中默认导入 jar 会将它们放在应用程序类路径而不是系统类路径中,这是导致此错误的原因(根据:https ://forums.databricks.com/questions/706/how -can-i-attach-a-jar-library-to-the-cluster-that.html)。为了解决这个问题,我使用了下面的代码(将“clusterName”更改为集群的实际名称):
%scala
// This code block only needs to be run once to create the init script for the cluster (file remains on restart)
// Create dbfs:/databricks/init/ if it doesn’t exist.
dbutils.fs.mkdirs("dbfs:/databricks/init/")
// Display the list of existing global init scripts.
display(dbutils.fs.ls("dbfs:/databricks/init/"))
// Create a directory named (clusterName) using Databricks File System - DBFS.
dbutils.fs.mkdirs("dbfs:/databricks/init/clusterName/")
// Create the adal4j script.
dbutils.fs.put("/databricks/init/clusterName/adal4j-install.sh","""
#!/bin/bash
wget --quiet -O /mnt/driver-daemon/jars/adal4j-1.6.0.jar http://central.maven.org/maven2/com/microsoft/azure/adal4j/1.6.0/adal4j-1.6.0.jar
wget --quiet -O /mnt/jars/driver-daemon/adal4j-1.6.0.jar http://central.maven.org/maven2/com/microsoft/azure/adal4j/1.6.0/adal4j-1.6.0.jar""", true)
// Check that the cluster-specific init script exists.
display(dbutils.fs.ls("dbfs:/databricks/init/clusterName/"))