0

我正在使用 azure databricks 创建一个简单的批处理,以将数据从 databricks 文件系统复制到另一个位置。

作为单元格中的命令,我通过了这个:

val df = spark.read.text("abfss://" + fileSystemName + "@" + storageAccountName + ".dfs.core.windows.net/fec78263-b86d-4531-ad9d-3139bf3aea31.txt")

源文件名是:fec78263-b86d-4531-ad9d-3139bf3aea31.txt

但是在运行 cmd 时,我收到此错误消息:

HEAD https://bassamsacc01.dfs.core.windows.net/bassamdatabricksfs01/fec78263-b86d-4531-ad9d-3139bf3aea31.txt?timeout=90    
StatusCode=403
StatusDescription=This request is not authorized to perform this operation using this permission.
ErrorCode=
ErrorMessage=
        at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:134)
        at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getPathProperties(AbfsClient.java:353)
        at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:498)
        at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:405)
        at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1439)
        at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:47)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:386)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:366)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:355)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:355)
        at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:927)
        at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:893)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3220206233807239:1)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-3220206233807239:46)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$$iw$$iw$$iw$$iw.<init>(command-3220206233807239:48)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$$iw$$iw$$iw.<init>(command-3220206233807239:50)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$$iw$$iw.<init>(command-3220206233807239:52)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$$iw.<init>(command-3220206233807239:54)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read.<init>(command-3220206233807239:56)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$.<init>(command-3220206233807239:60)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$.<clinit>(command-3220206233807239)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$eval$.$print$lzycompute(<notebook>:7)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$eval$.$print(<notebook>:6)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$eval.$print(<notebook>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
        at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
        at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
        at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
        at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
        at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
        at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:202)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
        at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
        at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
        at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$10(DriverLocal.scala:396)
        at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:238)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
        at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:233)
        at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:230)
        at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
        at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:275)
        at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:268)
        at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
        at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
        at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:653)
        at scala.util.Try$.apply(Try.scala:213)
        at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:645)
        at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:486)
        at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:598)
        at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:391)
        at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
        at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
        at java.lang.Thread.run(Thread.java:748)

乍一看,访问托管在 azure 帐户存储中的文件系统似乎存在身份验证问题,但不知道如何添加适当的字符串。


如果问题有帮助,请投票。提前致谢。

4

1 回答 1

1

从错误消息中,您没有在 Data Lake Storage Gen2 范围内为您的服务主体提供正确的角色。

要解决此问题,请导航到门户中的存储帐户 -> Access control (IAM)-> 将您的服务主体添加为角色Storage Blob Data Contributor,如下所示。

在此处输入图像描述

有关更多详细信息,请参阅此文档 -创建并授予服务主体权限

于 2020-08-10T03:03:56.570 回答