我可以通过在 hadoop 创建一个目录,将文件传输到 hadoop /user/ 目录,然后使用 spark_read_csv 函数来修复之前的错误:
secondary_two_tbl <- spark_read_csv(sc, "SECONDARYtwo.csv",
path = "/user/ruser/secondary/")
然后我得到一个新的错误:
Error: org.apache.spark.sql.AnalysisException: It is not allowed to add database prefix `SECONDARYtwo` for the TEMPORARY view name.;
at org.apache.spark.sql.execution.command.CreateViewCommand.<init>(views.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$createOrReplaceTempView$1.apply(Dataset.scala:2421)
at org.apache.spark.sql.Dataset$$anonfun$createOrReplaceTempView$1.apply(Dataset.scala:2415)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2603)
at org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:2415)
at org.apache.spark.sql.Dataset.registerTempTable(Dataset.scala:2385)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sparklyr.Handler.handleMethodCall(handler.scala:118)
at spar
在此之后,我尝试从“SECONDARYtwo.csv”文件中删除“.csv”部分并再次运行 spark_read_csv 函数。
tbl_secondary_two <- spark_read_csv(sc, "SECONDARYtwo",
path = "/user/ruser/secondary/")
这最后一次奏效了。