excel - 在 spark 2.0.0 中以流方式读取 excel 文件

Question

我有一组 Excel 格式文件，当 Excel 文件加载到本地目录时，需要从 Spark(2.0.0) 中读取这些文件。这里使用的 Scala 版本是 2.11.8。

我试过使用readstreamSparkSession 的方法，但我无法以流的方式阅读。我能够静态读取 Excel 文件：

val df = spark.read.format("com.crealytics.spark.excel").option("sheetName", "Data").option("useHeader", "true").load("Sample.xlsx")

有没有其他方法可以从本地目录以流的方式读取 excel 文件？

任何答案都会有所帮助。

谢谢

所做的更改：

val spark = SparkSession.builder().master("local[*]").config("spark.sql.warehouse.dir","file:///D:/pooja").appName("Spark SQL Example").getOrCreate()
spark.conf.set("spark.sql.streaming.schemaInference", true)
import spark.implicits._  
val dataFrame = spark.readStream.format("csv").option("inferSchema",true).option("header", true).load("file:///D:/pooja/sample.csv")
dataFrame.writeStream.format("console").start()
dataFrame.show()

更新代码：

val spark = SparkSession.builder().master("local[*]").appName("Spark SQL Example").getOrCreate()
spark.conf.set("spark.sql.streaming.schemaInference", true)
import spark.implicits._  
val df = spark.readStream.format("com.crealytics.spark.excel").option("header", true).load("file:///filepath/*.xlsx")
df.writeStream.format("memory").queryName("tab").start().awaitTermination()
val res = spark.sql("select * from tab")
res.show()

错误：

Exception in thread "main" java.lang.UnsupportedOperationException: Data source com.crealytics.spark.excel does not support streamed reading

谁能帮我解决这个问题。

score 0 · Accepted Answer

对于流式 DataFrame，您必须提供 Schema，目前 DataStreamReader 不支持option("inferSchema", true|false). 您可以设置 SQLConf 设置spark.sql.streaming.schemaInference，需要在会话级别进行设置。

你可以参考这里

excel - 在 spark 2.0.0 中以流方式读取 excel 文件

1 回答 1

Related

Reference