情况:我正在生成一个增量文件夹,其中包含来自先前流式查询 A 的数据,稍后从另一个 DF 读取数据,如此处所示
DF_OUT.writeStream.format("delta").(...).start("path")
(...)
DF_IN = spark.readStream.format("delta").load("path)
1 -当我尝试在同一程序的后续 readStream(ETL 管道的链接查询)中以这种方式读取它时,我最终会遇到下面的异常。
2 -但是,当我在 scala REPL 中运行它时,它运行顺利。
不确定那里发生了什么,但它确实令人费解。
org.apache.spark.sql.AnalysisException: Table schema is not set. Write data into it or use CREATE TABLE to set the schema.;
at org.apache.spark.sql.delta.DeltaErrors$.schemaNotSetException(DeltaErrors.scala:365)
at org.apache.spark.sql.delta.sources.DeltaDataSource.sourceSchema(DeltaDataSource.scala:74)
at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:209)
at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:95)
at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:95)
at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:33)
at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:171)
at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:225)
at org.apache.spark.ui.DeltaPipeline$.main(DeltaPipeline.scala:114)