spark作业中有多个查询,当我开始这项工作时,我发现每个查询都开始连接到kafka,并且它们彼此不共享数据,所以如何缓存数据以避免多次读取相同的数据。
我尝试使用命令“CACHE TABLE cache_table;”缓存表 然后
Queries with streaming sources must be executed with writeStream.start();;
kafka
org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:374)
org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:37)
org.apach