您是否尝试将 xml 读取到 df 或从列读取 xml 到 df(嵌套 xml)?
请尝试:
spark.read()
.format("xml")
.option("rowTag", "book")
.load("books.xml");
或者:
import com.databricks.spark.xml.functions.from_xml
import com.databricks.spark.xml.schema_of_xml
import spark.implicits._
val df = ... /// DataFrame with XML in column 'payload'
val payloadSchema = schema_of_xml(df.select("payload").as[String])
val parsed = df.withColumn("parsed", from_xml($"payload", payloadSchema))
https://github.com/databricks/spark-xml
(兼容 Spark 2.4.x 和 3.x,兼容 Scala 2.12。)