python - 对不同数组的相同列名寄予厚望

问问题 2021-11-30T14:24:44.883

44 次

假设我们有具有以下模式的 PySpark 数据框：

root
 |-- struct1: struct (nullable = true)
 |    |-- struct2: struct (nullable = true)
 |    |    |-- array1: array (nullable = true)
 |    |    |    |-- **struct3: struct (containsNull = true)
 |    |    |    |    |-- name1: string (nullable = true)**
                     |-- name2: string (nullable = true)
           |
           |-- array2: array (nullable = true)
                |-- **struct4: struct (containsNull = true)
                |    |-- name1: string (nullable = true)**
                     |-- name3: string (nullable = true)

请注意 struct3 和 struct4 有一个通用的列名 name1 （请参阅参考资料 ** ）。

问题我们如何对 name1 来自的列运行期望 struct3 ？对于使用哪个结构来运行期望，巨大的期望会感到困惑吗？特别是，以下命令是否会混淆远大的期望batch.expect_column_to_exist('name1')？

即使我们将数据展平并有像 struct3.name1 and这样的列 struct4.name1 ，这是否仍然会混淆巨大的期望？

python - 对不同数组的相同列名寄予厚望

0 回答 0

Related

Reference