我有一个数据框
+---------------------------------------------------------------+---+
|family_name |id |
+---------------------------------------------------------------+---+
|[[John, Doe, Married, 999-999-9999],[Jane, Doe, Married,Wife,]]|id1|
|[[Tom, Riddle, Single, 888-888-8888]] |id2|
+---------------------------------------------------------------+---+
root
|-- family_name: string (nullable = true)
|-- id: string (nullable = true)
我希望将列转换fam_name
为命名结构的数组
`family_name` array<struct<f_name:string,l_name:string,status:string,ph_no:string>>
我能够转换family_name
为数组,如下所示
val sch = ArrayType(ArrayType(StringType))
val fam_array = data
.withColumn("family_name_clean", regexp_replace($"family_name", "\\[\\[", "["))
.withColumn("family_name_clean_clean1", regexp_replace($"family_name_clean", "\\]\\]", "]"))
.withColumn("ar", toArray($"family_name_clean_clean1"))
//.withColumn("ar1", from_json($"ar", sch))
fam_array.show(false)
fam_array.printSchema()
+---------------------------------------------------------------+---+--------------------------------------------------------------+-------------------------------------------------------------+-----------------------------------------------------------------------+
|family_name |id |family_name_clean |family_name_clean_clean1 |ar |
+---------------------------------------------------------------+---+--------------------------------------------------------------+-------------------------------------------------------------+-----------------------------------------------------------------------+
|[[John, Doe, Married, 999-999-9999],[Jane, Doe, Married,Wife,]]|id1|[John, Doe, Married, 999-999-9999],[Jane, Doe, Married,Wife,]]|[John, Doe, Married, 999-999-9999],[Jane, Doe, Married,Wife,]|[[John, Doe, Married, 999-999-9999], [Jane, Doe, Married, Wife, ]]|
|[[Tom, Riddle, Single, 888-888-8888]] |id2|[Tom, Riddle, Single, 888-888-8888]] |[Tom, Riddle, Single, 888-888-8888] |[[Tom, Riddle, Single, 888-888-8888]] |
+---------------------------------------------------------------+---+--------------------------------------------------------------+-------------------------------------------------------------+-----------------------------------------------------------------------+
root
|-- family_name: string (nullable = true)
|-- id: string (nullable = true)
|-- family_name_clean: string (nullable = true)
|-- family_name_clean_clean1: string (nullable = true)
|-- ar: array (nullable = true)
| |-- element: string (containsNull = true)
sch
是所需类型的模式变量。
如何将列转换ar
为array<struct<>>
?
编辑:
我正在使用 Spark 2.3.2