检查下面的代码。
scala> df.show(false)
+-------------------+-----+
|hobbies |name |
+-------------------+-----+
|[Books, Music, Gym]|sam |
|[Books, Swimming] |Steve|
|[Gym, Music] |Alex |
+-------------------+-----+
使用groupBy
&collect_list
- 分组
hobbies
和收集列表names
- 分组
names
和收集列表hobbies
scala> :paste
// Entering paste mode (ctrl-D to finish)
df
.withColumn("hobbies",explode($"hobbies"))
.groupBy($"hobbies").agg(collect_list($"name").as("names")) // For Hobbies List
.groupBy($"name").agg(collect_list($"hobbies").as("hobbies")) // For Name List
.select(collect_list(to_json(struct($"hobbies",$"names"))).as("data")) // Final Json Output
.show(false)
// Exiting paste mode, now interpreting.
+--------------------------------------------------------------------------------------------------------------------------------------------+
|data |
+--------------------------------------------------------------------------------------------------------------------------------------------+
|[{"hobbies":["Swimming"],"names":["Steve"]}, {"hobbies":["Books"],"names":["sam","Steve"]}, {"hobbies":["Music","Gym"],"names":["sam","Alex"]}]|
+--------------------------------------------------------------------------------------------------------------------------------------------+
格式化输出
[
{ "hobbies": ["Swimming"],"names": ["Steve"]},
{"hobbies": ["Books"],"names": ["sam","Steve"]},
{"hobbies": ["Music","Gym"],"names": ["sam","Alex"]}
]