我有一个数据框
val df = spark.sqlContext.createDataFrame(Seq( ("100","3","sceince","A"), ("100","3","maths","B"), ("100","3","maths","F"), ("100","3","computesrs","null"), ("101","2","maths","E"), ("101","2","computesrs","C"), ("102","2","maths","null"), ("102","2","computesrs","C"), ("100","2","maths","D"), ("100","2","computesrs","C") )).toDF("Rid","class","subject","teacher")
scala> df.show
+---+-------+----------+-------+
|Rid|class | subject|teacher|
+---+-------+----------+-------+
|100| 3| sceince| A|
|100| 3| maths| B|
|100| 3| maths| F|
|100| 3|computesrs| null|
|101| 2| maths| E|
|101| 2|computesrs| C|
|102| 2| maths| null|
|102| 2|computesrs| C|
|100| 2| maths| D|
|100| 2|computesrs| C|
+---+-------+----------+-------+
我必须将这个数据框旋转到一些(5)个固定列中,分组 BYRid
和class
. 这里subject
的列可能有 n 个不同的值,但是Rid
我们class
必须生成subject
&teacher
列作为键值对。
预期的数据框:
+-------+-------+-----------+---------------+---------------+---------------+-----------+---------------+---------------+---------------+--------+--------------+
|Rid |class |period1 |periodteacher1 |period2 |periodteacher2 |period3 |periodteacher3 |period4 |periodteacher4 |period5 |periodteacher5|
+-------+-------+-----------+---------------+---------------+---------------+-----------+---------------+---------------+---------------+--------+--------------+
|100 |3 |sceince |A |maths |B |maths |F |computesrs | | | |
|100 |2 |maths |D |computesrs |C | | | | | | |
|101 |2 |maths |E |computesrs |C | | | | | | |
|102 |2 |maths | |computesrs |C | | | | | | |
+-------+-------+-----------+---------------+---------------+---------------+-----------+---------------+---------------+---------------+--------+--------------+
有什么建议么 ?