样本数据集:
$, Claw "OnCreativity" (2012) [Himself]
$, Homo Nykytaiteen museo (1986) [Himself] <25>
Suuri illusioni (1985) [Guests] <22>
$, Steve E.R. Sluts (2003) (V) <12>
$hort, Too 2012 AVN Awards Show (2012) (TV) [Himself - Musical Guest]
2012 AVN Red Carpet Show (2012) (TV) [Himself]
5th Annual VH1 Hip Hop Honors (2008) (TV) [Himself]
American Pimp (1999) [Too $hort]
我使用以下代码创建了一个键值对 RDD:
To split data: val actorTuple = actor.map(l => l.split("\t"))
To make KV pair: val actorKV = actorTuple.map(l => (l(0), l(l.length-1))).filter{case(x,y) => y != "" }
控制台上的键值 RDD 输出:
Array(($, Claw,"OnCreativity" (2012) [Himself]), ($, Homo,Nykytaiteen museo (1986) [Himself] <25>), ("",Suuri illusioni (1985) [Guests] <22>), ($, Steve,E.R. Sluts (2003) (V) <12>).......
但是,由于数据集的性质,很多行都有这个“”作为键,即空白(参见上面的 RDD 输出),所以,如果它是,我想要一个函数将前一行的演员复制到这一行空的。如何做到这一点。