这是一个用 scala 编写的 spark 流程序。它每 1 秒计算来自套接字的单词数。结果将是字数,例如,从时间 0 到 1 的字数,然后从时间 1 到 2 的字数。但我想知道是否有某种方法可以改变这个程序,以便我们可以累积字数?即从时间 0 到现在的字数。
val sparkConf = new SparkConf().setAppName("NetworkWordCount")
val ssc = new StreamingContext(sparkConf, Seconds(1))
// Create a socket stream on target ip:port and count the
// words in input stream of \n delimited text (eg. generated by 'nc')
// Note that no duplication in storage level only for running locally.
// Replication necessary in distributed scenario for fault tolerance.
val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
ssc.start()
ssc.awaitTermination()