我正在尝试使用 zeppelin 中的 rest API 提取 twitter 数据。尝试了两个选项registerAsTable
和registerTempTable
,两种方法都不起作用。请帮我解决错误。执行 zeppelin 教程代码时出现以下错误:
错误:值 registerAsTable 不是 org.apache.spark.rdd.RDD[Tweet] ).foreachRDD(rdd=> rdd.registerAsTable("tweets") 的成员
我正在尝试使用 zeppelin 中的 rest API 提取 twitter 数据。尝试了两个选项registerAsTable
和registerTempTable
,两种方法都不起作用。请帮我解决错误。执行 zeppelin 教程代码时出现以下错误:
错误:值 registerAsTable 不是 org.apache.spark.rdd.RDD[Tweet] ).foreachRDD(rdd=> rdd.registerAsTable("tweets") 的成员
在 zepplin 解释器中,从 GUI 中添加 org.apache.bahir:spark-streaming-twitter_2.11:2.0.0 的外部依赖项,然后使用 spark-2.0.1 运行
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.{ SparkConf, SparkContext}
import org.apache.spark.storage.StorageLevel
import scala.io.Source
//import org.apache.spark.Logging
import java.io.File
import org.apache.log4j.Logger
import org.apache.log4j.Level
import sys.process.stringSeqToProcess
import scala.collection.mutable.HashMap
/** Configures the Oauth Credentials for accessing Twitter */
def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: String, accessTokenSecret: String) {
val configs = new HashMap[String, String] ++= Seq(
"apiKey" -> apiKey, "apiSecret" -> apiSecret, "accessToken" -> accessToken, "accessTokenSecret" -> accessTokenSecret)
println("Configuring Twitter OAuth")
configs.foreach{ case(key, value) =>
if (value.trim.isEmpty) {
throw new Exception("Error setting authentication - value for " + key + " not set")
}
val fullKey = "twitter4j.oauth." + key.replace("api", "consumer")
System.setProperty(fullKey, value.trim)
println("\tProperty " + fullKey + " set as [" + value.trim + "]")
}
println()
}
// Configure Twitter credentials , following config values will not work,it is for show off
val apiKey = "7AVLnhssAqumpgY6JtMa59w6Tr"
val apiSecret = "kRLstZgz0BYazK6nqfMkPvtJas7LEqF6IlCp9YB1m3pIvvxrRZl"
val accessToken = "79438845v6038203392-CH8jDX7iUSj9xmQRLpHqLzgvlLHLSdQ"
val accessTokenSecret = "OXUpYu5YZrlHnjSacnGJMFkgiZgi4KwZsMzTwA0ALui365"
configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret)
import org.apache.spark.{ SparkConf, SparkContext}
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.SparkContext._
val ssc = new StreamingContext(sc, Seconds(2))
val tweets = TwitterUtils.createStream(ssc, None)
val twt = tweets.window(Seconds(10))
//twt.print
val sqlContext= new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
case class Tweet(createdAt:Long, text:String)
val tweet = twt.map(status=>
Tweet(status.getCreatedAt().getTime()/1000, status.getText())
)
tweet.foreachRDD(rdd=>rdd.toDF.registerTempTable("tweets"))
ssc.start()
//ssc.stop()
之后在另一个 zappelin 单元的表中运行一些查询
%sql select createdAt, text from tweets limit 50
val data = sc.textFile("/FileStore/tables/uy43p2971496606385819/testweet.json");
//将RDD转换为DF
val inputs= data.toDF();
inputs.createOrReplaceTempView("tweets");
RDD 不能注册为 Table 而 dataframe 可以。您可以将 RDD 转换为数据帧,然后将生成的数据帧写入 tempTable 或 table。
您可以将 RDD 转换为 Dataframe,如下所示
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
rdd.toDF()
请参阅How to convert rdd object to dataframe in spark和http://spark.apache.org/docs/latest/sql-programming-guide.html