4

我正在尝试使用 zeppelin 中的 rest API 提取 twitter 数据。尝试了两个选项registerAsTableregisterTempTable,两种方法都不起作用。请帮我解决错误。执行 zeppelin 教程代码时出现以下错误:

错误:值 registerAsTable 不是 org.apache.spark.rdd.RDD[Tweet] ).foreachRDD(rdd=> rdd.registerAsTable("tweets") 的成员

4

3 回答 3

0

在 zepplin 解释器中,从 GUI 中添加 org.apache.bahir:spark-streaming-twitter_2.11:2.0.0 的外部依赖项,然后使用 spark-2.0.1 运行

import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.{ SparkConf, SparkContext}
import org.apache.spark.storage.StorageLevel
import scala.io.Source
//import org.apache.spark.Logging
import java.io.File
import org.apache.log4j.Logger
import org.apache.log4j.Level
import sys.process.stringSeqToProcess

import scala.collection.mutable.HashMap
/** Configures the Oauth Credentials for accessing Twitter */
def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: String, accessTokenSecret: String) {
  val configs = new HashMap[String, String] ++= Seq(
    "apiKey" -> apiKey, "apiSecret" -> apiSecret, "accessToken" -> accessToken, "accessTokenSecret" -> accessTokenSecret)
  println("Configuring Twitter OAuth")
  configs.foreach{ case(key, value) =>
    if (value.trim.isEmpty) {
      throw new Exception("Error setting authentication - value for " + key + " not set")
    }
    val fullKey = "twitter4j.oauth." + key.replace("api", "consumer")
    System.setProperty(fullKey, value.trim)
    println("\tProperty " + fullKey + " set as [" + value.trim + "]")
  }
  println()
}


// Configure Twitter credentials , following config values will not work,it is for show off
val apiKey = "7AVLnhssAqumpgY6JtMa59w6Tr"
val apiSecret = "kRLstZgz0BYazK6nqfMkPvtJas7LEqF6IlCp9YB1m3pIvvxrRZl"
val accessToken = "79438845v6038203392-CH8jDX7iUSj9xmQRLpHqLzgvlLHLSdQ"
val accessTokenSecret = "OXUpYu5YZrlHnjSacnGJMFkgiZgi4KwZsMzTwA0ALui365"
configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret)

import org.apache.spark.{ SparkConf, SparkContext}
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.SparkContext._

val ssc = new StreamingContext(sc, Seconds(2))

val tweets = TwitterUtils.createStream(ssc, None)
val twt = tweets.window(Seconds(10))

//twt.print


val sqlContext= new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

case class Tweet(createdAt:Long, text:String)

val tweet = twt.map(status=>
  Tweet(status.getCreatedAt().getTime()/1000, status.getText())
)


tweet.foreachRDD(rdd=>rdd.toDF.registerTempTable("tweets"))
ssc.start()
//ssc.stop()

之后在另一个 zappelin 单元的表中运行一些查询

%sql select createdAt, text  from tweets   limit 50
于 2016-11-05T05:28:39.163 回答
0
val data = sc.textFile("/FileStore/tables/uy43p2971496606385819/testweet.json");

//将RDD转换为DF

val inputs= data.toDF();
inputs.createOrReplaceTempView("tweets");
于 2017-06-04T20:25:50.590 回答
0

RDD 不能注册为 Table 而 dataframe 可以。您可以将 RDD 转换为数据帧,然后将生成的数据帧写入 tempTable 或 table。

您可以将 RDD 转换为 Dataframe,如下所示

val sqlContext = new SQLContext(sc) 
import sqlContext.implicits._
rdd.toDF()

请参阅How to convert rdd object to dataframe in sparkhttp://spark.apache.org/docs/latest/sql-programming-guide.html

于 2016-02-11T09:04:21.470 回答