0

根据我对spark sql的调查,知道不能直接连接超过2个表,我们必须使用子查询才能使其工作。所以我正在使用子查询并能够加入 3 个表:

使用以下查询:

"选择姓名、年龄、性别、dpi.msisdn、subscriptionType、maritalStatus、isHighARPU、ipAddress、startTime、endTime、isRoaming、dpi.totalCount、dpi.website FROM (SELECT subsc.name、subsc.age、subsc.gender、subsc. msisdn、subsc.subscriptionType、subsc.maritalStatus、subsc.isHighARPU、cdr.ipAddress、cdr.startTime、cdr.endTime、cdr.isRoaming FROM SUBSCRIBER_META subsc、CDR_FACT cdr WHERE subsc.msisdn = cdr.msisdn AND cdr.isRoaming = 'Y ') temp, DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn";

但是当在相同的模式下,我试图加入 4 个表,它抛出了我下面的异常

java.lang.RuntimeException:[1.517] 失败:需要标识符

查询加入 4 个表:

SELECT name, dueAmount FROM (SELECT name, age, gender, dpi.msisdn, subscriptionType, maritalStatus, isHighARPU, ipAddress, startTime, endTime, isRoaming, dpi.totalCount, dpi.website FROM (SELECT subsc.name, subsc.age, subsc .gender, subsc.msisdn, subsc.subscriptionType, subsc.maritalStatus, subsc.isHighARPU, cdr.ipAddress, cdr.startTime, cdr.endTime, cdr.isRoaming FROM SUBSCRIBER_META subsc, CDR_FACT cdr where subsc.msisdn = cdr.msisdn AND cdr .isRoaming = 'Y') temp, DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn) inner, BILLING_META billing where inner.msisdn = billing.msisdn

谁能帮我完成这个查询?

提前致谢。错误如下:

09/02/2015 02:55:24 [ERROR] org.apache.spark.Logging$class: Error running job streaming job 1423479307000 ms.0
java.lang.RuntimeException: [1.517] failure: identifier expected

 SELECT name, dueAmount FROM (SELECT name, age, gender, dpi.msisdn, subscriptionType, maritalStatus, isHighARPU, ipAddress, startTime, endTime, isRoaming, dpi.totalCount, dpi.website FROM (SELECT subsc.name, subsc.age, subsc.gender, subsc.msisdn, subsc.subscriptionType, subsc.maritalStatus, subsc.isHighARPU, cdr.ipAddress, cdr.startTime, cdr.endTime, cdr.isRoaming FROM SUBSCRIBER_META subsc, CDR_FACT cdr WHERE subsc.msisdn = cdr.msisdn AND cdr.isRoaming = 'Y') temp, DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn) inner, BILLING_META billing where inner.msisdn = billing.msisdn
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    ^
        at scala.sys.package$.error(package.scala:27)
        at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60)
        at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:73)
        at org.apache.spark.sql.api.java.JavaSQLContext.sql(JavaSQLContext.scala:49)
        at com.hp.tbda.rta.examples.JdbcRDDStreaming5$7.call(JdbcRDDStreaming5.java:596)
        at com.hp.tbda.rta.examples.JdbcRDDStreaming5$7.call(JdbcRDDStreaming5.java:546)
        at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:274)
        at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:274)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
        at scala.util.Try$.apply(Try.scala:161)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
4

1 回答 1

0

由于您在 sql 中使用了 Spark 的保留关键字“inner”而发生异常。避免在 Spark SQL 中使用关键字作为自定义标识符。

于 2015-03-18T15:32:31.853 回答