1

我在亚马逊的 EMR 上使用 ipython notebook,带有 toree 内核。我想从我的 Hive 表中读取一些数据。

query_cmd = ("select  uuid, collect_list(distinct newsid) as news from \ 
        sog_l1screen.v1_test_dw_l1_display_orc_dt where dt>='20170709' and dt<'20170810'\
        and system_setting_area='BR' and action=2 group by uuid")
original_df = spark.sql(query_cmd)

它告诉我

Name: org.apache.toree.interpreter.broker.BrokerException
Message: Traceback (most recent call last):
  File "/tmp/kernel-PySpark-d8cc3c23-661f-478e-9cd8-35f54481581c/pyspark_runner.py", line 189, in <module>
    eval(compiled_code)
  File "<string>", line 3, in <module>
  File "/usr/lib/spark/python/pyspark/sql/session.py", line 541, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
AnalysisException: u"Table or view not found: `sog_l1screen`.`v1_test_dw_l1_display_orc_dt`; line 1 pos 57;\n'Aggregate ['uuid], ['uuid, 'collect_list('newsid) AS news#2]\n+- 'Filter ((('dt >= 20170709) && ('dt < 20170810)) && (('system_setting_area = BR) && ('action = 2)))\n   +- 'UnresolvedRelation `sog_l1screen`.`v1_test_dw_l1_display_orc_dt`\n"

StackTrace: org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:163)
org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:163)
scala.Option.foreach(Option.scala:257)
org.apache.toree.interpreter.broker.BrokerState.markFailure(BrokerState.scala:162)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:280)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.GatewayConnection.run(GatewayConnection.java:214)
java.lang.Thread.run(Thread.java:748)

sog_l1screen是我的数据库,v1_test_dw_l1_display_orc_dt是表。我确信它们存在于我的 Hive 中,我可以使用 Hive 触摸它们,或者将上面的代码写入一个.py文件,然后写入spark-submit这个文件。那么,如何使用 ipython notebook 从我的 Hive 表中读取数据?

4

0 回答 0