0

我正在尝试使用配置单元操作创建一个简单的工作流程。我正在使用 Cloudera 快速入门 VM (CDH 5.12)。以下是我的工作流程的组成部分:

1) top_n_products.hql

create table instacart.top_n as
(
select * from
(
select row_number() over (order by no_of_times_ordered desc)as num_rank, product_id, product_name, no_of_times_ordered 
from
(
select A.product_id, B.product_name, count(*) as no_of_times_ordered from 
instacart.order_products__train as A
left outer join
instacart.products as B
on A.product_id=B.product_id
group by A.product_id, B.product_name
)C
)D
where num_rank <= ${N}
);

2) hive-config.xml

我基本上已经将 /etc/hive/conf 中的默认 hive-site.xml 复制到我的工作流工作区文件夹中,并将其重命名为 hive-config.xml

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>cloudera</value>
  </property>

  <property>
    <name>hive.hwi.war.file</name>
    <value>/usr/lib/hive/lib/hive-hwi-0.8.1-cdh4.0.0.jar</value>
    <description>This is the WAR file with the jsp content for Hive Web Interface</description>
  </property>

  <property>
    <name>datanucleus.fixedDatastore</name>
    <value>true</value>
  </property>

  <property>
    <name>datanucleus.autoCreateSchema</name>
    <value>false</value>
  </property>

  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://127.0.0.1:9083</value>
    <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
  </property>
</configuration>

3) 工作流属性

在配置单元操作中,我设置了以下内容: - 将 HIVE XML、作业 XML 路径设置为我的 hive-config.xml - 还将 hive-config.xml 添加到文件 - 在工作流属性中,设置我的工作区的路径 - 定义我的查询中的参数 N

我的 Hive 操作属性的屏幕截图

当我尝试运行它失败的工作流时,stderr 抛出以下错误:

Log Type: stderr

            Log Upload Time: Mon Nov 20 19:49:04 -0800 2017

            Log Length: 2759
          SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/filecache/130/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Nov 20, 2017 7:47:34 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Nov 20, 2017 7:47:35 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
.
.
.
.
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

下面是生成的 workflow.xml 和 job.properties:

1) 工作流 XML:

<workflow-app name="Top_N_Products" xmlns="uri:oozie:workflow:0.5">
  <global>
      <job-xml>hive-config.xml</job-xml>
  </global>
    <start to="hive-87ac"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="hive-87ac" cred="hcat">
        <hive xmlns="uri:oozie:hive-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
              <job-xml>hive-config.xml</job-xml>
            <script>top_n_products.hql</script>
              <param>N={N}</param>
            <file>hive-config.xml#hive-config.xml</file>
        </hive>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

2)job.properties

security_enabled=False
send_email=False
dryrun=False
nameNode=hdfs://quickstart.cloudera:8020
jobTracker=localhost:8032
N=10

请注意,hive 查询通过 Hive 查询编辑器运行得非常好。配置工作流程时我是否遗漏了什么?任何帮助表示赞赏!

谢谢,德布

4

0 回答 0