1

我一直在尝试创建一个包含另一个表中的列的表,但 Hive CLI 始终未能这样做。

以下是查询:

CREATE TABLE tweets_id_sample AS
SELECT
   id
FROM tweets_sample;

此 Hive 查询附带的 CLI 错误如下:

Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201310250853_0023, Tracking URL = http://sandbox:50030/jobdetails.jsp?jobid=job_201310250853_0023
Kill Command = /usr/lib/hadoop/libexec/../bin/hadoop job  -kill job_201310250853_0023
Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 0
2013-10-26 07:40:37,273 Stage-1 map = 0%,  reduce = 0%
2013-10-26 07:41:21,570 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201310250853_0023 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://sandbox:50030/jobdetails.jsp?jobid=job_201310250853_0023
Examining task ID: task_201310250853_0023_m_000008 (and more) from job job_201310250853_0023
Examining task ID: task_201310250853_0023_m_000000 (and more) from job job_201310250853_0023

Task with the most failures(4):
-----
Task ID:
  task_201310250853_0023_m_000000

URL:
  http://sandbox:50030/taskdetails.jsp?jobid=job_201310250853_0023&tipid=task_201310250853_0023_m_000000
-----
Diagnostic Messages for this Task:

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 7   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

检查作业跟踪器后,任务及其所有尝试(直到作业被终止)都有以下相同的错误:

java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:425)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
    at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
    ... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.openx.data.jsonserde.JsonSerDe
    at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:463)
    at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:479)
    at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:90)
    ... 22 more
Caused by: java.lang.ClassNotFoundException: org.openx.data.jsonserde.JsonSerDe
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
    at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:422)
    ... 24 more

上面的相同查询在 Hive Beeswax 中有效。

我一直成功地在 Hive Beeswax 中创建这些类型的查询。上面的相同查询(使用不同的表名)有效并且具有以下日志:

13/10/26 07:51:30 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
13/10/26 07:51:30 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
13/10/26 07:51:30 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
13/10/26 07:51:30 INFO ql.Driver: <PERFLOG method=Driver.run>
13/10/26 07:51:30 INFO ql.Driver: <PERFLOG method=TimeToSubmit>
13/10/26 07:51:30 INFO ql.Driver: <PERFLOG method=compile>
13/10/26 07:51:30 INFO parse.ParseDriver: Parsing command: use default
13/10/26 07:51:30 INFO parse.ParseDriver: Parse Completed
13/10/26 07:51:30 INFO ql.Driver: Semantic Analysis Completed
13/10/26 07:51:30 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
13/10/26 07:51:30 INFO ql.Driver: </PERFLOG method=compile start=1382799090878 end=1382799090880 duration=2>
13/10/26 07:51:30 INFO ql.Driver: <PERFLOG method=Driver.execute>
13/10/26 07:51:30 INFO ql.Driver: Starting command: use default
13/10/26 07:51:30 INFO ql.Driver: </PERFLOG method=TimeToSubmit start=1382799090878 end=1382799090880 duration=2>
13/10/26 07:51:30 INFO ql.Driver: </PERFLOG method=Driver.execute start=1382799090880 end=1382799090924 duration=44>
OK
13/10/26 07:51:30 INFO ql.Driver: OK
13/10/26 07:51:30 INFO ql.Driver: <PERFLOG method=releaseLocks>
13/10/26 07:51:30 INFO ql.Driver: </PERFLOG method=releaseLocks start=1382799090924 end=1382799090924 duration=0>
13/10/26 07:51:30 INFO ql.Driver: </PERFLOG method=Driver.run start=1382799090878 end=1382799090924 duration=46>
13/10/26 07:51:30 INFO ql.Driver: <PERFLOG method=compile>
13/10/26 07:51:30 INFO parse.ParseDriver: Parsing command: CREATE TABLE tweets_id_sample_ui AS
   SELECT
      id
FROM tweets_sample
13/10/26 07:51:30 INFO parse.ParseDriver: Parse Completed
13/10/26 07:51:30 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
13/10/26 07:51:30 INFO parse.SemanticAnalyzer: Creating table tweets_id_sample_ui position=13
13/10/26 07:51:30 INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic Analysis
13/10/26 07:51:30 INFO parse.SemanticAnalyzer: Get metadata for source tables
13/10/26 07:51:31 INFO parse.SemanticAnalyzer: Get metadata for subqueries
13/10/26 07:51:31 INFO parse.SemanticAnalyzer: Get metadata for destination tables
13/10/26 07:51:31 INFO parse.SemanticAnalyzer: Completed getting MetaData in Semantic Analysis
13/10/26 07:51:31 INFO ppd.OpProcFactory: Processing for FS(286)
13/10/26 07:51:31 INFO ppd.OpProcFactory: Processing for SEL(285)
13/10/26 07:51:31 INFO ppd.OpProcFactory: Processing for TS(284)
13/10/26 07:51:31 INFO optimizer.GenMRFileSink1: using CombineHiveInputformat for the merge job
13/10/26 07:51:31 INFO physical.MetadataOnlyOptimizer: Looking for table scans where optimization is applicable
13/10/26 07:51:31 INFO physical.MetadataOnlyOptimizer: Found 0 metadata only table scans
13/10/26 07:51:31 INFO parse.SemanticAnalyzer: Completed plan generation
13/10/26 07:51:31 INFO ql.Driver: Semantic Analysis Completed
13/10/26 07:51:31 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:id, type:bigint, comment:null)], properties:null)
13/10/26 07:51:31 INFO ql.Driver: </PERFLOG method=compile start=1382799090924 end=1382799091259 duration=335>
13/10/26 07:51:31 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
13/10/26 07:51:31 INFO ql.Driver: <PERFLOG method=Driver.execute>
13/10/26 07:51:31 INFO ql.Driver: Starting command: CREATE TABLE tweets_id_sample_ui AS
   SELECT
      id
FROM tweets_sample
Total MapReduce jobs = 3
13/10/26 07:51:31 INFO ql.Driver: Total MapReduce jobs = 3
13/10/26 07:51:31 INFO ql.Driver: </PERFLOG method=TimeToSubmit end=1382799091337>
Launching Job 1 out of 3
13/10/26 07:51:31 INFO ql.Driver: Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
13/10/26 07:51:31 INFO exec.Task: Number of reduce tasks is set to 0 since there's no reduce operator
13/10/26 07:51:31 INFO exec.ExecDriver: Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
13/10/26 07:51:31 INFO exec.ExecDriver: adding libjars: file:///usr/lib//hcatalog/share/hcatalog/hcatalog-core.jar,file:///usr/lib/hive/lib/json-serde-1.1.4-jar-with-dependencies.jar
13/10/26 07:51:31 INFO exec.ExecDriver: Processing alias tweets_sample
13/10/26 07:51:31 INFO exec.ExecDriver: Adding input file hdfs://sandbox:8020/data/oct25_tweets
13/10/26 07:51:31 INFO exec.Utilities: Content Summary not cached for hdfs://sandbox:8020/data/oct25_tweets
13/10/26 07:51:35 INFO exec.ExecDriver: Making Temp Directory: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10002
13/10/26 07:51:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/10/26 07:51:35 INFO io.CombineHiveInputFormat: CombineHiveInputSplit creating pool for hdfs://sandbox:8020/data/oct25_tweets; using filter path hdfs://sandbox:8020/data/oct25_tweets
13/10/26 07:51:35 INFO mapred.FileInputFormat: Total input paths to process : 964
13/10/26 07:51:39 INFO io.CombineHiveInputFormat: number of splits 7
Starting Job = job_201310250853_0024, Tracking URL = http://sandbox:50030/jobdetails.jsp?jobid=job_201310250853_0024
13/10/26 07:51:39 INFO exec.Task: Starting Job = job_201310250853_0024, Tracking URL = http://sandbox:50030/jobdetails.jsp?jobid=job_201310250853_0024
Kill Command = /usr/lib/hadoop/libexec/../bin/hadoop job  -kill job_201310250853_0024
13/10/26 07:51:39 INFO exec.Task: Kill Command = /usr/lib/hadoop/libexec/../bin/hadoop job  -kill job_201310250853_0024
Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 0
13/10/26 07:51:48 INFO exec.Task: Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 0
2013-10-26 07:51:48,788 Stage-1 map = 0%,  reduce = 0%
13/10/26 07:51:48 INFO exec.Task: 2013-10-26 07:51:48,788 Stage-1 map = 0%,  reduce = 0%
2013-10-26 07:52:00,853 Stage-1 map = 1%,  reduce = 0%
13/10/26 07:52:00 INFO exec.Task: 2013-10-26 07:52:00,853 Stage-1 map = 1%,  reduce = 0%
2013-10-26 07:52:02,037 Stage-1 map = 2%,  reduce = 0%
13/10/26 07:52:02 INFO exec.Task: 2013-10-26 07:52:02,037 Stage-1 map = 2%,  reduce = 0%
2013-10-26 07:52:04,048 Stage-1 map = 3%,  reduce = 0%
13/10/26 07:52:04 INFO exec.Task: 2013-10-26 07:52:04,048 Stage-1 map = 3%,  reduce = 0%
...
2013-10-26 07:54:30,400 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 141.58 sec
13/10/26 07:54:30 INFO exec.Task: 2013-10-26 07:54:30,400 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 141.58 sec
MapReduce Total cumulative CPU time: 2 minutes 21 seconds 580 msec
13/10/26 07:54:30 INFO exec.Task: MapReduce Total cumulative CPU time: 2 minutes 21 seconds 580 msec
Ended Job = job_201310250853_0024
13/10/26 07:54:30 INFO exec.Task: Ended Job = job_201310250853_0024
13/10/26 07:54:30 INFO exec.FileSinkOperator: Moving tmp dir: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/_tmp.-ext-10002 to: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/_tmp.-ext-10002.intermediate
13/10/26 07:54:30 INFO exec.FileSinkOperator: Moving tmp dir: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/_tmp.-ext-10002.intermediate to: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10002
Stage-4 is filtered out by condition resolver.
13/10/26 07:54:30 INFO exec.Task: Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
13/10/26 07:54:30 INFO exec.Task: Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
13/10/26 07:54:30 INFO exec.Task: Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
13/10/26 07:54:30 INFO ql.Driver: Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
13/10/26 07:54:30 INFO exec.Task: Number of reduce tasks is set to 0 since there's no reduce operator
13/10/26 07:54:30 INFO exec.ExecDriver: Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
13/10/26 07:54:30 INFO exec.ExecDriver: adding libjars: file:///usr/lib//hcatalog/share/hcatalog/hcatalog-core.jar,file:///usr/lib/hive/lib/json-serde-1.1.4-jar-with-dependencies.jar
13/10/26 07:54:30 INFO exec.ExecDriver: Processing alias hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10002
13/10/26 07:54:30 INFO exec.ExecDriver: Adding input file hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10002
13/10/26 07:54:30 INFO exec.Utilities: Content Summary not cached for hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10002
13/10/26 07:54:30 INFO exec.ExecDriver: Making Temp Directory: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10001
13/10/26 07:54:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/10/26 07:54:30 INFO mapred.FileInputFormat: Total input paths to process : 7
13/10/26 07:54:30 INFO io.CombineHiveInputFormat: number of splits 1
Starting Job = job_201310250853_0025, Tracking URL = http://sandbox:50030/jobdetails.jsp?jobid=job_201310250853_0025
13/10/26 07:54:31 INFO exec.Task: Starting Job = job_201310250853_0025, Tracking URL = http://sandbox:50030/jobdetails.jsp?jobid=job_201310250853_0025
Kill Command = /usr/lib/hadoop/libexec/../bin/hadoop job  -kill job_201310250853_0025
13/10/26 07:54:31 INFO exec.Task: Kill Command = /usr/lib/hadoop/libexec/../bin/hadoop job  -kill job_201310250853_0025
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
13/10/26 07:54:39 INFO exec.Task: Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2013-10-26 07:54:39,392 Stage-3 map = 0%,  reduce = 0%
13/10/26 07:54:39 INFO exec.Task: 2013-10-26 07:54:39,392 Stage-3 map = 0%,  reduce = 0%
2013-10-26 07:54:48,505 Stage-3 map = 87%,  reduce = 0%
13/10/26 07:54:48 INFO exec.Task: 2013-10-26 07:54:48,505 Stage-3 map = 87%,  reduce = 0%
2013-10-26 07:54:49,510 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 6.95 sec
13/10/26 07:54:49 INFO exec.Task: 2013-10-26 07:54:49,510 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 6.95 sec
2013-10-26 07:54:50,517 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 6.95 sec
13/10/26 07:54:50 INFO exec.Task: 2013-10-26 07:54:50,517 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 6.95 sec
2013-10-26 07:54:51,525 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
13/10/26 07:54:51 INFO exec.Task: 2013-10-26 07:54:51,525 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
MapReduce Total cumulative CPU time: 6 seconds 950 msec
13/10/26 07:54:51 INFO exec.Task: MapReduce Total cumulative CPU time: 6 seconds 950 msec
Ended Job = job_201310250853_0025
13/10/26 07:54:51 INFO exec.Task: Ended Job = job_201310250853_0025
13/10/26 07:54:51 INFO exec.FileSinkOperator: Moving tmp dir: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/_tmp.-ext-10001 to: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/_tmp.-ext-10001.intermediate
13/10/26 07:54:51 INFO exec.FileSinkOperator: Moving tmp dir: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/_tmp.-ext-10001.intermediate to: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10001
Moving data to: hdfs://sandbox:8020/apps/hive/warehouse/tweets_id_sample_ui
13/10/26 07:54:51 INFO exec.Task: Moving data to: hdfs://sandbox:8020/apps/hive/warehouse/tweets_id_sample_ui from hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10001
13/10/26 07:54:51 INFO exec.DDLTask: Default to LazySimpleSerDe for table tweets_id_sample_ui
13/10/26 07:54:51 INFO hive.metastore: Trying to connect to metastore with URI thrift://sandbox:9083
13/10/26 07:54:51 INFO hive.metastore: Waiting 1 seconds before next connection attempt.
13/10/26 07:54:52 INFO hive.metastore: Connected to metastore.
13/10/26 07:54:53 INFO exec.StatsTask: Executing stats task
Table default.tweets_id_sample_ui stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 10972500, raw_data_size: 0]
13/10/26 07:54:54 INFO exec.Task: Table default.tweets_id_sample_ui stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 10972500, raw_data_size: 0]
13/10/26 07:54:54 INFO ql.Driver: </PERFLOG method=Driver.execute start=1382799091328 end=1382799294689 duration=203361>
MapReduce Jobs Launched: 
13/10/26 07:54:54 INFO ql.Driver: MapReduce Jobs Launched: 
Job 0: Map: 7   Cumulative CPU: 141.58 sec   HDFS Read: 1762842930 HDFS Write: 10972500 SUCCESS
13/10/26 07:54:54 INFO ql.Driver: Job 0: Map: 7   Cumulative CPU: 141.58 sec   HDFS Read: 1762842930 HDFS Write: 10972500 SUCCESS
Job 1: Map: 1   Cumulative CPU: 6.95 sec   HDFS Read: 10973519 HDFS Write: 10972500 SUCCESS
13/10/26 07:54:54 INFO ql.Driver: Job 1: Map: 1   Cumulative CPU: 6.95 sec   HDFS Read: 10973519 HDFS Write: 10972500 SUCCESS
Total MapReduce CPU Time Spent: 2 minutes 28 seconds 530 msec
13/10/26 07:54:54 INFO ql.Driver: Total MapReduce CPU Time Spent: 2 minutes 28 seconds 530 msec
OK
13/10/26 07:54:54 INFO ql.Driver: OK
13/10/26 07:54:56 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
13/10/26 07:54:56 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.

以下是与我的 Hive CLI 一起使用的实例:

  • 如果创建的是视图而不是表,则上述查询也有效。
  • 可以创建空表
  • 可以创建来自 HDFS 文件的表(例如,从第一个代码块中找到的 tweets_sample 表是从 HDFS 文件创建的

这是通过 Hive CLI 对 tweets_sample 执行的查询:

CREATE EXTERNAL TABLE tweets_sample (
   id BIGINT,
   created_at STRING,
   source STRING,
   favorited BOOLEAN,
   retweet_count INT,
   retweeted_status STRUCT<
      text:STRING,
      user:STRUCT<screen_name:STRING,name:STRING>>,
   entities STRUCT<
      urls:ARRAY<STRUCT<expanded_url:STRING>>,
      user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
      hashtags:ARRAY<STRUCT<text:STRING>>>,
   text STRING,
   user STRUCT<
      screen_name:STRING,
      name:STRING,
      friends_count:INT,
      followers_count:INT,
      statuses_count:INT,
      verified:BOOLEAN,
      utc_offset:STRING, -- was INT but nulls are strings
      time_zone:STRING>,
   in_reply_to_screen_name STRING,
   year int,
   month int,
   day int,
   hour int
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/data/oct25_tweets'
;

目前,我被困在如何解决这个问题上。

其他注意事项:

我工作的环境如下:

  • Oracle VM VirtualBox 上的 Hortonworks Sandbox v1.3
  • 我正在编写 Hortonworks 教程 #13
  • Hive Beeswax 查询是通过来自用户“hue”的 Hue UI 执行的
  • Hive CLI 查询从用户“root”执行(也从用户“hue”进行测试)
4

1 回答 1

3

解决方案:

这可以通过配置 Hive 以通过 Hive CLI 将 jar 添加到其类路径来解决,如下所示:

hive> ADD JAR [path to JSON SerDe jar file];

例如:

hive> ADD JAR /usr/lib/hive/lib/json-serde-1.1.4-jar-with-dependencies.jar;

Hive 将通过返回以下语句来确认添加:

Added /usr/lib/hive/lib/json-serde-1.1.4-jar-with-dependencies.jar to class path
Added resource: /usr/lib/hive/lib/json-serde-1.1.4-jar-with-dependencies.jar

以上必须在每个 Hive 会话开始时执行。

解释:

由于 select-from 子句,原始问题提出的查询会产生错误。如果将以下查询提交到 Hive CLI,将遇到相同的错误:

SELECT
   id
FROM tweets_sample;

源表 tweets_sample 的行以 JSON SerDe 格式存储。这可以从在问题末尾生成 tweets_sample 的查询中看出:

CREATE EXTERNAL TABLE tweets_sample (
   id BIGINT,
   ...
   hour int
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/data/oct25_tweets';

默认情况下,Hive 不知道如何解析或提取这种格式的列。有人会注意到,即使在添加 JSON SerDe jar 文件之前,以下查询仍然可以正常工作:

SELECT *
FROM tweets_sample;

此查询有效,因为 Hive 不需要从行中的特定列中提取元素,因此不需要知道行的格式是什么。

通过在执行上述解决方案中提供的任何 JSON SerDe 格式相关查询之前指定 JSON SerDe jar 文件,Hive 将知道如何执行此类查询。

于 2013-11-04T08:01:58.427 回答