0

我是 avro 和 hive 的新手,在学习它时我有些困惑。使用

tblproperties('avro.schema.url'='somewhereinHDFS/categories.avsc').

如果我create像这样运行这个命令

create table categories (id Int , dep_Id Int , name String) 
stored as avrofile  
tblproperties('avro.schema.url'=
'hdfs://quickstart.cloudera/user/cloudera/data/retail_avro_avsc/categories.avsc')

id Int, dep_Id Int但是即使我提供avsc包含完整架构的文件,为什么还要在上面的命令中使用。

create table categories stored as avrofile
tblproperties('avro/schema.url'=
'hdfs://quickstart.cloudera/user/cloudera/data/retail_avro_avsc/categories.avsc')

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
Encountered AvroSerdeException determining schema. 
Returning signal schema to indicate problem: 
Neither avro.schema.literal nor avro.schema.url specified, 
can't determine table schema)

为什么即使avsc文件存在并且已经包含架构,配置单元也需要指定架构?

4

2 回答 2

1

你可以尝试以这种方式进行吗?

CREATE TABLE categories
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.url'='http://schema.avsc');

更多信息在这里https://cwiki.apache.org/confluence/display/Hive/AvroSerDe

于 2016-09-06T08:35:39.823 回答
0

orders_sqoop从给定的 avro-schema 文件和 avro-data 文件 创建外部配置单元表:

 hive> create external table if not exists orders_sqoop
        stored as avro
        location '/user/hive/warehouse/retail_stage.db/orders'
        tblproperties('avro.schema.url'='/user/hive/warehouse/retail_stage.db/orders_schema/orders.avsc');

上述create table命令成功执行并创建orders_sqoop表。

验证下面的表结构:

hive> show create table orders_sqoop;
OK
CREATE EXTERNAL TABLE `orders_sqoop`(
  `order_id` int COMMENT '', 
  `order_date` bigint COMMENT '', 
  `order_customer_id` int COMMENT '', 
  `order_status` string COMMENT '')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
  'hdfs://quickstart.cloudera:8020/user/hive/warehouse/retail_stage.db/orders'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false', 
  'avro.schema.url'='/user/hive/warehouse/retail_stage.db/orders_schema/orders.avsc', 
  'numFiles'='2', 
  'numRows'='-1', 
  'rawDataSize'='-1', 
  'totalSize'='660906', 
  'transient_lastDdlTime'='1563093902')
Time taken: 0.125 seconds, Fetched: 21 row(s)

上表按预期创建。

于 2019-07-14T09:06:09.093 回答