1

我在集群模式下的纱线集群上有一个 spark sql 2.1.1 作业,我想在其中创建一个空的外部配置单元表(带有位置的分区将在后面的步骤中添加)。

CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

当我运行作业时,我收到错误:

CREATE EXTERNAL TABLE 必须带有 LOCATION

但是当我在 Hue 上的 Hive Editor 上运行相同的查询时,它运行得很好。我试图在 SparkSQL 2.1.1 文档中找到答案,但结果是空的。

有谁知道为什么 Spark SQL 对查询更严格?

4

1 回答 1

2

TL;不允许EXTERNAL带 no的DRLOCATION

最终答案在 Spark SQL 的语法定义文件SqlBase.g4中。

您可以找到CREATE EXTERNAL TABLEas createTableHeader的定义:

CREATE TEMPORARY? EXTERNAL? TABLE (IF NOT EXISTS)? tableIdentifier

此定义用于支持的 SQL语句

除非我记错了locationSpec是可选的。这是根据 ANTLR 语法。代码可能会做出其他决定,而且看起来确实如此。

scala> spark.version
res4: String = 2.3.0-SNAPSHOT

val q = "CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'"
scala> sql(q)
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: CREATE EXTERNAL TABLE must be accompanied by LOCATION(line 1, pos 0)

== SQL ==
CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
^^^

  at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
  at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1096)
  at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1064)
  at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
  at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateHiveTable(SparkSqlParser.scala:1064)
  at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateHiveTable(SparkSqlParser.scala:55)
  at org.apache.spark.sql.catalyst.parser.SqlBaseParser$CreateHiveTableContext.accept(SqlBaseParser.java:1124)
  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
  at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
  at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
  at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
  at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97)
  at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
  ... 48 elided

默认SparkSqlParser(带有astBuilderas SparkSqlAstBuilder)具有以下导致异常的断言:

if (external && location.isEmpty) {
  operationNotAllowed("CREATE EXTERNAL TABLE must be accompanied by LOCATION", ctx)

如果您认为应该允许此案例,我会考虑在Spark 的 JIRA中报告问题。请参阅SPARK-2825以获得支持的有力论据:

据我所知,CREATE EXTERNAL TABLE 已经可以工作,并且应该具有与 Hive 相同的语义。

于 2017-05-31T09:02:09.490 回答