5

I have loaded an excel file from S3 using the below syntax, but I am wondering about the options that need to be set here.

Why is it mandatory to set all the below options for loading excel file? None of these options are mandatory for loading other file types like csv,del,json,avro etc.

val data = sqlContext.read.
format("com.crealytics.spark.excel").
option("location", s3path).
option("useHeader", "true").
option("treatEmptyValuesAsNulls", "true").
option("inferSchema","true").
option("addColorColumns", "true").
load(path)

I get the below error if any of the above options(except location) are not set:

sqlContext.read.format("com.crealytics.spark.excel").option("location", s3path).load(s3path)

Error message :

Name: java.lang.IllegalArgumentException
Message: Parameter "useHeader" is missing in options.
StackTrace:   at com.crealytics.spark.excel.DefaultSource.checkParameter(DefaultSource.scala:37)
          at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:19)
          at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:7)
          at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:345)
          at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
          at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
          at $anonfun$1.apply(<console>:47)
          at $anonfun$1.apply(<console>:47)
          at time(<console>:36)
4

1 回答 1

4

的大多数选项spark-excel都是强制性的,除了userSchemasheetName

您可以随时在此处找到的 DataSource 源代码中进行检查。

您必须记住,此数据源或数据连接器包是在 spark 项目之外实现的,并且每个都带有自己的规则和参数。

于 2017-06-08T06:16:51.723 回答