sql - SparkJob file name

Question

I'm using a HQL query, that contains something similar to...

INSERT OVERWRITE TABLE ex_tb.ex_orc_tb
select *, SUBSTR(INPUT__FILE__NAME,60,4), CONCAT_WS('-', SUBSTR(INPUT__FILE__NAME,71,4), SUBSTR(INPUT__FILE__NAME,75,2), SUBSTR(INPUT__FILE__NAME,77,2))
 from ex_db.ex_ext_tb

When I go into hive, and I use that command, it works fine.

When I put it into a pyspark, hivecontext command, instead I get the error...

pyspark.sql.utils.AnalysisException: u"cannot resolve 'INPUT__FILE__NAME' given input columns: [list_name, name, day, link_params, id, template]; line 2 pos 17"

Any ideas why this might be?

score 6 · Accepted Answer

INPUT__FILE__NAME是 Hive 特定的虚拟列，在 Spark 中不受支持。

Spark 提供input_file_name的功能应该以类似的方式工作：

SELECT input_file_name() FROM df

但它需要 Spark 2.0 或更高版本才能与 PySpark 一起正常工作。

sql - SparkJob file name

1 回答 1

Related

Reference