此查询基于比此链接中的查询更进一步的步骤。在这种情况下,我再添加 1 或 2 个要处理的列,Spark 通过打印查询的物理计划抛出一个错误。
它说,Resolved attribute(s) fnlwgt_bucketed#152530 missing这是不真实的,好像我在少于 3 列的列中运行相同的代码,它就像一个魅力,所以我可以清楚地假设它不是一个错误在我的查询或代码中。
那么是内存不足错误吗?我认为,在内部,由于内存中有很多注册表,它们会由于数据溢出而被删除并被删除,这完全是我的假设。对此有何见解?你们中有人遇到过这样的问题吗?
py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql.
: org.apache.spark.sql.AnalysisException: Resolved attribute(s) fnlwgt_bucketed#152530 missing from occupation#17,high_income#25,fnlwgt#13,education#14,marital-status#16,relationship#18,workclass#12,sex#20,id_num#10,native_country#24,race#19,education-num#15,hours-per-week#23,age_bucketed#152432,capital-loss#22,age#11,capital-gain#21,fnlwgt_bucketed#99009 in operator !Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, native_country#24, high_income#25, age_bucketed#152432, fnlwgt_bucketed#152530, if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else UDF:bucketizer_0(cast(hours-per-week#23 as double)) AS hours-per-week_bucketed#152299]. Attribute(s) with the same name appear in the operation: fnlwgt_bucketed. Please check if the right attribute(s) are used.;;
Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, fnlwgt_bucketed#99009, hours-per-week_bucketed#152299, age_bucketed_WoE#152431, WoE#152524 AS fnlwgt_bucketed_WoE#152529]
+- Join Inner, (fnlwgt_bucketed#99009 = fnlwgt_bucketed#152530)
:- SubqueryAlias bucketed
: +- SubqueryAlias a
: +- Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, fnlwgt_bucketed#99009, hours-per-week_bucketed#152299, WoE#152426 AS age_bucketed_WoE#152431]
: +- Join Inner, (age_bucketed#48257 = age_bucketed#152432)
: :- SubqueryAlias bucketed
: : +- SubqueryAlias a
: : +- Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, fnlwgt_bucketed#99009, if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else UDF:bucketizer_0(cast(hours-per-week#23 as double)) AS hours-per-week_bucketed#152299]
: : +- Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, if (isnull(cast(fnlwgt#13 as double))) null else if (isnull(cast(fnlwgt#13 as double))) null else if (isnull(cast(fnlwgt#13 as double))) null else UDF:bucketizer_0(cast(fnlwgt#13 as double)) AS fnlwgt_bucketed#99009]
: : +- Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, native_country#24, high_income#25, if (isnull(cast(age#11 as double))) null else if (isnull(cast(age#11 as double))) null else if (isnull(cast(age#11 as double))) null else UDF:bucketizer_0(cast(age#11 as double)) AS age_bucketed#48257]
: : +- Relation[id_num#10,age#11,workclass#12,fnlwgt#13,education#14,education-num#15,marital-status#16,occupation#17,relationship#18,race#19,sex#20,capital-gain#21,capital-loss#22,hours-per-week#23,native_country#24,high_income#25] csv
: +- SubqueryAlias woe_table
这种情况还在继续。