我有一个数据类型为字符串的列“EVENT_ID”的数据框。我正在运行 FPGrowth 算法但抛出以下错误
Py4JJavaError: An error occurred while calling o1711.fit.
:java.lang.IllegalArgumentException: requirement failed:
The input column must be array, but got string.
列 EVENT_ID 具有值
E_34503_Probe
E_35203_In
E_31901_Cbc
我正在使用下面的代码将字符串列转换为 arraytype
df2 = df.withColumn("EVENT_ID", df["EVENT_ID"].cast(types.ArrayType(types.StringType())))
但我收到以下错误
Py4JJavaError: An error occurred while calling o1874.withColumn.
: org.apache.spark.sql.AnalysisException: cannot resolve '`EVENT_ID`' due to data type mismatch: cannot cast string to array<string>;;
如何将此列转换为数组类型或使用字符串类型运行 FPGrowth 算法?