解决此问题后: 如何将 FPGrowth 项目集限制为 2 或 3 我正在尝试使用 pyspark 将 fpgrowth 的关联规则输出导出到 python 中的 .csv 文件。运行近 8-10 小时后,它给出了一个错误。我的机器有足够的空间和内存。
Association Rule output is like this:
Antecedent Consequent Lift
['A','B'] ['C'] 1
代码在链接中: 如何将 FPGrowth 项目集限制为 2 或 3 只需再添加一行
ar = ar.coalesce(24)
ar.write.csv('/output', header=True)
使用的配置:
``` conf = SparkConf().setAppName("App")
conf = (conf.setMaster('local[*]')
.set('spark.executor.memory', '200G')
.set('spark.driver.memory', '700G')
.set('spark.driver.maxResultSize', '400G')) #8,45,10
sc = SparkContext.getOrCreate(conf=conf)
spark = SparkSession(sc)
这继续运行并消耗了我的 C:/ 驱动器的 1000GB
是否有任何有效的方法可以将输出保存为 .CSV 格式或 .XLSX 格式。
错误是:
```The error is:
Py4JJavaError: An error occurred while calling o207.csv.
org.apache.spark.SparkException: Job aborted.at
org.apache.spark.sql.execution.
datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
atorg.apache.spark.sql.execution.datasources.InsertIntoHadoopFs
RelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
at
org.apache.spark.sql.execution.command.
DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.
DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:664)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 9.0 failed 1 times, most recent failure: Lost task 10.0 in stage 9.0 (TID 226, localhost, executor driver): java.io.IOException: There is not enough space on the disk
at java.io.FileOutputStream.writeBytes(Native Method)
The progress:
19/07/15 14:12:32 WARN TaskSetManager: Stage 1 contains a task of very large size (26033 KB). The maximum recommended task size is 100 KB.
19/07/15 14:12:33 WARN TaskSetManager: Stage 2 contains a task of very large size (26033 KB). The maximum recommended task size is 100 KB.
19/07/15 14:12:38 WARN TaskSetManager: Stage 4 contains a task of very large size (26033 KB). The maximum recommended task size is 100 KB.
[Stage 5:> (0 + 24) / 24][Stage 6:> (0 + 0) / 24][I 14:14:02.723 NotebookApp] Saving file at /app1.ipynb
[Stage 5:==> (4 + 20) / 24][Stage 6:===> (4 + 4) / 24]