pandas - 使用 toPandas() 方法创建的数据框是否分布在 spark 集群中？

Question

我正在通过阅读 CSV

data=sc.textFile("filename") 

Df = Sparksql.create dataframe()

Pdf = Df.toPandas ()

现在 Pdf 是分布在 spark 集群中还是驻留在主机环境中？

score 1 · Accepted Answer

不。

正如它在 DataFrame 的 PySpark源代码中所说：

    .. note:: This method should only be used if the resulting Pandas's DataFrame is expected
        to be small, as all the data is loaded into the driver's memory.

pandas - 使用 toPandas() 方法创建的数据框是否分布在 spark 集群中？

1 回答 1

Related

Reference