apache-spark - 为什么我的 hdfs 容量不保持不变？

Question

我在 dataproc 上运行 pyspark 作业，我的 hdfs 总容量并没有保持不变。

正如您在第一个图表中看到的那样，即使使用的 hdfs 容量很小，剩余的 hdfs 容量也在下降。为什么剩余+使用不恒定？

score 1 · Accepted Answer

监控图中的“used”实际上是“DFS used”，并没有显示“non-DFS used”。如果您在组件网关 Web 界面中打开 HDFS UI，您应该能够看到如下内容：

Configured Capacity  :   232.5 GB
DFS Used     :   38.52 GB
Non DFS Used     :   45.35 GB
DFS Remaining    :   148.62 GB
DFS Used%    :   16.57 %
DFS Remaining%   :   63.92 %

公式是：

DFS Remaining = Total Disk Space - max(Reserved Space, Non-DFS Used) - DFS Used

配置容量 = 总磁盘空间 - 保留空间

保留空间由默认为 0 的dfs.datanode.du.reserved 1属性控制。因此，在您的情况下，使用的非 DFS 会被扣除。这是一个类似的问题。

apache-spark - 为什么我的 hdfs 容量不保持不变？

1 回答 1

Related

Reference