有一个由一个数据集、一个顺序文件和一个连接它们的查找阶段组成的并行作业。
顺序文件包含 15,811 行。它导入得很好(我可以在日志中看到)。
问题出在查找阶段 - 它引发以下错误:
LOOKUP,0: Could not map table file "/var/opt/ascential/adm/DataSet1/lookuptable.20140330.spzjazc (size 4191844864 bytes)": Not enough space
Error finalizing / saving table /tmp/dynLUT18950c3139ce
正如我在 IBM 网站和其他论坛上所读到的,一个可能的解决方案可能是增加节点的数量。所以我将我的 APT 文件从 1 个节点更改为 6 个节点:
{
node "node1"
{
fastname "xxx"
pools ""
resource disk "/var/opt/ascential/adm/DataSet1" {pools ""}
resource scratchdisk "/var/opt/ascential/adm/Scratch1" {pools ""}
}
node "node2"
{
fastname "xxx"
pools ""
resource disk "/var/opt/ascential/adm/DataSet2" {pools ""}
resource scratchdisk "/var/opt/ascential/adm/Scratch2" {pools ""}
}
node "node3"
{
fastname "xxx"
pools ""
resource disk "/var/opt/ascential/adm/DataSet3" {pools ""}
resource scratchdisk "/var/opt/ascential/adm/Scratch3" {pools ""}
}
node "node4"
{
fastname "xxx"
pools ""
resource disk "/var/opt/ascential/adm/DataSet4" {pools ""}
resource scratchdisk "/var/opt/ascential/adm/Scratch4" {pools ""}
}
node "node5"
{
fastname "xxx"
pools ""
resource disk "/var/opt/ascential/adm/DataSet5" {pools ""}
resource scratchdisk "/var/opt/ascential/adm/Scratch5" {pools ""}
}
node "node6"
{
fastname "xxx"
pools ""
resource disk "/var/opt/ascential/adm/DataSet6" {pools ""}
resource scratchdisk "/var/opt/ascential/adm/Scratch6" {pools ""}
}
}
尽管如此,我还是遇到了同样的错误,我注意到该作业仅写入第一个 DataSet 文件夹(有一个名为 /var/opt/ascential/adm/DataSet1/lookuptable.20140330.spzjazc 的文件,它的大小会不断增长,直到达到〜 4GB 然后作业失败并删除文件)。
我假设该作业实际上并未在多个节点上运行,因为只有 1 个文件。这个对吗?如何强制它在所有 6 个节点上运行,以便克服 4 GB 的限制?
还有其他解决方法吗?