我有一个应用程序在集群中每 1-2 秒创建僵尸进程。我在我的应用程序中使用 Process,但仅在我收到特定命令时才使用,而现在情况并非如此。
String command = "helm install release xxx";
LOGGER.debug("handle Install request : command [{}]", command);
waitForNormalTermination(Runtime.getRuntime().exec(command), INSTALL_TIMEOUT, TimeUnit.SECONDS, name);
private void waitForNormalTermination(Process process, int timeout, TimeUnit unit, String release) throws Exception {
try {
if (!process.waitFor(timeout, unit)) {
throw new TimeoutException("Timeout while executing " + process.info().commandLine().orElse(null));
}
if (process.exitValue() != 0) {
String errorStreamOutput = IOUtils.toString(process.getErrorStream(), StandardCharsets.UTF_8);
if (errorStreamOutput != null && errorStreamOutput.contains("release: not found")) {
throw new ReleaseNotFoundException(release);
}
throw new Exception("Process termination was abnormal, exit value: [" + process.exitValue() + "], command:[" + process.info().commandLine().orElse(null) + "] error returned:[" + errorStreamOutput + "]");
}
} finally {
pr.destroy(); // that part was added to simplify the code.. but each process are destroy like that before existing that method
}
}
这是我所做的
#1 - add pr.destroy(); in my code
#1b - build and publish the image
#2 - I killed my pod in my cluster.
#3 - my pod was recreated with the new image
#4 - I look into my node were I had zombies (it's the same where my application was).
I killed the process java that were generating zombie. I had over 12 000 zombies.. now I'm back at 4200.
#5 - I did : ps aux | grep 'Z' | wc -l
in a loop to see if I have new zombies... and yes.. they are still increasing
now I have this : root@test-pcl111:~# ps aux | grep 'Z' | wc -l
4487
I did this : kubectl logs iep-iep-codec-staging-7596fccd85-jkn68 --follow
在另一个终端,所以看看我是否有活动......
即使除了少数周期性 REST 调用(从其他应用程序轮询)外,我身边没有任何活动,僵尸仍然每 1-2 秒增加一次。在这一点上,我没有调用创建新 Process(..) 的方法
我错过了什么?
编辑 我创建了一个小脚本,它将通过节点中的应用程序打印僵尸。
#!/bin/bash
ps -eo ppid,comm | grep "<defunct>" | awk '{print $1}' | sort | uniq -c > /tmp/zombie.file
Files="/tmp/zombie.file"
Lines=$(cat $Files | tr -s ' ' | cut -d ' ' -f2,3)
i=0;
for Line in $Lines
do
if [[ $i -eq 0 ]]
then
echo "Zombies found = $Line"
i=1
else
ps -f $Line
i=0
fi
done
echo " "
echo " "
echo "Running docker containers are "
# that line was to grep only our containers from our private repo
#docker ps | grep private-repository
echo " "
echo " "
echo "the PID of those docker containers"
for value in $(docker ps | grep private-repository | cut -d ' ' -f1); do
docker inspect --format '{{ .State.Pid }}' $value
done
编辑
我对 Containerd 有同样的问题。看起来问题出在 exec 探针上。