1

我有一个应用程序在集群中每 1-2 秒创建僵尸进程。我在我的应用程序中使用 Process,但仅​​在我收到特定命令时才使用,而现在情况并非如此。

String command = "helm install release xxx";
LOGGER.debug("handle Install request : command [{}]", command);
waitForNormalTermination(Runtime.getRuntime().exec(command), INSTALL_TIMEOUT, TimeUnit.SECONDS, name);

private void waitForNormalTermination(Process process, int timeout, TimeUnit unit, String release) throws Exception {
try {
    if (!process.waitFor(timeout, unit)) {
        throw new TimeoutException("Timeout while executing " + process.info().commandLine().orElse(null));
    }

    if (process.exitValue() != 0) {
        String errorStreamOutput = IOUtils.toString(process.getErrorStream(), StandardCharsets.UTF_8);
        if (errorStreamOutput != null && errorStreamOutput.contains("release: not found")) {
            throw new ReleaseNotFoundException(release);
        }

        throw new Exception("Process termination was abnormal, exit value: [" + process.exitValue() + "], command:[" + process.info().commandLine().orElse(null) + "] error returned:[" + errorStreamOutput + "]");
    }
} finally {
   pr.destroy();   // that part was added to simplify the code.. but each process are destroy like that before existing that method
}
}

这是我所做的

#1 - add pr.destroy(); in my code
#1b - build and publish the image
#2 - I killed my pod in my cluster.
#3 - my pod was recreated with the new image
#4 - I look into my node were I had zombies (it's the same where my application was).
        I killed the process java that were generating zombie.  I had over 12 000 zombies.. now I'm back at 4200.
#5 - I did :  ps aux | grep 'Z' | wc -l    
       in a loop to see if I have new zombies... and yes.. they are still increasing
       now I have this : root@test-pcl111:~# ps aux | grep 'Z' | wc -l
       4487
  I did this : kubectl logs iep-iep-codec-staging-7596fccd85-jkn68 --follow

在另一个终端,所以看看我是否有活动......

即使除了少数周期性 REST 调用(从其他应用程序轮询)外,我身边没有任何活动,僵尸仍然每 1-2 秒增加一次。在这一点上,我没有调用创建新 Process(..) 的方法

我错过了什么?

编辑 我创建了一个小脚本,它将通过节点中的应用程序打印僵尸。

#!/bin/bash
ps -eo ppid,comm | grep "<defunct>" | awk '{print $1}' | sort | uniq -c > /tmp/zombie.file
Files="/tmp/zombie.file"
Lines=$(cat $Files | tr -s ' ' | cut -d ' ' -f2,3)

i=0;

for Line in $Lines
do
   if [[ $i -eq 0 ]]
     then
       echo "Zombies found = $Line"
       i=1
   else
       ps -f $Line
       i=0
   fi
done

echo " "
echo " "

echo "Running docker containers are "

# that line was to grep only our containers from our private repo
#docker ps | grep private-repository

echo " "
echo " "

echo "the PID of those docker containers"
for value in $(docker ps | grep private-repository  | cut -d ' ' -f1); do
  docker inspect --format '{{ .State.Pid }}' $value
done

编辑

我对 Containerd 有同样的问题。看起来问题出在 exec 探针上。

4

0 回答 0