linux - 有没有检测过时 NFS 挂载的好方法

Question

我有一个程序，只有在几个测试成功完成时才想启动。

我需要的一项测试是我的所有 NFS 挂载都正常运行。

我能比蛮力方法做得更好吗：

mount | sed -n "s/^.* on \(.*\) type nfs .*$/\1/p" | 
while read mount_point ; do 
  timeout 10 ls $mount_point >& /dev/null || echo "stale $mount_point" ; 
done

这timeout是一个实用程序，它将在后台运行命令，并在给定时间后将其杀死，如果SIGCHLD在时间限制之前没有被捕获，则以明显的方式返回成功/失败。

英文：解析的输出mount，检查（以超时为界）每个 NFS 挂载点。可选（不在上面的代码中）在第一个陈旧的挂载上中断。

score 9 · Accepted Answer

A colleague of mine ran into your script. This doesn't avoid a "brute force" approach, but if I may in Bash:

while read _ _ mount _; do 
  read -t1 < <(stat -t "$mount") || echo "$mount timeout"; 
done < <(mount -t nfs)

mount can list NFS mounts directly. read -t (a shell builtin) can time out a command. stat -t (terse output) still hangs like an ls*. ls yields unnecessary output, risks false positives on huge/slow directory listings, and requires permissions to access - which would also trigger a false positive if it doesn't have them.

while read _ _ mount _; do 
  read -t1 < <(stat -t "$mount") || lsof -b 2>/dev/null|grep "$mount"; 
done < <(mount -t nfs)

We're using it with lsof -b (non-blocking, so it won't hang too) in order to determine the source of the hangs.

Thanks for the pointer!

test -d (a shell builtin) would work instead of stat (a standard external) as well, but read -t returns success only if it doesn't time out and reads a line of input. Since test -d doesn't use stdout, a (( $? > 128 )) errorlevel check on it would be necessary - not worth the legibility hit, IMO.

score 7 · Accepted Answer

花了我一些时间，但这是我发现的适用于 Python 的内容：

import signal, os, subprocess
class Alarm(Exception):
    pass
    
def alarm_handler(signum, frame):
    raise Alarm

pathToNFSMount = '/mnt/server1/' # or you can implement some function 
                                 # to find all the mounts...

signal.signal(signal.SIGALRM, alarm_handler)
signal.alarm(3)  # 3 seconds
try:
    proc = subprocess.call('stat '+pathToNFSMount, shell=True, stderr=subprocess.PIPE, stdout=subprocess.PIPE) 
    stdoutdata, stderrdata = proc.communicate()
    signal.alarm(0)  # reset the alarm
except Alarm:
    print "Oops, taking too long!"

评论：

归功于这里的答案。
您还可以使用替代方案：

os.fork()和os.stat()

检查分叉是否完成，如果超时，您可以将其杀死。您将需要使用time.time()等等。

score 6 · Accepted Answer

除了在某些情况下挂起的先前答案之外，此代码段检查所有合适的挂载，使用信号 KILL 终止，并且也使用 CIFS 进行测试：

grep -v tracefs /proc/mounts | cut -d' ' -f2 | \
  while read m; do \
    timeout --signal=KILL 1 ls -d $m > /dev/null || echo "$m"; \
  done

score 5 · Accepted Answer

您可以编写一个 C 程序并检查ESTALE.

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <iso646.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>

int main(){
    struct stat st;
    int ret;
    ret = stat("/mnt/some_stale", &st);
    if(ret == -1 and errno == ESTALE){
        printf("/mnt/some_stale is stale\n");
        return EXIT_SUCCESS;
    } else {
        return EXIT_FAILURE;
    }
}

score 3 · Accepted Answer

如果您不介意因为陈旧的文件系统而等待命令完成，那么编写一个检查 ESTALE 的 C 程序是一个不错的选择。如果您想实现“超时”选项，我发现实现它的最佳方法（在 C 程序中）是派生一个尝试打开文件的子进程。然后，您检查子进程是否已在分配的时间内成功读取文件系统中的文件。

这是一个小的概念证明 C 程序来做到这一点：

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/wait.h>


void readFile();
void waitForChild(int pid);


int main(int argc, char *argv[])
{
  int pid;

  pid = fork();

  if(pid == 0) {
    // Child process.
    readFile();
  }
  else if(pid > 0) {
    // Parent process.
    waitForChild(pid);
  }
  else {
    // Error
    perror("Fork");
    exit(1);
  }

  return 0;
}

void waitForChild(int child_pid)
{
  int timeout = 2; // 2 seconds timeout.
  int status;
  int pid;

  while(timeout != 0) {
    pid = waitpid(child_pid, &status, WNOHANG);
    if(pid == 0) {
      // Still waiting for a child.
      sleep(1);
      timeout--;
    }
    else if(pid == -1) {
      // Error
      perror("waitpid()");
      exit(1);
    }
    else {
      // The child exited.
      if(WIFEXITED(status)) {
        // Child was able to call exit().
        if(WEXITSTATUS(status) == 0) {
          printf("File read successfully!\n");
          return;
        }
      }
      printf("File NOT read successfully.\n");
      return;
    }
  }

  // The child did not finish and the timeout was hit.
  kill(child_pid, 9);
  printf("Timeout reading the file!\n");
}

void readFile()
{
  int fd;

  fd = open("/path/to/a/file", O_RDWR);
  if(fd == -1) {
    // Error
    perror("open()");
    exit(1);
  }
  else {
    close(fd);
    exit(0);
  }
}

score 3 · Accepted Answer

我写了https://github.com/acdha/mountstatus，它使用类似于 UndeadKernel 提到的方法，我发现这是最强大的方法：它是一个守护进程，它通过分叉一个子进程定期扫描所有挂载的文件系统尝试列出顶级目录，SIGKILL如果它在特定超时内未能响应，则将成功和失败都记录到 syslog 中。这避免了某些客户端实现（例如较旧的 Linux）的问题，这些实现永远不会为某些类别的错误触发超时，NFS 服务器部分响应但例如不会响应诸如listdir等的实际调用。

我不发布它们，但包含的 Makefile 用于使用fpmUpstart 脚本构建 rpm 和 deb 包。

score 1 · Accepted Answer

另一种方式，使用 shell 脚本。对我有用：

#!/bin/bash
# Purpose:
# Detect Stale File handle and remove it
# Script created: July 29, 2015 by Birgit Ducarroz
# Last modification: --
#

# Detect Stale file handle and write output into a variable and then into a file
mounts=`df 2>&1 | grep 'Stale file handle' |awk '{print ""$2"" }' > NFS_stales.txt`
# Remove : ‘ and ’ characters from the output
sed -r -i 's/://' NFS_stales.txt && sed -r -i 's/‘//' NFS_stales.txt && sed -r -i 's/’//' NFS_stales.txt

# Not used: replace space by a new line
# stales=`cat NFS_stales.txt && sed -r -i ':a;N;$!ba;s/ /\n /g' NFS_stales.txt`

# read NFS_stales.txt output file line by line then unmount stale by stale.
#    IFS='' (or IFS=) prevents leading/trailing whitespace from being trimmed.
#    -r prevents backslash escapes from being interpreted.
#    || [[ -n $line ]] prevents the last line from being ignored if it doesn't end with a \n (since read returns a non-zero exit code when it encounters EOF).

while IFS='' read -r line || [[ -n "$line" ]]; do
    echo "Unmounting due to NFS Stale file handle: $line"
    umount -fl $line
done < "NFS_stales.txt"
#EOF

linux - 有没有检测过时 NFS 挂载的好方法

7 回答 7

Related

Reference