bash - 使用命名管道将现有文件与繁重进程的结果进行比较

Question

我正在尝试找出一种方法来将现有文件与进程的结果（一个沉重的文件，不要重复）进行比较，并将现有文件与该进程的结果进行比较，而无需将其写入临时文件（这将是一个大的临时文件，与现有文件的大小大致相同：让我们尝试提高效率，而不是占用两倍的空间）。

我想/tmp/replace_with_that用fifo替换普通文件（见下文），但当然用下面的代码这样做只会锁定脚本，因为/tmp/replace_with_that在将现有文件与命名管道进行比较之前无法读取fifo/tmp/test_against_this

#!/bin/bash

mkfifo /tmp/test_against_this
: > /tmp/replace_with_that    

echo 'A B C D' >/some/existing/file

{
  #A very heavy process not to repeat;
  #Solved: we used a named pipe.
  #Its large output should not be sent to a file
  #To solve: using this code, we write the output to a regular file

  for LETTER in "A B C D E"  
  do  
      echo $LETTER      
  done  

} | tee /tmp/test_against_this /tmp/replace_with_that >/dev/null &  

if cmp -s /some/existing/file /tmp/test_against_this
then  
    echo Exact copy
    #Don't do a thing to /some/existing/file
else
    echo Differs
    #Clobber /some/existing/file with /tmp/replace_with_that
    cat /tmp/replace_with_that >/some/existing/file
fi  

rm -f /tmp/test_against_this  
rm -f /tmp/replace_with_that

score 0 · Accepted Answer

我想我会推荐一种不同的方法：

生成 MD5/SHA1/SHA256/现有文件的任何哈希值
运行繁重的过程并替换输出文件
生成新文件的哈希
如果哈希匹配，则文件相同；如果不是，则新文件不同

score 0 · Accepted Answer

为了完整起见，我的回答（想探索管道的使用）：

试图找到一种方法来动态比较流和现有文件，而不会不必要地覆盖现有文件（如果流和文件是精确副本，则保持原样），并且有时不会创建大临时文件（aa 的产品例如 mysqldump 之类的繁重进程）。该解决方案必须仅依赖管道（命名和匿名），也许还有一些非常小的临时文件。

twalberg 建议的校验和解决方案很好，但是对大文件的 md5sum 调用是处理器密集型的（并且处理时间随着文件大小线性延长）。cmp 更快。

下面列出的函数的示例调用：

#!/bin/bash

mkfifo /tmp/fifo

mysqldump --skip-comments $HOST $USER $PASSWORD $DB >/tmp/fifo &

create_or_replace /some/existing/dump /tmp/fifo

#This also works, but depending on the anonymous fifo setup, seems less robust

create_or_replace /some/existing/dump <(mysqldump --skip-comments $HOST $USER $PASSWORD $DB)

功能：

#!/bin/bash

checkdiff(){
    local originalfilepath="$1"
    local differs="$2"
    local streamsize="$3"
    local timeoutseconds="$4"
    local originalfilesize=$(stat -c '%s' "$originalfilepath")
    local starttime
    local stoptime

    #Hackish: we can't know for sure when the wc subprocess will have produced the streamsize file
    starttime=$(date +%s)
    stoptime=$(( $starttime + $timeoutseconds ))
    while ([[ ! -f "$streamsize" ]] && (( $stoptime > $(date +%s) ))); do :; done;

    if ([[ ! -f "$streamsize" ]] || (( $originalfilesize == $(cat "$streamsize" | head -1) )))
    then
        #Using streams that were exact copies of files to compare with,
        #on average, with just a few test runs:
        #diff slowest, md5sum 2% faster than diff, and cmp method 5% faster than md5sum
        #Did not test, but on large unequal files,
        #cmp method should be way ahead of the 2 other methods
        #since equal files is the worst case scenario for cmp

        #diff -q --speed-large-files <(sort "$originalfilepath") <(sort -) >"$differs"
        #( [[ $(md5sum "$originalfilepath" | cut -b-32) = $(md5sum - | cut -b-32) ]] && : || echo -n '1' ) >"$differs" 
        ( cmp -s "$originalfilepath" - && : || echo -n '1' ) >"$differs"
    else
        echo -n '1' >"$differs"
    fi
}

create_or_replace(){

    local originalfilepath="$1"
    local newfilepath="$2" #Should be a pipe, but could be a regular file
    local differs="$originalfilepath.differs"
    local streamsize="$originalfilepath.size"
    local timeoutseconds=30
    local starttime
    local stoptime

    if [[ -f "$originalfilepath" ]]
    then
        #Cleanup
        [[ -f "$differs" ]] && rm -f "$differs"
        [[ -f "$streamsize" ]] && rm -f "$streamsize"

        #cat the pipe, get its size, check for differences between the stream and the file and pipe the stream into the original file if all checks show a diff
        cat "$newfilepath" |
        tee >(wc -m - | cut -f1 -d' ' >"$streamsize") >(checkdiff "$originalfilepath" "$differs" "$streamsize" "$timeoutseconds") | {

                #Hackish: we can't know for sure when the checkdiff subprocess will have produced the differs file
                starttime=$(date +%s)
                stoptime=$(( $starttime + $timeoutseconds ))
                while ([[ ! -f "$differs" ]] && (( $stoptime > $(date +%s) ))); do :; done;

                [[ ! -f "$differs" ]] || [[ ! -z $(cat "$differs" | head -1) ]] && cat - >"$originalfilepath"
        }

        #Cleanup
        [[ -f "$differs" ]] && rm -f "$differs"
        [[ -f "$streamsize" ]] && rm -f "$streamsize"

    else
        cat "$newfilepath" >"$originalfilepath"
    fi
}

bash - 使用命名管道将现有文件与繁重进程的结果进行比较

2 回答 2

Related

Reference