0

我正在尝试找出一种方法来将现有文件与进程的结果(一个沉重的文件,不要重复)进行比较,并将现有文件与该进程的结果进行比较,而无需将其写入临时文件(这将是一个大的临时文件,与现有文件的大小大致相同:让我们尝试提高效率,而不是占用两倍的空间)。

我想/tmp/replace_with_that用fifo替换普通文件(见下文),但当然用下面的代码这样做只会锁定脚本,因为/tmp/replace_with_that在将现有文件与命名管道进行比较之前无法读取fifo/tmp/test_against_this

#!/bin/bash

mkfifo /tmp/test_against_this
: > /tmp/replace_with_that    

echo 'A B C D' >/some/existing/file

{
  #A very heavy process not to repeat;
  #Solved: we used a named pipe.
  #Its large output should not be sent to a file
  #To solve: using this code, we write the output to a regular file

  for LETTER in "A B C D E"  
  do  
      echo $LETTER      
  done  

} | tee /tmp/test_against_this /tmp/replace_with_that >/dev/null &  

if cmp -s /some/existing/file /tmp/test_against_this
then  
    echo Exact copy
    #Don't do a thing to /some/existing/file
else
    echo Differs
    #Clobber /some/existing/file with /tmp/replace_with_that
    cat /tmp/replace_with_that >/some/existing/file
fi  

rm -f /tmp/test_against_this  
rm -f /tmp/replace_with_that
4

2 回答 2

0

我想我会推荐一种不同的方法:

  1. 生成 MD5/SHA1/SHA256/现有文件的任何哈希值
  2. 运行繁重的过程并替换输出文件
  3. 生成新文件的哈希
  4. 如果哈希匹配,则文件相同;如果不是,则新文件不同
于 2013-05-17T16:02:23.580 回答
0

为了完整起见,我的回答(想探索管道的使用):

试图找到一种方法来动态比较流和现有文件,而不会不必要地覆盖现有文件(如果流和文件是精确副本,则保持原样),并且有时不会创建大临时文件(aa 的产品例如 mysqldump 之类的繁重进程)。该解决方案必须仅依赖管道(命名和匿名),也许还有一些非常小的临时文件。

twalberg 建议的校验和解决方案很好,但是对大文件的 md5sum 调用是处理器密集型的(并且处理时间随着文件大小线性延长)。cmp 更快。

下面列出的函数的示例调用:

#!/bin/bash

mkfifo /tmp/fifo

mysqldump --skip-comments $HOST $USER $PASSWORD $DB >/tmp/fifo &

create_or_replace /some/existing/dump /tmp/fifo

#This also works, but depending on the anonymous fifo setup, seems less robust

create_or_replace /some/existing/dump <(mysqldump --skip-comments $HOST $USER $PASSWORD $DB)

功能:

#!/bin/bash

checkdiff(){
    local originalfilepath="$1"
    local differs="$2"
    local streamsize="$3"
    local timeoutseconds="$4"
    local originalfilesize=$(stat -c '%s' "$originalfilepath")
    local starttime
    local stoptime

    #Hackish: we can't know for sure when the wc subprocess will have produced the streamsize file
    starttime=$(date +%s)
    stoptime=$(( $starttime + $timeoutseconds ))
    while ([[ ! -f "$streamsize" ]] && (( $stoptime > $(date +%s) ))); do :; done;

    if ([[ ! -f "$streamsize" ]] || (( $originalfilesize == $(cat "$streamsize" | head -1) )))
    then
        #Using streams that were exact copies of files to compare with,
        #on average, with just a few test runs:
        #diff slowest, md5sum 2% faster than diff, and cmp method 5% faster than md5sum
        #Did not test, but on large unequal files,
        #cmp method should be way ahead of the 2 other methods
        #since equal files is the worst case scenario for cmp

        #diff -q --speed-large-files <(sort "$originalfilepath") <(sort -) >"$differs"
        #( [[ $(md5sum "$originalfilepath" | cut -b-32) = $(md5sum - | cut -b-32) ]] && : || echo -n '1' ) >"$differs" 
        ( cmp -s "$originalfilepath" - && : || echo -n '1' ) >"$differs"
    else
        echo -n '1' >"$differs"
    fi
}

create_or_replace(){

    local originalfilepath="$1"
    local newfilepath="$2" #Should be a pipe, but could be a regular file
    local differs="$originalfilepath.differs"
    local streamsize="$originalfilepath.size"
    local timeoutseconds=30
    local starttime
    local stoptime

    if [[ -f "$originalfilepath" ]]
    then
        #Cleanup
        [[ -f "$differs" ]] && rm -f "$differs"
        [[ -f "$streamsize" ]] && rm -f "$streamsize"

        #cat the pipe, get its size, check for differences between the stream and the file and pipe the stream into the original file if all checks show a diff
        cat "$newfilepath" |
        tee >(wc -m - | cut -f1 -d' ' >"$streamsize") >(checkdiff "$originalfilepath" "$differs" "$streamsize" "$timeoutseconds") | {

                #Hackish: we can't know for sure when the checkdiff subprocess will have produced the differs file
                starttime=$(date +%s)
                stoptime=$(( $starttime + $timeoutseconds ))
                while ([[ ! -f "$differs" ]] && (( $stoptime > $(date +%s) ))); do :; done;

                [[ ! -f "$differs" ]] || [[ ! -z $(cat "$differs" | head -1) ]] && cat - >"$originalfilepath"
        }

        #Cleanup
        [[ -f "$differs" ]] && rm -f "$differs"
        [[ -f "$streamsize" ]] && rm -f "$streamsize"

    else
        cat "$newfilepath" >"$originalfilepath"
    fi
}
于 2013-05-20T15:37:29.367 回答