bash - 如何在linux中制作一个winmerge等价物

Question

我的朋友最近问如何比较 linux 中的两个文件夹，然后对任何不同的文本文件运行 meld。我正在慢慢理解将许多细粒度实用程序组合在一起的 linux 哲学，我将以下解决方案放在一起。我的问题是，我该如何改进这个脚本。似乎有相当多的冗余，我希望学习更好的方式编写 unix 脚本。

#!/bin/bash

dir1=$1
dir2=$2

# show files that are different only
cmd="diff -rq $dir1 $dir2"
eval $cmd # print this out to the user too
filenames_str=`$cmd`

# remove lines that represent only one file, keep lines that have
# files in both dirs, but are just different
tmp1=`echo "$filenames_str" | sed -n '/ differ$/p'` 

# grab just the first filename for the lines of output
tmp2=`echo "$tmp1" | awk '{ print $2 }'`

# convert newlines sep to space
fs=$(echo "$tmp2") 

# convert string to array
fa=($fs) 

for file in "${fa[@]}"
do
    # drop first directory in path to get relative filename
    rel=`echo $file | sed "s#${dir1}/##"`

    # determine the type of file
    file_type=`file -i $file | awk '{print $2}' | awk -F"/" '{print $1}'`

    # if it's a text file send it to meld
    if [ $file_type == "text" ]
    then
        # throw out error messages with &> /dev/null
        meld $dir1/$rel $dir2/$rel &> /dev/null
    fi 
done

请保持/提高答案的可读性。更短但更难理解的答案不能作为答案。

score 0 · Accepted Answer

这是一个老问题，但让我们只是为了好玩而稍微研究一下，不要考虑最终目标（可能是 SCM），也不要考虑已经以更好的方式做到这一点的工具。让我们专注于脚本本身。

在 OP 的脚本中，bash 内部有很多字符串处理，使用和之类的工具sed，awk有时在同一命令行中或在执行 n 次的循环中不止一次（每个文件一个）。

没关系，但有必要记住：

每次脚本调用这些程序中的任何一个时，它都会在操作系统中创建一个新进程，这在时间和资源上都是昂贵的。所以调用的程序越少，正在执行的脚本的性能就越好：
- diff 2 次（1 次仅打印给用户）
- sed 1次处理diff结果，每个文件1次
- awk 1次处理sed结果，每个文件2次（处理file结果）
- file 每个文件1次
这不适用于echo,read和test其他 bash 的内置命令，因此不会执行任何外部程序。
meld是将文件显示给用户的最后一个命令，所以它不计算在内。
即使使用内置命令，重定向管道|也有成本，因为 shell 必须创建管道、复制句柄，甚至可能创建 shell 的分支（这是一个进程本身）。再说一遍：越少越好。
命令的消息diff是依赖于语言环境的，所以如果系统不是英文的，整个脚本将无法工作。

考虑到这一点，让我们清理一下原始脚本，保持 OP 的逻辑：

#!/bin/bash

dir1=$1
dir2=$2

# Set english as current language
LANG=en_US.UTF-8
# (1) show files that are different only
diff -rq $dir1 $dir2 | 
    # (2) remove lines that represent only one file, keep lines that have
    # files in both dirs, but are just different, delete all but left filename
    sed '/ differ$/!d; s/^Files //; s/ and .*//' |
    # (3) determine the type of file
    file -i -f - | 
    # (4) for each file
    while IFS=":" read file file_type
    do
        # (5) drop first directory in path to get relative filename
        rel=${file#$dir1}
        # (6) if it's a text file send it to meld
        if [[ "$file_type" =~ "text/" ]]
        then
            # throw out error messages with &> /dev/null
            meld ${dir1}${rel} ${dir2}${rel} &> /dev/null
        fi 
    done

稍微解释一下：

唯一的命令链，cmd1 | cmd2 | ...其中stdout前一个的输出 ( stdin) 是下一个的输入 ( )。
只执行sed一次以;在输出中执行 3 个操作（用分隔）diff：
- 删除以" differ"
- "Files "在剩余行的开头删除
- 删除从" and "到剩余行的末尾
执行file一次命令以处理stdin（选项-f -）中的文件列表
使用bash 语句读取由. 的每一行while分隔的两个值。:stdin
使用 bash 变量替换从变量中提取文件名
使用 bash 测试将文件类型与正则表达式进行比较

为清楚起见，我没有考虑文件和目录名称可能包含空格。在这种情况下，两个脚本都会失败。为避免这种情况，任何对文件/目录名称变量的引用都必须用双引号引起来。

我没有使用awk，因为它足够强大，几乎可以替换整个脚本；-)

bash - 如何在linux中制作一个winmerge等价物

1 回答 1

Related

Reference