并行处理
我会使用 GNU/并行。它不是默认分发的,但可以从默认包存储库为大多数 Linux 发行版安装。它是这样工作的:
parallel echo ::: arg1 arg2
将并行执行echo arg1and echo arg2。
因此,最简单的方法是创建一个脚本,在 bash/perl/python 中同步您的服务器 - 无论您喜欢什么 - 并像这样执行它:
parallel ./script ::: server1 server2
脚本可能如下所示:
#!/bin/sh
#$0 holds program name, $1 holds first argument.
#$1 will get passed from GNU/parallel. we save it to a variable.
server="$1"
lftp -e "find .; exit" "$server" >"$server-files.txt"
lftp似乎也可用于 Linux,因此您无需更改 FTP 客户端。
运行最大。一次 30 个实例,-j30像这样传递:parallel -j30 echo ::: 1 2 3
读取文件列表
现在如何将包含<server>|...条目的规范文件转换为 GNU/并行参数?简单 - 首先,过滤文件以仅包含主机名:
sed 's/|.*$//' server-list.txt
sed is used to replace things using regular expressions, and more. This will strip everything (.*) after the first | up to the line end ($). (While | normally means alternative operator in regular expressions, in sed, it needs to be escaped to work like that, otherwise it means just plain |.)
So now you have list of servers. How to pass them to your script? With xargs! xargs will put each line as if it was an additional argument to your executable. For example
echo -e "1\n2"|xargs echo fixed_argument
will run
echo fixed_argument 1 2
So in your case you should do
sed 's/|.*$//' server-list.txt | xargs parallel -j30 ./script :::
Caveats
Be sure not to save the results to the same file in each parallel task, otherwise the file will get corrupt - coreutils are simple and don't implement any locking mechanisms unless you implement them yourself. That's why I redirected the output to $server-files.txt rather than files.txt.