并行处理
我会使用 GNU/并行。它不是默认分发的,但可以从默认包存储库为大多数 Linux 发行版安装。它是这样工作的:
parallel echo ::: arg1 arg2
将并行执行echo arg1
and echo arg2
。
因此,最简单的方法是创建一个脚本,在 bash/perl/python 中同步您的服务器 - 无论您喜欢什么 - 并像这样执行它:
parallel ./script ::: server1 server2
脚本可能如下所示:
#!/bin/sh
#$0 holds program name, $1 holds first argument.
#$1 will get passed from GNU/parallel. we save it to a variable.
server="$1"
lftp -e "find .; exit" "$server" >"$server-files.txt"
lftp
似乎也可用于 Linux,因此您无需更改 FTP 客户端。
运行最大。一次 30 个实例,-j30
像这样传递:parallel -j30 echo ::: 1 2 3
读取文件列表
现在如何将包含<server>|...
条目的规范文件转换为 GNU/并行参数?简单 - 首先,过滤文件以仅包含主机名:
sed 's/|.*$//' server-list.txt
sed
is used to replace things using regular expressions, and more. This will strip everything (.*
) after the first |
up to the line end ($
). (While |
normally means alternative operator in regular expressions, in sed, it needs to be escaped to work like that, otherwise it means just plain |
.)
So now you have list of servers. How to pass them to your script? With xargs
! xargs
will put each line as if it was an additional argument to your executable. For example
echo -e "1\n2"|xargs echo fixed_argument
will run
echo fixed_argument 1 2
So in your case you should do
sed 's/|.*$//' server-list.txt | xargs parallel -j30 ./script :::
Caveats
Be sure not to save the results to the same file in each parallel task, otherwise the file will get corrupt - coreutils are simple and don't implement any locking mechanisms unless you implement them yourself. That's why I redirected the output to $server-files.txt
rather than files.txt
.