multithreading - 多线程shell脚本

Question

谁能帮我写一个多线程shell脚本

基本上我有两个文件，一个文件包含大约 >10K 行（main_file），另一个包含大约 200 行（sub_file）。这 200 行包含按主文件排序的重复字符串。我正在尝试使用以下命令为每个字符串创建单独的文件到其他文件

我已经收集了重复到 sub_file 中的字符串。该字符串随机出现在 main_file 中。

a=0
while IFS= read -r line
do
a=$(($a+1));
users[$a]=$line
egrep "${line}" $main_file >> $line
done <"$sub_file"

如果我在单线程中使用它需要更多时间，所以考虑使用多线程进程并在最短的时间内完成该进程..

帮帮我...

score 2 · Accepted Answer

The tool you need for that is gnu parallel:

parallel egrep '{}' "$mainfile" '>' '{}' < "$sub_file"

You can adjust the number of jobs processed with the option -P:

parallel -P 4 egrep '{}' "$mainfile" '>' '{}' < "$sub_file"

Please see the manual for more info.

By the way to make sure that you don't process a line twice you could make the input unique:

awk '!a[$0]++' "$sub_file" | parallel -P 4 egrep '{}' "$mainfile" '>' '{}'

score 0 · Accepted Answer

注意：发自我之前的帖子。这不直接适用，但与调整非常相似

我有一个包含以下内容的文件 1.txt。

-----cat 1.txt-----
1234
5678
1256
1234
1247

我在一个文件夹中还有 3 个文件

-----ls -lrt-------
A1.txt
A2.txt
A3.txt

这三个文件的内容是相似的格式，具有不同的数据值（所有三个文件都是制表符分隔的）

-----cat A1.txt----
A   X   1234    B   1234
A   X   5678    B   1234
A   X   1256    B   1256

-----cat A2.txt----
A   Y   8888    B   1234
A   Y   9999    B   1256
A   X   1234    B   1256

-----cat A3.txt----
A   Y   6798    C   1256

我的目标是对 1.txt 中的文本的所有 A1、A2 和 A3（仅针对 TAB 分隔文件的第 3 列）进行搜索，并且输出必须重定向到文件 matches.txt，如下所示。

Code:
/home/A1.txt:A   X   1234    B   1234
/home/A1.txt:A   X   5678    B   1234
/home/A1.txt:A   X   1256    B   1256
/home/A2.txt:A   X   1234    B   1256

以下应该工作。

cat A*.txt | tr -s '\t' '|' > combined.dat

{ while read myline;do
recset=`echo $myline | cut -f19 -d '|'|tr -d '\r'`
var=$(grep $recset 1.txt|wc -l)
if [[ $var -ne 0 ]]; then
echo $myline >> final.dat 
fi
done } < combined.dat

{ while read myline;do
recset=`echo $myline | cut -f19 -d '|'|tr -d '\r'`
var=$(grep $recset 1.txt|wc -l)
if [[ $var -ne 0 ]]; then
echo $myline >> final2.dat 
fi
done } < combined.dat

使用 AWK

awk 'NR==FNR{a[$0]=1}$3 in a{print FILENAME":"$0}' 1.txt A* > matches.txt

对于管道分隔

awk –F’|’ 'NR==FNR{a[$0]=1}$3 in a{print FILENAME":"$0}' 1.txt A* > matches.txt

multithreading - 多线程shell脚本

2 回答 2

Related

Reference