unix - 如果该值存在于 txt 文件中，则替换该值

Question

大家早上好，我有一个data.ped由数千列和数百行组成的文件。文件的前 6 列和前 4 行如下所示：

186 A_Han-4.DG 0 0 1 1
187 A_Mbuti-5.DG 0 0 1 1
188 A_Karitiana-4.DG 0 0 1 1
191 A_French-4.DG 0 0 1 1

我有一个ids.txt看起来像这样的文件：

186 Ignore_Han(discovery).DG
187 Ignore_Mbuti(discovery).DG
188 Ignore_Karitiana(discovery).DG
189 Ignore_Yoruba(discovery).DG
190 Ignore_Sardinian(discovery).DG
191 Ignore_French(discovery).DG
192 Dinka.DG
193 Dai.DG

我需要的是（在 unix 中）将文件第一列中的值替换为要从文件中替换的值在同一行data.ped中的第二列中的值。例如，我想用第二列中的“Ignore_Han(discovery).DG”值替换第一列中的“186”值（这是因为在该值的同一行的第一列中有“186 ") 所以文件必须如下所示：ids.txtdata.peddata.pedids.txtoutput.ped

Ignore_Han(discovery).DG A_Han-4.DG 0 0 1 1
Ignore_Mbuti(discovery).DG A_Mbuti-5.DG 0 0 1 1
Ignore_Karitiana(discovery).DG A_Karitiana-4.DG 0 0 1 1
Ignore_French(discovery).DG A_French-4.DG 0 0 1 1

data.ped 文件第一列的值是 ids.txt 文件第一列中存在的值的子集。所以总是有匹配的。

编辑：

我试过这个：

awk 'NR==FNR{a[$1]=$2; next} $1 in a{$1=a[$1]; print}' ids.txt data.ped

但是当我检查结果时：

cut -f 1-6 -d " " output.ped

我得到这个奇怪的输出：

A_Han-4.DG 0 0 1 1y).DG
A_Mbuti-5.DG 0 0 1 1y).DG
A_Karitiana-4.DG 0 0 1 1y).DG
A_French-4.DG 0 0 1 1y).DG

而如果我使用这个命令：

cut -f 1-6 -d " " output.ped | less

我明白了：

Ignore_Han(discovery).DG^M A_Han-4.DG 0 0 1 1
Ignore_Mbuti(discovery).DG^M A_Mbuti-5.DG 0 0 1 1
Ignore_Karitiana(discovery).DG^M A_Karitiana-4.DG 0 0 1 1
Ignore_French(discovery).DG^M A_French-4.DG 0 0 1 1

而且我不知道为什么每行都有^M。

score 1 · Accepted Answer

awk 'NR==FNR{a[$1]=$2; next} $1 in a{$1=a[$1]} 1' ids.txt data.ped

输出：

Ignore_Han(discovery).DG A_Han-4.DG 0 0 1 1
Ignore_Mbuti(discovery).DG A_Mbuti-5.DG 0 0 1 1
Ignore_Karitiana(discovery).DG A_Karitiana-4.DG 0 0 1 1
Ignore_French(discovery).DG A_French-4.DG 0 0 1 1

这是一个经典的 awk 任务，可根据您的要求进行各种修改。data.ped在这里，我们仅在中找到它的值时才替换的第一个字段ids.txt，否则我们将打印该行不变。如果您想删除不匹配的行：

awk 'NR==FNR{a[$1]=$2; next} $1 in a{$1=a[$1]; print}' ids.txt data.ped

不需要对输入文件进行排序，并保留第二个文件的顺序。

更新：

如果您Ctrl-M的输入中有字符，请先删除它们

cat file | tr -d '^M' > file.tmp && mv file.tmp file

对于任何file你使用。一般来说，我建议运行dos2unix任何可能包含^M或之类的字符的文本文件\r，通常来自 dos/windows 编辑。

score 0 · Accepted Answer

使用join命令连接两个文件

join ids.txt data.ped > temp

您可以使用cut命令删除第一列，例如：

cut -d " " -f 2- temp > output.ped

unix - 如果该值存在于 txt 文件中，则替换该值

2 回答 2

Related

Reference