bash - 具有两列作为输入的 Grep 文件

Question

我有一个包含以下行的文件：

"ALMEREWEG               ";" 45  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 51  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

我有第二个文件，其中包含以下行：

3891ZP;50;
3891ZN;53;A
3891ZN;53;B
3891ZN;54;

现在我想根据第二个文件的模式 grep 第一个文件，其中：

A) 第 2 个文件的第 1 列存在于第 1 个文件的第 5 列中；和

B）第 2 个文件的第 2 列存在于第 1 个文件的第 2 列中。

我的问题：如何做到这一点？

2013 年 7 月 7 日更新：我更新了 file2 格式以反映第三列（数字就足够了）。

score 3 · Accepted Answer

一种方法awk：

awk -F';' '
NR==FNR {
  a[$1]=$2
  next
}
{
  line=$0
  gsub(/\"/,"")
  gsub(/ *; */,";")
  if (a[$5]==$2) {
    print line
    line=""
  }
}' file2 file1

输出：

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

score 2 · Accepted Answer

大量借鉴@JS，我提供了以下改进的解决方案。他的代码的问题是，如果你在同一个邮政编码中有多个门牌号，它只会匹配最后一个。通过创建一个复合关联数组（如果这是名称......基本上将两个字段连接在一起），您可以解决这个问题：

创建一个文件postcode.awk：

BEGIN {
  FS=";"
}
# loop around as long as the total number of records read
# is equal to the number of records read in this file
# in other words - loop around the first file only
NR==FNR {
  a[$1,$2]=1 # create one array element for each $1/$2 pair
  next
}
# loop around all the elements of the second file:
# since we're done processing the first file
{
  # copy the original line before modifying it
  line=$0
  # take out the double quotes
  gsub(/\"/,"")
  # take out the spaces on either side of the semicolons
  gsub(/ *; */,";")
  # see if the associative array element exists:
  if (a[$5,$2]==1) {
    # echo the original line that matched:
    print line
  }
}

使用测试文件file1如下（我添加了一行来显示边界情况）：

"ALMEREWEG               ";" 45  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

和密钥文件file2（再次添加一行）：

3891ZP;50
3891ZP;52
3891ZN;53

您会看到 JS 的代码与编号为 50 的行不匹配。

但我的代码确实：

awk -f postcode.awk file2 file1

生产

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

score 0 · Accepted Answer

您可以使用类似的东西sed来构造模式grep：

$ grep -Ef <(sed -r 's/(.*);(.*)/^[^;]*;[^;]*\2[^;]*;([^;]*;){2}[^;]*\1/' file2) file1
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

score 0 · Accepted Answer

我使用 bashIFS和read. 然后将列传递给 grep：

# read line by line
while IFS=$'\n' read line ; do
    # split into columns
    IFS=$';' read -a col <<< "$line"
    # the expression can be refined but should work well as is
    grep -e ' '${col[1]}'  ";".*;.*";"'${col[0]} file1
done < file2

输出：

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

bash - 具有两列作为输入的 Grep 文件

4 回答 4

Related

Reference