1

我有一个包含以下行的文件:

"ALMEREWEG               ";" 45  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 51  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

我有第二个文件,其中包含以下行:

3891ZP;50;
3891ZN;53;A
3891ZN;53;B
3891ZN;54;

现在我想根据第二个文件的模式 grep 第一个文件,其中:

A) 第 2 个文件的第 1 列存在于第 1 个文件的第 5 列中;和

B)第 2 个文件的第 2 列存在于第 1 个文件的第 2 列中。

我的问题:如何做到这一点?

2013 年 7 月 7 日更新:我更新了 file2 格式以反映第三列(数字就足够了)。

4

4 回答 4

3

一种方法awk

awk -F';' '
NR==FNR {
  a[$1]=$2
  next
}
{
  line=$0
  gsub(/\"/,"")
  gsub(/ *; */,";")
  if (a[$5]==$2) {
    print line
    line=""
  }
}' file2 file1

输出

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"
于 2013-07-05T15:34:18.400 回答
2

大量借鉴@JS,我提供了以下改进的解决方案。他的代码的问题是,如果你在同一个邮政编码中有多个门牌号,它只会匹配最后一个。通过创建一个复合关联数组(如果这是名称......基本上将两个字段连接在一起),您可以解决这个问题:

创建一个文件postcode.awk

BEGIN {
  FS=";"
}
# loop around as long as the total number of records read
# is equal to the number of records read in this file
# in other words - loop around the first file only
NR==FNR {
  a[$1,$2]=1 # create one array element for each $1/$2 pair
  next
}
# loop around all the elements of the second file:
# since we're done processing the first file
{
  # copy the original line before modifying it
  line=$0
  # take out the double quotes
  gsub(/\"/,"")
  # take out the spaces on either side of the semicolons
  gsub(/ *; */,";")
  # see if the associative array element exists:
  if (a[$5,$2]==1) {
    # echo the original line that matched:
    print line
  }
}

使用测试文件file1如下(我添加了一行来显示边界情况):

"ALMEREWEG               ";" 45  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

和密钥文件file2(再次添加一行):

3891ZP;50
3891ZP;52
3891ZN;53

您会看到 JS 的代码与编号为 50 的行不匹配。

但我的代码确实:

awk -f postcode.awk file2 file1

生产

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"
于 2013-07-05T21:59:48.130 回答
0

您可以使用类似的东西sed来构造模式grep

$ grep -Ef <(sed -r 's/(.*);(.*)/^[^;]*;[^;]*\2[^;]*;([^;]*;){2}[^;]*\1/' file2) file1
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"
于 2013-07-05T14:10:21.657 回答
0

我使用 bashIFSread. 然后将列传递给 grep:

# read line by line
while IFS=$'\n' read line ; do
    # split into columns
    IFS=$';' read -a col <<< "$line"
    # the expression can be refined but should work well as is
    grep -e ' '${col[1]}'  ";".*;.*";"'${col[0]} file1
done < file2

输出:

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"
于 2013-07-05T14:18:23.587 回答