awk - 打印带有字段数据条件的行

Question

我有数据文件

cell   input     out    type      fun            level
AI20   A1,A2      Z     comb    ((A1A2))           2
IA2    A1,A2,A3   Z     comb    ((!A1A2)A3)        3
XOR    A1,A2,B1   Z     comb    (((A1A2)B1)        3
IAD    A1,A2,A3   Z     comb    (!((A1A2)A3))      3
INV    I1         ZN    comb    (!I1)              1  
BUF  A1,A2,A3,B1  Z     comb    (!(((A1A2)A3)B1))  4

从这些数据中，我想打印一起level field (6th column)给出的行sum 7。

在这里，level sum 7我们可以选择AI2O, BUF,INV行给出级别总和为2+ 4+ 1=7并打印它们
Or可以选择XOR, IAD,INV给出总和3+ 3+ 1=7并打印它们。任何随机选择的行都有效，但level sum需要7

输出可以是

cell   input     out    type      fun            level
AI20   A1,A2      Z     comb    ((A1A2))           2
INV    I1         ZN    comb    (!I1)              1  
BUF  A1,A2,A3,B1  Z     comb    (!(((A1A2)A3)B1))  4

或者输出也可以


cell   input     out    type      fun            level
XOR    A1,A2,B1   Z     comb    (((A1A2)B1)        3
IAD    A1,A2,A3   Z     comb    (!((A1A2)A3))      3
INV    I1         ZN    comb    (!I1)              1

我用awk试过了

awk '{{ sum[i] += $6} for (i=1;i<8;i++) print $0}' file

但这是每行打印 7 次而不是所需的输出。

第 2 部分。Prblm 继续第 1 部分。

带有数据的文件 2

cell   input  out  type   fun  level
CLK    C       Z    seq   Cq   1      
DFk    C,Cp    Q    seq   IQ   1
DFR    D,C     Qn   seq   IN   1
SKN    SE,Q    Qp   seq   Iq   1

第 2 部分的输出

cell   input     out    type      fun            level
AI20   A1,A2      Z     comb    ((A1A2))           2
INV    I1         ZN    comb    (!I1)              1  
BUF  A1,A2,A3,B1  Z     comb    (!(((A1A2)A3)B1))  4
CLK    C          Z     seq      Cq                1
XOR    A1,A2,B1   Z     comb    (((A1A2)B1)        3
IAD    A1,A2,A3   Z     comb    (!((A1A2)A3))      3
INV    I1         ZN    comb    (!I1)              1
DFk    C,Cp       Q     seq      IQ                1
IA2    A1,A2,A3   Z     comb    ((!A1A2)A3)        3
XOR    A1,A2,B1   Z     comb    (((A1A2)B1)        3
INV    I1         ZN    comb    (!I1)              1

part2 的输出是，当我们得到 file1 的 level sum 为 7 时，在其后插入 file2 的第一行。并再次检查级别总和 7 的条件，如果为真，则从 file2 插入第二行。然后再次检查级别总和为 7。如果为真，则从 file2 插入第 3 行。这样做是为了执行 3 次。

score 2 · Accepted Answer

这是这项工作的 awk 解决方案：

cat rnd.awk
function rnd(max) {        # generate a randon number between 2 and max
   return int(rand()*max-1)+2
}
BEGIN {
   srand()                 # seed random generation
}
NR == 1 {                  # for header row
   print                   # print header record
   next
}
{
   rec[NR] = $0            # save each record in rec array with NR as key 
   num[NR] = $NF           # save last column in num array with NR as key
}
END {
   while(1) {              # infinite loop
      r = rnd(NR)          # generate a randomm number between 2 and NR
      if (!seen[r]++)      # populate seen array with this random number
         s += num[r]       # get aggregate sum from num array

      if (s == 7)          # if sum is 7 then break the loop
         break
      else if (s > 7) {    # if sum > 7 then restart the loop
         delete seen
         s = 0
         continue
      }
   }
   for (j in seen)         # for each val in seen print rec array
      print rec[j]
}

将其用作：

awk -f rnd.awk file

cell   input     out    type      fun            level
AI20   A1,A2      Z     comb    ((A1A2))           2
INV    I1         ZN    comb    (!I1)              1
BUF  A1,A2,A3,B1  Z     comb    (!(((A1A2)A3)B1))  4

然后再次：

awk -f rnd.awk file

cell   input     out    type      fun            level
IA2    A1,A2,A3   Z     comb    ((!A1A2)A3)        3
XOR    A1,A2,B1   Z     comb    (((A1A2)B1)        3
INV    I1         ZN    comb    (!I1)              1

score 2 · Accepted Answer

在这个问题中有两个地方效率很重要：

生成所有可能的组合；
一旦知道组合，就检索正确的行。

第一个问题非常依赖于您作为“级别”的可能值的数量。如果您有“数百个”不同的值，那么为您提供所需总和的可能组合的数量将非常大，因此，您希望优化算法的这一部分。

第二部分取决于文件中的行数。为了解决这个问题，我将创建一个哈希表，其中键是“级别”的值，值是字符串数组，每个字符串都是您的行之一。一旦你有了给定的组合，你可以通过以下步骤几乎瞬间生成（几乎无限的）组合：

检索与组合中存在的每个值关联的字符串数组level；
从每个字符串数组中检索一个随机字符串；3 重复该过程以获得与给定数字组合关联的尽可能多的字符串组合level。

score 1 · Accepted Answer

以下函数将返回行的随机组合，其中级别列的总和与目标相等（根据您的问题，当前为 7）。它可以与任何数据框（只要有一个数字列“级别”）和任何目标一起使用：

import random

def get_one(df, target):    
    indices=[]
    values=[]
    while sum(values)<target:
        dftemp=df[(df['level']<=target-sum(values)) & (df['level']>0)]
        ind1=random.choice([i for i in set(dftemp.index)-set(indices)])
        indices.append(ind1)
        values.append(df.loc[ind1, 'level'])
    return df.loc[indices, :]

要获得结果，只需使用 df 和您的目标作为参数运行该函数：

>>>get_one(df, 7)

cell   input     out    type      fun            level
AI20   A1,A2      Z     comb    ((A1A2))           2
INV    I1         ZN    comb    (!I1)              1  
BUF  A1,A2,A3,B1  Z     comb    (!(((A1A2)A3)B1))  4

如果您想要其他总计，您可以更改参数，例如：

>>>get_one(df, 10)
>>>get_one(df, 15)

ETC

awk - 打印带有字段数据条件的行

3 回答 3

Related

Reference