0

我有一个下面的元组列表形式的模板,我将使用数据框连接来实例化它。

rule = [('#1', 'X', 'Y'), ('#2', 'X', 'Z'), ('#3', 'Z', 'Y')]

我还有一个作为字典的模板每个组件的实例。

rComp_substitution =

{('#1', 'X', 'Y'):           pred  subj  obj
                   0  nationality  BART  USA, 
 
 ('#2', 'X', 'Z'):            pred  subj      obj
                   0  placeOfBirth  BART  NEWYORK
                   1     hasFather  BART   HOMMER, 
 
 ('#3', 'Z', 'Y'):           pred     subj  obj
                   0    locatedIn  NEWYORK  USA
                   1  nationality   HOMMER  USA }

每个组件对应的实例是一个 pandas 数据框,具有三列。For ('#1', 'X', 'Y'),#1对应于pred, XtosubjYto obj

例如,首先实例化 ('#1', 'X', 'Y'), ('#2', 'X', 'Z')。

我们可以检查 ('#1', 'X', 'Y') 和 ('#2', 'X', 'Z') 的公共变量。

并将每个数据帧的公共变量 X(subj) 与一个键连接,以获得 ('#1', 'X', 'Y'), ('#2', 'X', 'Z') 的实例。

下面是我的代码。

depth = 0    
# step1 check common variable
current_subj = rule[depth][1] #['X']
current_obj = rule[depth][2] #['Y']
next_subj = rule[depth+1][1] #['X']
next_obj = rule[depth+1][2] #['Z']
if current_subj == next_subj or current_subj == next_obj:
    comVar = current_subj
elif current_obj == next_subj or current_obj == next_obj:
    comVar = current_obj

# step2 Create currnt_rComp with common variable for joining dataframes
current_rComp = rComp_substitution[rule[depth]]
unified_rComp = []
for col in current_rComp.itertuples(index=False):
    if comVar == current_subj:
        unified_rComp.append([col.subj, [list(col)]])
    elif comVar == current_obj:
        unified_rComp.append([col.obj, [list(col)]])
current_rComp = pd.DataFrame(unified_rComp, columns=['comVar', 'triples'])

# step3 Create next_rComp with common variable for joining dataframes
next_rComp = rComp_substitution[rule[depth+1]]
unified_rComp = []
for col in next_rComp.itertuples(index=False):
    if comVar == next_subj:
        unified_rComp.append([col.subj, [list(col)]])
    elif comVar == next_obj:
        unified_rComp.append([col.obj, [list(col)]])
next_rComp = pd.DataFrame(unified_rComp, columns=['comVar', 'triples'])

# step4 Join currnt_rComp and next_rComp with common variable as key
partial_proof_path = pd.merge(current_rComp, next_rComp, how='inner', on='comVar')
print(partial_proof_path)

此代码输出是

  comVar                   triples_x                        triples_y
0   BART  [[nationality, BART, USA]]  [[placeOfBirth, BART, NEWYORK]]
1   BART  [[nationality, BART, USA]]      [[hasFather, BART, HOMMER]]

我认为这段代码太长了。有没有办法用更简单的代码做同样的事情?

4

1 回答 1

0

输入数据:

rComp_substitution = {('#1', 'X', 'Y'): pd.DataFrame({'pred': ['nationality'], 'subj': ['BART'], 'obj': ['USA']}),
                      ('#2', 'X', 'Z'): pd.DataFrame({'pred': ['placeOfBirth', 'hasFather'], 'subj': ['BART', 'BART'], 'obj': ['NEWYORK', 'HOMMER']}),
                      ('#3', 'Z', 'Y'): pd.DataFrame({'pred': ['locatedIn', 'nationality'], 'subj': ['NEWYORK', 'HOMMER'], 'obj': ['USA', 'USA']})}

rules = list(rComp_substitution.keys())

主要功能:

def merge_from_common_key(rule0, rule1):
  # Load dataframes
  df0 = rComp_substitution[rule0]
  df1 = rComp_substitution[rule1]

  # Rename ["pred", "subj", "obj"] by ruleN
  df0.columns = rule0
  df1.columns = rule1

  # Find the common key(s) and merge the two dataframes
  key = df0.columns.intersection(df1.columns).tolist()
  df = pd.merge(df0, df1, on=key)

  # Build the new dataframe
  return pd.DataFrame({"common": df["X"].values.tolist(),
                       "left": df[list(rules[0])].values.tolist(),
                       "right": df[list(rules[1])].values.tolist()})

用法:

>>> merge_from_common_key(rules[0], rules[1])

  common                      left                          right
0   BART  [nationality, BART, USA]  [placeOfBirth, BART, NEWYORK]
1   BART  [nationality, BART, USA]      [hasFather, BART, HOMMER]
于 2021-05-11T21:11:33.387 回答