这个问题有一个优雅的解决方案,使用numpy
:
def compare_files( f1name, f2name, f3name, ctc1, ctc2, columns, TOL=0.001 ):
f1 = np.loadtxt( f1name, delimiter=',' )
f2 = np.loadtxt( f2name, delimiter=',' )
check = np.logical_and( *[np.absolute(s.outer(f1[:,i], f2[:,j])) < TOL for i,j in zip(ctc1,ctc2)] )
chosen1 = f1[np.any( check, axis=1 )]
chosen2 = f2[np.any( check, axis=0 )]
newshape = (2,f1.shape[0],f2.shape[0])
ind = np.indices(check.shape)[np.vstack((check,check)).reshape(newshape)]
ind1 = ind[:len(ind)/2]
ind2 = ind[len(ind)/2:]
new = np.concatenate( [eval(f)[ind1, c][:,None] if f=='f1' else\
eval(f)[ind2, c][:,None] \
for f,c in columns], axis=1 )
np.savetxt(f3name, new, delimiter=',', fmt='%f')
该功能是通用的,可以应用于您问题中描述的情况,如下所示:
f1name = 'one.csv'
f2name = 'two.csv'
f3name = 'three.csv'
ctc1 = [0,1] # columns to compare from file 1
# ^ ^
# | | # this arrows are just to emphisize who is compared with who...
# v v
ctc2 = [0,1] # columns to compare from file 2
columns = [['f2',0], # file 2 column 0
['f2',1], # file 2 column 1
['f1',4], # file 1 column 4
['f1',2]] # file 1 column 2
TOL = 0.001
compare_files( f1name, f2name, f3name, ctc1, ctc2, columns, TOL )
Wherectc1
和ctc2
将告诉函数要比较哪些列(ctc)。并将columns
告诉如何构建新文件。在此示例中,它使用来自f2
的第 0 列、第 1 列、第 4f1
列和第 2 列进行构建。
测试one.csv
:
12.23496740, -11.95760385, 3, 5, 11.1, 4
12.58295928, -11.39857395, 4, 7, 12.3, 6
12.42572572, -11.09478502, 2, 5, 12.3, 8
12.58300286, -11.95762569, 5, 11, 3.4, 7
并且two.csv
:
12.43, -11.0948, .7, 3
12.43, -11.0948, .7, 3
12.4257, -11.0948, .7, 3
12.43, -11.0948, .7, 3
12.5830, -11.3986, .2, 4
给出一个three.csv
:
12.583000,-11.398600,12.300000,0.200000
12.425700,-11.094800,12.300000,0.700000