你不能merge
两个字符串。我认为您对os.path.join
返回的内容感到困惑。它返回一个字符串。您必须DataFrame
从名为JJ
and的文件中实际读取 s WW
,然后执行merge
.
这是编写 2 的完整示例,将它们DataFrame
读回,read_csv
然后将它们合并到列上group
:
In [49]: df1 = DataFrame(randn(10, 1), columns=['a'])
In [50]: df1['group'] = np.random.choice(['b', 'c'], size=len(df1))
In [51]: df2 = DataFrame(randn(10, 1), columns=['b'])
In [52]: df2['group'] = np.random.choice(['b', 'c'], size=len(df1))
In [53]: df1.to_csv('df1.csv', index=False)
In [54]: cat df1.csv
a,group
-1.590035935931282,b
0.5496398501891229,c
-0.6484689548035797,b
0.19162302248253205,b
-0.9852064283582675,c
0.5975155551821989,b
0.29443634291217047,b
-0.7929994157215382,b
-1.9546460886048795,b
0.19195457928475546,c
In [55]: df2.to_csv('df2.csv', index=False)
In [56]: cat df2.csv
b,group
-1.2874060006117918,c
1.1037959548210117,b
0.47172389260467507,c
0.12802538607490285,c
-0.8753708425917293,b
-0.09187827793091947,b
1.140204215271196,c
0.4862940170888638,b
-1.1080430563137758,b
-1.3698112665693232,c
In [57]: df1_csv = read_csv('df1.csv', index_col=None)
In [58]: df2_csv = read_csv('df2.csv', index_col=None)
In [59]: df1_csv
Out[59]:
a group
0 -1.590 b
1 0.550 c
2 -0.648 b
3 0.192 b
4 -0.985 c
5 0.598 b
6 0.294 b
7 -0.793 b
8 -1.955 b
9 0.192 c
In [60]: df2_csv
Out[60]:
b group
0 -1.287 c
1 1.104 b
2 0.472 c
3 0.128 c
4 -0.875 b
5 -0.092 b
6 1.140 c
7 0.486 b
8 -1.108 b
9 -1.370 c
In [61]: df3 = pd.merge(df1_csv, df2_csv, on='group')
In [62]: df3
Out[62]:
a group b
0 -1.590 b 1.104
1 -1.590 b -0.875
2 -1.590 b -0.092
3 -1.590 b 0.486
4 -1.590 b -1.108
5 -0.648 b 1.104
6 -0.648 b -0.875
7 -0.648 b -0.092
8 -0.648 b 0.486
9 -0.648 b -1.108
10 0.192 b 1.104
11 0.192 b -0.875
12 0.192 b -0.092
13 0.192 b 0.486
14 0.192 b -1.108
15 0.598 b 1.104
16 0.598 b -0.875
17 0.598 b -0.092
18 0.598 b 0.486
19 0.598 b -1.108
20 0.294 b 1.104
21 0.294 b -0.875
22 0.294 b -0.092
23 0.294 b 0.486
24 0.294 b -1.108
25 -0.793 b 1.104
26 -0.793 b -0.875
27 -0.793 b -0.092
28 -0.793 b 0.486
29 -0.793 b -1.108
30 -1.955 b 1.104
31 -1.955 b -0.875
32 -1.955 b -0.092
33 -1.955 b 0.486
34 -1.955 b -1.108
35 0.550 c -1.287
36 0.550 c 0.472
37 0.550 c 0.128
38 0.550 c 1.140
39 0.550 c -1.370
40 -0.985 c -1.287
41 -0.985 c 0.472
42 -0.985 c 0.128
43 -0.985 c 1.140
44 -0.985 c -1.370
45 0.192 c -1.287
46 0.192 c 0.472
47 0.192 c 0.128
48 0.192 c 1.140
49 0.192 c -1.370
其他几件事:
不要is
用于比较对象是否相等,请使用==
. 只有在小整数的情况下才能可靠地工作,即使那样你也不应该依赖它,因为这是 CPython 的实现细节。
不用检查文件名str.endswith
,只需通过第一次 globbing 遍历您想要的内容:
import glob
for f in glob.glob(os.path.join(path, '*J.csv')):
if len(f) == 12:
# do all the thingz!