4

I have a long dataframe with daily dates starting from 1999. I apply a filter to the original_dataframe to create a new_dataframe_1 and another filter to create new_dataframe_2.

How do I create a third dataframe which contains only the rows that new_dataframe_1 and new_dataframe_2 have in common?

new_dataframe_1

    A   B   C   D
1   a   b   c   d
2   a   b   c   d
3   a   b   c   d
4   a   b   c   d


new_dataframe_2

    A   B   C   D
3   a   b   c   d
4   a   b   c   d
5   a   b   c   d
6   a   b   c   d


new_dataframe_3 = union of new_dataframe_1 and new_dataframe_2


    A   B   C   D
3   a   b   c   d
4   a   b   c   d
4

1 回答 1

4

如果您希望两个 DataFrame 中的列连接在一起,请进行内部连接:

import pandas as pd

df1 = pd.DataFrame({'A': range(5)}, index=list('abcde'))
df2 = pd.DataFrame({'B': range(10,20,2)}, index=list('AbCdE'))

print(df1)
#    A
# a  0
# b  1
# c  2
# d  3
# e  4

print(df2)
#     B
# A  10
# b  12
# C  14
# d  16
# E  18

print(df1.join(df2, how='inner'))

产量

   A   B
b  1  12
d  3  16

如果您只想从其中一个 DataFrame 中选择列,请reindex在索引的交点上执行:

import pandas as pd

df1 = pd.DataFrame({'A': range(5)}, index=list('abcde'))
df2 = pd.DataFrame({'A': range(5)}, index=list('AbCdE'))
print(df1)
#    A
# a  0
# b  1
# c  2
# d  3
# e  4

print(df2)
#    A
# A  0
# b  1
# C  2
# d  3
# E  4

print(df1.reindex(df1.index.intersection(df2.index)))

产量

   A
b  1
d  3

还有df1.locdf1.ix,但df1.reindex似乎更快:

In [33]: idx1 = df1.index    
In [34]: idx2 = df2.index

In [35]: %timeit df1.loc[idx1.intersection(idx2)]
1000 loops, best of 3: 269 µs per loop

In [36]: %timeit df1.ix[idx1.intersection(idx2)]
1000 loops, best of 3: 276 µs per loop

In [37]: %timeit df1.reindex(idx1.intersection(idx2))
10000 loops, best of 3: 186 µs per loop
于 2013-10-16T18:10:53.373 回答