python - 删除 Pandas DataFrame (Python) 中一行中的重复值

Question

删除熊猫数据框中任何行中的重复值的表达式如下......（注意：第一列是索引（日期），然后是四列数据）。

1983-02-16 512 517 510
514、1983-02-17 513 520 513
517、1983-02-18 500 500 500 500 <--重复值，
1983-02-21 505 505 496 496

删除重复值行，最终得到这个......

1983-02-16 512 517 510
514、1983-02-17 513 520 513
517、1983-02-21 505 505 496 496

只能按列而不是按行找到如何做到这一点......非常感谢提前，

彼得

score 1 · Accepted Answer

稍微优雅/动态的（但可能性能较差的版本）：

In [11]: msk = df1.apply(lambda col: df[1] != col).any(axis=1)
Out[11]:
0     True
1     True
2    False
3     True
dtype: bool

In [12]: msk.index = df1.index  # iloc doesn't support masking

In [13]: df1.loc[msk]
Out[13]:
              1    2    3    4
1983-02-16  512  517  510  514
1983-02-17  513  520  513  517
1983-02-21  505  505  496  496

score 0 · Accepted Answer

import pandas as pd
import io
content = '''\
1983-02-16 512 517 510 514
1983-02-17 513 520 513 517
1983-02-18 500 500 500 500
1983-02-21 505 505 496 496'''
df = pd.read_table(io.BytesIO(content), parse_dates=[0], header=None, sep='\s+',
                   index_col=0)
index = (df[1] == df[2]) & (df[1] == df[3]) & (df[1] == df[4])
df = df.ix[~index]
print(df)

产量

              1    2    3    4
0                             
1983-02-16  512  517  510  514
1983-02-17  513  520  513  517
1983-02-21  505  505  496  496

df.ix可用于选择行。df = df.ix[~index]选择所有index为 False 的行。

python - 删除 Pandas DataFrame (Python) 中一行中的重复值

2 回答 2

Related