python - 如何从csv文件中删除第一列中的字符串与另一个csv第一列中的字符串相同的行？

Question

我有 2 个 csv 文件。我需要删除第一个文件的所有行，其中第一列具有在第二个文件的第一列中找到的字符串。表1的头部是：

属	FAGR	当代艺术馆	MUBR	慕哈
1-14-0-20-45-16	0	0	40	0
1-14-0-20-46-22	0	0	0	169
2-02-FULL-61-13	0	0	0	27
2-12-FULL-35-15	56	182	435	311

表2的头部是：

属	FAGR	当代艺术馆	MUBR
1-14-0-20-46-22	0	0	0
2-02-FULL-61-13	0	0	0
21-14-0-10-47-8-A	0	0	0
AAA536-G1	0	0	0

预期的输出文件包含文件 1 的行，除了与第二个文件的前 2 行匹配的行（在第一列中共有以下字符串：1-14-0-20-46-22 和 2- 02-FULL-61-13）。比较完整文件时，必须从文件 1 中删除整个文件 2。

我正在通过 pandas索引和选择数据，但仍然找不到解决方案，可能是因为我是新手。

我尝试了发布的解决方案，结果如下：

df1 = generagrouped_df
df2['drop_key'] = 'DROP'
output = pd.merge(
left = df1,
right = df2,
how = 'left',
left_on = ['Genus'],
right_on = ['Genus']
)
output.drop(output[output['drop_key'] == 'DROP'].index, inplace = True)

错误消息是 KeyError: 'drop_key' （如下）：

KeyError                                  Traceback (most recent call last)
<ipython-input-103-67d27afa824b> in <module>()
----> 1 output.drop(output[output['drop_key'] == 'DROP'].index, inplace = True)

/Users/AnaPaula/opt/anaconda2/lib/python2.7/site-packages.   /pandas/core/frame.pyc in __getitem__(self, key)


2925             if self.columns.nlevels > 1:
2926                 return self._getitem_multilevel(key)
-> 2927             indexer = self.columns.get_loc(key)
2928             if is_integer(indexer):    
2929                 indexer = [indexer]
/Users/AnaPaula/opt/anaconda2/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in get_loc(self, key, method, tolerance)
2657                 return self._engine.get_loc(key)
2658             except KeyError:   
-> 2659                 return self._engine.get_loc(self._maybe_cast_indexer(key))
2660         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
 2661         if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
 KeyError: 'drop_key'

你能想出解决办法吗？谢谢美联社

score 0 · Accepted Answer

尝试在放置键所在的 csv 文件中添加一个新列，然后按索引删除该条件：

import pandas as pd

file1 = pd.read_csv('file_1.csv')
file2 = pd.read_csv('file_2.csv')

# Assign the keyword drop to the file with the strings you're looking
# to drop from your final solution.
file2['drop_key'] = 'DROP'

# Merge the files together
output = pd.merge(
    left = file1,
    right = file2,
    how = 'left',
    left_on = ['str_col'],
    right_on = ['str_col']
)

# Drop the rows that have the keyword 'DROP'
output.drop(output[output['drop_key'] == 'DROP'].index, inplace = True)

请注意，left_onandright_on应该是包含您要匹配的字符串的列的名称。这些在您提供的屏幕截图中不可用，因此我假设名称为str_col.

score 0 · Accepted Answer

我找到了解决方案。由于必须从文件 1 中删除整个文件 2，因此我执行了以下命令，该命令仅通知要比较的第一列，并且它起作用了：

df1.loc[pd.merge(df1, df2, on=['Genus'], how='left', indicator=True)['_merge'] == 'left_only']

谢谢你的时间！美联社

python - 如何从csv文件中删除第一列中的字符串与另一个csv第一列中的字符串相同的行？

2 回答 2

Related

Reference