1

我有一个 pandas 数据框和一个 pandas 系列标识符,并且想从数据框中过滤与该系列中的标识符相对应的行。要从数据框中获取标识符,我需要连接它的前两列。我尝试了各种过滤方法,但到目前为止似乎没有任何效果。这是我尝试过的:

1)我尝试在数据框中添加一列布尔值,如果该行对应于其中一个标识符,则为真,否则为假(希望能够在之后使用新列进行过滤):

df["isInAcids"] = (df["AcNo"] + df["Sortcode"]) in acids

在哪里

acids

是包含标识符的系列。

然而,这给了我一个

TypeError: unhashable type

2)我尝试使用 apply 功能进行过滤:

df[df.apply(lambda x: x["AcNo"] + x["Sortcode"] in acids, axis = 1)]

这不会给我一个错误,但数据框的长度保持不变,所以它似乎没有过滤任何东西。

3)我添加了一个新列,包含连接的字符串/标识符,然后尝试过滤(如果列中的值在一组值列表中,请参阅过滤数据帧行):

df["ACIDS"] = df["AcNo"] + df["Sortcode"]
df[df["ACIDS"].isin(acids)]

但同样,数据框没有改变。

我希望这是有道理的...

有什么建议我可能会出错吗?谢谢,安妮

4

1 回答 1

3

I think you're asking for something like the following:

In [1]: other_ids = pd.Series(['a', 'b', 'c', 'c'])

In [2]: df = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': ['a', 'b', 'c', 'f']})

In [3]: df
Out[3]: 
  ids  vals
0   a     1
1   b     2
2   c     3
3   f     4

In [4]: other_ids
Out[4]: 
0    a
1    b
2    c
3    c
dtype: object

In this case, the series other_ids would be like your series acids. We want to select just those rows of df whose id is in the series other_ids. To do that we'll use the dataframe's method .isin().

In [5]: df.ids.isin(other_ids)
Out[5]: 
0     True
1     True
2     True
3    False
Name: ids, dtype: bool

This gives a column of bools that we can index into:

In [6]: df[df.ids.isin(other_ids)]
Out[6]: 
  ids  vals
0   a     1
1   b     2
2   c     3

This is close to what you're doing with your 3rd attempt. Once you post a sample of your dataframe I can edit this answer, if it doesn't work already.

Reading a bit more, you may be having trouble because you have two columns in df that are your ids? Dataframe doesn't have an isin method, but we can get around that with something like:

In [26]: df = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': ['a', 'b', 'f', 'f'], 
'ids2': ['e', 'f', 'c', 'f']})

In [27]: df
Out[27]: 
  ids ids2  vals
0   a    e     1
1   b    f     2
2   f    c     3
3   f    f     4

In [28]: df.ids.isin(ids) + df.ids2.isin(ids)
Out[28]: 
0     True
1     True
2     True
3    False
dtype: bool

True is like 1 and False is like zero so we add the two boolean series from the two isins() to get something like an OR operation. Then like before we can index into this boolean series:

In [29]: new = df.ix[df.ids.isin(ids) + df.ids2.isin(ids)]

In [30]: new
Out[30]: 
  ids ids2  vals
0   a    e     1
1   b    f     2
2   f    c     3
于 2013-07-11T15:28:22.930 回答