python - 如何在 pandas df 列的条件子集中使用 endwith() 中的正则表达式？

Question

我想在我的数据框中列的.endswith()条件子集中使用或正则表达式。Sender name

Dataframedf有两列Sender email，Sender name我将使用它们来定义子集规则，以选择来自特定商店的所有邮件和该商店的特定电子邮件：

df = df[(df["Sender name"]=="Shop_name"]) & (df["Sender email"]=="reply@shop.com")]

但是后来我发现还有来自buy@shop.com,noreply@shop.com等的邮件。有没有办法将所有这些邮箱巧妙地引入*@shop.com第二个条件中的东西？
我尝试使用endswith()，但无法弄清楚如何使其适用于series对象。我想我可以先用列中的所有邮件形成一个列表，然后检查发送邮件服务器是否在其中pd.Series.isin。但也许那里有更优雅的东西？

score 2 · Accepted Answer

使用Series.str.endswithorSeries.str.contains与正则表达式 -$用于字符串的结尾并转义.为\，因为.是特殊的正则表达式值 - 任何字符：

df1 = df[(df["Sender name"]=="Shop_name"]) & (df["Sender email"].str.endswith("@shop.com"))]

或者：

df1 = df[(df["Sender name"]=="Shop_name"]) & (df["Sender email"].str.contains("@shop\.com$"))]

score 1 · Accepted Answer

使用`.query`

因为pandas >= 0.25.0我们可以使用.querypandas 方法（.eq& str.endswith）并使用反引号（`）来查询带有空格的列名：

df.query('`Sender name`.eq("Shop_name") & `Sender email`.str.endswith("@shop.com")')

输出

       Sender email Sender name
2    reply@shop.com   Shop_name
3      buy@shop.com   Shop_name
4  noreply@shop.com   Shop_name

使用的示例数据框：

# Example dataframe
df = pd.DataFrame({'Sender email':['ex@example.com', 'ex2@example.com', "reply@shop.com", "buy@shop.com", "noreply@shop.com"],
                   'Sender name': ['example', 'example', 'Shop_name', 'Shop_name', 'Shop_name']})

       Sender email Sender name
0    ex@example.com     example
1   ex2@example.com     example
2    reply@shop.com   Shop_name
3      buy@shop.com   Shop_name
4  noreply@shop.com   Shop_name

python - 如何在 pandas df 列的条件子集中使用 endwith() 中的正则表达式？

2 回答 2

使用.query

Related

Reference

使用`.query`