python - 从条件创建新数据框

Question

我正在尝试从以下代码创建 2 个单独的数据框：

import pandas as pd

sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')

sportdf = pd.DataFrame(columns = ['Name','Interest'])
sciencedf = pd.DataFrame(columns = ['Name','Interest'])

data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 
  
df = pd.DataFrame(data, columns = ['Name', 'Interest'])

s = []
q = []
for i in range(len(df)):
    if df.loc[i,"Interest"] in sport:
        s.append(df.loc[i,"Name"])
        s.append(df.loc[i,"Interest"])
        df_length = len(s)
        sportdf.loc[df_length] = s
        print(df)
    else:
        q.append(df.loc[i,"Name"])
        q.append(df.loc[i,"Interest"])
        df_length = len(q)
        #sciencedf.loc[df_length] = q

预期的输出是 sportdf 数据框将有一行是“tom”和“volleyball”，而 sciencedf 是“nick”“chemistry”和“juli”“physics”。

然而，在上面的代码中，我成功创建了 sportdf，但没有创建 sciencedf，因为列表 q 是 ['nick','chemistry','juli','physics]。我可以用其他方式拆分它然后添加，但我觉得我让这个比实际困难了 100 倍。总结一下：

for every row in df:
if the cell of the 'Interest' column is in the sport tuple:
add the row to the sportdf
if it is not (elif):
add the row to the sciencedf

score 1 · Accepted Answer

pandas isin 函数是解决方案：https ://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html

下面的代码会帮助你

import pandas as pd

sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')
data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 
df = pd.DataFrame(data, columns = ['Name', 'Interest'])

# just two lines of isin condition
sciencedf = df.loc[df['Interest'].isin(science)]
sprotdf = df.loc[df['Interest'].isin(sport)]

print(sciencedf)
print(sprotdf)

输出：

   Name   Interest
1  nick  chemistry
2  juli    physics
 
 Name    Interest
0  tom  volleyball

score 0 · Accepted Answer

使用您的信息：

import pandas as pd

sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')

data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']]

然后，您可以使用带有 if 子句的列表推导来构建每个数据帧所需的数据：

  
sportdata = [ [name, interest] for name, interest in data if interest in sport]
sciencedata = [ [name, interest] for name, interest in data if interest in science]

之后，您可以像往常一样构建每个数据框：

sportdf = pd.DataFrame(sportdata, columns = ['Name', 'Interest'])
sciencedf = pd.DataFrame(sciencedata, columns = ['Name', 'Interest'])

score 0 · Accepted Answer

您可以使用.query方法，下一个解决方案是使用 Python 3.7 测试的。我认为这个解决方案更清晰。

import pandas as pd
sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')

sportdf = pd.DataFrame(columns = ['Name','Interest'])
sciencedf = pd.DataFrame(columns = ['Name','Interest'])

data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 
  
df = pd.DataFrame(data, columns = ['Name', 'Interest'])

# Only two lines
sportdf = df.query(f"Interest == {sport}")
sciencedf = df.query(f"Interest == {science}")

print(sportdf)
print(sciencedf)

输出：

    Name    Interest
0   tom volleyball

    Name    Interest
1   nick    chemistry
2   juli    physics

python - 从条件创建新数据框

3 回答 3

Related

Reference