1

我正在尝试从以下代码创建 2 个单独的数据框:

import pandas as pd

sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')

sportdf = pd.DataFrame(columns = ['Name','Interest'])
sciencedf = pd.DataFrame(columns = ['Name','Interest'])

data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 
  
df = pd.DataFrame(data, columns = ['Name', 'Interest'])

s = []
q = []
for i in range(len(df)):
    if df.loc[i,"Interest"] in sport:
        s.append(df.loc[i,"Name"])
        s.append(df.loc[i,"Interest"])
        df_length = len(s)
        sportdf.loc[df_length] = s
        print(df)
    else:
        q.append(df.loc[i,"Name"])
        q.append(df.loc[i,"Interest"])
        df_length = len(q)
        #sciencedf.loc[df_length] = q 

预期的输出是 sportdf 数据框将有一行是“tom”和“volleyball”,而 sciencedf 是“nick”“chemistry”和“juli”“physics”。

然而,在上面的代码中,我成功创建了 sportdf,但没有创建 sciencedf,因为列表 q 是 ['nick','chemistry','juli','physics]。我可以用其他方式拆分它然后添加,但我觉得我让这个比实际困难了 100 倍。总结一下:

for every row in df:
if the cell of the 'Interest' column is in the sport tuple:
add the row to the sportdf
if it is not (elif):
add the row to the sciencedf
4

3 回答 3

1

pandas isin 函数是解决方案:https ://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html

下面的代码会帮助你

import pandas as pd

sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')
data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 
df = pd.DataFrame(data, columns = ['Name', 'Interest'])

# just two lines of isin condition
sciencedf = df.loc[df['Interest'].isin(science)]
sprotdf = df.loc[df['Interest'].isin(sport)]

print(sciencedf)
print(sprotdf)

输出:

   Name   Interest
1  nick  chemistry
2  juli    physics
 
 Name    Interest
0  tom  volleyball
于 2020-10-16T01:41:52.183 回答
0

使用您的信息:

import pandas as pd

sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')

data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 

然后,您可以使用带有 if 子句的列表推导来构建每个数据帧所需的数据:

  
sportdata = [ [name, interest] for name, interest in data if interest in sport]
sciencedata = [ [name, interest] for name, interest in data if interest in science]
 

之后,您可以像往常一样构建每个数据框:

sportdf = pd.DataFrame(sportdata, columns = ['Name', 'Interest'])
sciencedf = pd.DataFrame(sciencedata, columns = ['Name', 'Interest'])
于 2020-10-16T01:35:10.617 回答
0

您可以使用.query方法,下一个解决方案是使用 Python 3.7 测试的。我认为这个解决方案更清晰。

import pandas as pd
sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')

sportdf = pd.DataFrame(columns = ['Name','Interest'])
sciencedf = pd.DataFrame(columns = ['Name','Interest'])

data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 
  
df = pd.DataFrame(data, columns = ['Name', 'Interest'])

# Only two lines
sportdf = df.query(f"Interest == {sport}")
sciencedf = df.query(f"Interest == {science}")

print(sportdf)
print(sciencedf)

输出:

    Name    Interest
0   tom volleyball

    Name    Interest
1   nick    chemistry
2   juli    physics
于 2020-10-16T01:58:54.547 回答