python - 为一组行分配一个值 pandas

Question

我有一个大型 DataFrame，我对三个主要列感兴趣：个人 ID、教育和年份。我想创建一个名为education1985 的新变量，在其中我将他们在1985 年的教育分配给所有个人，无论是哪一年。我想在没有循环的情况下做到这一点，因为我的数据非常大。另外，我想在不事先知道不同个人的ID的情况下这样做。在这里，我附上了一个示例，其中包含我需要的创建数据。

# This is the initial data frame : 
individual_id = [1,1,1,1,2,2,2,2,2,4,4,4,4,5,5,6,6,6,7,7,8,9,9,9,9,9,9]
education = [1,1,2,3,1,2,3,3,3,2,2,3,4,3,4,4,4,5,1,2,2,1,2,3,3,3,3]
year = [1984,1985,1986,1987,1983,1984,1985,1986,1987,1985,1986,1987,1989,1984,1985,1984,1985,1986,1985,1986,1985,1983,1984,1985,1986,1987,1987]
df = pd.DataFrame()
df["id"] = individual_id
df["education"] = education
df["year"] = year

# And the desired outcome is create the variable educ85 : 

df2 = df.copy()
df2["educ85"] = educ85

非常感谢！！

score 0 · Accepted Answer

尝试这个：

df = df.join(
    other=df[df.year==1985][['id','education']].set_index('id').rename(
        columns={'education': 'educ85'}
    ), 
    how='left', 
    on='id'
)

score 0 · Accepted Answer

我倾向于喜欢相对简单的方法。

初始数据设置。

individual_id = [1,1,1,1,2,2,2,2,2,4,4,4,4,5,5,6,6,6,7,7,8,9,9,9,9,9,9]
education = [1,1,2,3,1,2,3,3,3,2,2,3,4,3,4,4,4,5,1,2,2,1,2,3,3,3,3]
year = [1984,1985,1986,1987,1983,1984,1985,1986,1987,1985,1986,1987,1989,1984,1985,1984,1985,1986,1985,1986,1985,1983,1984,1985,1986,1987,1987]
df = pd.DataFrame()
df["id"] = individual_id
df["education"] = education
df["year"] = year

将数据过滤到 1985 年的数据，然后创建一个字典，将 ID 映射到 Education。

d1985 = df[df.year == 1985]
d1985 = dict(zip(d1985.id, d1985.education))

然后，创建新的 DataFrame -

df2 = df.copy()
df2['educ85'] = df2.id.replace(d1985)

python - 为一组行分配一个值 pandas

2 回答 2

Related

Reference