1

我的 pandas DataFrame 中有一个带有国家名称的列。我想使用 if-else 条件在列上应用不同的过滤器,并且必须在具有这些条件的 DataFrame 上添加一个新列。

当前数据框:-

Company Country
BV 	Denmark
BV 	Sweden
DC 	Norway
BV 	Germany
BV 	France
DC 	Croatia
BV 	Italy
DC 	Germany
BV 	Austria
BV 	Spain

我已经尝试过了,但在这个过程中,我必须一次又一次地定义国家。

bookings_d2.loc[(bookings_d2.Country== '丹麦') | (bookings_d2.Country== '挪威'), 'Country'] = bookings_d2.Country

在 RI 中,目前正在使用这样的 if else 条件,我想在 python 中实现同样的事情。

R代码示例1:ifelse(bookings_d2$COUNTRY_NAME %in% c('Denmark','Germany','Norway','Sweden','France','Italy','Spain','Germany','Austria' ,'Netherlands','Croatia','Belgium'), as.character(bookings_d2$COUNTRY_NAME),'Others')

R 代码示例 2 : ifelse(bookings_d2$country %in% c('Germany'), ifelse(bookings_d2 $BOOKING_BRAND %in% c('BV'),'Germany_BV','Germany_DC'),bookings_d2$country)

预期的数据框:-

Company Country
BV 	Denmark
BV 	Sweden
DC 	Norway
BV 	Germany_BV
BV 	France
DC 	Croatia
BV 	Italy
DC 	Germany_DC
BV 	Others
BV 	Others

4

3 回答 3

3

不确定您到底要达到什么目标,但我想它是这样的:

df=pd.DataFrame({'country':['Sweden','Spain','China','Japan'], 'continent':[None] * 4})

  country continent
0  Sweden      None
1   Spain      None
2   China      None
3   Japan      None


df.loc[(df.country=='Sweden') | ( df.country=='Spain'), 'continent'] = "Europe"
df.loc[(df.country=='China') | ( df.country=='Japan'), 'continent'] = "Asia"

  country continent
0  Sweden    Europe
1   Spain    Europe
2   China      Asia
3   Japan      Asia

您还可以使用 python 列表推导,例如:

df.continent=["Europe" if (x=="Sweden" or x=="Denmark") else "Other" for x in df.country]
于 2019-08-23T04:50:27.773 回答
1

您可以使用:

例如 1:Series.isinnumpy.whereor一起使用loc,但必要的反转掩码 by ~

#removed Austria, Spain
L = ['Denmark','Germany','Norway','Sweden','France','Italy',
     'Germany','Netherlands','Croatia','Belgium']

df['Country'] = np.where(df['Country'].isin(L), df['Country'], 'Others')

选择:

df.loc[~df['Country'].isin(L), 'Country'] ='Others'

例如2:使用numpy.select或嵌套np.where

m1 = df['Country'] == 'Germany'
m2 = df['Company'] == 'BV'
df['Country'] = np.select([m1 & m2, m1 & ~m2],['Germany_BV','Germany_DC'], df['Country'])

选择:

df['Country'] = np.where(~m1, df['Country'],
                np.where(m2, 'Germany_BV','Germany_DC'))
print (df)
  Company     Country
0      BV     Denmark
1      BV      Sweden
2      DC      Norway
3      BV  Germany_BV
4      BV      France
5      DC     Croatia
6      BV       Italy
7      DC  Germany_DC
8      BV      Others
9      BV      Others
于 2019-08-23T05:45:49.647 回答
1

你可以这样做:

country_others=['Poland','Switzerland']


df.loc[df['Country']=='Germany','Country']=df.loc[df['Country']=='Germany'].apply(lambda x: x+df['Company'])['Country']
df.loc[(df['Company']=='DC') &(df['Country'].isin(country_others)),'Country']='Others'
于 2019-08-23T04:58:28.183 回答