54

this is a rather similar question to this question but with one key difference: I'm selecting the data I want to change not by its index but by some criteria.

If the criteria I apply return a single row, I'd expect to be able to set the value of a certain column in that row in an easy way, but my first attempt doesn't work:

>>> d = pd.DataFrame({'year':[2008,2008,2008,2008,2009,2009,2009,2009], 
...                   'flavour':['strawberry','strawberry','banana','banana',
...                   'strawberry','strawberry','banana','banana'],
...                   'day':['sat','sun','sat','sun','sat','sun','sat','sun'],
...                   'sales':[10,12,22,23,11,13,23,24]})

>>> d
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     12  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana     24  2009

>>> d[d.sales==24]
   day flavour  sales  year
7  sun  banana     24  2009

>>> d[d.sales==24].sales = 100
>>> d
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     12  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana     24  2009

So rather than setting 2009 Sunday's Banana sales to 100, nothing happens! What's the nicest way to do this? Ideally the solution should use the row number, as you normally don't know that in advance!

4

3 回答 3

79

Many ways to do that

1

In [7]: d.sales[d.sales==24] = 100

In [8]: d
Out[8]: 
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     12  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana    100  2009

2

In [26]: d.loc[d.sales == 12, 'sales'] = 99

In [27]: d
Out[27]: 
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     99  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana    100  2009

3

In [28]: d.sales = d.sales.replace(23, 24)

In [29]: d
Out[29]: 
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     99  2008
2  sat      banana     22  2008
3  sun      banana     24  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     24  2009
7  sun      banana    100  2009
于 2013-07-18T17:15:28.100 回答
14

Not sure about older version of pandas, but in 0.16 the value of a particular cell can be set based on multiple column values.

Extending the answer provided by @waitingkuo, the same operation can also be done based on values of multiple columns.

d.loc[(d.day== 'sun') & (d.flavour== 'banana') & (d.year== 2009),'sales'] = 100
于 2015-05-13T07:48:46.753 回答
5

Old question, but I'm surprised nobody mentioned numpy's .where() functionality (which can be called directly from the pandas module).

In this case the code would be:

d.sales = pd.np.where(d.sales == 24, 100, d.sales)

To my knowledge, this is one of the fastest ways to conditionally change data across a series.

于 2019-01-22T23:13:36.833 回答