13

So I would like make a slice of a dataframe and then set the value of the first item in that slice without copying the dataframe. For example:

df = pandas.DataFrame(numpy.random.rand(3,1))
df[df[0]>0][0] = 0

The slice here is irrelevant and just for the example and will return the whole data frame again. Point being, by doing it like it is in the example you get a setting with copy warning (understandably). I have also tried slicing first and then using ILOC/IX/LOC and using ILOC twice, i.e. something like:

df.iloc[df[0]>0,:][0] = 0
df[df[0]>0,:].iloc[0] = 0

And neither of these work. Again- I don't want to make a copy of the dataframe even if it id just the sliced version.

EDIT: It seems there are two ways, using a mask or IdxMax. The IdxMax method seems to work if your index is unique, and the mask method if not. In my case, the index is not unique which I forgot to mention in the initial post.

4

4 回答 4

12

我认为您可以使用idxmax获取第一个True值的索引,然后通过以下方式设置loc

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)))
print (df)
   0
0  1
1  3
2  0
3  0
4  3

print ((df[0] == 0).idxmax())
2

df.loc[(df[0] == 0).idxmax(), 0] = 100
print (df)
     0
0    1
1    3
2  100
3    0
4    3

df.loc[(df[0] == 3).idxmax(), 0] = 200
print (df)
     0
0    1
1  200
2    0
3    0
4    3

编辑:

非唯一索引的解决方案:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)), index=[1,2,2,3,4])
print (df)
   0
1  1
2  3
2  0
3  0
4  3

df = df.reset_index()
df.loc[(df[0] == 3).idxmax(), 0] = 200
df = df.set_index('index')
df.index.name = None
print (df)
     0
1    1
2  200
2    0
3    0
4    3

编辑1:

解决方案MultiIndex

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)), index=[1,2,2,3,4])
print (df)
   0
1  1
2  3
2  0
3  0
4  3

df.index = [np.arange(len(df.index)), df.index]
print (df)
     0
0 1  1
1 2  3
2 2  0
3 3  0
4 4  3

df.loc[(df[0] == 3).idxmax(), 0] = 200
df = df.reset_index(level=0, drop=True)

print (df)
     0
1    1
2  200
2    0
3    0
4    3

编辑2:

解决方案 double cumsum

np.random.seed(1)
df = pd.DataFrame([4,0,4,7,4], index=[1,2,2,3,4])
print (df)
   0
1  4
2  0
2  4
3  7
4  4

mask = (df[0] == 0).cumsum().cumsum()
print (mask)
1    0
2    1
2    2
3    3
4    4
Name: 0, dtype: int32

df.loc[mask == 1, 0] = 200
print (df)
     0
1    4
2  200
2    4
3    7
4    4
于 2017-02-28T18:30:03.490 回答
1

考虑数据框df

df = pd.DataFrame(dict(A=[1, 2, 3, 4, 5]))

print(df)

   A
0  1
1  2
2  3
3  4
4  5

创建一些任意切片slc

slc = df[df.A > 2]

print(slc)

   A
2  3
3  4
4  5

使用and访问 insideslc的第一行dfindex[0]loc

df.loc[slc.index[0]] = 0
print(df)

   A
0  1
1  2
2  0
3  4
4  5
于 2017-03-05T22:10:59.137 回答
1
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(6,1),index=[1,2,2,3,3,3])
df[1] = 0
df.columns=['a','b']
df['b'][df['a']>=0.5]=1
df=df.sort(['b','a'],ascending=[0,1])
df.loc[df[df['b']==0].index.tolist()[0],'a']=0

在此方法中,不会创建数据帧的额外副本,但会引入一个额外的列,可以在处理后删除该列。要选择任何索引而不是第一个索引,您可以按如下方式更改最后一行

df.loc[df[df['b']==0].index.tolist()[n],'a']=0

更改切片中的任何第 n 个项目

df

          a  
1  0.111089  
2  0.255633  
2  0.332682  
3  0.434527  
3  0.730548  
3  0.844724  

切片和标记后的df

          a  b
1  0.111089  0
2  0.255633  0
2  0.332682  0
3  0.434527  0
3  0.730548  1
3  0.844724  1

将切片中第一项的值(标记为 0)更改为 0 后

          a  b
3  0.730548  1
3  0.844724  1
1  0.000000  0
2  0.255633  0
2  0.332682  0
3  0.434527  0
于 2017-03-09T17:22:51.513 回答
0

因此,使用一些答案,我设法找到了一种方法来做到这一点:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)))
print df
   0
0  1
1  3
2  0
3  0
4  3
df.loc[(df[0] == 0).cumsum()==1,0] = 1
   0
0  1
1  3
2  1
3  0
4  3

本质上,这是使用带有 cumsum 的掩码。

于 2017-03-27T18:37:31.247 回答