python - Python Pandas indexing

Question

Sorry if this is a simple question, I've tried to look for a solution but can't find anything.

My code goes like this:

given zip1, create an index to select observations (other zipcodes) where some calculation has not been done yet (666)
```
I = (df['zip1'] == zip1) & (df['Distances'] == 666)
```
perform some calculation
```
distances = calc(zip1,df['zip2'][I])
```

So far so good, I've checked the distances variable, correct values, correct sized array.

put the distance variable in the right place
```
df['Distances'][I] = distances
```

but this last part updates all the df['Distances'] variables to nonsense values FOR ALL observations with df['zip1']=zip1 instead of the ones selected by I.

I've checked the boolean array I before the df['Distances'][I] = distances command and it looks fine. Any ideas would be greatly appreciated.

score 0 · Accepted Answer

您正在尝试的操作称为链式分配，并且不会按照您的想法工作，因为它返回副本而不是视图，因此您会看到错误。

这里有更多关于它和相关问题的信息，this和this。

所以你应该使用.loc或.ix喜欢这样：

df.loc[I,'Distances']=distances

python - Python Pandas indexing

1 回答 1

Related

Reference