2

Sorry if this is a simple question, I've tried to look for a solution but can't find anything.

My code goes like this:

  • given zip1, create an index to select observations (other zipcodes) where some calculation has not been done yet (666)

    I = (df['zip1'] == zip1) & (df['Distances'] == 666)
    
  • perform some calculation

    distances = calc(zip1,df['zip2'][I])
    

So far so good, I've checked the distances variable, correct values, correct sized array.

  • put the distance variable in the right place

    df['Distances'][I] = distances
    

but this last part updates all the df['Distances'] variables to nonsense values FOR ALL observations with df['zip1']=zip1 instead of the ones selected by I.

I've checked the boolean array I before the df['Distances'][I] = distances command and it looks fine. Any ideas would be greatly appreciated.

4

1 回答 1

0

您正在尝试的操作称为链式分配,并且不会按照您的想法工作,因为它返回副本而不是视图,因此您会看到错误。

这里有更多关于它和相关问题的信息,thisthis

所以你应该使用.loc.ix喜欢这样:

df.loc[I,'Distances']=distances
于 2013-10-30T20:46:54.683 回答