python - KeyError：删除某些行后访问 pandas DataFrame 中的数据时出现 150L

Question

我有一个函数可以遍历 pandas DataFrame 并删除在特定列中有连续重复的行。之后，我尝试在列表中返回该列的运行总和，但我似乎遇到了一个关键错误。我不确定这意味着什么。

最小代码：

dropRows = [] #stores rows indices to drop
#Sanitize the data to get rid of consecutive duplicates
for indx, val in enumerate(df.removeConsecutives): #for all the values
    if(indx == 0): #skip first indx
        continue

    if (val == df.removeConsecutives[indx-1]): #this is duplicate value as the last one
        dropRows.append(indx)

sanitizedData = df.drop(dropRows)

#Create Timestamps based on RTC
listOfSums= [0] #first sum is zero
sum = 0 #running total of seconds for timestamps
for indx, val in enumerate(sanitizedData.removeConsecutives):

    sum += sanitizedData.removeConsecutives[indx]

    listOfSums.append(sum) #add running sum to list

错误跟踪指向此行

    listOfSums.append(sum) #add running sum to list

这就是错误

C:\Users\JohnDoe\Anaconda\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_value (pandas\index.c:2987)()

C:\Users\JohnDoe\Anaconda\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_value (pandas\index.c:2802)()

C:\Users\JohnDoe\Anaconda\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_loc (pandas\index.c:3528)()

C:\Users\JohnDoe\Anaconda\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:7032)()

C:\Users\JohnDoe\Anaconda\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6973)()

KeyError: 150L

我正在使用 iPython，它在一个安装文件中单独安装所有软件包（pandas、numpy、SciPy 等），这就是为什么路径中显示 anaconda

score 2 · Accepted Answer

这里：

for indx, val in enumerate(sanitizedData .band_rtc):
    sum += sanitizedData.removeConsecutives[indx]

您正在使用枚举 - 即您的indx变量将从 0 变为 sanitizedData 中的行数。但是，该removeConsecutives系列不按连续数字索引。也许它曾经 - 但不是在你使用之后drop。

示例 - 你有一个 df 有 300 行。您在第 150 行发现了一个重复项，然后将其删除。现在您的 df 有 299 行，索引为 0-149、151-299。但是indx从 0 到 298 - 并尝试访问 150！如果您使用以下方法，这可能会起作用：

for indx, val in enumerate(sanitizedData .band_rtc):
    sum += sanitizedData.removeConsecutives.iloc[indx]

这是关于你的问题 - 但我建议看看drop_duplicates和sum。

python - KeyError：删除某些行后访问 pandas DataFrame 中的数据时出现 150L

1 回答 1

Related

Reference