1

此代码有效。但我不禁觉得这是一个 hack,尤其是“偏移”部分。我必须把它放在那里,否则每次我执行 del 操作时,删除中的所有索引值都会移动一个。

    # remove outliers > devs # of std deviations
    devs = 1
    deletes = []
    for num, duration in enumerate(durations):
        if (duration > (mean_duration + (devs * std_dev_one_test))) or \
            (duration < (mean_duration - (devs * std_dev_one_test))):
            deletes.append(num)
    offset = 0
    for delete in deletes:
        del durations[delete - offset]
        del dates[delete - offset]
        offset += 1

关于如何使它变得更好的想法?

4

4 回答 4

4

在迭代列表时构建一个 Keepers 列表:

def isKeeper( duration ):
    if (duration > (mean_duration + (devs * std_dev_one_test))) or \
            (duration < (mean_duration - (devs * std_dev_one_test))):
        return False
    return True

durations = [duration for duration in durations if isKeeper(duration)]
于 2012-07-07T00:58:22.470 回答
3

也许是这样的:

import numpy as np        

myList = [1,2,3,4,5,6,7,3,4,5,3,5,99] 

mean_duration  = np.mean(myList)
std_dev_one_test = np.std(myList)     

def drop_outliers(x):
    if abs(x - mean_duration) <= std_dev_one_test:
        return x

myList = filter(drop_outliers, myList)

结果:

>>> myList
[1, 2, 3, 4, 5, 6, 7, 3, 4, 5, 3, 5]
于 2012-07-07T00:52:08.790 回答
1

问题是您从列表中删除项目并导致索引移动并且您正在补偿偏移量吗?

如果是这种情况,那么只需从后到前删除,这样当您删除项目时,它不会影响列表的其余部分。

所以开始从最后一项迭代到列表的前面。

这些 SO 问题可能很有趣Delete many elements of list (python) and Python: Removing list element while iterate over list

可以在这里找到另一个很好的 SO 讨论:Remove items from a list while iterator(感谢@PaulMcGuire 通过评论提出的建议)

于 2012-07-07T00:13:18.400 回答
0

如果您的数据集很小,您可以反转您的逻辑,并保留值而不是删除它们:

# keep value outliers < devs # of std deviations
devs = 1
keeps = []
for duration in durations:
    if (duration <= (mean_duration + (devs * std_dev_one_test))) and \
        (duration >= (mean_duration - (devs * std_dev_one_test))):
        keeps.append(duration)
于 2012-07-07T00:19:22.810 回答