对于我正在基准测试的算法,我需要测试列表的某些部分(可能很长,但主要填充 0,偶尔填充 1)。这个想法是,在 n 个项目的列表中,其中 d 是感兴趣的,期望每个项目都有缺陷的概率为 d/n。因此,检查一组大小 d/n(出于信息论的原因,它是根据 floor 和 log 函数定义的——它使算法的分析更容易)。
算法:
1./如果n <= 2*d -2(即超过一半的列表被1填充)只需依次查看每个项目
2./ if n > 2*d -2:检查一组大小为 aplha (= floor(binarylog(l/d), l = n - d + 1, d = number of 1s)。如果有一个1,对组进行二分搜索以找到缺陷并设置 d = d - 1 和 n = n - 1 - x(x = 组的大小减去缺陷)。如果没有一个,则设置 n = n - groupSize 并转到 1(即检查列表的其余部分)。
但是,当在随机位置用 10 个 1 填充列表时,算法会找到除单个 1 之外的所有内容,然后在检查空列表的同时继续循环。
我认为问题在于,在丢弃包含全 0 的组时,我没有正确修改说明下一轮从哪里开始的参考,这导致我的算法失败。
这是函数的相关部分:
import math
def binary_search(inList):
low = 0
high = len(inList)
while low < high:
mid = (low + high) // 2
upper = inList[mid:high]
lower = inList[low:mid]
if any(lower):
high = mid
elif any(upper):
low = mid + 1
elif mid == 1:
return mid
else:
# Neither side has a 1
return -1
return mid
def HGBSA(inList, num_defectives):
n = len(inList)
defectives = []
#initialising the start of the group to be tested
start = 0
while num_defectives > 0:
defective = 0
if(n <= (2*num_defectives - 2)):
for i in inList:
if i == 1:
num_defectives = num_defectives - 1
n = n - 1
defectives.append(i)
else:
#params to determine size of group
l = n - num_defectives + 1
alpha = int(math.floor(math.log(l/num_defectives, 2)))
groupSize = 2**alpha
end = start + groupSize
group = inList[start:end]
#print(groupSize)
#print(group)
if any(group):
defective = binary_search(group)
defective = start + defective
defectives.append(defective)
undefectives = [s for s in group if s != 1]
n = n - 1 - len(undefectives)
num_defectives = num_defectives - 1
print(defectives)
else:
n = n - groupSize
start = start + groupSize
print(defectives)
return defectives
以下是该函数当前通过的测试:
from GroupTesting import HGBSA
#idenitify a single defective
inlist = [0]*1024
inlist[123] = 1
assert HGBSA(inlist, 1) == [123]
#identify two defectives
inlist = [0]*1024
inlist[123] = 1
inlist[789] = 1
assert inlist[123] == 1
assert inlist[789] == 1
assert HGBSA(inlist, 2) == [123, 789]
zeros = [0]*1024
ones = [1, 101, 201, 301, 401, 501, 601, 701, 801, 901]
for val in ones:
zeros[val] = 1
assert HGBSA(zeros, 10) == ones
即它找到一个确定性地放置在列表中的 1、2 和 10 个 1,但是这个测试:
zeros = [0] * 1024
ones = [1] * 10
l = zeros + ones
shuffle(l)
where_the_ones_are = [i for i, x in enumerate(l) if x == 1]
assert HGBSA(l, 10) == where_the_ones_are
已经暴露了bug。
此测试也因上面的代码而失败
#identify two defectives next to each other
inlist = [0]*1024
inlist[123] = 1
inlist[124] = 1
assert GT(inlist, 2) == [123, 124]
以下修改(如果没有缺陷则丢弃整个组,但仅丢弃有缺陷之前的组成员)通过“两个相邻”测试,但不通过“连续 10 个”或随机测试:
def HGBSA(inList, num_defectives):
n = len(inList)
defectives = []
#initialising the start of the group to be tested
start = 0
while num_defectives > 0:
defective = 0
if(n <= (2*num_defectives - 2)):
for i in inList:
if i == 1:
num_defectives = num_defectives - 1
n = n - 1
defectives.append(i)
else:
#params to determine size of group
l = n - num_defectives + 1
alpha = int(math.floor(math.log(l/num_defectives, 2)))
groupSize = 2**alpha
end = start + groupSize
group = inList[start:end]
#print(groupSize)
#print(group)
if any(group):
defective = binary_search(group)
defective = start + defective
defectives.append(defective)
undefectives = [s for s in group if s != 1 in range(0, groupSize//2)]
print(len(undefectives))
n = n - 1 - len(undefectives)
num_defectives = num_defectives - 1
start = start + defective + 1
#print(defectives)
else:
n = n - groupSize
start = start + groupSize
print(defectives)
return defectives
即问题是当一个组中有多个 1 被测试时,并且在第一个没有被检测到之后。代码通过的最佳测试是在整个列表中随机均匀分布的 1 并找到所有缺陷。
另外,我将如何创建测试以在将来捕获此类错误?