1

我正在尝试生成一个带有 6 个 UNIQUE 条码的随机条码列表,这些条码的汉明距离为 3。问题是程序正在生成一个带有重复的条码列表,而不是正确的汉明距离。下面是代码。

import random

nucl_list = ['A', 'C', 'G', 'T']
length = 6
number = 6
attempts = 1000
barcode_list = []
tested = []

def make_barcode():
"""Generates a random barcode from nucl_list"""
    barcode = ''
    for i in range(length):
        barcode += random.choice(nucl_list)
    return barcode

def distance(s1, s2):
"""Calculates the hamming distance between s1 and s2"""
    length1 = len(s1)
    length2 = len(s2)
    # Initiate 2-D array
    distances = [[0 for i in range(length2 + 1)] for j in range(length1 + 1)]
    # Add in null values for the x rows and y columns
    for i in range(0, length1 + 1):
        distances[i][0] = i
    for j in range(0, length2 + 1):
        distances[0][j] = j

    for i in range(1, length1 + 1):
        for j in range(1,length2 + 1):
            cost = 0
            if s1[i - 1] != s2[j - 1]:
                cost = 1
            distances[i][j] = min(distances[i - 1][j - 1] + cost, distances[i][j - 1] + 1, distances[i - 1][j] + 1)
    min_distance = distances[length1][length2]

    for i in range(0, length1 + 1):
        min_distance = min(min_distance, distances[i][length2])
    for j in range(0, length2 + 1):
        min_distance = min(min_distance, distances[length1][j])
    return min_distance

def compare_barcodes():
"""Generates a new barcode and compares with barcodes in barcode_list"""
    new_barcode = make_barcode()
    # keep track of # of barcodes tested
    tested.append(new_barcode)
    if new_barcode not in barcode_list:
        for barcode in barcode_list:
            dist = distance(barcode, new_barcode)
            if dist >= 3:
                barcode_list.append(new_barcode)
            else:
                pass
    else:
        pass

# make first barcode

first_barc = ''
for i in xrange(length):
    first_barc += random.choice(nucl_list)
barcode_list.append(first_barc)

while len(tested) < attempts:
    if len(barcode_list) < number:
        compare_barcodes()
    else:
        break

barcode_list.sort()

print barcode_list

我认为我的问题在于最后一个 while 循环:我想compare_barcodes不断生成符合条件的条形码(不是重复的,并且不在已经生成的任何条形码的汉明距离内)。

4

4 回答 4

1

@Jkdc 的答案是正确的,为他+1。在您的原始代码中,您几乎就在那里。这是我的建议,将您的if new_barcode not in barcode_list:条件移到您的内部for loop,制作它if new_barcode not in barcode_list and distance(barcode, new_barcode),然后您将不会在列表中添加任何重复项,然后仅当new_barcode不在您的 中时才计算距离barcode_list

def compare_barcodes():
    """Generates a new barcode and compares with barcodes in barcode_list"""
    new_barcode = make_barcode()
    # keep track of # of barcodes tested
    tested.append(new_barcode)
    for barcode in barcode_list:
        if new_barcode not in barcode_list and distance(barcode, new_barcode):
            barcode_list.append(new_barcode)

另一个建议是,如果您想避免重复,您可以使用set存储您的条形码,set操作未排序的唯一元素。

于 2015-04-21T16:20:58.493 回答
1

在你的compare_barcodes().

本质上,我们跟踪是否dist >= 3使用too_far. 一旦我们完成循环,barcode_list我们就回去检查too_far. 如果不是,too_far那么我们可以附加到列表中。

barcode_list每次发现时都会附加旧逻辑,dist >= 3这当然会不止一次,具体取决于已将多少条码添加到列表中。

def compare_barcodes():
    too_far = False
    """Generates a new barcode and compares with barcodes in barcode_list"""
    new_barcode = make_barcode()
    # keep track of # of barcodes tested
    tested.append(new_barcode)
    if new_barcode not in barcode_list:
        for barcode in barcode_list:
            dist = distance(barcode, new_barcode)
            if dist >= 3:
                too_far = True
        if not too_far:
            barcode_list.append(new_barcode)

编辑:我刚刚意识到您希望汉明距离为 3 或更大......在这种情况下,只需更改if not too farif too far.

于 2015-04-21T16:10:10.627 回答
0

问题出在您的 compare_barcodes() 函数上。在旧版本中,一旦它看到与任何比较字符串相距 3 步的条形码,它就会将该新字符串添加到列表中。代码可以修改如下。

def compare_barcodes():
    """Generates a new barcode and compares with barcodes in barcode_list"""
    minDist = length
    new_barcode = make_barcode()
    # keep track of # of barcodes tested
    tested.append(new_barcode)
    if new_barcode not in barcode_list:
        for barcode in barcode_list:
            dist = distance(barcode, new_barcode)
            #if dist >= 3:
            #    barcode_list.append(new_barcode)
            #else:
            #    pass
            if dist < minDist:
                minDist = dist
    else:
        pass

    if minDist >= 3:
        barcode_list.append(new_barcode)
于 2015-04-21T16:18:17.113 回答
0

我最终制作了一个新函数来计算汉明距离......

def compare_distances(new_barcode):
"""Compares the hamming_dist between new barcode and old barcodes"""
# Count number of distances < 3
count = 0
global barcode_list
for barcode in barcode_list:
    if distance(new_barcode, barcode) < 3:``
        count +=1
return count

def compare_barcodes():
    new_barcode = make_barcode()
    if new_barcode not in barcode_list:
        count = compare_distances(new_barcode)
        if count > 0:
            pass
        else:
            barcode_list.append(new_barcode)
    else:
        pass

# Initiate the functions to generate barcodes 
while len(barcode_list) < number:
    compare_barcodes()
于 2015-04-22T17:01:50.693 回答