4

请将此问题移至Code Review -area。它更适合那里,因为我知道下面的代码是垃圾,我想要关键的反馈来完成重写。我几乎是在重新发明轮子。

# Description: you are given a bitwise pattern and a string
# you need to find the number of times the pattern matches in the string.
# The pattern is determined by markov chain.
# For simplicity, suppose the ones and zeros as unbiased coin flipping
# that stops as it hits the pattern, below.
#
# Any one liner or simple pythonic solution?

import random

def matchIt(yourString, yourPattern):
        """find the number of times yourPattern occurs in yourString"""

        count = 0
        matchTimes = 0

        # How can you simplify the for-if structures?
        # THIS IS AN EXAMPLE HOW NOT TO DO IT, hence Code-Smell-label
        # please, read clarifications in [Update]

        for coin in yourString:
            #return to base
            if  count == len(pattern):
                    matchTimes = matchTimes + 1
                    count = 0

            #special case to return to 2, there could be more this type of conditions
            #so this type of if-conditionals are screaming for a havoc
            if count == 2 and pattern[count] == 1:
                    count = count - 1

            #the work horse
            #it could be simpler by breaking the intial string of lenght 'l'
            #to blocks of pattern-length, the number of them is 'l - len(pattern)-1'
            if coin == pattern[count]:
                    count=count+1

        average = len(yourString)/matchTimes

        return [average, matchTimes]



# Generates the list
myString =[]
for x in range(10000):
    myString= myString + [int(random.random()*2)]

pattern = [1,0,0]
result = matchIt(myString, pattern)

print("The sample had "+str(result[1])+" matches and its size was "+str(len(myString))+".\n" +
        "So it took "+str(result[0])+" steps in average.\n" +
        "RESULT: "+str([a for a in "FAILURE" if result[0] != 8]))


# Sample Output
# 
# The sample had 1656 matches and its size was 10000.
# So it took 6 steps in average.
# RESULT: ['F', 'A', 'I', 'L', 'U', 'R', 'E']

[更新]

我将在这里稍微解释一下理论,也许可以这样简化问题。上面的代码尝试用A下面的转换矩阵构造马尔可夫链。100你可以想象成抛硬币的模式与之相对应。

>>> Q=numpy.matrix('0.5 0.5 0; 0 0.5 0.5; 0 0.5 0')     
>>> I=numpy.identity(3)
>>> I
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])
>>> Q
matrix([[ 0.5,  0.5,  0. ],
        [ 0. ,  0.5,  0.5],
        [ 0. ,  0.5,  0. ]])
>>> A=numpy.matrix('0.5 0.5 0 0; 0 0.5 0.5 0; 0 0.5 0 0.5; 0 0 0 1')
>>> A
matrix([[ 0.5,  0.5,  0. ,  0. ],
        [ 0. ,  0.5,  0.5,  0. ],
        [ 0. ,  0.5,  0. ,  0.5],
        [ 0. ,  0. ,  0. ,  1. ]])

问题中的成为上面矩阵中average 8第一行的值的总和。N=(I-Q)^-1Q

>>> (I-Q)**-1
matrix([[ 2.,  4.,  2.],
        [ 0.,  4.,  2.],
        [ 0.,  2.,  2.]])
>>> numpy.sum(((I-Q)**-1)[0])
8.0

现在,您可能会看到这个显然只有模式匹配的问题变成了马尔可夫链。我看不出为什么你不能用类似于矩阵或矩阵的东西来代替混乱的 for-while-if 条件。我不知道如何实现它们,但迭代器可能是一种研究方法,特别是在需要分解的更多状态下。

但是 Numpy 出现了一个问题,这些东西-InfNaN用途是什么?(I-Q)**-1从矩阵中检查它们应该收敛到的值。是NN=I+Q+Q^2+Q^3+...=\frac{I-Q^{n}}{I-Q}

>>> (I-Q**99)/(I-Q)
matrix([[  2.00000000e+00,   1.80853571e-09,             -Inf],
        [             NaN,   2.00000000e+00,   6.90799171e-10],
        [             NaN,   6.90799171e-10,   1.00000000e+00]])
>>> (I-Q**10)/(I-Q)
matrix([[ 1.99804688,  0.27929688,        -Inf],
        [        NaN,  1.82617188,  0.10742188],
        [        NaN,  0.10742188,  0.96679688]])
4

3 回答 3

2
def matchIt(yourString, yourPattern):
        """find the number of times yourPattern occurs in yourString"""

您可以使用以下内容吗?

yourString.count(yourPattern)

在您的情况下,您可以创建myString一个包含 10000 个字符的真实字符串,也可以创建pattern一个字符串,然后以简单的 Python 方式计算出现次数。

编辑

pattern为您提供in text(可以是字符串或列表)的(重叠)出现次数的单行代码可能如下所示:

nbOccurences = sum(1 for i in xrange(len(text)-len(pattern)) if text[i:i+len(pattern)] == pattern)
于 2011-01-12T18:12:25.133 回答
1

好的 - 标准(-ish)字符串搜索:

def matchIt(needle, haystack):
    """
    @param needle:   string, text to seek
    @param haystack: string, text to search in

    Return number of times needle is found in haystack,
        allowing overlapping instances.

    Example: matchIt('abab','ababababab') -> 4
    """
    lastSeenAt = -1
    timesSeen = 0
    while True:
        nextSeen = haystack.find(needle, lastSeenAt+1)
        if nextSeen==-1:
            return timesSeen
        else:
            lastSeenAt = nextSeen
            timesSeen += 1

但是您想对数字列表执行此操作吗?没问题; 我们只需要使用 find() 方法创建一个列表类,如下所示:

import itertools
class FindableList(list):
    def find(self, sub, start=None, end=None):
        """
        @param sub: list, pattern to look for in self

        @param start: int, first possible start-of-list
            If not specified, start at first item

        @param: end: int, last+1 possible start-of-list
            If not specified, end such that entire self is searched

        Returns;
            Starting offset if a match is found, else -1
        """
        if start is None or start < 0:
            start = 0

        # N.B. If end is allowed to be too high,
        # zip() will silently truncate the list comparison
        # and you will probably get extra spurious matches.
        lastEnd = len(self) - len(sub) + 1
        if end is None or end > lastEnd:
            end = lastEnd

        rng = xrange if xrange else range
        iz  = itertools.izip
        isl = itertools.islice

        for pos in rng(start, end):
            if all(a==b for a,b in iz(sub, isl(self, pos, end))):
                return pos

        # no match found
        return -1

然后这个例子看起来像

matchIt([1,2,1,2], FindableList([1,2,1,2,1,2,1,2,1,2])) -> 4

你的代码变成:

# Generate a list
randIn = lambda x: int(x*random.random())
myString =[randIn(2) for i in range(10000)]

pattern = [1,0,0]
result = matchIt(pattern, myString)

print("The sample had {0} matches and its size was {1}.\n".format(result, len(myString)))
于 2011-01-12T18:50:46.280 回答
0

这还没有准备好。

类似的问题,但主要关注这里的图形库和类似的问题,但在C#中,可能有用。

与这个问题相关的文件是./networkx/generators/degree_seq.py(997 行,关于生成具有给定度数序列的图),./networkx/algorithms/mixing.py (line 20, function degree_assortativity(G) about probability based graphs)并且还要注意它的源代码引用了 92 个引用,不确定是否要重新发明轮子。对于 igraph,请阅读文件中convert.c关于加权边缘的第 835 行。您可以在此处获取Networkx的源代码,并在此处获取igraph的源代码。请注意,前者在 BSD 许可下使用 Python 完成,而 igraph 在 GNU (GPL) 下使用 C 完成。

要开始使用 Networkx,这是一条关于从其 jUnits -文件创建加权图的有用行test_convert_scipy.py

def create_weighted(self, G): 
    g = cycle_graph(4)
    e = g.edges()
    source = [u for u,v in e]
    dest = [v for u,v in e]
    weight = [s+10 for s in source]
    ex = zip(source, dest, weight)
    G.add_weighted_edges_from(ex)
    return G

因此,要创建您的马尔可夫链,请在此处提供有关有向加权图的帮助,可能是这样的:

>>> DG=nx.DiGraph()
>>> DG.add_weighted_edges_from([(0,0,0.5),(1,1,0.5),(3,3,1),(0,1,0.5),(1,2,0.5),(2,3,0.5), (2,1,0.5)])

or perhaps there is some ready markov chain generation tool as there are for some other stochastic processes, more here. Cannot find an algorithm to analyze the graph with excepted value or do trials with different sets as in your example, perhaps there is none, and you must stick with other repliers' solutions.

于 2011-01-13T18:03:01.563 回答