python - 如果它们是正值 k 分开，我如何从排名系统中删除值？

Question

假设我有以下代码：

def compute_ranks(graph, k):
    d = .8 #dampening factor
    loops = 10
    ranks = {}
    npages = len(graph)
    for page in graph:
        ranks[page] = 1.0 / npages
    for c in range(0, loops):
        newranks = {}
        for page in graph:
            newrank = (1-d) / npages
            for node in graph:
                if page in graph[node]:
                    newrank = newrank + d * (ranks[node]/len(graph[node]))
            newranks[page] = newrank
        ranks = newranks
     return ranks

好吧，现在假设我不想允许任何可以相互勾结的项目。如果我有一个项目字典

g = {'a': ['a', 'b', 'c'], 'b':['a'], 'c':['d'], 'd':['a']}

对于任何路径 A==>B，我不想允许来自 B==>A 的路径位于或低于我的数字 k 的距离。

例如，如果 k = 0，那么我不允许的唯一路径是 A==>A。

但是，如果 k = 2，那么我不会像以前那样允许链接 A==>A，但也不允许链接如 D==>A、B==>A 或 A==>C。

我知道这很令人困惑，我的大部分问题来自于不理解这意味着什么。

这是问题的记录：

# Question 2: Combatting Link Spam

# One of the problems with our page ranking system is pages can 
# collude with each other to improve their page ranks.  We consider 
# A->B a reciprocal link if there is a link path from B to A of length 
# equal to or below the collusion level, k.  The length of a link path 
# is the number of links which are taken to travel from one page to the 
# other.

# If k = 0, then a link from A to A is a reciprocal link for node A, 
# since no links needs to be taken to get from A to A.

# If k=1, B->A would count as a reciprocal link  if there is a link 
# A->B, which includes one link and so is of length 1. (it requires 
# two parties, A and B, to collude to increase each others page rank).

# If k=2, B->A would count as a reciprocal link for node A if there is
# a path A->C->B, for some page C, (link path of length 2),
# or a direct link A-> B (link path of length 1).

# Modify the compute_ranks code to 
#   - take an extra input k, which is a non-negative integer, and 
#   - exclude reciprocal links of length up to and including k from 
#     helping the page rank.

score 0 · Accepted Answer

一种可能的解决方案是引入一种检测合谋的递归方法。就像是：

def Colluding(p1,p2,itemDict,k):
    if p1 == p2:
        return True
    elif k == 0:
        return False
    else p2 in itemDict[p1]:
        return True
    for p in itemDict[p1]:
        if Colluding(p1,p,itemDict,k-1):
            return True
    return False

然后它说if item in itemDict[node]你会拥有if item in itemDict[node] and not Colluding(item,node,itemDict,k)或类似的东西。

如果在小深度（例如 A->B->A）处存在大量共谋链接，则进行深度优先搜索可能不是最佳选择，因为它们可能仅在经过几次全深度搜索后才能找到。在这种情况下，您也许可以找到另一种进行广度优先搜索的方法。此外，如果您有很多链接，则可能值得尝试转换为循环，因为如果您使用递归，Python 可能会出现堆栈溢出问题。递归算法是我首先想到的，因为使用递归遍历树似乎很自然。

注意：由于这是对家庭作业的帮助，因此我没有对其进行测试，并且某些比较可能不太正确，需要以某种方式进行调整。

score 0 · Accepted Answer

我想到了：

def Colluding(p1,p2,itemDict,k, time):
if k >= len(itemDict)/2:
    return True
if k == 0:
    if time == 0:
        if p1 == p2 and p1 in itemDict[p2]:
            return True
        else:
            return False
    if time == 1:
        if p1 in itemDict[p2]:
            return True
        else:
            return False
for p in itemDict[p1]:
    if Colluding(p1,p,itemDict,k-1,1) and p == p2:
        return True
    else:
        return False
return False

def compute_ranks(graph, k):
    d = 0.8 # damping factor
    numloops = 10
    ranks = {}
    npages = len(graph)
    for page in graph:
        ranks[page] = 1.0 / npages
    for i in range(0, numloops):
        newranks = {}
        for page in graph:
            newrank = (1 - d) / npages
            for node in graph:
                if page in graph[node] and not Colluding(page, node, graph, k,0):
                    newrank = newrank + d * (ranks[node]/len(graph[node]))
            newranks[page] = newrank
        ranks = newranks
    return ranks

感谢@Ken 为我指明了正确的方向。

score 0 · Accepted Answer

我也参加了这门课程。我提出了以下解决方案。基本上，我们需要的只是“所有互惠链接”。给定图形和级别 k，以下过程计算所有互惠链接。它返回一个元组列表，表示相互链接（node1，node2）。您需要做的就是检查（节点，页面）是否不在此列表中以计算排名。

def get_reciprocal_links(graph, k):
reci=[]
for node in g:
    for ele in g[node]:
        got = False
        work = []
        i=0
        if node == ele and i == k:
            reci.append((node,ele))
            break
        if ele in g:
            for e in g[ele]:
                work.append(e)

        while i < k:
            if node in work:
                reci.append((node,ele))
                got = True
                break
            else:
                i=i+1
                check=work.pop()
                if check in g:
                    for e in g[check]:
                        work.append(e)
        if got == True:
            continue
return reci

python - 如果它们是正值 k 分开，我如何从排名系统中删除值？

3 回答 3

Related

Reference