python - 从论文作者列表中计算鄂尔多斯数

Question

Erdős 数描述了一个人与数学家 Paul Erdős 之间的“合作距离”，以数学论文的作者身份衡量。要获得 Erdős 数，某人必须是与另一个拥有有限 Erdős 数的人的研究论文的合著者。Paul Erdős 的 Erdős 数为零。任何其他人的 Erdős 数是 k + 1，其中 k 是任何合著者中最小的 Erdős 数。维基百科。

给定作者（和论文）列表，我想为一组作者中的每一个生成一个 Erdős 编号。源数据如下（来自输入 .txt 文件）：

1(means only one scenario from this input, could have more than one from other entries)

4 3
Smith, M.N., Martin, G., Erdos, P.: Newtonian forms of prime factors
Erdos, P., Reisig, W.: Stuttering in petri nets
Smith, M.N., Chen, X.: First order derivates in structured programming
Jablonski, T., Hsueh, Z.: Selfstabilizing data structures
Smith, M.N.
Hsueh, Z.
Chen, X.

计算 Erdős 数的作者是：

Smith, M.N.
Hsueh, Z.
Chen, X.

我目前的计划是从每个条目中取出名称并形成名称列表（或列表列表）。但我不确定做到这一点的最佳方法。我应该使用什么？麻木？阅读线？

更新：输出应如下所示：

Scenario 1
Smith, M.N. 1
Hsueh, Z. infinity
Chen, X. 2

score 3 · Accepted Answer

为了更好地理解这个问题，请注意，这基本上只是未加权的单源最短路径问题，可以使用广度优先搜索来解决。您的问题中的图表定义为：

每个节点代表一个作者。
两个节点之间存在一条边，当且仅当有一篇论文由两个节点代表的两位作者共同撰写。

对于您的示例，图表如下：

雷西格
  |
  |
 鄂尔多斯——马丁
  | /
  | /
  | /
  | /
  | /
 史密斯——陈


雅布隆斯基——薛

因此，该算法最初将距离 0 分配给鄂尔多斯，将无穷大分配给其他人。然后当它迭代地访问邻居时，它分配的距离越来越大。所以下一次迭代将把距离（或者在这种情况下是 Erdos 数）分配给 Reisig、Martin 和 Smith。下一次也是最后一次迭代将为 Chen 分配距离 2。Jablonski 和 Hsueh 的距离将保持为无穷大。

使用邻接列表的图形表示：

e = 鄂尔多斯
r = Reisig
m = 马丁
s = 史密斯
c = 陈
j = 雅布隆斯基
h = 雪

e: 有效值
回覆
米：es
年代：欧共体
CS
Ĵ：小时
h: j

用Python解决它的代码：

import Queue

def calc_erdos(adj_lst):
    erdos_numbers = {}
    queue = Queue.Queue()
    queue.put(('Erdos, P.', 0))
    while not queue.empty():
        (cur_author, dist) = queue.get()
        if cur_author not in erdos_numbers:
            erdos_numbers[cur_author] = dist
        for author in adj_lst[cur_author]:
            if author not in erdos_numbers:
                queue.put((author, dist+1))
    return erdos_numbers

def main():
    num_scenario = int(raw_input())
    raw_input() # Read blank line
    for idx_scenario in range(1, num_scenario+1):
        [num_papers, num_queries] = [int(num) for num in raw_input().split()]
        adj_lst = {}
        for _ in range(num_papers):
            paper = raw_input()
            [authors, title] = paper.split(':')
            authors = [author.strip() for author in authors.split(',')]
            authors = [', '.join(first_last) for first_last in zip(authors[::2], authors[1::2])]
            # Build the adjacency list
            for author in authors:
                author_neighbors = adj_lst.get(author,set())
                for coauthor in authors:
                    if coauthor == author:
                        continue
                    author_neighbors.add(coauthor)
                adj_lst[author] = author_neighbors

        erdos_numbers = calc_erdos(adj_lst)

        print 'Scenario %d' % idx_scenario
        for _ in range(num_queries):
            author = raw_input()
            print '%s %s' % (author, erdos_numbers.get(author,'infinity'))

if __name__=='__main__':
    main()

输入：

1

4 3
Smith, MN, Martin, G., Erdos, P.：质因数的牛顿形式
Erdos, P., Reisig, W.：Petri 网中的口吃
Smith, MN, Chen, X.：结构化编程中的一阶导数
Jablonski, T., Hsueh, Z.：自稳定数据结构
明尼苏达州史密斯
薛，Z.
陈，X.

将输出：

方案 1
明尼苏达州史密斯 1
Hsueh, Z. 无穷大
陈 X. 2

注意：

更一般的问题可以描述为单源最短路径问题，最简单的解决方案是使用Djikstra 算法。

score 0 · Accepted Answer

我已经提交了对您的问题的编辑，以尝试澄清您希望实现的目标。基于此，我编写了以下代码来回答我认为您要问的问题：

f = ['Smith, M.N., Martin, G., Erdos, P.: Newtonian forms of prime factors',
'Erdos, P., Reisig, W.: Stuttering in petri nets',
'Smith, M.N., Chen, X.: First order derivates in structured programming',
'Jablonski, T., Hsueh, Z.: Selfstabilizing data structures']

author_en = {} # Dict to hold scores/author
coauthors = []
targets = ['Smith, M.N.','Hsueh, Z.','Chen, X.']

for line in f:
    # Split the line on the :
    authortext,papers = line.split(':')

    # Split on comma, then rejoin author (every 2)
    # See: http://stackoverflow.com/questions/9366650/string-splitting-after-every-other-comma-in-string-in-python
    authors = authortext.split(', ')
    authors = map(', '.join, zip(authors[::2], authors[1::2]))

    # Authors now contains a list of authors names    
    coauthors.append( authors )

    for author in authors:
        author_en[ author ] = None

author_en['Erdos, P.'] = 0 # Special case

在这一点上，我们现在有一个列表列表：每个列表包含来自给定出版物的共同作者和一个保存分数的字典。我们需要遍历每篇论文并为作者分配分数。我对 Erdos 分数的计算并不完全清楚，但您可能希望循环分数分配，直到没有变化 - 以解释影响早期分数的后来论文。

for ca in coauthors:
    minima = None
    for a in ca:
        if author_en[a] != None and ( author_en[a]<minima or minima is None ): # We have a score
            minima = author_en[a]

    if minima != None:
        for a in ca:
            if author_en[a] == None:
                author_en[a] = minima+1 # Lowest score of co-authors + 1

for author in targets:
    print "%s: %s" % ( author, author_en[author] )

score 0 · Accepted Answer

如果有人正在寻找 Java 实现：

import java.util.*;

public class P10044 {

public static void main(String[] args) {

    Scanner c = new Scanner(System.in);
    int cases = c.nextInt();

    for (int currentCase = 1; currentCase<=cases; currentCase++) {

        int p = c.nextInt();
        int n = c.nextInt();

        c.nextLine();

        HashMap<String, ArrayList<String>> graph = new HashMap<>();
        String[] testingNames = new String[n];
        ArrayList<String> authors = new ArrayList<>();

        HashMap<String, Integer> eNums = new HashMap<>();

        while (p-- > 0) {

            String[] paperAuthors = c.nextLine().split(":")[0].split("\\.,");
            for (int i = 0; i < paperAuthors.length; i++) {
                if (paperAuthors[i].charAt(paperAuthors[i].length() - 1) != '.')
                    paperAuthors[i] += '.';
                paperAuthors[i] = paperAuthors[i].trim();
            }

            for (String author : paperAuthors)
                if (!authors.contains(author))
                    authors.add(author);

            // create and update the graph
            for (String name : paperAuthors) {

                ArrayList<String> updatedValue;

                if (graph.keySet().contains(name))
                    updatedValue = graph.get(name);
                else
                    updatedValue = new ArrayList<>();

                for (String paperAuthor : paperAuthors)
                    if (!paperAuthor.equals(name))
                        updatedValue.add(paperAuthor);

                graph.put(name, updatedValue);

            }
        }


        //initialize the eNums map:
        for (String author : authors)
            if (!author.equals("Erdos, P."))
                eNums.put(author, Integer.MAX_VALUE);
            else
                eNums.put(author, 0);


        for (int i = 0; i < n; i++)
            testingNames[i] = c.nextLine();

        calculateEnums("Erdos, P.", graph, eNums);


        System.out.println("Scenario " + currentCase);
        for (String name : testingNames)
            if (!eNums.keySet().contains(name) || eNums.get(name) == Integer.MAX_VALUE)
                System.out.println(name + " infinity");
            else
                System.out.println(name + " " + eNums.get(name));

    }

}

private static void calculateEnums(String name, HashMap<String, ArrayList<String>> graph,
                                   HashMap<String, Integer> eNums) {

    ArrayList<String> notCalculated = new ArrayList<>();
    notCalculated.add(name);

    while (notCalculated.size() > 0) {
        String currentName = notCalculated.get(0);
        for (String connectedName : graph.get(currentName)) {
            if (eNums.get(connectedName) > eNums.get(currentName)) {
                eNums.put(connectedName, eNums.get(currentName) + 1);
                if(!notCalculated.contains(connectedName))
                    notCalculated.add(connectedName);
            }
        }
        notCalculated.remove(0);
    }

//        recursive implementation but will result in TLE

//        for(String connected: graph.get(name)) {
//            if (eNums.get(connected) > eNums.get(name)) {
//                eNums.put(connected, eNums.get(name) + 1);
//                calculateEnums(connected, graph, eNums);
//            }
//        }
}
}

python - 从论文作者列表中计算鄂尔多斯数

3 回答 3

Related

Reference