python - 如何使用 mincemeat 将 example.py 中定义的任务分发到两台客户端计算机？

Question

我已经从https://github.com/michaelfairley/mincemeatpy/zipball/v0.1.2下载了 mincemeat.py 示例

example.py 如下：

#!/usr/bin/env python
import mincemeat

    data = ["Humpty Dumpty sat on a wall",
            "Humpty Dumpty had a great fall",
            "All the King's horses and all the King's men",
            "Couldn't put Humpty together again",
           ]

datasource = dict(enumerate(data))

def mapfn(k, v):
    for w in v.split():
        yield w, 1

def reducefn(k, vs):
    result = sum(vs)
    return result

s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="changeme")
print results

它用于字数统计程序。

我已经通过 LAN 连接了网络中的两台计算机。我使用一台计算机作为服务器并在其上运行 example.py；在作为客户端的第二台计算机上，我使用以下命令行语句运行 mincemeat.py：

python mincemeat.py -p changeme server-IP

它工作正常。

现在我已经通过路由器在局域网中连接了 3 台计算机。然后一台机器作为服务器工作，我想在它上面运行 example.py，并将剩下的两台机器作为客户端机器运行。

我想将任务分配给我的两台客户端机器。那么将map和reduce的任务分发到两台计算机的过程是怎样的呢？如何将我在 example.py 中定义的任务分配给分别具有唯一 IP 的两台客户端计算机？

score 2 · Accepted Answer

默认示例几乎不包含 50 个单词。因此，当您切换窗口启动第二个客户端时，第一个客户端已经完成了对文本的处理。相反，使用大文本文件运行相同的程序，您可以添加第二个客户端。下面应该工作。在这个例子中，我使用了来自 Project Gutenberg 的小说Ulyesses (~1.5 MB)的纯文本格式。

在我的机器（Intel Xeon@ 3.10 GHz）中，使用 2 个客户端只需不到 30 秒。因此，请使用更大的文件或文件列表，或者快速启动第二个客户端。

#!/usr/bin/env python
import mincemeat

def file_contents(file_name):
    f = open(file_name)
    try:
        return f.read()
    finally:
        f.close()

novel_name = 'Ulysses.txt'

# The data source can be any dictionary-like object
datasource = {novel_name:file_contents(novel_name)}

def mapfn(k, v):
    for w in v.split():
        yield w, 1

def reducefn(k, vs):
    result = sum(vs)
    return result

s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="changeme")
print results

对于文件目录，请使用以下示例。转储文件夹中的所有文本文件textfiles。

#!/usr/bin/env python
import mincemeat
import glob

all_files = glob.glob('textfiles/*.txt')

def file_contents(file_name):
    f = open(file_name)
    try:
        return f.read()
    finally:
        f.close()

# The data source can be any dictionary-like object
datasource = dict((file_name, file_contents(file_name))
                  for file_name in all_files)

def mapfn(k, v):
    for w in v.split():
        yield w, 1

def reducefn(k, vs):
    result = sum(vs)
    return result

s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="changeme")
print results

python - 如何使用 mincemeat 将 example.py 中定义的任务分发到两台客户端计算机？

1 回答 1

Related

Reference