我正在处理多个部分的python map/reduce。
我的第一张地图打印到标准输入,以便第一个 reduce 可以拾取它。
映射的结果如下所示:
frozenset([4]) 14
reducefrozenset([4])
作为键读入,14
作为值读入。
如何仅从键中提取 [4] 以传递给 reduce 的输出?地图如下所示:
import sys
data = sys.stdin.read()
dataset = []
for line in data.splitlines():
dataset.append(map(int, line.strip().split(" ")))
c1 = []
for transaction in dataset:
for item in transaction:
if not [item] in c1:
c1.append([item])
candidates = map(frozenset, c1)
sscnt = {}
for tid in dataset:
for can in candidates:
if can.issubset(tid):
sscnt.setdefault(can, 0)
sscnt[can] += 1
for key,val in sscnt.items():
print key, val
减少看起来像这样:
import sys
min_support = 12
sscnt = {}
for input_line in sys.stdin:
input_line = input_line.strip()
key, value = input_line.split(" ")
key = int(key)
sscnt[key] = int(value)
retlist = []
for key in sscnt:
support = sscnt[key]
if value >= min_support:
retlist.insert(0, key)
print retlist
reduce 的输出如下所示:
['frozenset([1])', 'frozenset([4])', 'frozenset([2])']
输入数据如下所示:
1 2 3 5 8
2 3 4 7
1 2 4 5 7
1 2 4 6 7
1 2 3 4 5
1 2 4 5 6
1 2 4 6 9
1 2 4 8
3 5 6 8
1 2 4 7
1 2 4 5
1 2 4 9
3 5 6 9
1 2 4 7
3 5 6
1 2 4 8
1 5 6
3 5 9
1 2 4 6
4 5 6 7