python - Grouping tuple columns so their sum is less than 1

Question

I need to create a list of groups of items, grouped so that the sum of the negative logarithms of the probabilities is roughly 1.

So far I've come up with

probs = np.random.dirichlet(np.ones(50)*100.,size=1).tolist()
logs = [-1 * math.log(1-x,2) for x in probs[0]]
zipped = zip(range(0,50), logs)

for key, igroup in iter.groupby(zipped, lambda x: x[1] < 1):
    print(list(igroup))

I.e. I create a list of random numbers, take their negative logarithms, then zip these probabilities together with the item number.

I then want to create groups by adding together the numbers in the second column of the tuple until the sum is 1 (or slightly above it).

I've tried:

for key, igroup in iter.groupby(zipped, lambda x: x[1]):
    for thing in igroup:
        print(list(iter.takewhile(lambda x: x < 1, iter.accumulate(igroup))))

and various other variations on using itertools.accmuluate, but I can't get it to work.

Does anyone have an idea of what could be going wrong (I think I'm doing too much work).

Ideally, the output should be something like

groups = [[1,2,3], [4,5], [6,7,8,9]]

etc i.e these are the groups which satisfy this property.

score 1 · Accepted Answer

使用numpy.ufunc.accumulate和简单的循环：

import numpy as np

def group(xs, start=1):
    last_sum = 0
    for stop, acc in enumerate(np.add.accumulate(xs), start):
        if acc - last_sum >= 1:
            yield list(range(start, stop))
            last_sum = acc
            start = stop
    if start < stop:
        yield list(range(start, stop))

probs = np.random.dirichlet(np.ones(50) * 100, size=1)
logs = -np.log2(1 - probs[0])
print(list(group(logs)))

样本输出：

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
 [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]]

选择

使用numpy.searchsorted：

def group(xs, idx_start=1):
    xs = np.add.accumulate(xs)
    idxs = np.searchsorted(xs, np.arange(xs[-1]) + 1, side='left').tolist()
    return [list(range(i+idx_start, j+idx_start)) for i, j in zip([0] + idxs, idxs)]

python - Grouping tuple columns so their sum is less than 1

1 回答 1

Related

Reference