2

我有几个大小不同的数组(20000,1)。我想随机删除每个数组所有行的 25%,这样每个数组都会删除同一行。我发现一种相当乏味的方法如下:

import numpy as np

a=np.array(range(1000))
b=np.array(np.random.rand(1000))
seed=np.random.randint(0,100000000)     #picking a random seed
np.random.seed(seed)      #Setting the same seed for each deletion
a[np.random.rand(*a.shape) < .25] = 0
np.random.seed(seed)
b[np.random.rand(*b.shape) < .25] = 0
a=a[a !=0]
b=b[b !=0]

这种方法有几个问题,例如如果数组已经包含零怎么办?有没有更好的方法来做到这一点?

4

4 回答 4

6

基于并扩展自 Joel Cornett 的解决方案:

import numpy as np

length = 20000
limit = int(0.75*length)
keep = np.random.permutation(length)[:limit]

newArray = oldArray[keep]
于 2012-08-04T19:37:24.557 回答
1

Here is a non-numpy solution in very general terms:

import random
to_keep = set(random.sample(range(total_rows), keep_ratio * total_rows))

#do this for each array:
new_array = np.array(item for index, item in enumerate(old_array) if index in to_keep)
  • total_rows is the number of rows in each array (I think you said this was 20,000)

  • keep_ratio is the percentage of rows to keep, which according to you is 1 - 0.25 = 0.75

EDIT

You can also use numpy's compress() method.

import random
to_keep = set(random.sample(range(total_rows), keep_ratio * total_rows))
kompressor = (1 if i in to_keep else 0 for i in xrange(total_rows))

new_array = numpy.compress(kompressor, old_array, axis=1)
kompressor
于 2012-08-04T19:20:28.870 回答
1

类似于 Theodros 的答案,但保留了元素的原始顺序:

import numpy as np

mask = np.ones(len(a), dtype=bool)
mask[:len(a)/4] = 0
np.random.shuffle(mask)

a = a[mask]
b = b[mask]
于 2012-08-04T20:16:37.393 回答
0

我不知道它与 . 一起使用的效果如何numpy,但这是我在纯 Python 中使用的:

total = len(a)
toss = int(0.25 * total)
keeping = [False] * toss + [True] * (total - toss)
random.shuffle(keeping)
a = [value for value, flag in zip(a, keeping) if flag]
b = [value for value, flag in zip(b, keeping) if flag]
于 2012-08-04T19:41:18.723 回答